On 03-02-2014 3:35 pm, Victor Stinner wrote:
2014-02-03 Phil Thompson <p...@riverbankcomputing.com>:
For example, a string created with a maxchar of 255 (ie. a Latin-1
string)
must contain at least one character in the range 128-255 otherwise
you get
an assertion failure.
Yes, it's the specification of the PEP 393.
As it stands, when converting Latin-1 strings in my C extension
module I
must first check each character and specify a maxchar of 127 if the
strings
happens to only contain ASCII characters.
Use PyUnicode_FromKindAndData(PyUnicode_1BYTE_KIND, latin1_str,
length) which computes the kind for you.
What is the reasoning behind the checks being so strict?
Different Python functions rely on the exact kind to compare strings.
For example, if you search a latin1 substring in an ASCII string, the
search returns immediatly instead of searching in the string. A
latin1
string cannot be found in an ASCII string.
The main reason in the PEP 393 itself, a string must be compact to
not
waste memory.
Victor
Are you saying that code will fail if a particular Latin-1 string just
happens not to contains any character greater than 127?
I would be very surprised if that was the case. If it isn't the case
then I think that particular check shouldn't be made.
Phil
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com