On 03-02-2014 3:35 pm, Victor Stinner wrote:
2014-02-03 Phil Thompson <p...@riverbankcomputing.com>:
For example, a string created with a maxchar of 255 (ie. a Latin-1 string) must contain at least one character in the range 128-255 otherwise you get
an assertion failure.

Yes, it's the specification of the PEP 393.

As it stands, when converting Latin-1 strings in my C extension module I must first check each character and specify a maxchar of 127 if the strings
happens to only contain ASCII characters.

Use PyUnicode_FromKindAndData(PyUnicode_1BYTE_KIND, latin1_str,
length) which computes the kind for you.

What is the reasoning behind the checks being so strict?

Different Python functions rely on the exact kind to compare strings.
For example, if you search a latin1 substring in an ASCII string, the
search returns immediatly instead of searching in the string. A latin1
string cannot be found in an ASCII string.

The main reason in the PEP 393 itself, a string must be compact to not
waste memory.

Victor

Are you saying that code will fail if a particular Latin-1 string just happens not to contains any character greater than 127?

I would be very surprised if that was the case. If it isn't the case then I think that particular check shouldn't be made.

Phil
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to