On 2/1/2014 2:26 AM, Chris Angelico wrote:
On Sat, Feb 1, 2014 at 4:46 PM, Terry Reedy <tjre...@udel.edu> wrote:
On 1/31/2014 10:36 PM, Chris Angelico wrote:

On Sat, Feb 1, 2014 at 1:54 PM, MRAB <pyt...@mrabarnett.plus.com> wrote:

I think that some years ago I heard about a variation on UTF-8
(Microsoft?) where codepoint U+0000 is encoded as 0xC0 0x80 so that the
null byte can be used as the string terminator.

I had a look on Wikipedia found this:

http://en.wikipedia.org/wiki/Null-terminated_string


Yeah, it's a common abuse of UTF-8. It's a violation of spec, but an
understandable one. However, I don't understand why the first part -
why should \0 become U+0000 but (presumably) the \a later on
(...cs\accel...) doesn't become U+0007, etc?


Because only  \0 has a special meaning in a C string,

I should have added 'to C itself', as the string terminator.

and Tk is written in C and uses C strings.

Eh? I've used \a in C programs (not often but I have used it).

It's possible that \0 is the only one that actually bombs anything
(because of C0 80 representation).

\0 can bomb C byte processing by terminating it sooner than it should. Its unexpected replacement bombs utf-8 decoding.

> But since \7 and \a both represent
0x07 in a C string, I would expect there to be other problems, if it's
interpreting it as source. Ah well! Weird weird.

While other control codes may have special meaning to a terminal or other device, to do not have special meaning to the operation of C string functions themselves (except possible for a 'getline' function looking for n -- but I do not remember is the C stdlib has any such functions).

I am speaking from my memory of C. I have not looked at the Tk C code to see just what it did where to create the exception. I am just happy that Serhiy was able to fixed tkinter without causing another test to fail.

--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to