On 2/1/2014 2:26 AM, Chris Angelico wrote:
On Sat, Feb 1, 2014 at 4:46 PM, Terry Reedy <tjre...@udel.edu> wrote:
On 1/31/2014 10:36 PM, Chris Angelico wrote:
On Sat, Feb 1, 2014 at 1:54 PM, MRAB <pyt...@mrabarnett.plus.com> wrote:
I think that some years ago I heard about a variation on UTF-8
(Microsoft?) where codepoint U+0000 is encoded as 0xC0 0x80 so that the
null byte can be used as the string terminator.
I had a look on Wikipedia found this:
http://en.wikipedia.org/wiki/Null-terminated_string
Yeah, it's a common abuse of UTF-8. It's a violation of spec, but an
understandable one. However, I don't understand why the first part -
why should \0 become U+0000 but (presumably) the \a later on
(...cs\accel...) doesn't become U+0007, etc?
Because only \0 has a special meaning in a C string,
I should have added 'to C itself', as the string terminator.
and Tk is written in C and uses C strings.
Eh? I've used \a in C programs (not often but I have used it).
It's possible that \0 is the only one that actually bombs anything
(because of C0 80 representation).
\0 can bomb C byte processing by terminating it sooner than it should.
Its unexpected replacement bombs utf-8 decoding.
> But since \7 and \a both represent
0x07 in a C string, I would expect there to be other problems, if it's
interpreting it as source. Ah well! Weird weird.
While other control codes may have special meaning to a terminal or
other device, to do not have special meaning to the operation of C
string functions themselves (except possible for a 'getline' function
looking for n -- but I do not remember is the C stdlib has any such
functions).
I am speaking from my memory of C. I have not looked at the Tk C code to
see just what it did where to create the exception. I am just happy that
Serhiy was able to fixed tkinter without causing another test to fail.
--
Terry Jan Reedy
--
https://mail.python.org/mailman/listinfo/python-list