On 07/08/2013 01:53 PM, ferdy.blat...@gmail.com wrote:
Hi Steven,
thank you for your reply... I really needed another python guru which
is also an English teacher! Sorry if English is not my mother tongue...
"uncorrect" instead of "incorrect" (I misapplied the "similarity
principle" like "unpleasant...>...uncorrect").
Apart from these trifles, you said:
All characters are UTF-8, characters. "a" is a UTF-8 character. So is "ă".
Not using python 3, for me (a programmer which was present at the beginning of
computer science, badly interacting with many languages from assembler to
Fortran and from c to Pascal and so on) it was an hard job to arrange the
abrupt transition from characters only equal to bytes to some special
characters defined with 2, 3 bytes and even more.
Characters do not have a width. They are Unicode code points, an
abstraction. It's only when you encode them in byte strings that a code
point takes on any specific width. And some encodings go to one-byte
strings (and get errors for characters that don't match), some go to
two-bytes each, some variable, etc.
I should have preferred another solution... but i'm not Guido....!
But Unicode has nothing to do with Guido, and it has existed for about
25 years (if I recall correctly). It's only that Python 3 is finally
embracing it, and making it the default type for characters, as it
should be. As far as I'm concerned, the only reason it shouldn't have
been done long ago was that programs were trying to fit on 640k DOS
machines. Even before Unicode, there were multi-byte encodings around
(eg. Microsoft's MBCS), and each was thoroughly incompatible with all
the others. And the problem with one-byte encodings is that if you need
to use a Greek currency symbol in a document that's mostly Norwegian (or
some such combination of characters), there might not be ANY valid way
to do it within a single "character set."
Python 2 supports all the same Unicode features as 3; it's just that it
defaults to byte strings. So it's HARDER to get it right.
Except for special purpose programs like a file dumper, it's usually
unnecessary for a Python 3 programmer to deal with individual bytes from
a byte string. Text files are a bunch of bytes, and somebody has to
interpret them as characters. If you let open() handle it, and if you
give it the correct encoding, it just works. Internally, all strings
are Unicode, and you don't care where they came from, or what human
language they may have characters from. You can combine strings from
multiple places, without much worry that they might interfere.
Windows NT/2000/XP/Vista/7 has used Unicode for its file system (NTFS)
from the beginning (approx 1992), and has had Unicode versions of each
of its API's for nearly as long.
I appreciate you've been around a long time, and worked in a lot of
languages. I've programmed professionally in at least 35 languages
since 1967. But we've come a long way from the 6bit characters I used
in 1968. At that time, we packed them 10 characters to each word.
--
DaveA
--
http://mail.python.org/mailman/listinfo/python-list