Oleg Broytman writes: > This is the core of the problem. Python2 favors Unix model but > Windows people pays the price. Python3 reverses that
This is certainly not true. What is true is that Python 3 makes no attempt to make it easy to write crappy software in the old Unix style, that breaks when unexpected character encoding are encountered. Python 3 is designed to make it easier to write reliable software, even if it will only ever be used on one platform. Nevertheless, it's still a reasonable language for writing byte-shoveling software, with the last piece in place as of the acceptance of PEP 461. As of that PEP, you can use regexps for tokenizing byte streams and %-formatting to conveniently produce them. If you want to treat them piecewise as character streams with different encodings, you have a large library of codecs, which provide an incremental decoder interface. While AFAIK no codec implements a decode-until-error mode, that's not all that much of a loss, as many encodings overlap. Eg, if you start decoding using a latin-1 codec, decoding the whole document will succeed, even if it switches to windows-1251 in the meantime. Oleg, I gather Russian is your native language. That's moderately complicated, I admit. But the Russians are a distant second to the Japanese in self-destructive proliferation of incompatible character coding standards and non-standard variants. After 24 years of dealing with the mess that is East Asian encodings (which is even bound up with the "religion" of Japanese exceptionalism -- some Japanese have argued that there is a spiritual superiority to Japanese JIS codes!), I cannot believe you are going to find a better environment for dealing with these issues than Python 3. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com