Hyeshik Chang <hyes...@gmail.com> added the comment:

Hello, everyone!

The rationale why I chose to encode the test strings into a Python source code 
was that I wanted for them to be treated as text files which are trackable in 
CVS or subversion and to keep Python source codes free of any non-ASCII 
characters. Now I don't feel the need of "text file" status, STINNER's 
suggestion works for me.

Actually, all "stateful" encodings supported by cjkcodecs lack of adequate test 
codes. (There are seven more iso-2022 stateful encodings in addition of hz in 
Python.)  "cjkencoding_tests.py" is used for random chunk coding tests and most 
stateful encodings are not compatible with random chunk coding. For those 
reasons, I didn't include test strings for them there. But they apparently 
still need appropriate simple string coding and stream coding tests.

STINNER Victor wrote:
> I don't understand why different texts are used. Why not just using the
> same original text for all testcases? One reason can be that some
> encodings (e.g. ISO 2202) use escape sequences to change the current
> encoding. Or maybe because the characters are different (chinese vs
> japanese characters?).

Almost every encoding in cjkcodecs has different set of characters. They 
support different languages (Chinese, Japanese, Korean), different scripts 
(Hanja, Kanji, Traditional and Simplified Chinese), different standards (johab 
and KS X 1001 in Korean), different versions/variants (JIS X 0201 and JIS X 
0213 in Japanese).  It would be quite striking, actually one of them, gb18030, 
is a "superset" of the Unicode so far.


Teddy J Reedy wrotes:
> Perhaps there should be a separate test like the above to be sure that hz 
> really uses GB2312-80, as specified.

You're right.


By the way, my previous e-mail address <pe...@freebsd.org> isn't reachable 
anymore, please send to <hyes...@gmail.com> when you need.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue12057>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to