[issue5640] Wrong print() result when unicode error handler is not 'strict'

2009-04-02 Thread Hye-Shik Chang
Hye-Shik Chang hyes...@gmail.com added the comment: Right. Here I upload a patch to fix the addressed problem on cjkcodecs. Please test whether the patch corrects the behavior. -- keywords: +patch Added file: http://bugs.python.org/file13572/cjkcodecs-fix-statefulenc.diff

[issue5640] Wrong print() result when unicode error handler is not 'strict'

2009-04-02 Thread Hye-Shik Chang
Hye-Shik Chang hyes...@gmail.com added the comment: Sorry. I just found that the fix breaks few other test units. I'll check. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5640

[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2009-03-17 Thread Hye-Shik Chang
Hye-Shik Chang hyes...@gmail.com added the comment: When I asked Taiwanese developers how often they use these character sets, it appeared that they are almost useless in the usual computing environment in Taiwan. This will only serve for a historical compatibility and literal standard

[issue3594] PyTokenizer_FindEncoding() never succeeds

2008-09-03 Thread Hye-Shik Chang
Hye-Shik Chang [EMAIL PROTECTED] added the comment: pitrou, that's because Python source code can't be correctly tokenized when it's encoded in few odd encodings like iso-2022 or shift-jis which utilizes \, (, ) and as second byte of two-byte character sequence. For example, '\x81

[issue1276] LookupError: unknown encoding: X-MAC-JAPANESE

2008-08-23 Thread Hye-Shik Chang
Hye-Shik Chang [EMAIL PROTECTED] added the comment: Committed patch cjkmactemporary.diff as r65988 in the py3k branch. I'll open another issue for cjkcodecs implementation of Mac codecs. -- resolution: - fixed status: open - closed ___ Python

[issue1276] LookupError: unknown encoding: X-MAC-JAPANESE

2008-08-20 Thread Hye-Shik Chang
Changes by Hye-Shik Chang [EMAIL PROTECTED]: Added file: http://bugs.python.org/file11170/cjkmactemporary.diff ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue1276

[issue1276] LookupError: unknown encoding: X-MAC-JAPANESE

2008-06-26 Thread Hye-Shik Chang
Hye-Shik Chang [EMAIL PROTECTED] added the comment: Added a patch that implements codecs for CJK Macintosh encodings. I tried to implement that just alike the other existing CJK codecs, but it required many inefficient mapping tables due to their odd mappings (like this: u'ABCDE' - 'ab

[issue1276] LookupError: unknown encoding: X-MAC-JAPANESE

2008-06-26 Thread Hye-Shik Chang
Changes by Hye-Shik Chang [EMAIL PROTECTED]: Added file: http://bugs.python.org/file10749/maccjkcodecs-1-py3k.diff ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue1276

[issue1276] LookupError: unknown encoding: X-MAC-JAPANESE

2008-02-24 Thread Hye-Shik Chang
Hye-Shik Chang added the comment: I'll take this. -- assignee: lemburg - hyeshik.chang nosy: +hyeshik.chang __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1276

[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2008-02-14 Thread Hye-Shik Chang
Hye-Shik Chang added the comment: I have generated compressed mapping tables by several ways. I extracted mapping data into individual files and reorganized them by translating into Python source code or archiving into a zip file. The following table shows the result: (in kilobytes) (also

[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2008-02-14 Thread Hye-Shik Chang
Hye-Shik Chang added the comment: I couldn't find an appropriate method to implement in situ compressed mapping table. AFAIK, python has the smallest mapping table footprint for each charset among major open source transcoding programs. I have thought about the compression many times

[issue2066] Adding new CNS11643 support, a *huge* charset, in cjkcodecs

2008-02-11 Thread Hye-Shik Chang
New submission from Hye-Shik Chang: This patch adds CNS11643 support into Python unicode codecs. CNS11643 is a huge character which is used in EUC-TW and ISO-2022-CN. CJKCodecs have had the CNS11643 support for 4 years at least, but I dropped it because of its huge size in integrating

[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2008-02-11 Thread Hye-Shik Chang
Changes by Hye-Shik Chang: -- title: Adding new CNS11643 support, a *huge* charset, in cjkcodecs - Adding new CNS11643, a *huge* charset, support in cjkcodecs __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue2066

[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2008-02-11 Thread Hye-Shik Chang
Hye-Shik Chang added the comment: I've generated the mapping table from ICU's CNS11643-1992 mapping. I see that CNS11643 is quite rarely used in the internet, but it's the only national standard character set in Taiwan. Asking Taiwanese python users, even they didn't think that it's necessary

[issue1037] Ill-coded identifier crashes python when coding spec is utf-8

2007-08-27 Thread Hye-Shik Chang
New submission from Hye-Shik Chang: Illegal identifier makes python crash on UTF-8 source codes/interpreters. Python 3.0x (py3k:57555M, Aug 27 2007, 21:23:47) [GCC 3.4.6 [FreeBSD] 20060305] on freebsd6 compile(b'#coding:utf-8\n\xfc', '', 'exec') zsh: segmentation fault (core dumped) ./python