[issue23050] Add Japanese legacy encodings
Tetsuya Morimoto added the comment: These character encodings are legacy, but are still used. Do you have an idea of how many users still have documents stored or exchanged using these encodings? Hmm, I guess iso-2022-jp codec is still default charset of MUA (Mail User Agent) on Japanese Windows platform. But I'm not sure how many so I'll investigate, wait a few days. The patch is not trivial, the legacy japanese codecs are complex and so error prone :-/ Ya, this patch has some refactoring. However, existing tests have passed and adding encoding codecs wouldn't affect other codecs basically. Why do you think it's error plone? For previous requests to add new codecs, we closed issues as wontfix and we suggested to share the codecs at the Python Cheeseshop (PyPI). Here it's more complex because C code is modified to implement the new encodings. Could you show me previous requests? I can understand C code modifying is higher cost to review. However, we have codec tests and it wouldn't affect other codecs, I think. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23050 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23050] Add Japanese legacy encodings
Tetsuya Morimoto added the comment: By error prone, it mean that it's easy to introduce a bug or a regression, since the code is complex and almost nobody maintains it. Indeed. Actually, I encountered some faults when I migrated original patch. The character encoding is a kind of specialty area. This patch is written by Masayuki Moriyama, who is an expert of character encoding and he have been contributed to various communities for a long time. Also, he helps me to migrate original patch(for Python 2.4.3) to Python 3.5. You can see commit log he fixed some bugs. https://bitbucket.org/t2y/cpython/commits/all I'm not stongly opposed to any change. I'm just trying to understand the context. Thanks. I'll help it by explaining the context. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23050 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23050] Add Japanese legacy encodings
Tetsuya Morimoto added the comment: Another traditional issue with Japanese codecs is that people have different opinions on what the encoding should do. It may be that when we release the codec, somebody comes up and says that the codec is incorrect, and it should do something different for some code points, citing some other applications which he considers right. In particular for the Microsoft ones, people may claim that some version of Windows did things differently. In regard to e-mail encoding, Japanese should use utf-8, then it resolves most problems. However, for historical reason or compatibility reason, it's different even today. I don't think these legacy codecs are needed for individual application, but we sometimes encounter an encoding issue when an application collaborates to external system like e-mail. Now, for this set, the ones that got registered with IANA sound ok (in the sense that it is our bug if they fail to conform to the IANA spec, and IANA's fault if they fail to do what users expect). For the other ones, I wonder whether there is some official source that can be consulted for correctness. Exactly. Now, I'm finding euc-jp-ms and iso-2022-jp-ms spec in English. Of course, there's a voluntary document in Japanese as follows. http://www.wdic.org/w/WDIC/eucJP-ms http://www.wdic.org/w/WDIC/ISO-2022-JP-MS I may agree with dropping character encoding which is difficult to find official source. On a different note: why do you claim that the code is written by Perky? (it's not you, is it?) Right! Because the credit belongs to him. I'm an assistant. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23050 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23050] Add Japanese legacy encodings
New submission from Tetsuya Morimoto: This patch adds Japanese legacy encodings as below. https://bitbucket.org/t2y/cpython/branches/compare/japanese-legacy-encoding..default * eucjp_ms (euc-jp compatible with cp932) * iso2022_jp_ms (yet another iso-2022-jp compatible with cp932, similar to cp50220) * cp50220 (http://www.iana.org/assignments/charset-reg/CP50220) * cp50221 (a variant of cp50220) * cp50222 (a variant of cp50220) * cp51932 (http://www.iana.org/assignments/charset-reg/CP51932) Originally, these character encodings patch was created as result in IPA project in 2005, by Masayuki Moriyama. The result was contributed to several community: libiconv, glibc, perl, PHP, Ruby, PostgreSQL, MySQL, nkf. He had made a patch for Python 2.4.3 at that time, but somehow, no one worked to integrate. That's a crying shame. These character encodings are legacy, but are still used. Lots of end-user don't care the character encoding. Unfortunately, for historical reason, e-mails are encoded with these legacy encodings on Japanese Windows platform. Actually, my customer recently reported about Mojibake since its e-mail data would be encoded with cp50220 (iso-2022-jp-ms). References: * About IPA: http://www.ipa.go.jp/english/about/summary.html * Mojibake: http://en.wikipedia.org/wiki/Mojibake * Java encoding names: http://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html References in Japanese: * Japanese Legacy Encoding Project: http://legacy-encoding.sourceforge.jp/wiki/ * Project details: http://www.ipa.go.jp/about/jigyoseika/05fy-pro/open/2005-1467d.pdf -- components: Library (Lib) files: add-japanese-legacy-encoding1.patch hgrepos: 285 keywords: patch messages: 232638 nosy: ishimoto, naoki, t2y priority: normal severity: normal status: open title: Add Japanese legacy encodings type: enhancement versions: Python 3.5 Added file: http://bugs.python.org/file37447/add-japanese-legacy-encoding1.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23050 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23050] Add Japanese legacy encodings
Tetsuya Morimoto added the comment: On Mon, Dec 15, 2014 at 1:04 AM, R. David Murray rep...@bugs.python.org wrote: In emails these are labeled as, say, iso-2022-jp-ms? No. These are labeled just 'iso-2022-jp' and we (japanese) choose proper charset encoding to decode the encoded text. You can see several variants of iso-2022-jp. Yes, that's a very strange, but it's a historical reason. http://en.wikipedia.org/wiki/ISO/IEC_2022#ISO.2FIEC_2022_character_sets See also issue 8898 with regards to email encodings. Therefore, this is different issue. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23050 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14002] distutils2 fails to install a package from PyPI on Python 2.7.2
Tetsuya Morimoto tetsuya.morim...@gmail.com added the comment: I can reproduce it on Mac OS X. I made a patch which checks the func_name attribute of function before it refers. It works for me. However, I wonder if a function has both func.im_self and func.func_name? Tell me the background because I'm newbie for distutils2. -- keywords: +patch nosy: +t2y Added file: http://bugs.python.org/file24801/distutils2_pypi_wrapper.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14002 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14002] AttributeError in distutils2.pypi.wrapper
Changes by Tetsuya Morimoto tetsuya.morim...@gmail.com: Removed file: http://bugs.python.org/file24801/distutils2_pypi_wrapper.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14002 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com