[issue7856] cannot decode from or encode to big5 \xf9\xd8

2021-03-09 Thread STINNER Victor

STINNER Victor  added the comment:

> It looks like the F9D6–F9FE characters all come from the Big5-ETen extension

One option would be to add a new big5eten encoding to Python. Someone has to 
implement the code.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7856] cannot decode from or encode to big5 \xf9\xd8

2021-03-09 Thread Max Bolingbroke

Max Bolingbroke  added the comment:

As of Python 3.7.9 this also affects \xf9\xd6 which should be \u7881 in 
Unicode. This character is the second character of 宏碁 which is the name of the 
Taiwanese electronics manufacturer Acer.

You can work around the issue using big5hkscs just like with the original 
\xf9\xd8 problem.

It looks like the F9D6–F9FE characters all come from the Big5-ETen extension 
(https://en.wikipedia.org/wiki/Big5#ETEN_extensions, 
https://moztw.org/docs/big5/table/eten.txt) which is so popular that it is a 
defacto standard. Big5-2003 (mentioned in a comment below) seems to be an 
extension of Big5-ETen. For what it's worth, whatwg includes these mappings in 
their own big5 reference tables: https://encoding.spec.whatwg.org/big5.html. 

Unfortunately Big5 is still in common use in Taiwan. It's pretty funny that 
Python fails to decode Big5 documents containing the name of one of Taiwan's 
largest multinationals :-)

--
nosy: +batterseapower

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7856] cannot decode from or encode to big5 \xf9\xd8

2014-05-19 Thread Martin v . Löwis

Martin v. Löwis added the comment:

Inndy, you might also be talking about big5-2003, from

http://www.csie.ntu.edu.tw/~r92030/project/big5/

Python currently does not support big5-2003, but a contribution of such an 
encoding would surely be welcome.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7856] cannot decode from or encode to big5 \xf9\xd8

2014-05-19 Thread Martin v . Löwis

Martin v. Löwis added the comment:

I'm still looking for an official source of that.

>>> u"\u88cf".encode("big5hkscs")
'\xf9\xd8'

works fine (and always has been working fine), and the character clearly is in 
big5hkscs. According to 

http://en.wikipedia.org/wiki/Big5

F9D8 is "Reserved for user-defined characters", so this suggests that the 
character does *not* have a fixed meaning in BIG-5. However, it is part of the 
Hong Kong Supplementary Character Set.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7856] cannot decode from or encode to big5 \xf9\xd8

2014-05-19 Thread Inndy

Inndy added the comment:

I'm Taiwanese, F9D8 in big5 should be mapped to E8A38F in UTF-8.

--
nosy: +Inndy

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7856] cannot decode from or encode to big5 \xf9\xd8

2012-02-01 Thread Antoine Pitrou

Changes by Antoine Pitrou :


--
nosy: +haypo
versions: +Python 2.7, Python 3.2, Python 3.3 -Python 2.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7856] cannot decode from or encode to big5 \xf9\xd8

2012-01-31 Thread Kang-Hao (Kenny) Lu

Changes by Kang-Hao (Kenny) Lu :


--
nosy: +kennyluck

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7856] cannot decode from or encode to big5 \xf9\xd8

2010-02-05 Thread Roumen Petrov

Roumen Petrov  added the comment:

> That iconv supports it is not convincing, ...

GNU libc is not convincing . What you talking about ?

--
nosy: +rpetrov

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7856] cannot decode from or encode to big5 \xf9\xd8

2010-02-04 Thread Martin v . Löwis

Martin v. Löwis  added the comment:

perky, what do you think?

--
assignee:  -> hyeshik.chang
nosy: +hyeshik.chang

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7856] cannot decode from or encode to big5 \xf9\xd8

2010-02-04 Thread Xuefer x

Xuefer x  added the comment:

sure after enlighten by your url which is OBSOLETE
see: http://www.unicode.org/Public/MAPPINGS/EASTASIA/ReadMe.txt
i found http://unicode.org/charts/unihan.html
then http://www.unicode.org/Public/UNIDATA/
then http://www.unicode.org/Public/UNIDATA/Unihan.zip
in side the zip, open Unihan_OtherMappings.txt
big 5 includes
#   kBigFive
#   kHKSCS
which are listed in Unihan_OtherMappings.txt
HKSCS is one of the big-5 encoding
and i search for F9D8 got
U+88CF  kHKSCS  F9D8

you may also want to update other encoding map table to catch up with 
Unihan_OtherMappings.txt

thanks for your quick reply btw

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7856] cannot decode from or encode to big5 \xf9\xd8

2010-02-04 Thread Martin v . Löwis

Martin v. Löwis  added the comment:

In particular, the Unicode consortium mapping table, now at

http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/BIG5.TXT

doesn't map f9d8 to anything; the current version of that table (in unihan.zip) 
has these mappings for U+88CF:

U+88CF  kCCCII  232E61
U+88CF  kCNS1986E-444E
U+88CF  kCNS19923-444E
U+88CF  kEACC   215763
U+88CF  kGB13279
U+88CF  kHKSCS  F9D8
U+88CF  kJis0   4602
U+88CF  kKPS0   D9E0
U+88CF  kKSC0   5574
U+88CF  kTaiwanTelegraph5937
U+88CF  kXerox  241:102

As you can see, it isn't supported in big5.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7856] cannot decode from or encode to big5 \xf9\xd8

2010-02-04 Thread Martin v . Löwis

Martin v. Löwis  added the comment:

That iconv supports it is not convincing, IMO. Do you have other sources (like 
tables in the web somewhere) that support your request?

--
nosy: +loewis

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7856] cannot decode from or encode to big5 \xf9\xd8

2010-02-04 Thread Xuefer x

New submission from Xuefer x :

using iconv:
$ printf "\xf9\xd8" | iconv -f big5 -t utf-8 | xxd
000: e8a3 8f  ...
$ printf "\xe8\xa3\x8f" | iconv -f utf-8 -t big5 | xxd
000: f9d8 ..

using python
>>> print "\xf9\xd8".decode("big5")
Traceback (most recent call last):
  File "", line 1, in 
UnicodeDecodeError: 'big5' codec can't decode bytes in position 0-1: illegal 
multibyte sequence
>>> print "\xe8\xa3\x8f".decode("utf-8").encode("big5")
Traceback (most recent call last):
  File "", line 1, in 
UnicodeEncodeError: 'big5' codec can't encode character u'\u88cf' in position 
0: illegal multibyte sequence

--
components: Unicode
messages: 98865
nosy: Xuefer.x
severity: normal
status: open
title: cannot decode from or encode to big5 \xf9\xd8
type: behavior
versions: Python 2.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com