Re: [BUGS] ERROR: character 0xe3809c of encoding UTF8 has no equivalent in EUC_JP
Hi, We have a customer in Japan who would be interested in this fix, in the future. Would you like me to enter it as an official Postgres bug? Sincerely, Kasia -Original Message- From: Tatsuo Ishii [mailto:is...@postgresql.org] Sent: Tuesday, March 22, 2011 10:17 PM To: itagaki.takah...@gmail.come Cc: Kasia Tuszynska; pgsql-bugs@postgresql.org Subject: Re: [BUGS] ERROR: character 0xe3809c of encoding UTF8 has no equivalent in EUC_JP Agreed if the encoding is added as an user-defined encoding. I don't want to add built-in encodings only for Japanese language any more. I do not agree here. Adding one more encoding/conversion is not big deal. Anyway these soltions would come to be real after one or two releases at the earliest. The realistic solution available today is replacing default conversion for EUC-JP and UTF-8. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] ERROR: character 0xe3809c of encoding UTF8 has no equivalent in EUC_JP
On Fri, Mar 25, 2011 at 03:33, Kasia Tuszynska ktuszyn...@esri.com wrote: We have a customer in Japan who would be interested in this fix, in the future. Would you like me to enter it as an official Postgres bug? Not a bug at all -- there are at least 3 versions of EUCJP encodings, and postgres just supports one of them. I think it won't be changed in the near term. So, you would need to define a CONVERSION for your purpose as of now. However, I think we could have an extension of conversion procedure set for Japanese confused encodings out of the core. -- Itagaki Takahiro -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] ERROR: character 0xe3809c of encoding UTF8 has no equivalent in EUC_JP
We have a customer in Japan who would be interested in this fix, in the future. Would you like me to enter it as an official Postgres bug? Sincerely, As I stated before, I don't regard this as a bug. BTW I wonder why you don't use CREATE CONVERSION which can be used for customer's problem today... -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] ERROR: character 0xe3809c of encoding UTF8 has no equivalent in EUC_JP
Hi, I was wondering if this was considered a bug, and if so what were the plans to fix it: http://archives.postgresql.org/pgsql-bugs/2005-08/msg00211.php I searched the: pgsql-bug archive and found nothing I also searched the wiki to do list and found nothing But I could have missed it. I don't consider it's a bug. We maps WAVE DASH of EUC-JP (0xa1c1) to U+FF5E, not U+301C. U+FF5E and U+301C look same, but there are different code point by some reason I don't know. On the other hand EUC-JP has only one code point for WAVE DASH. So if we want to do a round trip conversion between EUC-JP and UTF-8, we have to choose either U+FF5E OR U+301C. We have chosen U+FF5E. If we change the mapping, many existing applications would break. Same thing can be said to MINUS sign. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] ERROR: character 0xe3809c of encoding UTF8 has no equivalent in EUC_JP
On Wed, Mar 23, 2011 at 08:05, Kasia Tuszynska ktuszyn...@esri.com wrote: I was wondering if this was considered a bug, and if so what were the plans to fix it: http://archives.postgresql.org/pgsql-bugs/2005-08/msg00211.php The wave dash issue is not postgres-specific; some other converter just replace it with '?'. Instead, postgres throws an error. I guess there is no possibility to support ambiguous character mappings in the default conversions, but you can define more relaxed conversion procedures for your purpose. BTW, we cannot use non-default conversion procedures from SQL commands, right? If it were allowed, we can use some relaxed conversions on the initial loading, like this: =# SET character_conversion TO utf8_to_eucjp_relaxed; =# COPY tbl FROM '/file_with_wave_dashes.utf8.tsv'; =# RESET character_conversion; Another idea is to allow to create new encoding names and define the above conversion procs as the default: =# CREATE ENCODING eucjp_relaxed; =# CREATE DEFAULT CONVERSION xxx FOR utf8 TO eucjp_relaxed FROM utf8_to_eucjp_relaxed; I think overhaul of conversion support is a TODO item. -- Itagaki Takahiro -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] ERROR: character 0xe3809c of encoding UTF8 has no equivalent in EUC_JP
On Wed, Mar 23, 2011 at 10:58, Tatsuo Ishii is...@postgresql.org wrote: So if we want to do a round trip conversion between EUC-JP and UTF-8, we have to choose either U+FF5E OR U+301C. We have chosen U+FF5E. If we change the mapping, many existing applications would break. I heard a request a few times for an additional one-directional conversion from U+301C to EUC-JP (0xa1c1). It should not break existing applications. We already have non-round trip conversions for IBM and NEC extended characters in SJIS. The policy seems not so strict for me. Anyway, we might need to revisit the area in the near term for unicode Emoji issue. -- Itagaki Takahiro -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] ERROR: character 0xe3809c of encoding UTF8 has no equivalent in EUC_JP
So if we want to do a round trip conversion between EUC-JP and UTF-8, we have to choose either U+FF5E OR U+301C. We have chosen U+FF5E. If we change the mapping, many existing applications would break. I heard a request a few times for an additional one-directional conversion from U+301C to EUC-JP (0xa1c1). It should not break existing applications. We already have non-round trip conversions for IBM and NEC extended characters in SJIS. The policy seems not so strict for me. Doesn't breaking round-trip conversion between EUC-JP and UTF-8 itself break backward compatibility? I think what we can do best here is, adding new encoding and default conversion. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] ERROR: character 0xe3809c of encoding UTF8 has no equivalent in EUC_JP
On Wed, Mar 23, 2011 at 13:02, Tatsuo Ishii is...@postgresql.org wrote: I think what we can do best here is, adding new encoding and default conversion. Agreed if the encoding is added as an user-defined encoding. I don't want to add built-in encodings only for Japanese language any more. -- Itagaki Takahiro -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] ERROR: character 0xe3809c of encoding UTF8 has no equivalent in EUC_JP
Agreed if the encoding is added as an user-defined encoding. I don't want to add built-in encodings only for Japanese language any more. I do not agree here. Adding one more encoding/conversion is not big deal. Anyway these soltions would come to be real after one or two releases at the earliest. The realistic solution available today is replacing default conversion for EUC-JP and UTF-8. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs