> The EUC family has direct encoding of 7-bit ASCII and then 3 > selectable character sets represented by sequences with the high bit > set, with details varying between the Chinese (simplified Chinese), > Taiwanese (traditional Chinese), Japanese (2 kinds) and Korean > variants. I don't know if the pg_wchar encoding we're producing in > pg_euc*2wchar_with_len() has a name, but it doesn't appear to match > the description of the standard "fixed" representation on the > Wikipedia page for Extended Unix Code (it's too wide for starters, > looking at the shift distances).
Yes. pg_euc*2wchar_with_len() creates "variable length" representation of EUC, 1 byte to 4 bytes range per character. Then, expands each character into pg_wchar. Also it can be converted back to the multibyte representation easily. Note that the standard "fixed" representation of EUC includes ASCII range bytes in *non* ASCII characters, thus I think it is not easy to use for backend safe encoding. Best regards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
