Something very strange is going on on my machine with UTF8: postgres=# show server_encoding; server_encoding ----------------- UTF8 (1 row)
postgres=# select length(convert_from(E'\343\203\251\343\202\244\343\202\273\343\203\263','utf8')); length -------- 8 (1 row) postgres=# select 'substring(s,'||i||',1)',convert_to(substring(s,i,1),'utf8') from (select convert_from(E'\343\203\251\343\202\244\343\202\273\343\203\263','utf8') as s)a, (select generate_series(1,8) as i)b; ?column? | convert_to ------------------+------------ substring(s,1,1) | \343 substring(s,2,1) | \203\251 substring(s,3,1) | \343 substring(s,4,1) | \202\244 substring(s,5,1) | \343 substring(s,6,1) | \202\273 substring(s,7,1) | \343 substring(s,8,1) | \203\263 (8 rows) I believe this is in fact only four katakana characters. (Namely U+30E9 U+30A4 U+30BB U+30F3) \343 is merely the first byte of each three-byte encoding for the individual characters. Dave doesn't see the same behaviour on this three machines, so I think it's something unique to my machine. Possibly not a Postgres bug at all but some kind of install gotcha. I'm running Debian unstable with glibc 2.6.1-4 so it is a bit bleeding edge. But as I understand it the utf8 decoding is all our code anyways so I can't quite figure out how it could be glibc's fault. Does anybody else see anything like this? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly