[ This email to hackers from last night got lost so I am remailing.] Tom Lane wrote: > "John Hansen" <[EMAIL PROTECTED]> writes: > >> That is backpatched to 8.0.X. Does that not fix the problem reported? > > > No, as andrew said, what this patch does, is allow values > 0xffff and > > at the same time validates the input to make sure it's valid utf8. > > The impression I get is that most of the 'Unicode characters above > 0x10000' reports we've seen did not come from people who actually needed > more-than-16-bit Unicode codepoints, but from people who had screwed up > their encoding settings and were trying to tell the backend that Latin1 > was Unicode or some such. So I'm a bit worried that extending the > backend support to full 32-bit Unicode will do more to mask encoding > mistakes than it will do to create needed functionality. > > Not that I'm against adding the functionality. I'm just doubtful that > the reports we've seen really indicate that we need it, or that adding > it will cut down on the incidence of complaints :-(
OK, I got on the IRC server and talked to folks who actually understand this. They say there are Chinese who are reporting this problem, so I Googled and found this: http://www.yale.edu/chinesemac/pages/charset_encoding.html#Unicode See the paragraph with "Supplementary Ideographic Plane". You will see that paragraph says: The Supplementary Ideographic Plane (SIP) currently contains 42,711 additional characters in "CJK Unified Ideographs Extension B" (U+20000-2A6D6). The PDF chart for this is available at: http://www.unicode.org/charts/PDF/U20000.pdf I assume it is that U+20000-2A6D6 range that people are complaining about. So, we do have a bug, and we are probably going to need to fix it in 8.0.X. I apologize to people who reported this problem and I wasn't attentive to the seriousness of it. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])