Hello Everyone, I very much understand why SJIS is not a server encoding. It contains ASCII second bytes (including \ and ' both of which can be really nasty inside a normal sql) and further, half-width katakana is represented as one byte-characters, incidentally two of which coincide with a kanji.
My question is, however: what would be the best practice if it was imperative to use SJIS encoding for texts and no built-in conversions are useful? To elaborate, I need to support japanese emoji characters, which are special emoticons for mobile phones. These characters are usually in a region that is not specified by the standard SJIS, therefore they are not properly converted either to EUC or UTF8 (which would be my prefered choice, but unfortunately not all mobile phones support it, so conversion is still necessary - from what i've seen, the new SJIS_2004 map seems to define these entities, but I'm not 100% sure they all get converted properly). I inherited a system in which this problem is "bypassed" by setting SQL_ASCII server encoding, but that is not the best solution (full text search is rendered useless and occasionally the special character issue rears its ugly head - not only do we have to deal with normal sqlinjection, but also encoding-based injections) (and for the real WTF, my predecessor converted everything to EUC before inserting - eventually losing all the emojis and creating all sorts of strange phenomena, like tables with one column in euc until a certain date and sjis from then on while euc for all other columns) Is there a way to properly deal with sjis+emoji extensions (a patch i'm not aware of, for example), is it considered as a todo for further releases or should i consider augmenting postgres in a way (if the latter, could you provide any pointers on how to proceed?) Thank you, Zaki -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane Sent: Monday, March 26, 2007 11:20 AM To: ITAGAKI Takahiro Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Server-side support of all encodings ITAGAKI Takahiro <[EMAIL PROTECTED]> writes:
PostgreSQL suppots SJIS, BIG5, GBK, UHC and GB18030 as client encodings, but we cannot use them as server encodings. Are there any reason for it?
Very much so --- they aren't safe ASCII-supersets, and thus for example the parser will fail on them. Backend encodings must have the property that all bytes of a multibyte character are >= 128. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match