Luke Lonergan wrote: > Bruce, > > Is there a good source of multi-byte copy data test cases? What is > currently done to test the trans-coding support? (where client and server > encodings are different) > > I notice that the regression data in the CVS version of postgres does not > seem to include cases other than the ASCII data, is there another source of > data/cases we're missing? > > Also - Alon's looking into this, but it would appear that the presumption on > EOL for two-byte encodings is 0x0a+0xNN, where 0x0a is followed by any byte. > Similar for other current control characters (escape, delimiter). Is there > a definition of format and semantics for COPY with 2-byte encodings we > should look at? > > I've looked at the code and the docs like sql-copy.html and the question is > relevant because of the following case: > if newline were defined as 0x0a+0x00 as opposed to 0x0a+0xNN where N is > arbitrary, we could parse using 16-bit logic. > however > if newline were defined as 0x0a+0xNN, we must use byte-wise parsing
We have two and three-byte encodings, so 16-bit seems like it wouldn't work. I am not aware of any specs except the C code itself. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster