On Nov 12, 2004, at 12:20 AM, Tom Lane wrote:

Patrick B Kelly <[EMAIL PROTECTED]> writes:
I may not be explaining myself well or I may fundamentally
misunderstand how copy works.

Well, you're definitely ignoring the character-set-conversion issue.


I was not trying to ignore the character set and encoding issues but perhaps my assumptions are naive or overly optimistic. I realized that quotes are not as consistent as the NL characters but I was assuming that some encodings would escape to ASCII or a similar encoding like JIS Roman that would simplify recognition of the quote character. Unicode files make recognizing other punctuation like the quote fairly straightforward and to the naive observer, the code in CopyReadLine as it is currently written appears to handle multi-byte encodings such as SJIS that may present characters below 127 in trailing bytes.


As I said, perhaps I was oversimplifying. Is there a regression test set of input files for that I could review to see all of the supported encodings?


Patrick B. Kelly ------------------------------------------------------ http://patrickbkelly.org


---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly

Reply via email to