Re: [HACKERS] multiline CSV fields

Patrick B Kelly Thu, 11 Nov 2004 18:41:10 -0800


On Nov 11, 2004, at 6:16 PM, Tom Lane wrote:

Patrick B Kelly <[EMAIL PROTECTED]> writes:
What about just coding a FSM into backend/commands/copy.c:CopyReadLine() that does not process any flavor of NL characters when it is inside of a data field?
CopyReadLine has no business tracking that.  One reason why not is that
it is dealing with data not yet converted out of the client's encoding,
which makes matching to user-specified quote/escape characters
difficult.
                        regards, tom lane
---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings

I appreciate what you are saying about the encoding and you are, of course, right but CopyReadLine is already processing the NL characters and it is doing it without considering the context in which they appear. Unfortunately, the same character(s) are used for two different purposes in the files in question. Without considering whether they appear inside or outside of data fields, CopyReadline will mistake one for the other and cannot correctly do what it is already trying to do which is break the input file into lines.

My suggestion is to simply have CopyReadLine recognize these two states (in-field and out-of-field) and execute the current logic only while in the second state. It would not be too hard but as you mentioned it is non-trivial.


Patrick B. Kelly
------------------------------------------------------
                              http://patrickbkelly.org


---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] multiline CSV fields

Reply via email to