On 02/04/2011 05:49 AM, Itagaki Takahiro wrote:
Here is a demonstration to support jagged input files. It's a patch
on the latest patch. The new added API is:

   bool NextLineCopyFrom(
         [IN] CopyState cstate,
         [OUT] char ***fields, [OUT] int *nfields, [OUT] Oid *tupleOid)

It just returns separated fields in the next line. Fortunately, I need
no extra code for it because it is just extracted from NextCopyFrom().

Thanks, I'll have a look at it, after an emergency job I need to attend to. But the API looks weird. Why are fields and nfields OUT params. The issue isn't decomposing the line into raw fields. The code for doing that works fine as is, including on jagged files. See commit af1a614ec6d074fdea46de2e1c462f23fc7ddc6f which was done for exactly this purpose. The issue is taking those and composing them into the expected tuple.

I'm willing to include the change into copy APIs,
but we still have a few issues. See below.

On Fri, Feb 4, 2011 at 16:53, Andrew Dunstan<and...@dunslane.net>  wrote:
The problem with COPY FROM is that nobody's come up with a good syntax for
allowing it as a FROM target. Doing what I want via FDW neatly gets us
around that problem. But I'm quite OK with doing the hard work inside the
COPY code - that's what my working prototype does in fact.
I think it is not only syntax issue. I found an issue that we hard to
support FORCE_NOT_NULL option for extra fields. See FIXME in the patch.
It is a fundamental problem to support jagged fields.

It's not a problem at all if you turn the line into a text array. That's exactly why we've been proposing it for this. The array has however many elements are on the line.

One thing I'd like is to to have file_fdw do something we can't do another
way. currently it doesn't, so it's nice but uninteresting.
BTW, how do you determine which field is shifted in your broken CSV file?
For example, the case you find "AB,CD,EF" for 2 columns tables.
I could provide a raw CSV reader for jagged files, but you still have to
cook the returned fields into a proper tuple...


See above. My client who deals with this situation and has been doing so for years treats underflowing fields as null and ignores overflowing fields. They would do he same if the data were delivered with a text array. It works very well for them.


See <https://github.com/adunstan/postgresql-dev/tree/sqlmed2> for my dev branch on this.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to