Re: [HACKERS] exposing COPY API

Andrew Dunstan Fri, 04 Feb 2011 06:00:03 -0800


On 02/04/2011 05:49 AM, Itagaki Takahiro wrote:

Here is a demonstration to support jagged input files. It's a patch
on the latest patch. The new added API is:

   bool NextLineCopyFrom(
         [IN] CopyState cstate,
         [OUT] char ***fields, [OUT] int *nfields, [OUT] Oid *tupleOid)

It just returns separated fields in the next line. Fortunately, I need
no extra code for it because it is just extracted from NextCopyFrom().

Thanks, I'll have a look at it, after an emergency job I need to attendto. But the API looks weird. Why are fields and nfields OUT params. Theissue isn't decomposing the line into raw fields. The code for doingthat works fine as is, including on jagged files. See commitaf1a614ec6d074fdea46de2e1c462f23fc7ddc6f which was done for exactly thispurpose. The issue is taking those and composing them into the expectedtuple.

I'm willing to include the change into copy APIs,
but we still have a few issues. See below.

On Fri, Feb 4, 2011 at 16:53, Andrew Dunstan<[email protected]>  wrote:

The problem with COPY FROM is that nobody's come up with a good syntax for
allowing it as a FROM target. Doing what I want via FDW neatly gets us
around that problem. But I'm quite OK with doing the hard work inside the
COPY code - that's what my working prototype does in fact.

I think it is not only syntax issue. I found an issue that we hard to
support FORCE_NOT_NULL option for extra fields. See FIXME in the patch.
It is a fundamental problem to support jagged fields.

It's not a problem at all if you turn the line into a text array. That'sexactly why we've been proposing it for this. The array has however manyelements are on the line.

One thing I'd like is to to have file_fdw do something we can't do another
way. currently it doesn't, so it's nice but uninteresting.

BTW, how do you determine which field is shifted in your broken CSV file?
For example, the case you find "AB,CD,EF" for 2 columns tables.
I could provide a raw CSV reader for jagged files, but you still have to
cook the returned fields into a proper tuple...

See above. My client who deals with this situation and has been doing sofor years treats underflowing fields as null and ignores overflowingfields. They would do he same if the data were delivered with a textarray. It works very well for them.

See <https://github.com/adunstan/postgresql-dev/tree/sqlmed2> for my devbranch on this.



cheers

andrew



--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] exposing COPY API

Reply via email to