Re: [HACKERS] exposing COPY API

2011-02-09 Thread Shigeru HANADA
On Tue, 8 Feb 2011 08:49:36 -0500
Robert Haas robertmh...@gmail.com wrote:
 On Tue, Feb 8, 2011 at 4:42 AM, Shigeru HANADA
 han...@metrosystems.co.jp wrote:
  I'll submit revised file_fdw patch after removing IsForeignTable()
  catalog lookup along Heikki's proposal.
 
 So I'm a bit confused.  I don't see the actual copy API change patch
 anywhere here.  Are we close to getting something committed there?

I'm sorry but I might have missed your point...

I replied here to answer to Itagaki-san's mention about typos in
file_fdw patch.

Or, would you mean that file_fdw should not depend on copy API change
patch?

Regards,
--
Shigeru Hanada



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing COPY API

2011-02-09 Thread Robert Haas
On Wed, Feb 9, 2011 at 7:38 AM, Shigeru HANADA
han...@metrosystems.co.jp wrote:
 On Tue, 8 Feb 2011 08:49:36 -0500
 Robert Haas robertmh...@gmail.com wrote:
 On Tue, Feb 8, 2011 at 4:42 AM, Shigeru HANADA
 han...@metrosystems.co.jp wrote:
  I'll submit revised file_fdw patch after removing IsForeignTable()
  catalog lookup along Heikki's proposal.

 So I'm a bit confused.  I don't see the actual copy API change patch
 anywhere here.  Are we close to getting something committed there?

 I'm sorry but I might have missed your point...

 I replied here to answer to Itagaki-san's mention about typos in
 file_fdw patch.

 Or, would you mean that file_fdw should not depend on copy API change
 patch?

I mean that this thread is entitled exposing copy API, and I'm
wondering when and if the patch to expose the COPY API is going to be
committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing COPY API

2011-02-09 Thread Andrew Dunstan



On 02/09/2011 12:26 PM, Robert Haas wrote:

On Wed, Feb 9, 2011 at 7:38 AM, Shigeru HANADA
han...@metrosystems.co.jp  wrote:

On Tue, 8 Feb 2011 08:49:36 -0500
Robert Haasrobertmh...@gmail.com  wrote:

On Tue, Feb 8, 2011 at 4:42 AM, Shigeru HANADA
han...@metrosystems.co.jp  wrote:

I'll submit revised file_fdw patch after removing IsForeignTable()
catalog lookup along Heikki's proposal.

So I'm a bit confused.  I don't see the actual copy API change patch
anywhere here.  Are we close to getting something committed there?

I'm sorry but I might have missed your point...

I replied here to answer to Itagaki-san's mention about typos in
file_fdw patch.

Or, would you mean that file_fdw should not depend on copy API change
patch?

I mean that this thread is entitled exposing copy API, and I'm
wondering when and if the patch to expose the COPY API is going to be
committed.



Itagaki-san published a patch for this about about 12 hours ago in the 
file_fdw thread that looks pretty committable to me.


This whole API thing is a breakout from file_fdw, because the original 
file_fdw submission copied huge chunks of copy.c instead of trying to 
leverage it.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing COPY API

2011-02-09 Thread Robert Haas
On Wed, Feb 9, 2011 at 12:45 PM, Andrew Dunstan and...@dunslane.net wrote:
 Itagaki-san published a patch for this about about 12 hours ago in the
 file_fdw thread that looks pretty committable to me.

OK, excellent.

 This whole API thing is a breakout from file_fdw, because the original
 file_fdw submission copied huge chunks of copy.c instead of trying to
 leverage it.

Yeah, I remembered that, I just got mixed up because the two patches
were on the same thread, and the one that is the topic of this thread
was posted elsewhere.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing COPY API

2011-02-08 Thread Itagaki Takahiro
On Tue, Feb 8, 2011 at 09:38, Andrew Dunstan and...@dunslane.net wrote:
 Here is a patch against the latest revision of file_fdw to exercise this
 API. It includes some regression tests, and I think apart from one or two
 small details plus a requirement for documentation, is complete.

The patch contains a few fixes for typo in the original patch.
Hanada-san, could you take them into the core file_fdw patch?

   CREATE FOREIGN TABLE jagged_text (
        t   text[]
   ) SERVER file_server
   OPTIONS (format 'csv', filename
   '/home/andrew/pgl/pg_head/contrib/file_fdw/data/jagged.csv', header
   'true', textarray 'true');

There might be another approach -- we could have jagged_file_fdw aside
from file_fdw, because we cannot support some features in textarray mode
like force_not_null option and multiple non-text[] columns.

I'll include NextCopyFromRawFields() in COPY API patch to complete
raw_fields support in CopyState. (Or, we should also revert changes
related to raw_fields.)  However, we'd better postpone jagged csv
support to 9.2. The design is still under discussion.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing COPY API

2011-02-08 Thread Andrew Dunstan



On 02/08/2011 03:49 AM, Itagaki Takahiro wrote:

On Tue, Feb 8, 2011 at 09:38, Andrew Dunstanand...@dunslane.net  wrote:

Here is a patch against the latest revision of file_fdw to exercise this
API. It includes some regression tests, and I think apart from one or two
small details plus a requirement for documentation, is complete.

The patch contains a few fixes for typo in the original patch.
Hanada-san, could you take them into the core file_fdw patch?


   CREATE FOREIGN TABLE jagged_text (
t   text[]
   ) SERVER file_server
   OPTIONS (format 'csv', filename
   '/home/andrew/pgl/pg_head/contrib/file_fdw/data/jagged.csv', header
   'true', textarray 'true');

There might be another approach -- we could have jagged_file_fdw aside
from file_fdw, because we cannot support some features in textarray mode
like force_not_null option and multiple non-text[] columns.

I'll include NextCopyFromRawFields() in COPY API patch to complete
raw_fields support in CopyState. (Or, we should also revert changes
related to raw_fields.)  However, we'd better postpone jagged csv
support to 9.2. The design is still under discussion.



Please do include NextCopyFromRawFields(). I think the API is very 
limiting without it, but very flexible with it.


I also think that it's better to have contrib examples of the use of an 
API than not.


FORCE NOT NULL is much more of an issue for the *non* raw fields case 
than the reverse. In the raw fields case the caller can manage it 
themselves.


Multiple non-text[] columns strikes me as a red herring. This isn't the 
only possible use of NextCopyFromRawFields(), but it is one significant 
(and very useful as it stands) use.


cheers

andrew





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing COPY API

2011-02-08 Thread Shigeru HANADA
On Tue, 8 Feb 2011 17:49:09 +0900
Itagaki Takahiro itagaki.takah...@gmail.com wrote:

 On Tue, Feb 8, 2011 at 09:38, Andrew Dunstan and...@dunslane.net wrote:
  Here is a patch against the latest revision of file_fdw to exercise this
  API. It includes some regression tests, and I think apart from one or two
  small details plus a requirement for documentation, is complete.
 
 The patch contains a few fixes for typo in the original patch.
 Hanada-san, could you take them into the core file_fdw patch?

Thanks, I've applied to my local repo.

s/Centinel/Sentinel/
s/Vaidate/Validate/
s/reaind/reading/

Recent file_fdw is available here:
http://git.postgresql.org/gitweb?p=users/hanada/postgres.git;a=summary

I'll submit revised file_fdw patch after removing IsForeignTable()
catalog lookup along Heikki's proposal.

Regards,
--
Shigeru Hanada



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing COPY API

2011-02-08 Thread Robert Haas
On Tue, Feb 8, 2011 at 4:42 AM, Shigeru HANADA
han...@metrosystems.co.jp wrote:
 On Tue, 8 Feb 2011 17:49:09 +0900
 Itagaki Takahiro itagaki.takah...@gmail.com wrote:

 On Tue, Feb 8, 2011 at 09:38, Andrew Dunstan and...@dunslane.net wrote:
  Here is a patch against the latest revision of file_fdw to exercise this
  API. It includes some regression tests, and I think apart from one or two
  small details plus a requirement for documentation, is complete.

 The patch contains a few fixes for typo in the original patch.
 Hanada-san, could you take them into the core file_fdw patch?

 Thanks, I've applied to my local repo.

 s/Centinel/Sentinel/
 s/Vaidate/Validate/
 s/reaind/reading/

 Recent file_fdw is available here:
 http://git.postgresql.org/gitweb?p=users/hanada/postgres.git;a=summary

 I'll submit revised file_fdw patch after removing IsForeignTable()
 catalog lookup along Heikki's proposal.

So I'm a bit confused.  I don't see the actual copy API change patch
anywhere here.  Are we close to getting something committed there?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing COPY API

2011-02-07 Thread Andrew Dunstan



On 02/04/2011 05:49 AM, Itagaki Takahiro wrote:

Here is a demonstration to support jagged input files. It's a patch
on the latest patch. The new added API is:

   bool NextLineCopyFrom(
 [IN] CopyState cstate,
 [OUT] char ***fields, [OUT] int *nfields, [OUT] Oid *tupleOid)

It just returns separated fields in the next line. Fortunately, I need
no extra code for it because it is just extracted from NextCopyFrom().

I'm willing to include the change into copy APIs,
but we still have a few issues. See below.



I've looked at this, and I think it will do what I want. I haven't had 
time to play with it, although I hope to soon.  AIUI, it basically hands 
back the raw parsed strings to the user, who then has the responsibility 
of constructing Datum and Nulls arrays to form the tuple.  That should 
be all I need. So +1 from me for including it. In fact, +10. And many 
thanks.



I think we need a better name though. NextCopyFromRawFields maybe.


On Fri, Feb 4, 2011 at 16:53, Andrew Dunstanand...@dunslane.net  wrote:

The problem with COPY FROM is that nobody's come up with a good syntax for
allowing it as a FROM target. Doing what I want via FDW neatly gets us
around that problem. But I'm quite OK with doing the hard work inside the
COPY code - that's what my working prototype does in fact.

I think it is not only syntax issue. I found an issue that we hard to
support FORCE_NOT_NULL option for extra fields. See FIXME in the patch.
It is a fundamental problem to support jagged fields.


I don't think we need to worry about it. The caller will have access to 
the raw strings so they can handle it. In fact, I'd take out that bit of 
code in NextCopyLine_From, and replace it with a comment about how it's 
the caller's responsibility to handle.



One thing I'd like is to to have file_fdw do something we can't do another
way. currently it doesn't, so it's nice but uninteresting.

BTW, how do you determine which field is shifted in your broken CSV file?
For example, the case you find AB,CD,EF for 2 columns tables.
I could provide a raw CSV reader for jagged files, but you still have to
cook the returned fields into a proper tuple...




I answered this previously, but in the case of a text array it won't 
even arise - the array will have however many fields have been read.



cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing COPY API

2011-02-07 Thread Andrew Dunstan



On 02/07/2011 11:34 AM, Andrew Dunstan wrote:



On 02/04/2011 05:49 AM, Itagaki Takahiro wrote:

Here is a demonstration to support jagged input files. It's a patch
on the latest patch. The new added API is:

   bool NextLineCopyFrom(
 [IN] CopyState cstate,
 [OUT] char ***fields, [OUT] int *nfields, [OUT] Oid *tupleOid)

It just returns separated fields in the next line. Fortunately, I need
no extra code for it because it is just extracted from NextCopyFrom().

I'm willing to include the change into copy APIs,
but we still have a few issues. See below.



I've looked at this, and I think it will do what I want. I haven't had 
time to play with it, although I hope to soon.  AIUI, it basically 
hands back the raw parsed strings to the user, who then has the 
responsibility of constructing Datum and Nulls arrays to form the 
tuple.  That should be all I need. So +1 from me for including it. In 
fact, +10. And many thanks.



I think we need a better name though. NextCopyFromRawFields maybe.



Here is a patch against the latest revision of file_fdw to exercise this 
API. It includes some regression tests, and I think apart from one or 
two small details plus a requirement for documentation, is complete.


This work is also published at 
https://github.com/adunstan/postgresql-dev/tree/sqlmed3


Here's an excerpt from the regression tests:

   CREATE FOREIGN TABLE jagged_text (
t   text[]
   ) SERVER file_server
   OPTIONS (format 'csv', filename
   '/home/andrew/pgl/pg_head/contrib/file_fdw/data/jagged.csv', header
   'true', textarray 'true');
   SELECT * FROM jagged_text;
 t
   
 {one field}
 {two,fields}
 {three,NULL,fields of which one is null}
 {next line has no fields}
 {}
   (5 rows)

   SELECT t[3] AS a, t[1] AS b, t[99] AS c  FROM jagged_text;
  a  |b| c
   -+-+---
 | one field   |
 | two |
 fields of which one is null | three   |
 | next line has no fields |
 | |
   (5 rows)

Overall, this API works quite nicely, and does exactly what I want, so 
again many thanks.


cheers

andrew


*** /dev/null
--- b/contrib/file_fdw/data/jagged.csv
***
*** 0 
--- 1,6 
+ header for a file with varying numbers of fields
+ one field
+ two,fields
+ three,,fields of which one is null
+ next line has no fields
+ 
*** a/contrib/file_fdw/file_fdw.c
--- b/contrib/file_fdw/file_fdw.c
***
*** 13,18 
--- 13,22 
  
  #include postgres.h
  
+ #include sys/types.h
+ #include sys/stat.h
+ #include unistd.h
+ 
  #include access/reloptions.h
  #include catalog/pg_foreign_table.h
  #include catalog/pg_foreign_server.h
***
*** 26,31 
--- 30,36 
  #include optimizer/cost.h
  #include parser/parsetree.h
  #include storage/fd.h
+ #include utils/array.h
  #include utils/builtins.h
  
  PG_MODULE_MAGIC;
***
*** 61,78  static struct FileFdwOption valid_options[] = {
  
  	/* FIXME: implement force_not_null option */
  
! 	/* Centinel */
  	{ NULL,			InvalidOid }
  };
  
  /*
   * FDW-specific information for FdwExecutionState.
   */
  typedef struct FileFdwPrivate {
  	char		   *filename;
  	Relation		rel;		/* scan target relation */
! 	CopyState		cstate;		/* state of reaind file */
  	List		   *options;	/* merged generic options, excluding filename */
  } FileFdwPrivate;
  
  /*
--- 66,92 
  
  	/* FIXME: implement force_not_null option */
  
! 	/* Local option */
! 	{ textarray,  ForeignTableRelationId },
! 
! 	/* Sentinel */
  	{ NULL,			InvalidOid }
  };
  
+ #define FILE_FDW_TEXTARRAY_STASH_INIT 64
  /*
   * FDW-specific information for FdwExecutionState.
   */
  typedef struct FileFdwPrivate {
  	char		   *filename;
+ 	booltextarray;  /* make a text array rather than a tuple */
  	Relation		rel;		/* scan target relation */
! 	CopyState		cstate;		/* state of read in file */
  	List		   *options;	/* merged generic options, excluding filename */
+ /* stash for processing text arrays - not used otherwise */
+ 	int text_array_stash_size;
+ 	Datum  *text_array_values;
+ 	bool   *text_array_nulls;
  } FileFdwPrivate;
  
  /*
***
*** 91,96  static void fileIterate(FdwExecutionState *festate, TupleTableSlot *slot);
--- 105,113 
  static void fileEndScan(FdwExecutionState *festate);
  static void fileReScan(FdwExecutionState *festate);
  
+ /* text array support */
+ static void makeTextArray(FileFdwPrivate *fdw_private,
+ 		   TupleTableSlot *slot, char **raw_fields, int nfields);
  /*
   * Helper functions
   */
***
*** 142,148  

Re: [HACKERS] exposing COPY API

2011-02-04 Thread Itagaki Takahiro
Here is a demonstration to support jagged input files. It's a patch
on the latest patch. The new added API is:

  bool NextLineCopyFrom(
[IN] CopyState cstate,
[OUT] char ***fields, [OUT] int *nfields, [OUT] Oid *tupleOid)

It just returns separated fields in the next line. Fortunately, I need
no extra code for it because it is just extracted from NextCopyFrom().

I'm willing to include the change into copy APIs,
but we still have a few issues. See below.

On Fri, Feb 4, 2011 at 16:53, Andrew Dunstan and...@dunslane.net wrote:
 The problem with COPY FROM is that nobody's come up with a good syntax for
 allowing it as a FROM target. Doing what I want via FDW neatly gets us
 around that problem. But I'm quite OK with doing the hard work inside the
 COPY code - that's what my working prototype does in fact.

I think it is not only syntax issue. I found an issue that we hard to
support FORCE_NOT_NULL option for extra fields. See FIXME in the patch.
It is a fundamental problem to support jagged fields.

 One thing I'd like is to to have file_fdw do something we can't do another
 way. currently it doesn't, so it's nice but uninteresting.

BTW, how do you determine which field is shifted in your broken CSV file?
For example, the case you find AB,CD,EF for 2 columns tables.
I could provide a raw CSV reader for jagged files, but you still have to
cook the returned fields into a proper tuple...

-- 
Itagaki Takahiro


jagged_csv_api-20110204.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing COPY API

2011-02-04 Thread Andrew Dunstan



On 02/04/2011 05:49 AM, Itagaki Takahiro wrote:

Here is a demonstration to support jagged input files. It's a patch
on the latest patch. The new added API is:

   bool NextLineCopyFrom(
 [IN] CopyState cstate,
 [OUT] char ***fields, [OUT] int *nfields, [OUT] Oid *tupleOid)

It just returns separated fields in the next line. Fortunately, I need
no extra code for it because it is just extracted from NextCopyFrom().


Thanks, I'll have a look at it, after an emergency job I need to attend 
to. But the API looks weird. Why are fields and nfields OUT params. The 
issue isn't decomposing the line into raw fields. The code for doing 
that works fine as is, including on jagged files. See commit 
af1a614ec6d074fdea46de2e1c462f23fc7ddc6f which was done for exactly this 
purpose. The issue is taking those and composing them into the expected 
tuple.



I'm willing to include the change into copy APIs,
but we still have a few issues. See below.

On Fri, Feb 4, 2011 at 16:53, Andrew Dunstanand...@dunslane.net  wrote:

The problem with COPY FROM is that nobody's come up with a good syntax for
allowing it as a FROM target. Doing what I want via FDW neatly gets us
around that problem. But I'm quite OK with doing the hard work inside the
COPY code - that's what my working prototype does in fact.

I think it is not only syntax issue. I found an issue that we hard to
support FORCE_NOT_NULL option for extra fields. See FIXME in the patch.
It is a fundamental problem to support jagged fields.


It's not a problem at all if you turn the line into a text array. That's 
exactly why we've been proposing it for this. The array has however many 
elements are on the line.



One thing I'd like is to to have file_fdw do something we can't do another
way. currently it doesn't, so it's nice but uninteresting.

BTW, how do you determine which field is shifted in your broken CSV file?
For example, the case you find AB,CD,EF for 2 columns tables.
I could provide a raw CSV reader for jagged files, but you still have to
cook the returned fields into a proper tuple...



See above. My client who deals with this situation and has been doing so 
for years treats underflowing fields as null and ignores overflowing 
fields. They would do he same if the data were delivered with a text 
array. It works very well for them.



See https://github.com/adunstan/postgresql-dev/tree/sqlmed2 for my dev 
branch on this.



cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing COPY API

2011-02-03 Thread Itagaki Takahiro
On Fri, Feb 4, 2011 at 09:48, Andrew Dunstan and...@dunslane.net wrote:
 I'd like to be able to add a callback function to construct the values for
 the tuple. So it would become something like:
   typedef void (*copy_make_values) (CopyState cstate, NumFieldsRead int);

You can do nothing interesting in the callback probably
because the details of CopyState is not exported yet.
Also, we should pass through user context for such kind of callback.
The prototype of would have void *userdata.

 Of course, I want this so I could construct a text array from the read in
 data, but I could also imagine a foreign data wrapper wanting to mangle the
 data before handing it to postgres, say by filling in a field or hashing it.

Could you explain the actual use-cases and examples?  I think we need to have
SQL-level extensibility if we provide such flexibility. I guess typical users
don't want to write functions with C for each kind of input files.

Note that pg_bulkload has a similar feature like as:
  CREATE FUNCTION my_function(...) RETURNS record AS ...;
  COPY tbl FROM 'file' WITH (make_record_from_line = my_function)

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing COPY API

2011-02-03 Thread Andrew Dunstan




Of course, I want this so I could construct a text array from the read in
data, but I could also imagine a foreign data wrapper wanting to mangle the
data before handing it to postgres, say by filling in a field or hashing it.

Could you explain the actual use-cases and examples?  I think we need to have
SQL-level extensibility if we provide such flexibility. I guess typical users
don't want to write functions with C for each kind of input files.

Note that pg_bulkload has a similar feature like as:
   CREATE FUNCTION my_function(...) RETURNS record AS ...;
   COPY tbl FROM 'file' WITH (make_record_from_line = my_function)



Umm, where? I can't find this in the documentation 
http://pgbulkload.projects.postgresql.org/pg_bulkload.html nor in the 
source code. And how would  module like that provide an extra copy option?


The object, as I have explained previously, is to have a FDW that 
returns a text array from a (possibly irregularly shaped) file.



So, given this file:

1,,2,3
4,5,6

select t[4] as a,t[2] as b from my_fdw_table;

would return

a | b
-
3 |
  | 5



cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing COPY API

2011-02-03 Thread Itagaki Takahiro
On Fri, Feb 4, 2011 at 11:32, Andrew Dunstan and...@dunslane.net wrote:
 Umm, where? I can't find this in the documentation
 http://pgbulkload.projects.postgresql.org/pg_bulkload.html

Here:
http://pgbulkload.projects.postgresql.org/pg_bulkload.html#filter

 The object, as I have explained previously, is to have a FDW that returns a
 text array from a (possibly irregularly shaped) file.

I remember the text array proposal, but if the extension is written in C,
it can only handle one kind of input files. If another file is broken
in a different way, you need to rewrite the C code, no?

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing COPY API

2011-02-03 Thread Andrew Dunstan



On 02/03/2011 09:43 PM, Itagaki Takahiro wrote:

On Fri, Feb 4, 2011 at 11:32, Andrew Dunstanand...@dunslane.net  wrote:

Umm, where? I can't find this in the documentation
http://pgbulkload.projects.postgresql.org/pg_bulkload.html

Here:
http://pgbulkload.projects.postgresql.org/pg_bulkload.html#filter


The object, as I have explained previously, is to have a FDW that returns a
text array from a (possibly irregularly shaped) file.

I remember the text array proposal, but if the extension is written in C,
it can only handle one kind of input files. If another file is broken
in a different way, you need to rewrite the C code, no?



AFAICT, this doesn't support ragged tables with too many columns, which 
is my prime use case. If it supported variadic arguments in filter 
functions it might come closer.


But where does the COPY syntax you showed come in?

Also, a FDW allows the COPY to be used as a FROM target, giving it great 
flexibility. AFAICT this does not.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing COPY API

2011-02-03 Thread Itagaki Takahiro
On Fri, Feb 4, 2011 at 12:17, Andrew Dunstan and...@dunslane.net wrote:
 http://pgbulkload.projects.postgresql.org/pg_bulkload.html#filter
 AFAICT, this doesn't support ragged tables with too many columns, which is
 my prime use case. If it supported variadic arguments in filter functions it
 might come closer.

It will be good improvement for pg_bulkload, but it's off-topic ;-)

 Also, a FDW allows the COPY to be used as a FROM target, giving it great
 flexibility. AFAICT this does not.

BTW, which do you want to improve, FDW or COPY FROM?  If FDW, the better
API would be raw version of NextCopyFrom(). For example:
  bool NextRawFields(CopyState cstate, char **fields, int *nfields)
The caller FDW has responsibility to form tuples from the raw fields.
If you need to customize how to form the tuples from various fields,
the FDW also need to have such extensibility.

If COPY FROM, we should implement all the features in copy.c
rather than exported APIs.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing COPY API

2011-02-03 Thread Andrew Dunstan



On 02/03/2011 10:43 PM, Itagaki Takahiro wrote:



Also, a FDW allows the COPY to be used as a FROM target, giving it great
flexibility. AFAICT this does not.

BTW, which do you want to improve, FDW or COPY FROM?  If FDW, the better
API would be raw version of NextCopyFrom(). For example:
   bool NextRawFields(CopyState cstate, char **fields, int *nfields)
The caller FDW has responsibility to form tuples from the raw fields.
If you need to customize how to form the tuples from various fields,
the FDW also need to have such extensibility.

If COPY FROM, we should implement all the features in copy.c
rather than exported APIs.




The problem with COPY FROM is that nobody's come up with a good syntax 
for allowing it as a FROM target. Doing what I want via FDW neatly gets 
us around that problem. But I'm quite OK with doing the hard work inside 
the COPY code - that's what my working prototype does in fact.


One thing I'd like is to to have file_fdw do something we can't do 
another way. currently it doesn't, so it's nice but uninteresting.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers