Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-09-29 Thread Tom Lane
Corey Huinker  writes:
> [ file_fdw_program_v3.diff ]

Pushed with cosmetic adjustments, mostly more work on the comments and
documentation.

I did not push the proposed test case; it's unportable.  The upthread
suggestion to add a TAP test would have been all right, because
--enable-tap-tests requires Perl, but the basic regression tests must
not.  I'm a bit dubious that it'd be worth the work to create such a
test anyway, when COPY FROM PROGRAM itself hasn't got one.

What *would* be worth some effort is allowing ANALYZE on a file_fdw
table reading from a program.  I concur that that probably can't be
the default behavior, but always falling back to the 10-block default
with no pg_stats stats is a really horrid prospect.

One idea is to invent another table-level FDW option "analyze".
If we could make that default to true for files and false for programs,
it'd preserve the desired default behavior, but it would add a feature
for plain files too: if they're too unstable to be worth analyzing,
you could turn it off.

Another thought is that maybe manual ANALYZE should go through in any
case, and the FDW option would only be needed to control auto-analyze.
Although I'm not sure what to think about scripted cases like
vacuumdb --analyze.  Maybe we'd need two flags, one permitting explicit
ANALYZE and one for autoanalyze, which could have different defaults.

Another thing that felt a little unfinished was the cost estimation
behavior.  Again, it's not clear how to do any better by default,
but is it worth the trouble to provide an FDW option to let the user
set the cost estimate for reading the table?  I'm not sure honestly.
Since there's only one path for the FDW itself, the cost estimate
doesn't matter in simple cases, and I'm not sure how much it matters
even in more complicated ones.  It definitely sucks that we don't
have a rows estimate that has anything to do with reality, but allowing
ANALYZE would be enough to handle that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-09-12 Thread Amit Langote
On 2016/09/13 2:01, Corey Huinker wrote:
> Thanks for the review!
> 
> I agree with all the code cleanups suggested and have made then in the
> attached patch, to save the committer some time.

Thanks.  Have already marked the patch as ready for the committer.

> Also in this patch, I changed sgml para to
>  
>   Changing table-level options requires superuser privileges, for security
>   reasons: only a superuser should be able to determine which file is read
>   or which program is run.  In principle non-superusers could be allowed to
>   change the other options, but that's not supported at present.
>  
> 
> "Determine" is an odd word in this context. I understand it to mean
> "set/change", but I can see where a less familiar reader would take the
> meaning to be "has permission to see the value already set". Either way,
> it now mentions program as an option in addition to filename.

Agreed.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-09-12 Thread Corey Huinker
On Mon, Sep 12, 2016 at 1:59 AM, Amit Langote  wrote:

> On 2016/09/11 8:04, Corey Huinker wrote:
> > V2 of this patch:
> >
> > Changes:
> > * rebased to most recent master
> > * removed non-tap test that assumed the existence of Unix sed program
> > * added non-tap test that assumes the existence of perl
> > * switched from filename/program to filename/is_program to more closely
> > follow patterns in copy.c
> > * slight wording change in C comments
>
> This version looks mostly good to me.  Except some whitespace noise in
> some hunks:
>
> @@ -139,7 +142,9 @@ static bool fileIsForeignScanParallelSafe(PlannerInfo
> *root, RelOptInfo *rel,
>   */
>  static bool is_valid_option(const char *option, Oid context);
>  static void fileGetOptions(Oid foreigntableid,
> -   char **filename, List **other_options);
> +   char **filename,
> +   bool *is_program,
>
> Space after "is_program,"
>
> @@ -196,16 +201,17 @@ file_fdw_validator(PG_FUNCTION_ARGS)
>
>  /*
>   * Only superusers are allowed to set options of a file_fdw foreign
> table.
> - * This is because the filename is one of those options, and we don't
> want
> - * non-superusers to be able to determine which file gets read.
> + * The reason for this is to prevent non-superusers from changing the
>
> Space after "the"
>
> -if (stat(filename, _buf) == 0)
> +if ((! is_program) && (stat(filename, _buf) == 0)))
>
> Space between ! and is_program.
>
>
> -if (strcmp(def->defname, "filename") == 0)
> +if ((strcmp(def->defname, "filename") == 0) ||
> (strcmp(def->defname, "program") == 0))
>
> I think the usual style would be to split the if statement into two lines
> as follows to keep within 80 characters per line [1]:
>
> +if ((strcmp(def->defname, "filename") == 0) ||
> +(strcmp(def->defname, "program") == 0))
>
> And likewise for:
>
> -   _private->filename, _private->options);
> +   _private->filename, _private->is_program,
> _private->options);
>
> By the way, doesn't the following paragraph in file-fdw.sgml need an
> update?
>
>  
>   Changing table-level options requires superuser privileges, for security
>   reasons: only a superuser should be able to determine which file is read.
>   In principle non-superusers could be allowed to change the other options,
>   but that's not supported at present.
>  
>
>
> I would like to mark this now as "Ready for Committer".
>
> Thanks,
> Amit
>
> [1] https://www.postgresql.org/docs/devel/static/source-format.html
>
>
>
(reposting non-top-posted...sorry)

Thanks for the review!

I agree with all the code cleanups suggested and have made then in the
attached patch, to save the committer some time.

Also in this patch, I changed sgml para to
 
  Changing table-level options requires superuser privileges, for security
  reasons: only a superuser should be able to determine which file is read
  or which program is run.  In principle non-superusers could be allowed to
  change the other options, but that's not supported at present.
 

"Determine" is an odd word in this context. I understand it to mean
"set/change", but I can see where a less familiar reader would take the
meaning to be "has permission to see the value already set". Either way, it
now mentions program as an option in addition to filename.
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index b471991..7f534b1 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -59,6 +59,7 @@ struct FileFdwOption
 static const struct FileFdwOption valid_options[] = {
/* File options */
{"filename", ForeignTableRelationId},
+   {"program", ForeignTableRelationId},
 
/* Format options */
/* oids option is not supported */
@@ -85,10 +86,11 @@ static const struct FileFdwOption valid_options[] = {
  */
 typedef struct FileFdwPlanState
 {
-   char   *filename;   /* file to read */
-   List   *options;/* merged COPY options, excluding 
filename */
-   BlockNumber pages;  /* estimate of file's physical 
size */
-   double  ntuples;/* estimate of number of rows 
in file */
+   char   *filename;   /* file/program to read */
+   boolis_program; /* true if filename represents 
an OS command */
+   List   *options;/* merged COPY options, excluding 
filename and program */
+   BlockNumber pages;  /* estimate of file or program 
output's physical size */
+   double  ntuples;/* estimate of number of rows 
in file/program output */
 } FileFdwPlanState;
 
 /*
@@ -96,9 +98,10 @@ typedef struct FileFdwPlanState
  */
 typedef struct FileFdwExecutionState
 {
-   char   *filename;   /* file to read */
-  

Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-09-12 Thread Corey Huinker
Thanks for the review!

I agree with all the code cleanups suggested and have made then in the
attached patch, to save the committer some time.

Also in this patch, I changed sgml para to
 
  Changing table-level options requires superuser privileges, for security
  reasons: only a superuser should be able to determine which file is read
  or which program is run.  In principle non-superusers could be allowed to
  change the other options, but that's not supported at present.
 

"Determine" is an odd word in this context. I understand it to mean
"set/change", but I can see where a less familiar reader would take the
meaning to be "has permission to see the value already set". Either way, it
now mentions program as an option in addition to filename.


On Mon, Sep 12, 2016 at 1:59 AM, Amit Langote  wrote:

> On 2016/09/11 8:04, Corey Huinker wrote:
> > V2 of this patch:
> >
> > Changes:
> > * rebased to most recent master
> > * removed non-tap test that assumed the existence of Unix sed program
> > * added non-tap test that assumes the existence of perl
> > * switched from filename/program to filename/is_program to more closely
> > follow patterns in copy.c
> > * slight wording change in C comments
>
> This version looks mostly good to me.  Except some whitespace noise in
> some hunks:
>
> @@ -139,7 +142,9 @@ static bool fileIsForeignScanParallelSafe(PlannerInfo
> *root, RelOptInfo *rel,
>   */
>  static bool is_valid_option(const char *option, Oid context);
>  static void fileGetOptions(Oid foreigntableid,
> -   char **filename, List **other_options);
> +   char **filename,
> +   bool *is_program,
>
> Space after "is_program,"
>
> @@ -196,16 +201,17 @@ file_fdw_validator(PG_FUNCTION_ARGS)
>
>  /*
>   * Only superusers are allowed to set options of a file_fdw foreign
> table.
> - * This is because the filename is one of those options, and we don't
> want
> - * non-superusers to be able to determine which file gets read.
> + * The reason for this is to prevent non-superusers from changing the
>
> Space after "the"
>
> -if (stat(filename, _buf) == 0)
> +if ((! is_program) && (stat(filename, _buf) == 0)))
>
> Space between ! and is_program.
>
>
> -if (strcmp(def->defname, "filename") == 0)
> +if ((strcmp(def->defname, "filename") == 0) ||
> (strcmp(def->defname, "program") == 0))
>
> I think the usual style would be to split the if statement into two lines
> as follows to keep within 80 characters per line [1]:
>
> +if ((strcmp(def->defname, "filename") == 0) ||
> +(strcmp(def->defname, "program") == 0))
>
> And likewise for:
>
> -   _private->filename, _private->options);
> +   _private->filename, _private->is_program,
> _private->options);
>
> By the way, doesn't the following paragraph in file-fdw.sgml need an
> update?
>
>  
>   Changing table-level options requires superuser privileges, for security
>   reasons: only a superuser should be able to determine which file is read.
>   In principle non-superusers could be allowed to change the other options,
>   but that's not supported at present.
>  
>
>
> I would like to mark this now as "Ready for Committer".
>
> Thanks,
> Amit
>
> [1] https://www.postgresql.org/docs/devel/static/source-format.html
>
>
>
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index b471991..7f534b1 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -59,6 +59,7 @@ struct FileFdwOption
 static const struct FileFdwOption valid_options[] = {
/* File options */
{"filename", ForeignTableRelationId},
+   {"program", ForeignTableRelationId},
 
/* Format options */
/* oids option is not supported */
@@ -85,10 +86,11 @@ static const struct FileFdwOption valid_options[] = {
  */
 typedef struct FileFdwPlanState
 {
-   char   *filename;   /* file to read */
-   List   *options;/* merged COPY options, excluding 
filename */
-   BlockNumber pages;  /* estimate of file's physical 
size */
-   double  ntuples;/* estimate of number of rows 
in file */
+   char   *filename;   /* file/program to read */
+   boolis_program; /* true if filename represents 
an OS command */
+   List   *options;/* merged COPY options, excluding 
filename and program */
+   BlockNumber pages;  /* estimate of file or program 
output's physical size */
+   double  ntuples;/* estimate of number of rows 
in file/program output */
 } FileFdwPlanState;
 
 /*
@@ -96,9 +98,10 @@ typedef struct FileFdwPlanState
  */
 typedef struct FileFdwExecutionState
 {
-   char   *filename;   /* file to read */
-   List   *options;

Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-09-12 Thread Amit Langote
On 2016/09/11 8:04, Corey Huinker wrote:
> V2 of this patch:
> 
> Changes:
> * rebased to most recent master
> * removed non-tap test that assumed the existence of Unix sed program
> * added non-tap test that assumes the existence of perl
> * switched from filename/program to filename/is_program to more closely
> follow patterns in copy.c
> * slight wording change in C comments

This version looks mostly good to me.  Except some whitespace noise in
some hunks:

@@ -139,7 +142,9 @@ static bool fileIsForeignScanParallelSafe(PlannerInfo
*root, RelOptInfo *rel,
  */
 static bool is_valid_option(const char *option, Oid context);
 static void fileGetOptions(Oid foreigntableid,
-   char **filename, List **other_options);
+   char **filename,
+   bool *is_program,

Space after "is_program,"

@@ -196,16 +201,17 @@ file_fdw_validator(PG_FUNCTION_ARGS)

 /*
  * Only superusers are allowed to set options of a file_fdw foreign
table.
- * This is because the filename is one of those options, and we don't
want
- * non-superusers to be able to determine which file gets read.
+ * The reason for this is to prevent non-superusers from changing the

Space after "the"

-if (stat(filename, _buf) == 0)
+if ((! is_program) && (stat(filename, _buf) == 0)))

Space between ! and is_program.


-if (strcmp(def->defname, "filename") == 0)
+if ((strcmp(def->defname, "filename") == 0) ||
(strcmp(def->defname, "program") == 0))

I think the usual style would be to split the if statement into two lines
as follows to keep within 80 characters per line [1]:

+if ((strcmp(def->defname, "filename") == 0) ||
+(strcmp(def->defname, "program") == 0))

And likewise for:

-   _private->filename, _private->options);
+   _private->filename, _private->is_program,
_private->options);

By the way, doesn't the following paragraph in file-fdw.sgml need an update?

 
  Changing table-level options requires superuser privileges, for security
  reasons: only a superuser should be able to determine which file is read.
  In principle non-superusers could be allowed to change the other options,
  but that's not supported at present.
 


I would like to mark this now as "Ready for Committer".

Thanks,
Amit

[1] https://www.postgresql.org/docs/devel/static/source-format.html




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-09-10 Thread Corey Huinker
V2 of this patch:

Changes:
* rebased to most recent master
* removed non-tap test that assumed the existence of Unix sed program
* added non-tap test that assumes the existence of perl
* switched from filename/program to filename/is_program to more closely
follow patterns in copy.c
* slight wording change in C comments


On Thu, Sep 8, 2016 at 6:59 PM, Craig Ringer 
wrote:

> On 9 Sep. 2016 03:45, "Corey Huinker"  wrote:
> >
> >
>
> > Stylistically, would a separate .pl file for the emitter be preferable
> to something inline like
> >
> >> perl -e 'print "a\tb\tcc\t4\n"; print "b\tc\tdd\t5\n"'
>
> I'd be fine with that and a suitable comment. Just be careful with
> different platforms' shell escaping rules.
>
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index b471991..bf9753a 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -59,6 +59,7 @@ struct FileFdwOption
 static const struct FileFdwOption valid_options[] = {
/* File options */
{"filename", ForeignTableRelationId},
+   {"program", ForeignTableRelationId},
 
/* Format options */
/* oids option is not supported */
@@ -85,10 +86,11 @@ static const struct FileFdwOption valid_options[] = {
  */
 typedef struct FileFdwPlanState
 {
-   char   *filename;   /* file to read */
-   List   *options;/* merged COPY options, excluding 
filename */
-   BlockNumber pages;  /* estimate of file's physical 
size */
-   double  ntuples;/* estimate of number of rows 
in file */
+   char   *filename;   /* file/program to read */
+   boolis_program; /* true if filename represents 
an OS command */
+   List   *options;/* merged COPY options, excluding 
filename and program */
+   BlockNumber pages;  /* estimate of file or program 
output's physical size */
+   double  ntuples;/* estimate of number of rows 
in file/program output */
 } FileFdwPlanState;
 
 /*
@@ -96,9 +98,10 @@ typedef struct FileFdwPlanState
  */
 typedef struct FileFdwExecutionState
 {
-   char   *filename;   /* file to read */
-   List   *options;/* merged COPY options, excluding 
filename */
-   CopyState   cstate; /* state of reading file */
+   char   *filename;   /* file/program to read */
+   boolis_program; /* true if filename represents 
an OS command */
+   List   *options;/* merged COPY options, excluding 
filename and is_program */
+   CopyState   cstate; /* state of reading file or 
program */
 } FileFdwExecutionState;
 
 /*
@@ -139,7 +142,9 @@ static bool fileIsForeignScanParallelSafe(PlannerInfo 
*root, RelOptInfo *rel,
  */
 static bool is_valid_option(const char *option, Oid context);
 static void fileGetOptions(Oid foreigntableid,
-  char **filename, List **other_options);
+  char **filename,
+  bool *is_program,
+  List **other_options);
 static List *get_file_fdw_attribute_options(Oid relid);
 static bool check_selective_binary_conversion(RelOptInfo *baserel,
  Oid 
foreigntableid,
@@ -196,16 +201,17 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 
/*
 * Only superusers are allowed to set options of a file_fdw foreign 
table.
-* This is because the filename is one of those options, and we don't 
want
-* non-superusers to be able to determine which file gets read.
+* The reason for this is to prevent non-superusers from changing the 
+* definition to access an arbitrary file not visible to that user
+* or to run programs not accessible to that user.
 *
 * Putting this sort of permissions check in a validator is a bit of a
 * crock, but there doesn't seem to be any other place that can enforce
 * the check more cleanly.
 *
-* Note that the valid_options[] array disallows setting filename at any
-* options level other than foreign table --- otherwise there'd still 
be a
-* security hole.
+* Note that the valid_options[] array disallows setting filename and
+* program at any options level other than foreign table --- otherwise
+* there'd still be a security hole.
 */
if (catalog == ForeignTableRelationId && !superuser())
ereport(ERROR,
@@ -247,11 +253,11 @@ file_fdw_validator(PG_FUNCTION_ARGS)
}
 
/*
-* Separate out filename and column-specific options, since

Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-09-08 Thread Corey Huinker
On Thu, Sep 8, 2016 at 6:59 PM, Craig Ringer 
wrote:

> On 9 Sep. 2016 03:45, "Corey Huinker"  wrote:
> >
> >
>
> > Stylistically, would a separate .pl file for the emitter be preferable
> to something inline like
> >
> >> perl -e 'print "a\tb\tcc\t4\n"; print "b\tc\tdd\t5\n"'
>
> I'd be fine with that and a suitable comment. Just be careful with
> different platforms' shell escaping rules.
>
Do perl command switches on windows/VMS use /e instead of -e?  If so,
that'd be a great argument doing just "perl filename".


Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-09-08 Thread Craig Ringer
On 9 Sep. 2016 03:45, "Corey Huinker"  wrote:
>
>

> Stylistically, would a separate .pl file for the emitter be preferable to
something inline like
>
>> perl -e 'print "a\tb\tcc\t4\n"; print "b\tc\tdd\t5\n"'

I'd be fine with that and a suitable comment. Just be careful with
different platforms' shell escaping rules.


Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-09-08 Thread Corey Huinker
On Tue, Sep 6, 2016 at 11:44 PM, Craig Ringer 
wrote:

> On 7 September 2016 at 11:37, Corey Huinker 
> wrote:
> > On Tue, Sep 6, 2016 at 11:24 PM, Craig Ringer <
> craig.rin...@2ndquadrant.com>
> > wrote:
> >>
> >> On 7 September 2016 at 11:21, Corey Huinker 
> >> wrote:
> >> > On Tue, Sep 6, 2016 at 6:53 PM, Craig Ringer
> >> > 
> >>
> >> > And the TAP test would detect the operating system and know to create
> an
> >> > FDW
> >> > that has the PROGRAM value 'cat test_data.csv' on Unix, 'type
> >> > test_data.csv'
> >> > on windows, and 'type test_data.csv;1' on VMS?
> >>
> >> Right. Or just "perl emit_test_data.pl" that works for all of them,
> >> since TAP is perl so you can safely assume you have Perl.
> >
> >
> > Thanks. I was mentally locked in more basic OS commands. Am I right in
> > thinking perl is about the *only* OS command you can be sure is on every
> > architecture?
>
> Probably, there's a lot of crazy out there.
>
> TAP tests can be conditionally run based on architecture, but
> something like this is probably worth testing as widely as possible.
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Stylistically, would a separate .pl file for the emitter be preferable to
something inline like

perl -e 'print "a\tb\tcc\t4\n"; print "b\tc\tdd\t5\n"'


?

I'm inclined to go inline to cut down on the number of moving parts, but I
can see where perl's readability is a barrier to some, and either way I
want to follow established patterns.


[*] For those who don't perl, the command prints:

a   b   cc  4
b   c   dd  5


Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-09-07 Thread Amit Langote
On 2016/09/07 12:29, Corey Huinker wrote:
> On Tue, Sep 6, 2016 at 9:46 PM, Amit Langote wrote:
>> OK.
> Well...maybe not, depending on what Craig and other can do to educate me
> about the TAP tests.

Sure.

>>> Changing table-level options requires superuser privileges, for security
>>> reasons: only a superuser should be able to determine which file is read,
>>> or which program is run. In principle non-superusers could be allowed to
>>> change the other options, but that's not supported at present.
>>
>> Hmm, just a little modification would make it better for me:
>>
>> ... for security reasons.  For example, only superuser should be able to
>> determine which file read or which program is run.
>>
>> Although that could be just me.
> 
> In this case "determine" is unclear whether a non-superuser can set the
> program to be run, or is capable of knowing which program is set to be run
> by the fdw.

Hmm, it is indeed unclear.

How about:

... for security reasons.  For example, only superuser should be able to
*change* which file is read or which program is run.

I just realized this is not just about a C comment.  There is a line in
documentation as well which needs an update.  Any conclusion here should
be applied there.

> We may want some more opinions on what is the most clear.

Certainly.

>> But as you said above, this could be deferred to the committer.
>>
> 
> Yeah, and that made for zero storage savings: a char pointer which is never
> assigned a string takes up as much space as a 4-byte-aligned boolean. And
> the result is that "file" really means program, which I found slightly
> awkward.

My only intent to push for that approach is to have consistency with other
code implementing a similar feature although it may not be that important.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-09-06 Thread Craig Ringer
On 7 September 2016 at 11:37, Corey Huinker  wrote:
> On Tue, Sep 6, 2016 at 11:24 PM, Craig Ringer 
> wrote:
>>
>> On 7 September 2016 at 11:21, Corey Huinker 
>> wrote:
>> > On Tue, Sep 6, 2016 at 6:53 PM, Craig Ringer
>> > 
>>
>> > And the TAP test would detect the operating system and know to create an
>> > FDW
>> > that has the PROGRAM value 'cat test_data.csv' on Unix, 'type
>> > test_data.csv'
>> > on windows, and 'type test_data.csv;1' on VMS?
>>
>> Right. Or just "perl emit_test_data.pl" that works for all of them,
>> since TAP is perl so you can safely assume you have Perl.
>
>
> Thanks. I was mentally locked in more basic OS commands. Am I right in
> thinking perl is about the *only* OS command you can be sure is on every
> architecture?

Probably, there's a lot of crazy out there.

TAP tests can be conditionally run based on architecture, but
something like this is probably worth testing as widely as possible.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-09-06 Thread Corey Huinker
On Tue, Sep 6, 2016 at 11:24 PM, Craig Ringer 
wrote:

> On 7 September 2016 at 11:21, Corey Huinker 
> wrote:
> > On Tue, Sep 6, 2016 at 6:53 PM, Craig Ringer <
> craig.rin...@2ndquadrant.com>
>
> > And the TAP test would detect the operating system and know to create an
> FDW
> > that has the PROGRAM value 'cat test_data.csv' on Unix, 'type
> test_data.csv'
> > on windows, and 'type test_data.csv;1' on VMS?
>
> Right. Or just "perl emit_test_data.pl" that works for all of them,
> since TAP is perl so you can safely assume you have Perl.
>

Thanks. I was mentally locked in more basic OS commands. Am I right in
thinking perl is about the *only* OS command you can be sure is on every
architecture?

The platforms page says we support S/390 but no mention of VM/MVS/CMS. Did
we do an OS/400 port yet? ::ducks::


Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-09-06 Thread Corey Huinker
On Tue, Sep 6, 2016 at 9:46 PM, Amit Langote 
wrote:

> On 2016/09/07 3:12, Corey Huinker wrote:
> > On Fri, Sep 2, 2016 at 5:07 AM, Amit Langote wrote:
> >> I am not familiar with win32 stuff too, so I don't have much to say
> about
> >> that.  Maybe someone else can chime in to help with that.
> >
> > The regressions basically *can't* test this because we'd need a shell
> > command we know works on any architecture.
>
> OK.
>

Well...maybe not, depending on what Craig and other can do to educate me
about the TAP tests.


> >
> > Changing table-level options requires superuser privileges, for security
> > reasons: only a superuser should be able to determine which file is read,
> > or which program is run. In principle non-superusers could be allowed to
> > change the other options, but that's not supported at present.
>
> Hmm, just a little modification would make it better for me:
>
> ... for security reasons.  For example, only superuser should be able to
> determine which file read or which program is run.
>
> Although that could be just me.
>

In this case "determine" is unclear whether a non-superuser can set the
program to be run, or is capable of knowing which program is set to be run
by the fdw.

We may want some more opinions on what is the most clear.


>
> >> @@ -632,11 +671,18 @@ fileBeginForeignScan(ForeignScanState *node, int
> >> eflags)
> >>   * Create CopyState from FDW options.  We always acquire all
> columns,
> >> so
> >>   * as to match the expected ScanTupleSlot signature.
> >>   */
> >> -cstate = BeginCopyFrom(node->ss.ss_currentRelation,
> >> -   filename,
> >> -   false,
> >> -   NIL,
> >> -   options);
> >> +if(filename)
> >> +cstate = BeginCopyFrom(node->ss.ss_currentRelation,
> >> +   filename,
> >> +   false,
> >> +   NIL,
> >> +   options);
> >> +else
> >> +cstate = BeginCopyFrom(node->ss.ss_currentRelation,
> >> +   program,
> >> +   true,
> >> +   NIL,
> >> +   options)
> >>
> >> Like I mentioned above, if there was a is_program Boolean flag instead
> of
> >> separate filename and program, this would be just:
> >>
> >> +cstate = BeginCopyFrom(node->ss.ss_currentRelation,
> >> +   filename,
> >> +   is_program,
> >> +   NIL,
> >> +   options);
> >>
> >> That would get rid of if-else blocks here and a couple of other places.
> >
> > It would, pushing the complexity out to the user. Which could be fine,
> but
> > if we do that then "filename" is a misnomer.
>
> This is an internal state so I'm not sure how this would be pushing
> complexity out to the user.  Am I perhaps misreading what you said?
>

Indeed, it is internal state. Maybe we rename the variable file_command or
something.


>
> What a user sees is that there are two separate foreign table options -
> filename and program.  That we handle them internally using a string to
> identify either file or program and a Boolean flag to show which of the
> two is just an internal implementation detail.
>
> COPY does it that way internally and I just saw that psql's \copy does it
> the same way too.  In the latter's case, following is the options struct
> (src/bin/psql/copy.c):
>
> struct copy_options
> {
> char   *before_tofrom;  /* COPY string before TO/FROM */
> char   *after_tofrom;   /* COPY string after TO/FROM filename */
> char   *file;   /* NULL = stdin/stdout */
> boolprogram;/* is 'file' a program to popen? */
> boolpsql_inout; /* true = use psql stdin/stdout */
> boolfrom;   /* true = FROM, false = TO */
> };
>
> But as you said above, this could be deferred to the committer.
>

Yeah, and that made for zero storage savings: a char pointer which is never
assigned a string takes up as much space as a 4-byte-aligned boolean. And
the result is that "file" really means program, which I found slightly
awkward.


> >> diff --git a/contrib/file_fdw/input/file_fdw.source
> >> b/contrib/file_fdw/input/file_fdw.source
> >> index 685561f..eae34a4 100644
> >> --- a/contrib/file_fdw/input/file_fdw.source
> >> +++ b/contrib/file_fdw/input/file_fdw.source
> >>
> >> You forgot to include expected output diffs.
> >
> > Having regression tests for this is extremely problematic, because the
> > program invoked would need to be an invokable command on any architecture
> > supported by postgres. I'm pretty sure no such command exists.
>
> Craig and Michael elsewhere suggested something about adding TAP tests. If
> that helps the situation, maybe you could.
>

Yeah, I 

Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-09-06 Thread Craig Ringer
On 7 September 2016 at 11:21, Corey Huinker  wrote:
> On Tue, Sep 6, 2016 at 6:53 PM, Craig Ringer 

> And the TAP test would detect the operating system and know to create an FDW
> that has the PROGRAM value 'cat test_data.csv' on Unix, 'type test_data.csv'
> on windows, and 'type test_data.csv;1' on VMS?

Right. Or just "perl emit_test_data.pl" that works for all of them,
since TAP is perl so you can safely assume you have Perl.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-09-06 Thread Corey Huinker
On Tue, Sep 6, 2016 at 6:53 PM, Craig Ringer 
wrote:

> On 7 Sep. 2016 02:14, "Corey Huinker"  wrote:
> >
>
> > Having regression tests for this is extremely problematic, because the
> program invoked would need to be an invokable command on any architecture
> supported by postgres. I'm pretty sure no such command exists.
>
> Your best bet will be using the TAP framework. There you can use Perl
> logic.
>
> I'm not sure where to put such a test though. It doesn't really make sense
> in src/test/recovery/ .
>

And the TAP test would detect the operating system and know to create an
FDW that has the PROGRAM value 'cat test_data.csv' on Unix, 'type
test_data.csv' on windows, and 'type test_data.csv;1' on VMS?


Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-09-06 Thread Amit Langote
On 2016/09/07 3:12, Corey Huinker wrote:
> On Fri, Sep 2, 2016 at 5:07 AM, Amit Langote wrote:
>> I am not familiar with win32 stuff too, so I don't have much to say about
>> that.  Maybe someone else can chime in to help with that.
> 
> The regressions basically *can't* test this because we'd need a shell
> command we know works on any architecture.

OK.

>> @@ -97,8 +99,9 @@ typedef struct FileFdwPlanState
>>  typedef struct FileFdwExecutionState
>>  {
>>  char   *filename;   /* file to read */
>> -List   *options;/* merged COPY options, excluding
>> filename */
>> -CopyState   cstate; /* state of reading file */
>> +char   *program;/* program to read output from */
>> +List   *options;/* merged COPY options, excluding
>> filename and program */
>> +CopyState   cstate; /* state of reading file or program */
>>
>> Have you considered a Boolean flag is_program instead of char **program
>> similar to how copy.c does it?  (See a related comment further below)
> 
> Considered it yes, but decided against it when I started to write my
> version. When Adam delivered his version of the patch, I noticed he had
> made the same design decision. Only one of them will be initialized, and
> the boolean will byte align to 4 bytes, so it's the same storage allocated
> either way.
> 
> Either we're fine with two variables, or we think file_name is poorly
> named. I have only a slight preference for the two variables, and defer to
> the group for a preference.

OK, let's defer it to the committer.

>> - * This is because the filename is one of those options, and we don't
>> want
>> - * non-superusers to be able to determine which file gets read.
>> + * This is because the filename or program string are two of those
>> + * options, and we don't want non-superusers to be able to determine
>> which
>> + * file gets read or what command is run.
>>
>> I'm no native English speaker, but I wonder if this should read: filename
>> or program string *is* one of those options ... OR filename *and* program
>> are two of those options ... ?  Also, "are two of those options" sounds a
>> bit odd to me because I don't see that used as often as "one of those
>> whatever".  I guess that's quite a bit of nitpicking about a C comment, ;)
> 
> Given that C comments constitute a large portion of our documentation, I
> fully support making them as clear as possible.
> 
> I don't remember if this is Adam's comment or mine. Adam - if you're out
> there, chime in.
> 
> The original paragraph was:
> 
> Changing table-level options requires superuser privileges, for security
> reasons: only a superuser should be able to determine which file is read.
> In principle non-superusers could be allowed to change the other options,
> but that's not supported at present.
> 
> 
> How does this paragraph sound?:
> 
> Changing table-level options requires superuser privileges, for security
> reasons: only a superuser should be able to determine which file is read,
> or which program is run. In principle non-superusers could be allowed to
> change the other options, but that's not supported at present.

Hmm, just a little modification would make it better for me:

... for security reasons.  For example, only superuser should be able to
determine which file read or which program is run.

Although that could be just me.

>> @@ -257,9 +264,26 @@ file_fdw_validator(PG_FUNCTION_ARGS)
>>  ereport(ERROR,
>>  (errcode(ERRCODE_SYNTAX_ERROR),
>>   errmsg("conflicting or redundant options")));
>> +if (program)
>> +ereport(ERROR,
>> +(errcode(ERRCODE_SYNTAX_ERROR),
>> + errmsg("conflicting or redundant options")));
>>  filename = defGetString(def);
>>  }
>>
>> +else if (strcmp(def->defname, "program") == 0)
>> +{
>> +if (filename)
>> +ereport(ERROR,
>> +(errcode(ERRCODE_SYNTAX_ERROR),
>> + errmsg("conflicting or redundant options")));
>> +if (program)
>> +ereport(ERROR,
>> +(errcode(ERRCODE_SYNTAX_ERROR),
>> + errmsg("conflicting or redundant options")));
>> +program = defGetString(def);
>> +}
>>
>> Why does it check for filename here?  Also the other way around above.
> 
> We don't want them to specify both program and filename, nor do we allow 2
> programs, or 2 filenames.

Ah, I forgot about the mutually exclusive options part.

>> @@ -632,11 +671,18 @@ fileBeginForeignScan(ForeignScanState *node, int
>> eflags)
>>   * Create CopyState from FDW options.  We always acquire all columns,
>> so
>>   * as to match the expected ScanTupleSlot signature.
>>   */
>> -cstate = BeginCopyFrom(node->ss.ss_currentRelation,
>> 

Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-09-06 Thread Michael Paquier
On Wed, Sep 7, 2016 at 7:53 AM, Craig Ringer
 wrote:
> On 7 Sep. 2016 02:14, "Corey Huinker"  wrote:
>>
>
>> Having regression tests for this is extremely problematic, because the
>> program invoked would need to be an invokable command on any architecture
>> supported by postgres. I'm pretty sure no such command exists.
>
> Your best bet will be using the TAP framework. There you can use Perl logic.
>
> I'm not sure where to put such a test though. It doesn't really make sense
> in src/test/recovery/ .

There is nothing preventing the addition of a TAP test where there are
normal regression tests, so if you want a test for file_fdw you should
add it there, then change its Makefile to have the following target
rules:
check: prove-check

prove-check:
$(prove_check)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-09-06 Thread Craig Ringer
On 7 Sep. 2016 02:14, "Corey Huinker"  wrote:
>

> Having regression tests for this is extremely problematic, because the
program invoked would need to be an invokable command on any architecture
supported by postgres. I'm pretty sure no such command exists.

Your best bet will be using the TAP framework. There you can use Perl logic.

I'm not sure where to put such a test though. It doesn't really make sense
in src/test/recovery/ .


Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-09-06 Thread Corey Huinker
On Fri, Sep 2, 2016 at 5:07 AM, Amit Langote 
wrote:

>
> I am not familiar with win32 stuff too, so I don't have much to say about
> that.  Maybe someone else can chime in to help with that.
>

The regressions basically *can't* test this because we'd need a shell
command we know works on any architecture.


>
> About the patch:
>
> * applies cleanly, compiles fine
> * basic functionality seems to work (have not done any extensive tests
> though)
>
>
> - Specifies the file to be read.  Required.  Must be an absolute path
> name.
> + Specifies the file to be read.  Must be an absolute path name.
> + Either filename or program
> must be
> + specified.  They are mutually exclusive.
> +
> +   
> +  
>
> The note about filename and program being mutually exclusive could be
> placed at the end of list of options.  Or perhaps mention this note as
> part of the description of program option, because filename is already
> defined by that point.
>
> +   
> +
> + Specifies the command to executed.
>
> s/to executed/to be executed/g
>

Correct. I will fix that when other issues below are resolved.


>
> + Either program or filename
> must be
> + specified.  They are mutually exclusive.
>  
>
> Oh I see this has already been mentioned in program option description.
> Did you intend to specify the same in both filename and program
> descriptions?
>

It's been a while since I repackaged Adam's code, but generally I'm in
favor of some redundancy if the two mutually exclusive things are
documented far enough apart.



>
> @@ -97,8 +99,9 @@ typedef struct FileFdwPlanState
>  typedef struct FileFdwExecutionState
>  {
>  char   *filename;   /* file to read */
> -List   *options;/* merged COPY options, excluding
> filename */
> -CopyState   cstate; /* state of reading file */
> +char   *program;/* program to read output from */
> +List   *options;/* merged COPY options, excluding
> filename and program */
> +CopyState   cstate; /* state of reading file or program */
>
> Have you considered a Boolean flag is_program instead of char **program
> similar to how copy.c does it?  (See a related comment further below)
>

Considered it yes, but decided against it when I started to write my
version. When Adam delivered his version of the patch, I noticed he had
made the same design decision. Only one of them will be initialized, and
the boolean will byte align to 4 bytes, so it's the same storage allocated
either way.

Either we're fine with two variables, or we think file_name is poorly
named. I have only a slight preference for the two variables, and defer to
the group for a preference.


>
> - * This is because the filename is one of those options, and we don't
> want
> - * non-superusers to be able to determine which file gets read.
> + * This is because the filename or program string are two of those
> + * options, and we don't want non-superusers to be able to determine
> which
> + * file gets read or what command is run.
>
> I'm no native English speaker, but I wonder if this should read: filename
> or program string *is* one of those options ... OR filename *and* program
> are two of those options ... ?  Also, "are two of those options" sounds a
> bit odd to me because I don't see that used as often as "one of those
> whatever".  I guess that's quite a bit of nitpicking about a C comment, ;)
>

Given that C comments constitute a large portion of our documentation, I
fully support making them as clear as possible.

I don't remember if this is Adam's comment or mine. Adam - if you're out
there, chime in.

The original paragraph was:

Changing table-level options requires superuser privileges, for security
reasons: only a superuser should be able to determine which file is read.
In principle non-superusers could be allowed to change the other options,
but that's not supported at present.


How does this paragraph sound?:

Changing table-level options requires superuser privileges, for security
reasons: only a superuser should be able to determine which file is read,
or which program is run. In principle non-superusers could be allowed to
change the other options, but that's not supported at present.




> @@ -257,9 +264,26 @@ file_fdw_validator(PG_FUNCTION_ARGS)
>  ereport(ERROR,
>  (errcode(ERRCODE_SYNTAX_ERROR),
>   errmsg("conflicting or redundant options")));
> +if (program)
> +ereport(ERROR,
> +(errcode(ERRCODE_SYNTAX_ERROR),
> + errmsg("conflicting or redundant options")));
>  filename = defGetString(def);
>  }
>
> +else if (strcmp(def->defname, "program") == 0)
> +{
> +if (filename)
> +ereport(ERROR,
> +

Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-09-02 Thread Amit Langote
Hi Corey,

Here are some comments and a review of the patch.

On 2016/06/03 5:48, Corey Huinker wrote:
> A while back, there was a push to make COPY gzip-aware. That didn't happen,
> but COPY FROM PROGRAM did, and it scratches the same itch.
> 
> I have a similar need, but with file_fdw foreign tables. I have .csv.gz
> files downloaded to the server, but those CSVs have 100+ columns in them,
> and in this case I only really care about a half dozen of those columns.
> I'd like to avoid:
> - the overhead of writing the uncompressed file to disk and then
> immediately re-reading it
> - writing unwanted columns to a temp/work table via COPY, and then
> immediately re-reading them
> - multicorn fdw because it ends up making a python string out of all data
> cells
> - a csv parsing tool like csvtool or mlr, because they output another CSV
> which must be reparsed from scratch

This feature seems desirable to me too.

> Since file_fdw leverages COPY, it seemed like it would be easy to add the
> FROM PROGRAM feature to file_fdw. I began asking questions on #postgresql
> IRC, only to discover that Adam Gomaa ( akgo...@gmail.com ) had already
> written such a thing, but hadn't submitted it. Attached is a small rework
> of his patch, along with documentation.

Yeah, the fact that file_fdw leverages COPY makes including this feature a
breeze.  Various concerns such as security, popen() vs execve() were
addressed when COPY FROM PROGRAM/STDIN/STDOUT feature was proposed [1], so
we will not have to go through that again (hopefully).

> NOTE: The regression test includes unix commands in the program option. I
> figured that wouldn't work for win32 systems, so I checked to see what the
> regression tests do to test COPY FROM PROGRAM...and I couldn't find any. So
> I guess the test exists as a proof of concept that will get excised before
> final commit.

I am not familiar with win32 stuff too, so I don't have much to say about
that.  Maybe someone else can chime in to help with that.

About the patch:

* applies cleanly, compiles fine
* basic functionality seems to work (have not done any extensive tests though)


- Specifies the file to be read.  Required.  Must be an absolute path
name.
+ Specifies the file to be read.  Must be an absolute path name.
+ Either filename or program must be
+ specified.  They are mutually exclusive.
+
+   
+  

The note about filename and program being mutually exclusive could be
placed at the end of list of options.  Or perhaps mention this note as
part of the description of program option, because filename is already
defined by that point.

+   
+
+ Specifies the command to executed.

s/to executed/to be executed/g

+ Either program or filename must be
+ specified.  They are mutually exclusive.
 

Oh I see this has already been mentioned in program option description.
Did you intend to specify the same in both filename and program descriptions?

@@ -97,8 +99,9 @@ typedef struct FileFdwPlanState
 typedef struct FileFdwExecutionState
 {
 char   *filename;   /* file to read */
-List   *options;/* merged COPY options, excluding filename */
-CopyState   cstate; /* state of reading file */
+char   *program;/* program to read output from */
+List   *options;/* merged COPY options, excluding
filename and program */
+CopyState   cstate; /* state of reading file or program */

Have you considered a Boolean flag is_program instead of char **program
similar to how copy.c does it?  (See a related comment further below)

- * This is because the filename is one of those options, and we don't
want
- * non-superusers to be able to determine which file gets read.
+ * This is because the filename or program string are two of those
+ * options, and we don't want non-superusers to be able to determine
which
+ * file gets read or what command is run.

I'm no native English speaker, but I wonder if this should read: filename
or program string *is* one of those options ... OR filename *and* program
are two of those options ... ?  Also, "are two of those options" sounds a
bit odd to me because I don't see that used as often as "one of those
whatever".  I guess that's quite a bit of nitpicking about a C comment, ;)

@@ -257,9 +264,26 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 ereport(ERROR,
 (errcode(ERRCODE_SYNTAX_ERROR),
  errmsg("conflicting or redundant options")));
+if (program)
+ereport(ERROR,
+(errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options")));
 filename = defGetString(def);
 }

+else if (strcmp(def->defname, "program") == 0)
+{
+if (filename)
+ereport(ERROR,
+(errcode(ERRCODE_SYNTAX_ERROR),
+ 

Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-06-06 Thread Adam Gomaa
I'm fine with the code being released under the PostgreSQL license.

Thanks,
Adam
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-06-06 Thread Robert Haas
On Thu, Jun 2, 2016 at 4:48 PM, Corey Huinker  wrote:
> A while back, there was a push to make COPY gzip-aware. That didn't happen,
> but COPY FROM PROGRAM did, and it scratches the same itch.
>
> I have a similar need, but with file_fdw foreign tables. I have .csv.gz
> files downloaded to the server, but those CSVs have 100+ columns in them,
> and in this case I only really care about a half dozen of those columns. I'd
> like to avoid:
> - the overhead of writing the uncompressed file to disk and then immediately
> re-reading it
> - writing unwanted columns to a temp/work table via COPY, and then
> immediately re-reading them
> - multicorn fdw because it ends up making a python string out of all data
> cells
> - a csv parsing tool like csvtool or mlr, because they output another CSV
> which must be reparsed from scratch
>
> Since file_fdw leverages COPY, it seemed like it would be easy to add the
> FROM PROGRAM feature to file_fdw. I began asking questions on #postgresql
> IRC, only to discover that Adam Gomaa ( akgo...@gmail.com ) had already
> written such a thing, but hadn't submitted it. Attached is a small rework of
> his patch, along with documentation.

His failure to submit that here himself raises the question of whether
he is OK with that code being released under the PostgreSQL license.
If this patch is going to be considered, I think we should have a post
from him clarifying that matter.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-06-03 Thread Corey Huinker
On Fri, Jun 3, 2016 at 1:06 AM, Craig Ringer  wrote:

> On 3 June 2016 at 04:48, Corey Huinker  wrote:
>
>> A while back, there was a push to make COPY gzip-aware. That didn't
>> happen, but COPY FROM PROGRAM did, and it scratches the same itch.
>>
>
>
>> - writing unwanted columns to a temp/work table via COPY, and then
>> immediately re-reading them
>>
>
> Without wanting to take away from the value of letting file FDW access
> FROM PROGRAM, I think this really merits a solution that applies to COPY as
> well. Variants on "how do I COPY just some columns from a CSV" is a real
> FAQ, and it doesn't seem like it'd be excessively hard to support. We'd
> just need some way to pass a list of column-ordinals or header-names.
>
> Not asking you to do that work, just pointing out that this particular
> issue applies to COPY its self as well.
>

I agree, they are two different but slightly overlapping issues. file_fdw
needs a way to handle compressed/filtered files, and COPY needs the ability
to skip columns. But right now COPY is all about getting the shape of the
input from the shape of the destination table.

I had the bright idea of creating a custom datatype, call it "skip" /
"filler" / "nada" / "devnull" or something like that, that would map any
and all values to NULL, thus allowing COPY to naively "store" the unwanted
values into nothingness.

That idea falls apart when it hits InputFunctionCall() in fmgr.c, which
prevents any text string from coercing into NULL. I didn't think I was up
to the task of making InputFunctionCall respect a special case type (void?)
or somehow promoting the pseudo type void into a legit column type (unknown
side-effects there). I did create a type that coerced all input into the
empty string, but the disk usage of that was still pretty significant.

So maybe if there was a subspecies of COPY that returned SETOF RECORD:

SELECT important_column1, important_column2
FROM copy_srf( program := 'zcat filename', format := 'csv', header := true
) AS t(bogus_col1 text, important_column1 integer, important_column2 date,
bogus_col2 text, ... );


That would allow COPY to keep it's current efficiency and simplicity, while
(hopefully) avoiding the unwanted data from ever hitting disk. It would
also be vaguely reminiscent of SQL*Loader.


Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-06-02 Thread Craig Ringer
On 3 June 2016 at 04:48, Corey Huinker  wrote:

> A while back, there was a push to make COPY gzip-aware. That didn't
> happen, but COPY FROM PROGRAM did, and it scratches the same itch.
>


> - writing unwanted columns to a temp/work table via COPY, and then
> immediately re-reading them
>

Without wanting to take away from the value of letting file FDW access FROM
PROGRAM, I think this really merits a solution that applies to COPY as
well. Variants on "how do I COPY just some columns from a CSV" is a real
FAQ, and it doesn't seem like it'd be excessively hard to support. We'd
just need some way to pass a list of column-ordinals or header-names.

Not asking you to do that work, just pointing out that this particular
issue applies to COPY its self as well.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services