subject:"BUG #15449\: file_fdw using program cause exit code error when using LIMIT"

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

2018-11-19 Thread Tom Lane

Etsuro Fujita  writes:
> I agree that it's better to keep the BeginCopyFrom API as-is.  Also, I 
> think your version would handle SIGPIPE in COPY FROM PROGRAM more 
> properly than what I proposed.  So, +1 from me.

Thanks for reviewing!  I've pushed it now, though at the last minute
I reconsidered resetting SIGUSR2 as my previous patch did.  The trouble
with resetting that is that it results in a small window where receipt
of SIGUSR2 would result in backend termination, which we surely don't
want.  Now, it doesn't look to me like the postmaster will ever send
SIGUSR2 to ordinary backends, but it wouldn't be terribly surprising
if someone makes a change that relies on the expectation of SIGUSR2
being SIG_IGN'd by backends.  I don't see any real upside to resetting
SIGUSR2 for the called program that would justify taking any risk
there.  (Note that this concern doesn't apply to SIGPIPE since we
only expect that to be raised synchronously, ie during a write.)

As Kyotaro-san said upthread, it's odd that exec() provides no
way to reset all the handlers to SIG_DFL on the child side.
But since it doesn't, we're not really able to do much more than
this.

regards, tom lane

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

2018-11-18 Thread Etsuro Fujita


(2018/11/17 8:10), Tom Lane wrote:

I wrote:

I'm not quite sure whether the reached_eof test should be
"if (bytesread == 0)" or "if (bytesread<  maxread)".  The former
seems more certain to indicate real EOF; are there other cases where
the fread might return a short read?  On the other hand, if we
support in-band EOF indicators (\. for instance) then we might
stop without having made an extra call to CopyGetData that would
see bytesread == 0.  On the third hand, if we do do that it's
not quite clear to me whether SIGPIPE is ignorable or not.


After still further thought, it seems like "if (bytesread == 0)"
is the right way to proceed.  That protects us against incorrectly
setting reached_eof if the pipe is being run in unbuffered or
line-buffered mode.  (Which copy.c doesn't do, at the moment,
but I'd just as soon this code didn't need to assume that.
Also, I'm not 100% convinced that Windows or other platforms
might not act that way.)



So I think this version is probably good, although maybe it could
use an additional comment explaining the above reasoning.


I agree that it's better to keep the BeginCopyFrom API as-is.  Also, I 
think your version would handle SIGPIPE in COPY FROM PROGRAM more 
properly than what I proposed.  So, +1 from me.


Thanks!

Best regards,
Etsuro Fujita

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

2018-11-18 Thread Kyotaro HORIGUCHI

At Sat, 17 Nov 2018 11:14:49 -0500, Tom Lane  wrote in 
<23489.1542471...@sss.pgh.pa.us>
> I wrote:
> > After still further thought, it seems like "if (bytesread == 0)"
> > is the right way to proceed.  That protects us against incorrectly
> > setting reached_eof if the pipe is being run in unbuffered or
> > line-buffered mode.  (Which copy.c doesn't do, at the moment,
> > but I'd just as soon this code didn't need to assume that.
> > Also, I'm not 100% convinced that Windows or other platforms
> > might not act that way.)  In the case where we terminate early
> > because we saw an in-band EOF marker, we would also not set reached_eof,
> > and again that seems like a good outcome: if we see SIGPIPE it
> > must mean that the program kept sending data after the EOF marker,
> > and that seems like a good thing to complain about.  (It's only
> > guaranteed to fail if the program sends more than the current pipe
> > buffer-ful, but I don't feel a need to add extra code to check for
> > nonempty buffer contents as we exit.)
> 
> Oh, wait, that last bit is backwards: if we see an in-band EOF mark,
> and as a consequence exit without having set reached_eof, then the
> exit code will think that SIGPIPE is ignorable.  So transmitting
> more data after an EOF mark will not be complained of, whether
> it's within the same bufferload or not.
> 
> Still, I can't get very excited about that.  Potentially we could
> improve it by having the places that recognize EOF marks set 
> reached_eof, but I'm unconvinced that it's worth the trouble.
> I'm thinking that it's better to err in the direction of reporting
> SIGPIPE less often not more often, considering that up to now
> we've never reported it at all and there've been so few complaints.

My opinion here is when we execute an external program on the
other end of a pipe, we should behave as loosely (tolerantly) as
ordinary un*x programs are expected. If we're connecting to
another PostgreSQL server, we should be stringent as the current
behavior.

In other words, we don't need to change the behavior of other
than the COPY_FILE case, but ClosePipeToProgram shouldn't
complain not only for SIGPIPE but any kinds of error.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

2018-11-17 Thread Tom Lane

I wrote:
> After still further thought, it seems like "if (bytesread == 0)"
> is the right way to proceed.  That protects us against incorrectly
> setting reached_eof if the pipe is being run in unbuffered or
> line-buffered mode.  (Which copy.c doesn't do, at the moment,
> but I'd just as soon this code didn't need to assume that.
> Also, I'm not 100% convinced that Windows or other platforms
> might not act that way.)  In the case where we terminate early
> because we saw an in-band EOF marker, we would also not set reached_eof,
> and again that seems like a good outcome: if we see SIGPIPE it
> must mean that the program kept sending data after the EOF marker,
> and that seems like a good thing to complain about.  (It's only
> guaranteed to fail if the program sends more than the current pipe
> buffer-ful, but I don't feel a need to add extra code to check for
> nonempty buffer contents as we exit.)

Oh, wait, that last bit is backwards: if we see an in-band EOF mark,
and as a consequence exit without having set reached_eof, then the
exit code will think that SIGPIPE is ignorable.  So transmitting
more data after an EOF mark will not be complained of, whether
it's within the same bufferload or not.

Still, I can't get very excited about that.  Potentially we could
improve it by having the places that recognize EOF marks set 
reached_eof, but I'm unconvinced that it's worth the trouble.
I'm thinking that it's better to err in the direction of reporting
SIGPIPE less often not more often, considering that up to now
we've never reported it at all and there've been so few complaints.

regards, tom lane

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

2018-11-16 Thread Tom Lane

I wrote:
> I'm not quite sure whether the reached_eof test should be
> "if (bytesread == 0)" or "if (bytesread < maxread)".  The former
> seems more certain to indicate real EOF; are there other cases where
> the fread might return a short read?  On the other hand, if we
> support in-band EOF indicators (\. for instance) then we might
> stop without having made an extra call to CopyGetData that would
> see bytesread == 0.  On the third hand, if we do do that it's
> not quite clear to me whether SIGPIPE is ignorable or not.

After still further thought, it seems like "if (bytesread == 0)"
is the right way to proceed.  That protects us against incorrectly
setting reached_eof if the pipe is being run in unbuffered or
line-buffered mode.  (Which copy.c doesn't do, at the moment,
but I'd just as soon this code didn't need to assume that.
Also, I'm not 100% convinced that Windows or other platforms
might not act that way.)  In the case where we terminate early
because we saw an in-band EOF marker, we would also not set reached_eof,
and again that seems like a good outcome: if we see SIGPIPE it
must mean that the program kept sending data after the EOF marker,
and that seems like a good thing to complain about.  (It's only
guaranteed to fail if the program sends more than the current pipe
buffer-ful, but I don't feel a need to add extra code to check for
nonempty buffer contents as we exit.)

So I think this version is probably good, although maybe it could
use an additional comment explaining the above reasoning.

regards, tom lane

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

2018-11-16 Thread Tom Lane

I wrote:
> I just had a thought about that: suppose we add a flag to CopyState
> to indicate whether we reached EOF while reading.  ...
> Then the logic in ClosePipeToProgram could be "if we did not reach
> EOF, then a SIGPIPE termination is unsurprising.  If we did reach
> EOF, then SIGPIPE is an error."  If the called program gets SIGPIPE
> for some reason other than us closing the pipe early, then we would
> see EOF next time we try to read, and correctly report that there's
> a problem.

Concretely, something like the attached.

I'm not quite sure whether the reached_eof test should be
"if (bytesread == 0)" or "if (bytesread < maxread)".  The former
seems more certain to indicate real EOF; are there other cases where
the fread might return a short read?  On the other hand, if we
support in-band EOF indicators (\. for instance) then we might
stop without having made an extra call to CopyGetData that would
see bytesread == 0.  On the third hand, if we do do that it's
not quite clear to me whether SIGPIPE is ignorable or not.

regards, tom lane

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6588ebd..8975446 100644
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
*** typedef struct CopyStateData
*** 114,120 
  	FILE	   *copy_file;		/* used if copy_dest == COPY_FILE */
  	StringInfo	fe_msgbuf;		/* used for all dests during COPY TO, only for
   * dest == COPY_NEW_FE in COPY FROM */
! 	bool		fe_eof;			/* true if detected end of copy data */
  	EolType		eol_type;		/* EOL type of input */
  	int			file_encoding;	/* file or remote side's character encoding */
  	bool		need_transcoding;	/* file encoding diff from server? */
--- 114,122 
  	FILE	   *copy_file;		/* used if copy_dest == COPY_FILE */
  	StringInfo	fe_msgbuf;		/* used for all dests during COPY TO, only for
   * dest == COPY_NEW_FE in COPY FROM */
! 	bool		is_copy_from;	/* COPY TO, or COPY FROM? */
! 	bool		reached_eof;	/* true if we read to end of copy data (not
!  * all copy_dest types maintain this) */
  	EolType		eol_type;		/* EOL type of input */
  	int			file_encoding;	/* file or remote side's character encoding */
  	bool		need_transcoding;	/* file encoding diff from server? */
*** CopyGetData(CopyState cstate, void *data
*** 575,580 
--- 577,584 
  ereport(ERROR,
  		(errcode_for_file_access(),
  		 errmsg("could not read from COPY file: %m")));
+ 			if (bytesread == 0)
+ cstate->reached_eof = true;
  			break;
  		case COPY_OLD_FE:
  
*** CopyGetData(CopyState cstate, void *data
*** 595,601 
  			bytesread = minread;
  			break;
  		case COPY_NEW_FE:
! 			while (maxread > 0 && bytesread < minread && !cstate->fe_eof)
  			{
  int			avail;
  
--- 599,605 
  			bytesread = minread;
  			break;
  		case COPY_NEW_FE:
! 			while (maxread > 0 && bytesread < minread && !cstate->reached_eof)
  			{
  int			avail;
  
*** CopyGetData(CopyState cstate, void *data
*** 623,629 
  			break;
  		case 'c':	/* CopyDone */
  			/* COPY IN correctly terminated by frontend */
! 			cstate->fe_eof = true;
  			return bytesread;
  		case 'f':	/* CopyFail */
  			ereport(ERROR,
--- 627,633 
  			break;
  		case 'c':	/* CopyDone */
  			/* COPY IN correctly terminated by frontend */
! 			cstate->reached_eof = true;
  			return bytesread;
  		case 'f':	/* CopyFail */
  			ereport(ERROR,
*** ProcessCopyOptions(ParseState *pstate,
*** 1050,1055 
--- 1054,1061 
  	if (cstate == NULL)
  		cstate = (CopyStateData *) palloc0(sizeof(CopyStateData));
  
+ 	cstate->is_copy_from = is_from;
+ 
  	cstate->file_encoding = -1;
  
  	/* Extract options from the statement node tree */
*** ClosePipeToProgram(CopyState cstate)
*** 1727,1737 
--- 1733,1755 
  (errcode_for_file_access(),
   errmsg("could not close pipe to external command: %m")));
  	else if (pclose_rc != 0)
+ 	{
+ 		/*
+ 		 * If we ended a COPY FROM PROGRAM before reaching EOF, then it's
+ 		 * expectable for the called program to fail with SIGPIPE, and we
+ 		 * should not report that as an error.  Otherwise, SIGPIPE indicates a
+ 		 * problem.
+ 		 */
+ 		if (cstate->is_copy_from && !cstate->reached_eof &&
+ 			WIFSIGNALED(pclose_rc) && WTERMSIG(pclose_rc) == SIGPIPE)
+ 			return;
+ 
  		ereport(ERROR,
  (errcode(ERRCODE_EXTERNAL_ROUTINE_EXCEPTION),
   errmsg("program \"%s\" failed",
  		cstate->filename),
   errdetail_internal("%s", wait_result_to_str(pclose_rc;
+ 	}
  }
  
  /*
*** BeginCopyFrom(ParseState *pstate,
*** 3169,3175 
  	oldcontext = MemoryContextSwitchTo(cstate->copycontext);
  
  	/* Initialize state variables */
! 	cstate->fe_eof = false;
  	cstate->eol_type = EOL_UNKNOWN;
  	cstate->cur_relname = RelationGetRelationName(cstate->rel);
  	cstate->cur_lineno = 0;
--- 31

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

2018-11-16 Thread Tom Lane

I wrote:
> Are we sufficiently convinced that we must have the dont-allow-partial
> option to not fix this in the back branches?  I'm not.

I just had a thought about that: suppose we add a flag to CopyState
to indicate whether we reached EOF while reading.  This wouldn't be
hugely expensive, just something like

switch (cstate->copy_dest)
{
case COPY_FILE:
bytesread = fread(databuf, 1, maxread, cstate->copy_file);
if (ferror(cstate->copy_file))
ereport(ERROR,
(errcode_for_file_access(),
 errmsg("could not read from COPY file: %m")));
+   cstate->reached_eof |= (bytesread < maxread);
break;
case COPY_OLD_FE:

Then the logic in ClosePipeToProgram could be "if we did not reach
EOF, then a SIGPIPE termination is unsurprising.  If we did reach
EOF, then SIGPIPE is an error."  If the called program gets SIGPIPE
for some reason other than us closing the pipe early, then we would
see EOF next time we try to read, and correctly report that there's
a problem.

There are race-ish conditions in cases like the called program
getting an unrelated SIGPIPE at about the same time that we decide
to stop reading early, but I don't think it's really possible
to disambiguate that.

regards, tom lane

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

2018-11-16 Thread Tom Lane

Etsuro Fujita  writes:
> * I think it's better to ignore the SIGPIPE failure in 
> ClosePipeToProgram if we were in a COPY FROM PROGRAM that was allowed to 
> terminate early and keep the behavior as-is otherwise.  If we ignore 
> that failure unconditionally in that function, eg, COPY TO PROGRAM would 
> fail to get a (better) error message in CopySendEndOfRow or EndCopy when 
> the invoked program was terminated on SIGPIPE, as discussed before [1]. 
>   And to do so, I added a new argument to BeginCopyFrom to specify 
> whether COPY FROM PROGRAM can terminate early or not.

If we do that, it makes this not back-patchable, I fear --- the fact
that file_fdw is calling BeginCopyFrom seems like sufficient evidence
that there might be third-party callers who would object to an API
break in minor releases.  That seems unfortunate for a bug fix.

Are we sufficiently convinced that we must have the dont-allow-partial
option to not fix this in the back branches?  I'm not.

regards, tom lane

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

2018-11-13 Thread Etsuro Fujita


(2018/11/12 20:41), Etsuro Fujita wrote:

(2018/11/12 18:52), Kyotaro HORIGUCHI wrote:

I read the Thomas's messages as "TTIO/TTOU are not
needed to our busines and we don't have a reason to restore them
before calling external programs other than just plaster
seemingly consistency." And I agree to the analysis and I agree
to you on the point that it doens't need consideration just now.


OK


Attached is an updated version of Tom's POC patch.  Here are changes:

* I modified his patch so that the called program inherits SIG_DFL for 
SIGUSR2 as well, as discussed upthread.


* I think it's better to ignore the SIGPIPE failure in 
ClosePipeToProgram if we were in a COPY FROM PROGRAM that was allowed to 
terminate early and keep the behavior as-is otherwise.  If we ignore 
that failure unconditionally in that function, eg, COPY TO PROGRAM would 
fail to get a (better) error message in CopySendEndOfRow or EndCopy when 
the invoked program was terminated on SIGPIPE, as discussed before [1]. 
 And to do so, I added a new argument to BeginCopyFrom to specify 
whether COPY FROM PROGRAM can terminate early or not.


Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/message-id/5BE18409.2070004%40lab.ntt.co.jp
*** a/contrib/file_fdw/file_fdw.c
--- b/contrib/file_fdw/file_fdw.c
***
*** 678,683  fileBeginForeignScan(ForeignScanState *node, int eflags)
--- 678,684 
  		   filename,
  		   is_program,
  		   NULL,
+ 		   true,
  		   NIL,
  		   options);
  
***
*** 754,759  fileReScanForeignScan(ForeignScanState *node)
--- 755,761 
  	festate->filename,
  	festate->is_program,
  	NULL,
+ 	true,
  	NIL,
  	festate->options);
  }
***
*** 1117,1124  file_acquire_sample_rows(Relation onerel, int elevel,
  	/*
  	 * Create CopyState from FDW options.
  	 */
! 	cstate = BeginCopyFrom(NULL, onerel, filename, is_program, NULL, NIL,
! 		   options);
  
  	/*
  	 * Use per-tuple memory context to prevent leak of memory used to read
--- 1119,1126 
  	/*
  	 * Create CopyState from FDW options.
  	 */
! 	cstate = BeginCopyFrom(NULL, onerel, filename, is_program, NULL, false,
! 		   NIL, options);
  
  	/*
  	 * Use per-tuple memory context to prevent leak of memory used to read
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
***
*** 119,124  typedef struct CopyStateData
--- 119,125 
  	int			file_encoding;	/* file or remote side's character encoding */
  	bool		need_transcoding;	/* file encoding diff from server? */
  	bool		encoding_embeds_ascii;	/* ASCII can be non-first byte? */
+ 	bool		allow_partial;	/* true if partial execution is allowed */
  
  	/* parameters from the COPY command */
  	Relation	rel;			/* relation to copy to or from */
***
*** 998,1004  DoCopy(ParseState *pstate, const CopyStmt *stmt,
  		PreventCommandIfParallelMode("COPY FROM");
  
  		cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
! 			   NULL, stmt->attlist, stmt->options);
  		*processed = CopyFrom(cstate);	/* copy from file to database */
  		EndCopyFrom(cstate);
  	}
--- 999,1005 
  		PreventCommandIfParallelMode("COPY FROM");
  
  		cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
! 			   NULL, false, stmt->attlist, stmt->options);
  		*processed = CopyFrom(cstate);	/* copy from file to database */
  		EndCopyFrom(cstate);
  	}
***
*** 1727,1737  ClosePipeToProgram(CopyState cstate)
--- 1728,1749 
  (errcode_for_file_access(),
   errmsg("could not close pipe to external command: %m")));
  	else if (pclose_rc != 0)
+ 	{
+ 		/*
+ 		 * If we were in a COPY FROM PROGRAM that was allowed to terminate
+ 		 * early and the called program terminated on SIGPIPE, assume it's OK;
+ 		 * we must have chosen to stop reading its output early.
+ 		 */
+ 		if (cstate->allow_partial &&
+ 			WIFSIGNALED(pclose_rc) && WTERMSIG(pclose_rc) == SIGPIPE)
+ 			return;
+ 
  		ereport(ERROR,
  (errcode(ERRCODE_EXTERNAL_ROUTINE_EXCEPTION),
   errmsg("program \"%s\" failed",
  		cstate->filename),
   errdetail_internal("%s", wait_result_to_str(pclose_rc;
+ 	}
  }
  
  /*
***
*** 3184,3189  BeginCopyFrom(ParseState *pstate,
--- 3196,3202 
  			  const char *filename,
  			  bool is_program,
  			  copy_data_source_cb data_source_cb,
+ 			  bool allow_partial,
  			  List *attnamelist,
  			  List *options)
  {
***
*** 3301,3306  BeginCopyFrom(ParseState *pstate,
--- 3314,3320 
  	cstate->volatile_defexprs = volatile_defexprs;
  	cstate->num_defaults = num_defaults;
  	cstate->is_program = is_program;
+ 	cstate->allow_partial = allow_partial;
  
  	if (data_source_cb)
  	{
*** a/src/backend/replication/logical/tablesync.c
--- b/src/backend/replication/logical/tablesync.c
***
*** 799,805  copy_table(Relation rel)

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

2018-11-12 Thread Etsuro Fujita

(2018/11/12 18:52), Kyotaro HORIGUCHI wrote:

At Fri, 09 Nov 2018 18:27:09 +0900, Etsuro Fujita wrote
in<5be552ed.4040...@lab.ntt.co.jp>

Ok, I can live with that with no problem.

...

I think Thomas just saying that reading more lines can develop
problems. According to the current discussion, we should error
out if we had SEGV when limit 1.

Ah, I misread that. Sorry for the noise.

Being said that, the ruby case may suggest that we should be more
tolerant for the crash-after-limit case.

The Ruby case would be sad, but I'm not sure we can handle such a case
safely in general, because core and file_fdw don't have enough knowledge
about whether a non-zero exit code returned from pclose is OK or not,
which would actually depend on the called program. One approach for
that would be to add an option to file_fdw for the called program to
tell it to ignore those exit codes, which would be somewhat similar to
what Thomas proposed [1].

In my understanding processes not connected to a
terminal(tty/pts) cannot receive TTIN/TTOU (unless someone sent
it artifically). Since child processes are detached by setsid()
(on Linux), programs called in that way also won't have a
controlling terminal at the start time and I suppose they have no
means to connect to one since they are no longer on the same
session with postmaster.

For TTIN and TTOU, we would first need to make clear the reason for
the inconsistency Thomas pointed out. I'm wondering if we should
leave the TTIN/TTOU stuff for future work.

Inconsistency?

By "the inconsistency" I mean his words:

Why do bgwriter.c, startup.c, ... set SIGTTIN and SIGTTOU back to
SIG_DFL, but not regular backends?

I read the Thomas's messages as "TTIO/TTOU are not
needed to our busines and we don't have a reason to restore them
before calling external programs other than just plaster
seemingly consistency." And I agree to the analysis and I agree
to you on the point that it doens't need consideration just now.

If the consistency means the different behaviors between perl and
ruby, (as written in another message,) I'm inclined to vote for
having a bit more tolerance for error of external programs as my
second patch.

Maybe my explanation was not enough, but I don't mean that.

Thanks!

Best regards,
Etsuro Fujita

[1]
https://www.postgresql.org/message-id/CAEepm%3D0fBPiRkSiJ3v4ynm%2BaP-A-dhuHjTFBAxwo59EkE2-E5Q%40mail.gmail.com

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

2018-11-12 Thread Kyotaro HORIGUCHI

At Fri, 09 Nov 2018 18:27:09 +0900, Etsuro Fujita  
wrote in <5be552ed.4040...@lab.ntt.co.jp>
> > Ok, I can live with that with no problem.
> 
> OK
...
> > I think Thomas just saying that reading more lines can develop
> > problems. According to the current discussion, we should error
> > out if we had SEGV when limit 1.
> 
> Ah, I misread that.  Sorry for the noise.

Being said that, the ruby case may suggest that we should be more
tolerant for the crash-after-limit case.

> > In my understanding processes not connected to a
> > terminal(tty/pts) cannot receive TTIN/TTOU (unless someone sent
> > it artifically).  Since child processes are detached by setsid()
> > (on Linux), programs called in that way also won't have a
> > controlling terminal at the start time and I suppose they have no
> > means to connect to one since they are no longer on the same
> > session with postmaster.
> 
> For TTIN and TTOU, we would first need to make clear the reason for
> the inconsistency Thomas pointed out.  I'm wondering if we should
> leave the TTIN/TTOU stuff for future work.

Inconsistency? I read the Thomas's messages as "TTIO/TTOU are not
needed to our busines and we don't have a reason to restore them
before calling external programs other than just plaster
seemingly consistency." And I agree to the analysis and I agree
to you on the point that it doens't need consideration just now.

If the consistency means the different behaviors between perl and
ruby, (as written in another message,) I'm inclined to vote for
having a bit more tolerance for error of external programs as my
second patch.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

2018-11-12 Thread Kyotaro HORIGUCHI

At Fri, 9 Nov 2018 20:32:54 +1300, Thomas Munro  
wrote in 
> On Fri, Nov 9, 2018 at 6:39 PM Kyotaro HORIGUCHI
>  wrote:
> > Mmm..I didn't get an error at hand on both CentOS7 and High Sierra.
> >
> > | $ perl -e 'for (my $i=0; $i< 100; $i++) { print "$i\n"; }' | head -5
> > ...
> > | 4
> > | $ echo $?
> > | 0
> 
> That's the exit code from head.  You can see python or perl's exit
> code by adding strace in front (on Linux):

Sorry. My stupid. I understand it as head ignores not only
SIGPIPE but SIGSEV and any error exit status of the calling
program. I tried head with a program ends with SEGV and saw that
head ignores it.

$ ./t
line 1
Segmentation fault (core dumped)

$ ./t | head -5  # SEGV before head closes the pipe
line 1
$

This is more tolerant than what I proposed before. (I think it is
too-much tolerant for us).

> > create foreign table ft5 (a text) server svf1 options (program 'ruby -e 
> > "for i in 1..1000 do puts i; end"');
> > select * from ft5 limit 5;
> >  a
> > ---
> >  1
> > ...
> >  5
> > (5 rows)
> > (no error)
> 
> 1000 is not enough... due to buffering, it works.  Try 100:

Ah. Understood. Thanks. (Ruby's flush donesn't work for pipes..)

> postgres=# create foreign table ft5 (a text) server svf1 options
> (program 'ruby -e "for i in 1..100 do puts i; end"');
> CREATE FOREIGN TABLE
> postgres=# select * from ft5 limit 5;
> ERROR:  program "ruby -e "for i in 1..100 do puts i; end"" failed
> DETAIL:  child process exited with exit code 1

I saw the same failure. Ruby handles SIGPIPE and converts it to
exit(1). It cannot be handled by just ignoring SIGPIPE.

Ruby seems to be a friend of my second patch:p

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

2018-11-09 Thread Etsuro Fujita


(2018/11/09 14:39), Kyotaro HORIGUCHI wrote:

At Thu, 08 Nov 2018 21:52:31 +0900, Etsuro Fujita  wrote 
in<5be4318f.4040...@lab.ntt.co.jp>

(2018/11/08 10:50), Thomas Munro wrote:

I take back what I said earlier about false positives from other
pipes.  I think it's only traditional Unix programs designed for use
in pipelines and naive programs that let SIGPIPE terminate the
process.   The accepted answer here gives a good way to think about
it:

https://stackoverflow.com/questions/8369506/why-does-sigpipe-exist


Thanks for the information!


A program sophisticated enough to be writing to other pipes is no
longer in that category and should be setting up signal dispositions
itself, so I agree that we should enable the default disposition and
ignore WTERMSIG(exit_code) == SIGPIPE, as proposed.  That is pretty
close to the intended purpose of that signal AFAICS.


Great!


In the sense of "We don't care the reason", negligible reasons
are necessariry restricted to SIGPIPE, evan SIGSEGV could be
theoretically ignored safely. "theoretically" here means it is
another issue whether we rely on the output from a program which
causes SEGV (or any reason other than SIGPIPE, which we caused).


For the SIGSEGV case, I think it would be better that we don't rely on
the output data, IMO, because I think there might be a possibility
that
the program have generated that data incorrectly/unexpectedly.


+1

I don't think we should ignore termination by signals other than
SIGPIPE: that could hide serious problems from users.  I want to know
if my program is crashing with SIGBUS, SIGTERM, SIGFPE etc, even if it
happens after we read enough data; there is a major problem that a
human needs to investigate!


I think so too.


Ok, I can live with that with no problem.


OK


As the result it doesn't report an error for SELECT * FROM ft2
LIMIT 1 on "main(void){puts("test1"); return 1;}".

=#  select * from ft limit 1;
  a
---
test1
(1 row)

limit 2 reports the error.

=# select * from ft limit 2;
ERROR:  program "/home/horiguti/work/exprog" failed
DETAIL:  child process exited with exit code 1


I think this would be contrary to users expectations: if the SELECT
command works for limit 1, they would expect that the command would
work
for limit 2 as well.  So, I think it would be better to error out that
command for limit 1 as well, as-is.


I think it's correct that LIMIT 1 gives no error but LIMIT 2 gives an
error.  For LIMIT 1, we got all the rows we wanted, and then we closed
the pipe.  If we got a non-zero non-signal exit code, or a signal exit
code and it was SIGPIPE (not any other signal!), then we should
consider that to be expected.


Maybe I'm missing something, but the non-zero non-signal exit code
means that there was something wrong with the called program, so I
think a human had better investigate that as well IMO, which would
probably be a minor problem, though.  Too restrictive?


I think Thomas just saying that reading more lines can develop
problems. According to the current discussion, we should error
out if we had SEGV when limit 1.


Ah, I misread that.  Sorry for the noise.


On Wed, Nov 7, 2018 at 4:44 PM Etsuro Fujita
   wrote:

(2018/11/06 19:50), Thomas Munro wrote:

On my FreeBSD system, I compared the output of procstat -i (= show
signal disposition) for two "sleep 60" processes, one invoked from the
shell and the other from COPY ... FROM PROGRAM.  The differences were:
PIPE, TTIN, TTOU and USR2.  For the first and last of those, the
default action would be to terminate the process, but the COPY PROGRAM
child ignored them; for TTIN and TTOU, the default action would be to
stop the process, but again they are ignored.  Why do bgwriter.c,
startup.c, ... set SIGTTIN and SIGTTOU back to SIG_DFL, but not
regular backends?


So, we should revert SIGUSR2 as well to default processing?


I don't think it matters in practice, but it might be nice to restore
that just for consistency.


Agreed.


I'm not sure what to think about the TTIN,
TTOU stuff; I don't understand job control well right now but I don't
think it really applies to programs run by a PostgreSQL backend, so if
we restore those it'd probably again be only for consistency.  Then
again, there may be a reason someone decided to ignore those in the
postmaster + regular backends but not the various auxiliary processes.
Anyone?


I don't have any idea about that.


In my understanding processes not connected to a
terminal(tty/pts) cannot receive TTIN/TTOU (unless someone sent
it artifically).  Since child processes are detached by setsid()
(on Linux), programs called in that way also won't have a
controlling terminal at the start time and I suppose they have no
means to connect to one since they are no longer on the same
session with postmaster.


For TTIN and TTOU, we would first need to make clear the reason for the 
inconsistency Thomas pointed out.  I'm wondering if we should leave the 
TTIN/TTOU stuff for future work.


Best regard

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

2018-11-08 Thread Thomas Munro

On Fri, Nov 9, 2018 at 6:39 PM Kyotaro HORIGUCHI
 wrote:
> Mmm..I didn't get an error at hand on both CentOS7 and High Sierra.
>
> | $ perl -e 'for (my $i=0; $i< 100; $i++) { print "$i\n"; }' | head -5
> ...
> | 4
> | $ echo $?
> | 0

That's the exit code from head.  You can see python or perl's exit
code by adding strace in front (on Linux):

$ strace python -c "for i in range(100): print i" | head -5
...
+++ exited with 1 +++

$ strace perl -e 'for (my $i=0; $i< 100; $i++) { print "$i\n"; }' | head -5
...
+++ killed by SIGPIPE +++

> create foreign table ft5 (a text) server svf1 options (program 'ruby -e "for 
> i in 1..1000 do puts i; end"');
> select * from ft5 limit 5;
>  a
> ---
>  1
> ...
>  5
> (5 rows)
> (no error)

1000 is not enough... due to buffering, it works.  Try 100:

postgres=# create foreign table ft5 (a text) server svf1 options
(program 'ruby -e "for i in 1..100 do puts i; end"');
CREATE FOREIGN TABLE
postgres=# select * from ft5 limit 5;
ERROR:  program "ruby -e "for i in 1..100 do puts i; end"" failed
DETAIL:  child process exited with exit code 1

-- 
Thomas Munro
http://www.enterprisedb.com

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

2018-11-08 Thread Kyotaro HORIGUCHI

At Thu, 08 Nov 2018 21:52:31 +0900, Etsuro Fujita  
wrote in <5be4318f.4040...@lab.ntt.co.jp>
> (2018/11/08 10:50), Thomas Munro wrote:
> > I take back what I said earlier about false positives from other
> > pipes.  I think it's only traditional Unix programs designed for use
> > in pipelines and naive programs that let SIGPIPE terminate the
> > process.   The accepted answer here gives a good way to think about
> > it:
> >
> > https://stackoverflow.com/questions/8369506/why-does-sigpipe-exist
> 
> Thanks for the information!
>
> > A program sophisticated enough to be writing to other pipes is no
> > longer in that category and should be setting up signal dispositions
> > itself, so I agree that we should enable the default disposition and
> > ignore WTERMSIG(exit_code) == SIGPIPE, as proposed.  That is pretty
> > close to the intended purpose of that signal AFAICS.
> 
> Great!
>
> >>> In the sense of "We don't care the reason", negligible reasons
> >>> are necessariry restricted to SIGPIPE, evan SIGSEGV could be
> >>> theoretically ignored safely. "theoretically" here means it is
> >>> another issue whether we rely on the output from a program which
> >>> causes SEGV (or any reason other than SIGPIPE, which we caused).
> >>
> >> For the SIGSEGV case, I think it would be better that we don't rely on
> >> the output data, IMO, because I think there might be a possibility
> >> that
> >> the program have generated that data incorrectly/unexpectedly.
> >
> > +1
> >
> > I don't think we should ignore termination by signals other than
> > SIGPIPE: that could hide serious problems from users.  I want to know
> > if my program is crashing with SIGBUS, SIGTERM, SIGFPE etc, even if it
> > happens after we read enough data; there is a major problem that a
> > human needs to investigate!
> 
> I think so too.

Ok, I can live with that with no problem.

> >>> As the result it doesn't report an error for SELECT * FROM ft2
> >>> LIMIT 1 on "main(void){puts("test1"); return 1;}".
> >>>
> >>> =#  select * from ft limit 1;
> >>>  a
> >>> ---
> >>>test1
> >>> (1 row)
> >>>
> >>> limit 2 reports the error.
> >>>
> >>> =# select * from ft limit 2;
> >>> ERROR:  program "/home/horiguti/work/exprog" failed
> >>> DETAIL:  child process exited with exit code 1
> >>
> >> I think this would be contrary to users expectations: if the SELECT
> >> command works for limit 1, they would expect that the command would
> >> work
> >> for limit 2 as well.  So, I think it would be better to error out that
> >> command for limit 1 as well, as-is.
> >
> > I think it's correct that LIMIT 1 gives no error but LIMIT 2 gives an
> > error.  For LIMIT 1, we got all the rows we wanted, and then we closed
> > the pipe.  If we got a non-zero non-signal exit code, or a signal exit
> > code and it was SIGPIPE (not any other signal!), then we should
> > consider that to be expected.
> 
> Maybe I'm missing something, but the non-zero non-signal exit code
> means that there was something wrong with the called program, so I
> think a human had better investigate that as well IMO, which would
> probably be a minor problem, though.  Too restrictive?

I think Thomas just saying that reading more lines can develop
problems. According to the current discussion, we should error
out if we had SEGV when limit 1.

> > I tried to think of a scenario where the already-received output is
> > truly invalidated by a later error that happens after we close the
> > pipe.  It could be something involving a program that uses something
> > optimistic like serializable snapshot isolation that can later decide
> > that whatever it told you earlier is not valid after all.  Suppose the
> > program is clever enough to expect EPIPE and not consider that to be
> > an error, but wants to tell us about serialization failure with a
> > non-zero exit code.  To handle that, you'd need a way to provide an
> > option to file_fdw to tell it not to ignore non-zero exit codes after
> > close.  This seems so exotic and contrived that it's not worth
> > bothering with for now, but could always be added.
> 
> Interesting!  I agree that such an option could add more flexibility
> in handling the non-zero-exit-code case.

I think the program shoudn't output a line until all possible
output is validated. Once the data source emited a line, the
receiver can't do other than believe that it won't be withdrawn.

> > BTW just for curiosity:
> >
> > perl -e 'for (my $i=0; $i< 100; $i++) { print "$i\n"; }' | head -5
> > Exit code: terminated by SIGPIPE, like seq
> 
> Good to know!

Mmm..I didn't get an error at hand on both CentOS7 and High Sierra.

| $ perl -e 'for (my $i=0; $i< 100; $i++) { print "$i\n"; }' | head -5
...
| 4
| $ echo $?
| 0

> > ruby -e 'for i in 1..100 do puts i; end' | head -5
> > Exit code: 1, like Python
> 
> Sad.  Anyway, thanks a lot for these experiments in addition to the
> previous ones!

ruby reported broken pipe but exit status was 0..

create foreign tab

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

2018-11-08 Thread Etsuro Fujita


(2018/11/08 10:50), Thomas Munro wrote:

On Wed, Nov 7, 2018 at 11:03 PM Etsuro Fujita
  wrote:

(2018/11/07 9:22), Kyotaro HORIGUCHI wrote:

At Tue, 06 Nov 2018 21:07:37 +0900, Etsuro Fujita   
wrote in<5be18409.2070...@lab.ntt.co.jp>

(2018/11/06 12:53), Kyotaro HORIGUCHI wrote:

even if it comes from another pipe, we can assume that the
SIGPIPE immediately stopped the program before returning any
garbage to us.


Sorry, I don't understand this.


Mmm. It looks confusing, sorry. In other words:

We don't care the reason for program's ending if we received
enough data from the program and we actively (= before receiving
EOF) stop reading. On the other hand if SIGPIPE (on another pipe)
or something fatal happened on the program before we end reading,
we receive EOF just after that and we are interested in the cause
of the program's death.

# Does the above make sense?


Yeah, thanks for the explanation!


+1

I take back what I said earlier about false positives from other
pipes.  I think it's only traditional Unix programs designed for use
in pipelines and naive programs that let SIGPIPE terminate the
process.   The accepted answer here gives a good way to think about
it:

https://stackoverflow.com/questions/8369506/why-does-sigpipe-exist


Thanks for the information!


A program sophisticated enough to be writing to other pipes is no
longer in that category and should be setting up signal dispositions
itself, so I agree that we should enable the default disposition and
ignore WTERMSIG(exit_code) == SIGPIPE, as proposed.  That is pretty
close to the intended purpose of that signal AFAICS.


Great!


In the sense of "We don't care the reason", negligible reasons
are necessariry restricted to SIGPIPE, evan SIGSEGV could be
theoretically ignored safely. "theoretically" here means it is
another issue whether we rely on the output from a program which
causes SEGV (or any reason other than SIGPIPE, which we caused).


For the SIGSEGV case, I think it would be better that we don't rely on
the output data, IMO, because I think there might be a possibility that
the program have generated that data incorrectly/unexpectedly.


+1

I don't think we should ignore termination by signals other than
SIGPIPE: that could hide serious problems from users.  I want to know
if my program is crashing with SIGBUS, SIGTERM, SIGFPE etc, even if it
happens after we read enough data; there is a major problem that a
human needs to investigate!


I think so too.


Finally in the attached PoC, I set cstate->eof_reached on failure
of NextCopyFromRawFields then if eof_reached we don't ingore
SIGPIPE. For a program like
"puts("test1");fflush(stdout);kill(getpid(), SIGPIPE);", I had
the following behavior. (I got an error without the fflush() but
it is inevitable).

=# select * from ft2 limit 1;
  a
---
test1

=# select * from ft2 limit 2;
ERROR:  program "/home/horiguti/work/exprog" failed
DETAIL:  child process was terminated by signal 13: Broken pipe

For the original case:

=# select * from ft1 limit 1;
 a
--
test
=# select * from ft1 limit 2;
 a
--
test
(1 row)


I didn't confirmed whether it is complete.


Sorry, I don't understand this fully, but the reason to add the
eof_reached stuff is to avoid ignoring the SIGPIPE failure in normal
COPY FROM PROGRAM cases?


Yes, if we have received EOF, the program ended before we
finished reading from the pipe thus any error should be
reported.  Elsewise we don't care at least SIGPIPE. In the
attached patch all kind of errors are ignored when pipe is closed
from reading-end on backends.


OK, I understand your idea.  Thanks for the patch!


+1


As the result it doesn't report an error for SELECT * FROM ft2
LIMIT 1 on "main(void){puts("test1"); return 1;}".

=#  select * from ft limit 1;
 a
---
   test1
(1 row)

limit 2 reports the error.

=# select * from ft limit 2;
ERROR:  program "/home/horiguti/work/exprog" failed
DETAIL:  child process exited with exit code 1


I think this would be contrary to users expectations: if the SELECT
command works for limit 1, they would expect that the command would work
for limit 2 as well.  So, I think it would be better to error out that
command for limit 1 as well, as-is.


I think it's correct that LIMIT 1 gives no error but LIMIT 2 gives an
error.  For LIMIT 1, we got all the rows we wanted, and then we closed
the pipe.  If we got a non-zero non-signal exit code, or a signal exit
code and it was SIGPIPE (not any other signal!), then we should
consider that to be expected.


Maybe I'm missing something, but the non-zero non-signal exit code means 
that there was something wrong with the called program, so I think a 
human had better investigate that as well IMO, which would probably be a 
minor problem, though.  Too restrictive?



I tried to think of a scenario where the already-received output is
truly invalidated by a later error that happens after we close the
pipe.  It could be something involving a progr

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

2018-11-07 Thread Thomas Munro

On Wed, Nov 7, 2018 at 11:03 PM Etsuro Fujita
 wrote:
> (2018/11/07 9:22), Kyotaro HORIGUCHI wrote:
> > At Tue, 06 Nov 2018 21:07:37 +0900, Etsuro 
> > Fujita  wrote 
> > in<5be18409.2070...@lab.ntt.co.jp>
> >> (2018/11/06 12:53), Kyotaro HORIGUCHI wrote:
> >>> even if it comes from another pipe, we can assume that the
> >>> SIGPIPE immediately stopped the program before returning any
> >>> garbage to us.
> >>
> >> Sorry, I don't understand this.
> >
> > Mmm. It looks confusing, sorry. In other words:
> >
> > We don't care the reason for program's ending if we received
> > enough data from the program and we actively (= before receiving
> > EOF) stop reading. On the other hand if SIGPIPE (on another pipe)
> > or something fatal happened on the program before we end reading,
> > we receive EOF just after that and we are interested in the cause
> > of the program's death.
> >
> > # Does the above make sense?
>
> Yeah, thanks for the explanation!

+1

I take back what I said earlier about false positives from other
pipes.  I think it's only traditional Unix programs designed for use
in pipelines and naive programs that let SIGPIPE terminate the
process.   The accepted answer here gives a good way to think about
it:

https://stackoverflow.com/questions/8369506/why-does-sigpipe-exist

A program sophisticated enough to be writing to other pipes is no
longer in that category and should be setting up signal dispositions
itself, so I agree that we should enable the default disposition and
ignore WTERMSIG(exit_code) == SIGPIPE, as proposed.  That is pretty
close to the intended purpose of that signal AFAICS.

> > In the sense of "We don't care the reason", negligible reasons
> > are necessariry restricted to SIGPIPE, evan SIGSEGV could be
> > theoretically ignored safely. "theoretically" here means it is
> > another issue whether we rely on the output from a program which
> > causes SEGV (or any reason other than SIGPIPE, which we caused).
>
> For the SIGSEGV case, I think it would be better that we don't rely on
> the output data, IMO, because I think there might be a possibility that
> the program have generated that data incorrectly/unexpectedly.

+1

I don't think we should ignore termination by signals other than
SIGPIPE: that could hide serious problems from users.  I want to know
if my program is crashing with SIGBUS, SIGTERM, SIGFPE etc, even if it
happens after we read enough data; there is a major problem that a
human needs to investigate!

> >>> Finally in the attached PoC, I set cstate->eof_reached on failure
> >>> of NextCopyFromRawFields then if eof_reached we don't ingore
> >>> SIGPIPE. For a program like
> >>> "puts("test1");fflush(stdout);kill(getpid(), SIGPIPE);", I had
> >>> the following behavior. (I got an error without the fflush() but
> >>> it is inevitable).
> >>>
> >>> =# select * from ft2 limit 1;
> >>>  a
> >>> ---
> >>>test1
> >>>
> >>> =# select * from ft2 limit 2;
> >>> ERROR:  program "/home/horiguti/work/exprog" failed
> >>> DETAIL:  child process was terminated by signal 13: Broken pipe
> >>>
> >>> For the original case:
> >>>
> >>> =# select * from ft1 limit 1;
> >>> a
> >>> --
> >>>test
> >>> =# select * from ft1 limit 2;
> >>> a
> >>> --
> >>>test
> >>> (1 row)
> >>>
> >>>
> >>> I didn't confirmed whether it is complete.
> >>
> >> Sorry, I don't understand this fully, but the reason to add the
> >> eof_reached stuff is to avoid ignoring the SIGPIPE failure in normal
> >> COPY FROM PROGRAM cases?
> >
> > Yes, if we have received EOF, the program ended before we
> > finished reading from the pipe thus any error should be
> > reported.  Elsewise we don't care at least SIGPIPE. In the
> > attached patch all kind of errors are ignored when pipe is closed
> > from reading-end on backends.
>
> OK, I understand your idea.  Thanks for the patch!

+1

> > As the result it doesn't report an error for SELECT * FROM ft2
> > LIMIT 1 on "main(void){puts("test1"); return 1;}".
> >
> > =#  select * from ft limit 1;
> > a
> > ---
> >   test1
> > (1 row)
> >
> > limit 2 reports the error.
> >
> > =# select * from ft limit 2;
> > ERROR:  program "/home/horiguti/work/exprog" failed
> > DETAIL:  child process exited with exit code 1
>
> I think this would be contrary to users expectations: if the SELECT
> command works for limit 1, they would expect that the command would work
> for limit 2 as well.  So, I think it would be better to error out that
> command for limit 1 as well, as-is.

I think it's correct that LIMIT 1 gives no error but LIMIT 2 gives an
error.  For LIMIT 1, we got all the rows we wanted, and then we closed
the pipe.  If we got a non-zero non-signal exit code, or a signal exit
code and it was SIGPIPE (not any other signal!), then we should
consider that to be expected.

I tried to think of a scenario where the already-received output is
truly invalidated by a later error that happens after we close the
pipe.  It could be something involving

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

2018-11-07 Thread Etsuro Fujita


(2018/11/07 9:22), Kyotaro HORIGUCHI wrote:

At Tue, 06 Nov 2018 21:07:37 +0900, Etsuro Fujita  wrote 
in<5be18409.2070...@lab.ntt.co.jp>

(2018/11/06 12:53), Kyotaro HORIGUCHI wrote:

At Fri, 02 Nov 2018 22:05:36 +0900, Etsuro
Fujita  wrote
in<5bdc4ba0.7050...@lab.ntt.co.jp>

(2018/10/29 15:58), Kyotaro HORIGUCHI wrote:

But iff we are explicitly stop reading from
the pipe before detecting an error, it can be ignored since we
aren't interested in further failure.


You mean that we should ignore other failures of the called program
when we stop reading from the pipe early?


Yes, we have received sufficient data from the pipe then closed
it successfully. The program may want write more but we don't
want it. We ragher should ignore SIGPIPE exit code of the program

 rather

since closing our writing end of a pipe is likely to cause it and


I think so too.


even if it comes from another pipe, we can assume that the
SIGPIPE immediately stopped the program before returning any
garbage to us.


Sorry, I don't understand this.


Mmm. It looks confusing, sorry. In other words:

We don't care the reason for program's ending if we received
enough data from the program and we actively (= before receiving
EOF) stop reading. On the other hand if SIGPIPE (on another pipe)
or something fatal happened on the program before we end reading,
we receive EOF just after that and we are interested in the cause
of the program's death.

# Does the above make sense?


Yeah, thanks for the explanation!


In the sense of "We don't care the reason", negligible reasons
are necessariry restricted to SIGPIPE, evan SIGSEGV could be
theoretically ignored safely. "theoretically" here means it is
another issue whether we rely on the output from a program which
causes SEGV (or any reason other than SIGPIPE, which we caused).


For the SIGSEGV case, I think it would be better that we don't rely on 
the output data, IMO, because I think there might be a possibility that 
the program have generated that data incorrectly/unexpectedly.



Finally in the attached PoC, I set cstate->eof_reached on failure
of NextCopyFromRawFields then if eof_reached we don't ingore
SIGPIPE. For a program like
"puts("test1");fflush(stdout);kill(getpid(), SIGPIPE);", I had
the following behavior. (I got an error without the fflush() but
it is inevitable).

=# select * from ft2 limit 1;
 a
---
   test1

=# select * from ft2 limit 2;
ERROR:  program "/home/horiguti/work/exprog" failed
DETAIL:  child process was terminated by signal 13: Broken pipe

For the original case:

=# select * from ft1 limit 1;
a
--
   test
=# select * from ft1 limit 2;
a
--
   test
(1 row)


I didn't confirmed whether it is complete.


Sorry, I don't understand this fully, but the reason to add the
eof_reached stuff is to avoid ignoring the SIGPIPE failure in normal
COPY FROM PROGRAM cases?


Yes, if we have received EOF, the program ended before we
finished reading from the pipe thus any error should be
reported.  Elsewise we don't care at least SIGPIPE. In the
attached patch all kind of errors are ignored when pipe is closed
from reading-end on backends.


OK, I understand your idea.  Thanks for the patch!


As the result it doesn't report an error for SELECT * FROM ft2
LIMIT 1 on "main(void){puts("test1"); return 1;}".

=#  select * from ft limit 1;
a
---
  test1
(1 row)

limit 2 reports the error.

=# select * from ft limit 2;
ERROR:  program "/home/horiguti/work/exprog" failed
DETAIL:  child process exited with exit code 1


I think this would be contrary to users expectations: if the SELECT 
command works for limit 1, they would expect that the command would work 
for limit 2 as well.  So, I think it would be better to error out that 
command for limit 1 as well, as-is.


Best regards,
Etsuro Fujita

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

2018-11-06 Thread Etsuro Fujita


(2018/11/06 19:50), Thomas Munro wrote:

On Wed, Oct 24, 2018 at 1:21 AM Tom Lane  wrote:

I wrote:

=?utf-8?q?PG_Bug_reporting_form?=  writes:

SELECT * FROM test_file_fdw_program_limit LIMIT 0;
/*
[38000] ERROR: program "echo "test"" failed Detail: child process exited
with exit code 1
*/



Yeah, I can reproduce this on macOS as well as Linux.  Capturing stderr
shows something pretty unsurprising:
sh: line 1: echo: write error: Broken pipe
So the called program is behaving in a somewhat reasonable way: it's
detecting EPIPE on its stdout (after we close the pipe), reporting that,
and doing exit(1).
Unfortunately, it's not clear what we could do about that, short of
always reading the whole program output, which is likely to be a
cure worse than the disease.  If the program were failing thanks to
SIGPIPE, we could recognize that as a case we can ignore ... but with
behavior like this, I don't see a reliable way to tell it apart from
a generic program failure, which surely we'd better report.


After a bit of thought, the problem here is blindingly obvious:
we generally run the backend with SIGPIPE handing set to SIG_IGN,
and evidently popen() allows the called program to inherit that,
at least on these platforms.

So what we need to do is make sure the called program inherits SIG_DFL
handling for SIGPIPE, and then special-case that result as not being
a failure.  The attached POC patch does that and successfully makes
the file_fdw problem go away for me.

It's just a POC because there are some things that need more thought
than I've given them:

1. Is it OK to revert SIGPIPE to default processing for *all* programs
launched through OpenPipeStream?  If not, what infrastructure do we
need to add to control that?  In particular, is it sane to revert
SIGPIPE for a pipe program that we will write to not read from?
(I thought probably yes, because that is the normal Unix behavior,
but it could be debated.)

2. Likewise, is it always OK for ClosePipeToProgram to ignore a
SIGPIPE failure?  (For ordinary COPY, maybe it shouldn't, since
we don't intend to terminate that early.)


I'm not sure about that.  It might in theory be telling you about some
other pipe.  If you're OK with that false positive, why not ignore all
errors after you've read enough successful input and decided to close
the pipe?


It's unfortunate to have that false positive, but in my opinion I think 
we had better to error out if there is something wrong with the called 
program, because in that case I think the data that we read from the 
pipe might not be reliable.  IMO I think it would be the responsibility 
of the called program to handle/ignore SIGPIPE properly if necessary.



3. Maybe this should be implemented at some higher level?


It won't work for some programs that ignore or catch the signal, so in
theory you might want to give users the power/responsibility to say
"ignore errors that occur after I decide to hang up".  Here are three
different behaviours I found in popular software, showing termination
by signal, custom error handling that we can't distinguish, and a
bonehead strategy:


Interesting!


$ seq 1 100 | head -5
1
2
3
4
5
... exit code indicates killed by signal

$ python -c "for i in range(100): print i" | head -5
0
1
2
3
4
Traceback (most recent call last):
   File "", line 1, in
IOError: [Errno 32] Broken pipe
... exit code 1


That's sad.


$ cat Test.java
public class Test {
 public static void main(String[] args) {
 for (int i = 0; i<  100; ++i) {
 System.out.println(Integer.toString(i));
 }
 }
}
$ javac Test.java
$ java Test | head -5
0
1
2
3
4
... wait a really long time with no output, exit code 0

(Explanation: JVMs ignore SIGPIPE and usually convert EPIPE into an IO
exception, except for PrintStreams like System.out which just eat data
after an error...)


I agree that that is a bonehead strategy, but that seems not that bad to me.


4. Are there any other signals we ought to be reverting to default
behavior before launching a COPY TO/FROM PROGRAM?


On my FreeBSD system, I compared the output of procstat -i (= show
signal disposition) for two "sleep 60" processes, one invoked from the
shell and the other from COPY ... FROM PROGRAM.  The differences were:
PIPE, TTIN, TTOU and USR2.  For the first and last of those, the
default action would be to terminate the process, but the COPY PROGRAM
child ignored them; for TTIN and TTOU, the default action would be to
stop the process, but again they are ignored.  Why do bgwriter.c,
startup.c, ... set SIGTTIN and SIGTTOU back to SIG_DFL, but not
regular backends?


So, we should revert SIGUSR2 as well to default processing?

Thanks!

Best regards,
Etsuro Fujita

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

2018-11-06 Thread Kyotaro HORIGUCHI

At Tue, 06 Nov 2018 21:07:37 +0900, Etsuro Fujita  
wrote in <5be18409.2070...@lab.ntt.co.jp>
> (2018/11/06 12:53), Kyotaro HORIGUCHI wrote:
> > At Fri, 02 Nov 2018 22:05:36 +0900, Etsuro
> > Fujita wrote
> > in<5bdc4ba0.7050...@lab.ntt.co.jp>
> >> (2018/10/29 15:58), Kyotaro HORIGUCHI wrote:
> >>> At Tue, 23 Oct 2018 13:21:31 +0100, Tom Lane  wrote
> >>> in<18397.1540297...@sss.pgh.pa.us>
>  It's just a POC because there are some things that need more thought
>  than I've given them:
> 
>  1. Is it OK to revert SIGPIPE to default processing for *all* programs
>  launched through OpenPipeStream?  If not, what infrastructure do we
>  need to add to control that?  In particular, is it sane to revert
>  SIGPIPE for a pipe program that we will write to not read from?
>  (I thought probably yes, because that is the normal Unix behavior,
>  but it could be debated.)
> 
> >> ISTM that it would be OK to inherit SIG_DFL in both cases, because I
> >> think it would be the responsibility of the called program to handle
> >> SIGPIPE properly (if necessary) in both cases.  Maybe I'm missing
> >> something, though.
> >
> > So, I think we *should* (not just OK to) restore SIGPIPE to
> > SIG_DFL in any case here to prevent undetermined situation for
> > the program.
> 
> OK
> 
>  2. Likewise, is it always OK for ClosePipeToProgram to ignore a
>  SIGPIPE failure?  (For ordinary COPY, maybe it shouldn't, since
>  we don't intend to terminate that early.)
> >>>
> >>> Since the SIGPIPE may come from another pipe, I think we
> >>> shouldn't generally.
> >>
> >> Agreed; if ClosePipeToProgram ignores that failure, we would fail to
> >> get a better error message in CopySendEndOfRow if the called program
> >> (inheriting SIG_DFL for SIGPIPE) was terminated on SIGPIPE:
> >>
> >>  if (cstate->is_program)
> >>  {
> >>  if (errno == EPIPE)
> >>  {
> >>  /*
> >>   * The pipe will be closed automatically on error
> >>   * at
> >>   * the end of transaction, but we might get a
> >>   * better
> >>   * error message from the subprocess' exit code
> >>   * than
> >>   * just "Broken Pipe"
> >>   */
> >>  ClosePipeToProgram(cstate);
> >>
> >>  /*
> >>   * If ClosePipeToProgram() didn't throw an error,
> >>   * the
> >>   * program terminated normally, but closed the
> >>   * pipe
> >>   * first. Restore errno, and throw an error.
> >>   */
> >>  errno = EPIPE;
> >>  }
> >>  ereport(ERROR,
> >>  (errcode_for_file_access(),
> >>   errmsg("could not write to COPY program:
> >>   %m")));
> >>  }
> >
> > Mmm, that's EPIPE of fwrite, not SIGNALLED&SIGPIPE of called
> > program's exit status. So it is irrelevant to called program's
> > SIGPIPE setup. It requires SIGPIPE to be kept to SIG_IGN on the
> > backend side.
> 
> My explanation might not be enough.  Let me explain.  If the called
> program that inherited SIG_DFL for SIGPIPE terminated on SIGPIPE for
> some reason, ClosePipeToProgram *as-is* would create an error message
> from that program's exit code.  But if we modify ClosePipeToProgram
> like the original POC patch, that function would not create that
> message for that termination.  To avoid that, I think it would be
> better for ClosePipeToProgram to ignore the SIGPIPE failure only in
> the case where the caller is a COPY FROM PROGRAM that is allowed to
> terminate early. (Maybe we could specify that as a new argument for
> BeginCopyFrom.)

I understood. Thanks for the explanation. I agree to that.

> >>> But iff we are explicitly stop reading from
> >>> the pipe before detecting an error, it can be ignored since we
> >>> aren't interested in further failure.
> >>
> >> You mean that we should ignore other failures of the called program
> >> when we stop reading from the pipe early?
> >
> > Yes, we have received sufficient data from the pipe then closed
> > it successfully. The program may want write more but we don't
> > want it. We ragher should ignore SIGPIPE exit code of the program
rather
> > since closing our writing end of a pipe is likely to cause it and
> 
> I think so too.
> 
> > even if it comes from another pipe, we can assume that the
> > SIGPIPE immediately stopped the program before returning any
> > garbage to us.
> 
> Sorry, I don't understand this.

Mmm. It looks confusing, sorry. In other words:

We don't care the reason

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

2018-11-06 Thread Etsuro Fujita


(2018/11/06 12:53), Kyotaro HORIGUCHI wrote:

At Fri, 02 Nov 2018 22:05:36 +0900, Etsuro Fujita  wrote 
in<5bdc4ba0.7050...@lab.ntt.co.jp>

(2018/10/29 15:58), Kyotaro HORIGUCHI wrote:

At Tue, 23 Oct 2018 13:21:31 +0100, Tom Lane  wrote
in<18397.1540297...@sss.pgh.pa.us>

It's just a POC because there are some things that need more thought
than I've given them:

1. Is it OK to revert SIGPIPE to default processing for *all* programs
launched through OpenPipeStream?  If not, what infrastructure do we
need to add to control that?  In particular, is it sane to revert
SIGPIPE for a pipe program that we will write to not read from?
(I thought probably yes, because that is the normal Unix behavior,
but it could be debated.)



ISTM that it would be OK to inherit SIG_DFL in both cases, because I
think it would be the responsibility of the called program to handle
SIGPIPE properly (if necessary) in both cases.  Maybe I'm missing
something, though.


So, I think we *should* (not just OK to) restore SIGPIPE to
SIG_DFL in any case here to prevent undetermined situation for
the program.


OK


2. Likewise, is it always OK for ClosePipeToProgram to ignore a
SIGPIPE failure?  (For ordinary COPY, maybe it shouldn't, since
we don't intend to terminate that early.)


Since the SIGPIPE may come from another pipe, I think we
shouldn't generally.


Agreed; if ClosePipeToProgram ignores that failure, we would fail to
get a better error message in CopySendEndOfRow if the called program
(inheriting SIG_DFL for SIGPIPE) was terminated on SIGPIPE:

 if (cstate->is_program)
 {
 if (errno == EPIPE)
 {
 /*
  * The pipe will be closed automatically on error at
  * the end of transaction, but we might get a better
  * error message from the subprocess' exit code than
  * just "Broken Pipe"
  */
 ClosePipeToProgram(cstate);

 /*
  * If ClosePipeToProgram() didn't throw an error, the
  * program terminated normally, but closed the pipe
  * first. Restore errno, and throw an error.
  */
 errno = EPIPE;
 }
 ereport(ERROR,
 (errcode_for_file_access(),
  errmsg("could not write to COPY program: %m")));
 }


Mmm, that's EPIPE of fwrite, not SIGNALLED&SIGPIPE of called
program's exit status. So it is irrelevant to called program's
SIGPIPE setup. It requires SIGPIPE to be kept to SIG_IGN on the
backend side.


My explanation might not be enough.  Let me explain.  If the called 
program that inherited SIG_DFL for SIGPIPE terminated on SIGPIPE for 
some reason, ClosePipeToProgram *as-is* would create an error message 
from that program's exit code.  But if we modify ClosePipeToProgram like 
the original POC patch, that function would not create that message for 
that termination.  To avoid that, I think it would be better for 
ClosePipeToProgram to ignore the SIGPIPE failure only in the case where 
the caller is a COPY FROM PROGRAM that is allowed to terminate early. 
(Maybe we could specify that as a new argument for BeginCopyFrom.)



But iff we are explicitly stop reading from
the pipe before detecting an error, it can be ignored since we
aren't interested in further failure.


You mean that we should ignore other failures of the called program
when we stop reading from the pipe early?


Yes, we have received sufficient data from the pipe then closed
it successfully. The program may want write more but we don't
want it. We ragher should ignore SIGPIPE exit code of the program
since closing our writing end of a pipe is likely to cause it and


I think so too.


even if it comes from another pipe, we can assume that the
SIGPIPE immediately stopped the program before returning any
garbage to us.


Sorry, I don't understand this.


4. Are there any other signals we ought to be reverting to default
behavior before launching a COPY TO/FROM PROGRAM?


All handlers other than SIG_DEF and IGN are reset on exec().
Only SIGUSR2 seems to be set to SIG_IGN, but it's not likely to
harm anything. Perhaps it would be safer against future changes
if we explicitly reset all changed actions to SIG_DEF, but it
might be too-much..


Not sure, but reverting SIGPIPE to default would be enough as a fix
for the original issue, if we go with the POC patch.


Agreed. I wonder why there's no API that resets all handlers to
SIG_DFL at once or some flag telling to exec() that it should
start with default signal handler set.


Such API would be an improvement, but IMO I think that would go beyond 
the scope of this fix.



Finally in the attached PoC, I set cstate->eof_reached on failur

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

2018-11-06 Thread Thomas Munro

On Wed, Oct 24, 2018 at 1:21 AM Tom Lane  wrote:
> I wrote:
> > =?utf-8?q?PG_Bug_reporting_form?=  writes:
> >> SELECT * FROM test_file_fdw_program_limit LIMIT 0;
> >> /*
> >> [38000] ERROR: program "echo "test"" failed Detail: child process exited
> >> with exit code 1
> >> */
>
> > Yeah, I can reproduce this on macOS as well as Linux.  Capturing stderr
> > shows something pretty unsurprising:
> > sh: line 1: echo: write error: Broken pipe
> > So the called program is behaving in a somewhat reasonable way: it's
> > detecting EPIPE on its stdout (after we close the pipe), reporting that,
> > and doing exit(1).
> > Unfortunately, it's not clear what we could do about that, short of
> > always reading the whole program output, which is likely to be a
> > cure worse than the disease.  If the program were failing thanks to
> > SIGPIPE, we could recognize that as a case we can ignore ... but with
> > behavior like this, I don't see a reliable way to tell it apart from
> > a generic program failure, which surely we'd better report.
>
> After a bit of thought, the problem here is blindingly obvious:
> we generally run the backend with SIGPIPE handing set to SIG_IGN,
> and evidently popen() allows the called program to inherit that,
> at least on these platforms.
>
> So what we need to do is make sure the called program inherits SIG_DFL
> handling for SIGPIPE, and then special-case that result as not being
> a failure.  The attached POC patch does that and successfully makes
> the file_fdw problem go away for me.
>
> It's just a POC because there are some things that need more thought
> than I've given them:
>
> 1. Is it OK to revert SIGPIPE to default processing for *all* programs
> launched through OpenPipeStream?  If not, what infrastructure do we
> need to add to control that?  In particular, is it sane to revert
> SIGPIPE for a pipe program that we will write to not read from?
> (I thought probably yes, because that is the normal Unix behavior,
> but it could be debated.)
>
> 2. Likewise, is it always OK for ClosePipeToProgram to ignore a
> SIGPIPE failure?  (For ordinary COPY, maybe it shouldn't, since
> we don't intend to terminate that early.)

I'm not sure about that.  It might in theory be telling you about some
other pipe.  If you're OK with that false positive, why not ignore all
errors after you've read enough successful input and decided to close
the pipe?

> 3. Maybe this should be implemented at some higher level?

It won't work for some programs that ignore or catch the signal, so in
theory you might want to give users the power/responsibility to say
"ignore errors that occur after I decide to hang up".  Here are three
different behaviours I found in popular software, showing termination
by signal, custom error handling that we can't distinguish, and a
bonehead strategy:

$ seq 1 100 | head -5
1
2
3
4
5
... exit code indicates killed by signal

$ python -c "for i in range(100): print i" | head -5
0
1
2
3
4
Traceback (most recent call last):
  File "", line 1, in 
IOError: [Errno 32] Broken pipe
... exit code 1

$ cat Test.java
public class Test {
public static void main(String[] args) {
for (int i = 0; i < 100; ++i) {
System.out.println(Integer.toString(i));
}
}
}
$ javac Test.java
$ java Test | head -5
0
1
2
3
4
... wait a really long time with no output, exit code 0

(Explanation: JVMs ignore SIGPIPE and usually convert EPIPE into an IO
exception, except for PrintStreams like System.out which just eat data
after an error...)

> 4. Are there any other signals we ought to be reverting to default
> behavior before launching a COPY TO/FROM PROGRAM?

On my FreeBSD system, I compared the output of procstat -i (= show
signal disposition) for two "sleep 60" processes, one invoked from the
shell and the other from COPY ... FROM PROGRAM.  The differences were:
PIPE, TTIN, TTOU and USR2.  For the first and last of those, the
default action would be to terminate the process, but the COPY PROGRAM
child ignored them; for TTIN and TTOU, the default action would be to
stop the process, but again they are ignored.  Why do bgwriter.c,
startup.c, ... set SIGTTIN and SIGTTOU back to SIG_DFL, but not
regular backends?

-- 
Thomas Munro
http://www.enterprisedb.com

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

2018-11-05 Thread Kyotaro HORIGUCHI

At Fri, 02 Nov 2018 22:05:36 +0900, Etsuro Fujita  
wrote in <5bdc4ba0.7050...@lab.ntt.co.jp>
> (2018/10/29 15:58), Kyotaro HORIGUCHI wrote:
> > At Tue, 23 Oct 2018 13:21:31 +0100, Tom Lane wrote
> > in<18397.1540297...@sss.pgh.pa.us>
> >> After a bit of thought, the problem here is blindingly obvious:
> >> we generally run the backend with SIGPIPE handing set to SIG_IGN,
> >> and evidently popen() allows the called program to inherit that,
> >> at least on these platforms.
> >>
> >> So what we need to do is make sure the called program inherits SIG_DFL
> >> handling for SIGPIPE, and then special-case that result as not being
> >> a failure.  The attached POC patch does that and successfully makes
> >> the file_fdw problem go away for me.
> 
> Thanks for working on this!
> 
> >> It's just a POC because there are some things that need more thought
> >> than I've given them:
> >>
> >> 1. Is it OK to revert SIGPIPE to default processing for *all* programs
> >> launched through OpenPipeStream?  If not, what infrastructure do we
> >> need to add to control that?  In particular, is it sane to revert
> >> SIGPIPE for a pipe program that we will write to not read from?
> >> (I thought probably yes, because that is the normal Unix behavior,
> >> but it could be debated.)
> >
> > Forcing to ignore SIGPIPE lets the program to handle EPIPE, which
> > might not be handled poperly since it would be unexpected by the
> > program.
> >
> > If we don't read from the pipe (that is, we are writing to it),
> > anyway we shouldn't change the behavior since SIGPIPE can come
> > from another pipe.
> 
> I'm a bit confused here.  Horiguchi-san, are you saying that the
> called program that we will read from should inherit SIG_DFL for
> SIGPIPE, as proposed in the POC patch, but the called program that we
> will write to should inherit SIG_IGN as it currently does?

Maybe yes. The understanding of the first paragram looks
right. But in my second paragraph, I said that we shouldn't set
SIGPIPE to other than SIG_DFL at exec() time even if we are to
read the pipe because it can be writing to another pipe and the
change can affect the program's behavior.

> ISTM that it would be OK to inherit SIG_DFL in both cases, because I
> think it would be the responsibility of the called program to handle
> SIGPIPE properly (if necessary) in both cases.  Maybe I'm missing
> something, though.

So, I think we *should* (not just OK to) restore SIGPIPE to
SIG_DFL in any case here to prevent undetermined situation for
the program.

> >> 2. Likewise, is it always OK for ClosePipeToProgram to ignore a
> >> SIGPIPE failure?  (For ordinary COPY, maybe it shouldn't, since
> >> we don't intend to terminate that early.)
> >
> > Since the SIGPIPE may come from another pipe, I think we
> > shouldn't generally.
> 
> Agreed; if ClosePipeToProgram ignores that failure, we would fail to
> get a better error message in CopySendEndOfRow if the called program
> (inheriting SIG_DFL for SIGPIPE) was terminated on SIGPIPE:
> 
> if (cstate->is_program)
> {
> if (errno == EPIPE)
> {
> /*
>  * The pipe will be closed automatically on error at
>  * the end of transaction, but we might get a better
>  * error message from the subprocess' exit code than
>  * just "Broken Pipe"
>  */
> ClosePipeToProgram(cstate);
> 
> /*
>  * If ClosePipeToProgram() didn't throw an error, the
>  * program terminated normally, but closed the pipe
>  * first. Restore errno, and throw an error.
>  */
> errno = EPIPE;
> }
> ereport(ERROR,
> (errcode_for_file_access(),
>  errmsg("could not write to COPY program: %m")));
> }

Mmm, that's EPIPE of fwrite, not SIGNALLED&SIGPIPE of called
program's exit status. So it is irrelevant to called program's
SIGPIPE setup. It requires SIGPIPE to be kept to SIG_IGN on the
backend side.

> > But iff we are explicitly stop reading from
> > the pipe before detecting an error, it can be ignored since we
> > aren't interested in further failure.
> 
> You mean that we should ignore other failures of the called program
> when we stop reading from the pipe early?

Yes, we have received sufficient data from the pipe then closed
it successfully. The program may want write more but we don't
want it. We ragher should ignore SIGPIPE exit code of the program
since closing our writing end of a pipe is likely to cause it and
even if it comes from another pipe, we can assume that the
SIGPIPE immediately stopped the program before returning any
garbage to us.

> > Thus ClosePipeToProgram
> >

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

2018-11-02 Thread Etsuro Fujita


(2018/10/29 15:58), Kyotaro HORIGUCHI wrote:

At Tue, 23 Oct 2018 13:21:31 +0100, Tom Lane  wrote 
in<18397.1540297...@sss.pgh.pa.us>

After a bit of thought, the problem here is blindingly obvious:
we generally run the backend with SIGPIPE handing set to SIG_IGN,
and evidently popen() allows the called program to inherit that,
at least on these platforms.

So what we need to do is make sure the called program inherits SIG_DFL
handling for SIGPIPE, and then special-case that result as not being
a failure.  The attached POC patch does that and successfully makes
the file_fdw problem go away for me.


Thanks for working on this!


It's just a POC because there are some things that need more thought
than I've given them:

1. Is it OK to revert SIGPIPE to default processing for *all* programs
launched through OpenPipeStream?  If not, what infrastructure do we
need to add to control that?  In particular, is it sane to revert
SIGPIPE for a pipe program that we will write to not read from?
(I thought probably yes, because that is the normal Unix behavior,
but it could be debated.)


Forcing to ignore SIGPIPE lets the program to handle EPIPE, which
might not be handled poperly since it would be unexpected by the
program.

If we don't read from the pipe (that is, we are writing to it),
anyway we shouldn't change the behavior since SIGPIPE can come
from another pipe.


I'm a bit confused here.  Horiguchi-san, are you saying that the called 
program that we will read from should inherit SIG_DFL for SIGPIPE, as 
proposed in the POC patch, but the called program that we will write to 
should inherit SIG_IGN as it currently does?


ISTM that it would be OK to inherit SIG_DFL in both cases, because I 
think it would be the responsibility of the called program to handle 
SIGPIPE properly (if necessary) in both cases.  Maybe I'm missing 
something, though.



2. Likewise, is it always OK for ClosePipeToProgram to ignore a
SIGPIPE failure?  (For ordinary COPY, maybe it shouldn't, since
we don't intend to terminate that early.)


Since the SIGPIPE may come from another pipe, I think we
shouldn't generally.


Agreed; if ClosePipeToProgram ignores that failure, we would fail to get 
a better error message in CopySendEndOfRow if the called program 
(inheriting SIG_DFL for SIGPIPE) was terminated on SIGPIPE:


if (cstate->is_program)
{
if (errno == EPIPE)
{
/*
 * The pipe will be closed automatically on 
error at
 * the end of transaction, but we might get a 
better
 * error message from the subprocess' exit code 
than

 * just "Broken Pipe"
 */
ClosePipeToProgram(cstate);

/*
 * If ClosePipeToProgram() didn't throw an 
error, the

 * program terminated normally, but closed the pipe
 * first. Restore errno, and throw an error.
 */
errno = EPIPE;
}
ereport(ERROR,
(errcode_for_file_access(),
 errmsg("could not write to COPY program: 
%m")));

}


But iff we are explicitly stop reading from
the pipe before detecting an error, it can be ignored since we
aren't interested in further failure.


You mean that we should ignore other failures of the called program when 
we stop reading from the pipe early?



Thus ClosePipeToProgram
might need an extra parameter or CopyState may need an additional
flag named, say, "explicit_close" (or
"ignore_sigpipe_on_pclose"..) in CopyState which will be set in
EndCopy.


3. Maybe this should be implemented at some higher level?


I'm not sure, but it seems to me not needed.


ISTM that it's a good idea to control the ClosePipeToProgram behavior by 
a new member for CopyState.



4. Are there any other signals we ought to be reverting to default
behavior before launching a COPY TO/FROM PROGRAM?


All handlers other than SIG_DEF and IGN are reset on exec().
Only SIGUSR2 seems to be set to SIG_IGN, but it's not likely to
harm anything. Perhaps it would be safer against future changes
if we explicitly reset all changed actions to SIG_DEF, but it
might be too-much..


Not sure, but reverting SIGPIPE to default would be enough as a fix for 
the original issue, if we go with the POC patch.


Best regards,
Etsuro Fujita

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

2018-10-28 Thread Kyotaro HORIGUCHI

Hello.

At Tue, 23 Oct 2018 13:21:31 +0100, Tom Lane  wrote in 
<18397.1540297...@sss.pgh.pa.us>
> I wrote:
> > =?utf-8?q?PG_Bug_reporting_form?=  writes:
> >> SELECT * FROM test_file_fdw_program_limit LIMIT 0;
> >> /*
> >> [38000] ERROR: program "echo "test"" failed Detail: child process exited
> >> with exit code 1
> >> */
> 
> > Yeah, I can reproduce this on macOS as well as Linux.  Capturing stderr
> > shows something pretty unsurprising:
> > sh: line 1: echo: write error: Broken pipe
> > So the called program is behaving in a somewhat reasonable way: it's
> > detecting EPIPE on its stdout (after we close the pipe), reporting that,
> > and doing exit(1).
> > Unfortunately, it's not clear what we could do about that, short of
> > always reading the whole program output, which is likely to be a
> > cure worse than the disease.  If the program were failing thanks to
> > SIGPIPE, we could recognize that as a case we can ignore ... but with
> > behavior like this, I don't see a reliable way to tell it apart from
> > a generic program failure, which surely we'd better report.
> 
> After a bit of thought, the problem here is blindingly obvious:
> we generally run the backend with SIGPIPE handing set to SIG_IGN,
> and evidently popen() allows the called program to inherit that,
> at least on these platforms.
> 
> So what we need to do is make sure the called program inherits SIG_DFL
> handling for SIGPIPE, and then special-case that result as not being
> a failure.  The attached POC patch does that and successfully makes
> the file_fdw problem go away for me.
> 
> It's just a POC because there are some things that need more thought
> than I've given them:
> 
> 1. Is it OK to revert SIGPIPE to default processing for *all* programs
> launched through OpenPipeStream?  If not, what infrastructure do we
> need to add to control that?  In particular, is it sane to revert
> SIGPIPE for a pipe program that we will write to not read from?
> (I thought probably yes, because that is the normal Unix behavior,
> but it could be debated.)

Forcing to ignore SIGPIPE lets the program to handle EPIPE, which
might not be handled poperly since it would be unexpected by the
program.

If we don't read from the pipe (that is, we are writing to it),
anyway we shouldn't change the behavior since SIGPIPE can come
from another pipe.

> 2. Likewise, is it always OK for ClosePipeToProgram to ignore a
> SIGPIPE failure?  (For ordinary COPY, maybe it shouldn't, since
> we don't intend to terminate that early.)

Since the SIGPIPE may come from another pipe, I think we
shouldn't generally. But iff we are explicitly stop reading from
the pipe before detecting an error, it can be ignored since we
aren't interested in further failure. Thus ClosePipeToProgram
might need an extra parameter or CopyState may need an additional
flag named, say, "explicit_close" (or
"ignore_sigpipe_on_pclose"..) in CopyState which will be set in
EndCopy.

> 3. Maybe this should be implemented at some higher level?

I'm not sure, but it seems to me not needed.

> 4. Are there any other signals we ought to be reverting to default
> behavior before launching a COPY TO/FROM PROGRAM?

All handlers other than SIG_DEF and IGN are reset on exec().
Only SIGUSR2 seems to be set to SIG_IGN, but it's not likely to
harm anything. Perhaps it would be safer against future changes
if we explicitly reset all changed actions to SIG_DEF, but it
might be too-much..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

2018-10-23 Thread Tom Lane

I wrote:
> =?utf-8?q?PG_Bug_reporting_form?=  writes:
>> SELECT * FROM test_file_fdw_program_limit LIMIT 0;
>> /*
>> [38000] ERROR: program "echo "test"" failed Detail: child process exited
>> with exit code 1
>> */

> Yeah, I can reproduce this on macOS as well as Linux.  Capturing stderr
> shows something pretty unsurprising:
> sh: line 1: echo: write error: Broken pipe
> So the called program is behaving in a somewhat reasonable way: it's
> detecting EPIPE on its stdout (after we close the pipe), reporting that,
> and doing exit(1).
> Unfortunately, it's not clear what we could do about that, short of
> always reading the whole program output, which is likely to be a
> cure worse than the disease.  If the program were failing thanks to
> SIGPIPE, we could recognize that as a case we can ignore ... but with
> behavior like this, I don't see a reliable way to tell it apart from
> a generic program failure, which surely we'd better report.

After a bit of thought, the problem here is blindingly obvious:
we generally run the backend with SIGPIPE handing set to SIG_IGN,
and evidently popen() allows the called program to inherit that,
at least on these platforms.

So what we need to do is make sure the called program inherits SIG_DFL
handling for SIGPIPE, and then special-case that result as not being
a failure.  The attached POC patch does that and successfully makes
the file_fdw problem go away for me.

It's just a POC because there are some things that need more thought
than I've given them:

1. Is it OK to revert SIGPIPE to default processing for *all* programs
launched through OpenPipeStream?  If not, what infrastructure do we
need to add to control that?  In particular, is it sane to revert
SIGPIPE for a pipe program that we will write to not read from?
(I thought probably yes, because that is the normal Unix behavior,
but it could be debated.)

2. Likewise, is it always OK for ClosePipeToProgram to ignore a
SIGPIPE failure?  (For ordinary COPY, maybe it shouldn't, since
we don't intend to terminate that early.)

3. Maybe this should be implemented at some higher level?

4. Are there any other signals we ought to be reverting to default
behavior before launching a COPY TO/FROM PROGRAM?

regards, tom lane

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b58a74f4e3db9d8e8f30d35e08ca84814742faa3..c4c6c875ec8102df2756d8ea1ba79791d63fed25 100644
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
*** ClosePipeToProgram(CopyState cstate)
*** 1727,1737 
--- 1727,1746 
  (errcode_for_file_access(),
   errmsg("could not close pipe to external command: %m")));
  	else if (pclose_rc != 0)
+ 	{
+ 		/*
+ 		 * If the called program terminated on SIGPIPE, assume it's OK; we
+ 		 * must have chosen to stop reading its output early.
+ 		 */
+ 		if (WIFSIGNALED(pclose_rc) && WTERMSIG(pclose_rc) == SIGPIPE)
+ 			return;
+ 
  		ereport(ERROR,
  (errcode(ERRCODE_EXTERNAL_ROUTINE_EXCEPTION),
   errmsg("program \"%s\" failed",
  		cstate->filename),
   errdetail_internal("%s", wait_result_to_str(pclose_rc;
+ 	}
  }
  
  /*
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 8dd51f176749c5d1b2adc9993b5a9a3571ee566c..6cec85e2c1dc7dea23c8b941e5080714ea65e57a 100644
*** a/src/backend/storage/file/fd.c
--- b/src/backend/storage/file/fd.c
*** OpenTransientFilePerm(const char *fileNa
*** 2430,2440 
--- 2430,2445 
   * Routines that want to initiate a pipe stream should use OpenPipeStream
   * rather than plain popen().  This lets fd.c deal with freeing FDs if
   * necessary.  When done, call ClosePipeStream rather than pclose.
+  *
+  * This function also ensures that the popen'd program is run with default
+  * SIGPIPE processing, rather than the SIG_IGN setting the backend normally
+  * uses.  This ensures desirable response to, eg, closing a read pipe early.
   */
  FILE *
  OpenPipeStream(const char *command, const char *mode)
  {
  	FILE	   *file;
+ 	int			save_errno;
  
  	DO_DB(elog(LOG, "OpenPipeStream: Allocated %d (%s)",
  			   numAllocatedDescs, command));
*** OpenPipeStream(const char *command, cons
*** 2452,2459 
  TryAgain:
  	fflush(stdout);
  	fflush(stderr);
  	errno = 0;
! 	if ((file = popen(command, mode)) != NULL)
  	{
  		AllocateDesc *desc = &allocatedDescs[numAllocatedDescs];
  
--- 2457,2469 
  TryAgain:
  	fflush(stdout);
  	fflush(stderr);
+ 	pqsignal(SIGPIPE, SIG_DFL);
  	errno = 0;
! 	file = popen(command, mode);
! 	save_errno = errno;
! 	pqsignal(SIGPIPE, SIG_IGN);
! 	errno = save_errno;
! 	if (file != NULL)
  	{
  		AllocateDesc *desc = &allocatedDescs[numAllocatedDescs];
  
*** TryAgain:
*** 2466,2473 
  
  	if (errno == EMFILE || errno == ENFILE)
  	{
- 		int			save_errno = errno;
- 
  		ereport(LOG,
  (errcode(ERRCODE_INSUFFICIENT_RESOURCES),
   errmsg("out of file descriptors: %m; release and

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

26 matches

Site Navigation

Mail list logo

Footer information