Re: [HACKERS] SQL/MED - file_fdw

Noah Misch Mon, 07 Feb 2011 04:17:28 -0800

On Mon, Feb 07, 2011 at 03:39:39PM +0900, Itagaki Takahiro wrote:
> On Sun, Feb 6, 2011 at 09:01, Noah Misch <[email protected]> wrote:
> > Most "parse analysis"-type bits of DoCopy() move to BeginCopy().
> 
> It would be possible to move more FROM-only or TO-only members in BeginCopy()
> into their BeginCopyFrom/To(), but it is just a code refactoring requires
> more code rearrangement. It should be done by another try even if needed.


Agreed.

> > CopyState.processed is gone, with the row count now bubbling up
> > through return values.
> 
> I removed it from CopyState because it represents "number of inserted rows"
> in COPY FROM. However, for the viewpoint of API, the numbers of lines in
> the input file is more suitable. To avoid the confusion, I moved it
> from a common member to COPY FROM-only variable.

Perhaps I'm missing something.  The new API does not expose a "processed" count
at all; that count is used for the command tag of a top-level COPY.  This part
of the patch is just changing how we structure the code to maintain that tally,
and it has no implications for observers outside copy.c.  Right?

> > For COPY TO, the relkind checks move from DoCopyTo() to BeginCopyTo(). ??I'm
> > skeptical about this one. ??It's not required for correctness, and the 
> > relkind
> > checks for COPY FROM remain in CopyFrom().
> 
> I think error checks should be done in the early phase, so I moved the check
> to BeginCopyTo(). However, relkind check in COPY FROM is needed only for
> COPY FROM command because the relation is the inserted target. For APIs,
> the relation is used as a template for input file. So, we cannot perform
> the check in BeginCopyFrom().

The choice of where to put them does not affect anything outside of copy.c, and
placement in DoCopyTo() would make the symmetry between the TO and FROM code
paths easier to follow.  Not a major concern, though.

> > file_fdw uses CopyFromErrorCallback() to give errors the proper context. 
> > ??The
> > function uses template strings like "COPY %s, line %d", where %s is the 
> > name of
> > the relation being copied. ??Presumably file_fdw and other features using 
> > this
> > API would wish to customize that error message prefix, and the relation name
> > might not be apropos at all. ??How about another argument to BeginCopyFrom,
> > specifying an error prefix to be stashed in the CopyState?
> 
> I changed "COPY %s, .." to "relation %s, ..." because the first string is
> the relation name anyway. We could have another prefix argument, but I think
> it has little information for errors.

That's perhaps an improvement for file_fdw, but not for regular COPY.

My comment originated with a faulty idea that file_fdw's internal use of COPY
was an implementation detail that users should not need to see.  Looking now,
the file_fdw documentation clearly explains the tie to COPY, even referring
users to the COPY documentation.  I no longer see a need to hide the fact that
the "foreign" source is a PostgreSQL COPY command.  The error messages are fine
as they were.

Some future client of the new API may wish to hide its internal COPY use, but
there's no need to design for that now.

> We also have many "COPY" in other messages, but they won't be used actually
> because the are messages for invalid arguments and file_fdw will have own
> validater function. All invalid arguments will be filtered in CREATE commands.

Agreed; "could not read from COPY file: %m" appears to be the primary one liable
to happen in practice.  The greater failure with that one is that, given a query
reading from multiple file_fdw tables, you can't tell which file had a problem.

This issue runs broader than the patch at hand; I will start another thread to
address it.  Let's proceed with this patch, not changing any error messages.  If
other discussion concludes that the desired behavior requires an enhancement to
this new API, a followup commit can implement that.

> >> ! ?? ?? ?? ?? ?? ?? /* Reset the per-tuple exprcontext */
> >> ! ?? ?? ?? ?? ?? ?? ResetPerTupleExprContext(estate);
> >> !
> >> ! ?? ?? ?? ?? ?? ?? /* Switch into its memory context */
> >> ! ?? ?? ?? ?? ?? ?? 
> >> MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
> >
> > Shouldn't a switch to this context happen inside NextCopyFrom(), then again 
> > for
> > the calls to heap_form_tuple/HeapTupleSetOid? ??I found previous discussion 
> > on
> > this matter, but no conclusion.
> 
> In my understanding, NextCopyFrom() should use CurrentMemoryContext provided
> by the caller. The file_fdw will use executor's per-tuple context for it.

In a direct call to COPY FROM, all of NextCopyFrom() uses the per-tuple context
of CopyState->estate.  We reset that context before each call to NextCopyFrom().
This is true before (ignoring code movement) and after the patch.

A file_fdw NextCopyFrom() call will use the per-tuple context of the executor
performing a foreign scan.  Allocations will arise primarily in type input
functions.  ExecEvalExpr(), used to acquire default values, will still use the
per-tuple context of CopyState->estate.  That per-tuple context will never get
reset explicitly, so default value computations leak until EndCopyFrom().

I see memory context use was discussed already, but I don't see the
aforementioned specific details addressed:
http://archives.postgresql.org/message-id/[email protected]

The use of ExecEvalExpr() in NextCopyFrom() also seems contrary to the
execnodes.h comment you quoted in that thread ("CurrentMemoryContext should be
set to ecxt_per_tuple_memory before calling ExecEvalExpr() --- see
ExecEvalExprSwitchContext().").  NextCopyFrom() passes CopyState->estate to
ExecEvalExpr, but the current context may be the per-tuple context of a
different executor state.

> Another issue in the automatic context switch is to discard the previous
> values in the next call while the caller is unaware.

True; we would need to document that pointers stored in the "values" argument of
NextCopyData() are valid only until the next call.  Does that already work with
file_fdw's usage pattern, or would file_fdw need to make copies?


Three hunk-specific comments (diff is between previously-reviewed copy.c and
current master plus today's patch):

> *** a/src/backend/commands/copy.c
> --- b/src/backend/commands/copy.c
> ***************
> *** 175,193 **** typedef struct CopyStateData
>        * The definition of input functions and default expressions are stored
>        * in these variables.
>        */
>       EState     *estate;
>       AttrNumber      num_defaults;
>       bool            file_has_oids;
>       FmgrInfo        oid_in_function;
>       Oid                     oid_typioparam;
> !     FmgrInfo   *in_functions;
> !     Oid                *typioparams;
> !     int                *defmap;
>       ExprState **defexprs;           /* array of default att expressions */
>   } CopyStateData;
>   
>   /* DestReceiver for COPY (SELECT) TO */
>   typedef struct
>   {
>       DestReceiver pub;                       /* publicly-known function 
> pointers */
>       CopyState       cstate;                 /* CopyStateData for the 
> command */
> --- 175,193 ----
>        * The definition of input functions and default expressions are stored
>        * in these variables.
>        */

This block comment remains roughly half correct.  Again, I think a small comment
on every field below should replace it.

>       EState     *estate;
>       AttrNumber      num_defaults;
>       bool            file_has_oids;
>       FmgrInfo        oid_in_function;
>       Oid                     oid_typioparam;
> !     FmgrInfo   *in_functions;       /* array of input functions for each 
> attrs */
> !     Oid                *typioparams;        /* array of element types for 
> in_functions */
> !     int                *defmap;                     /* array of default att 
> numbers */
>       ExprState **defexprs;           /* array of default att expressions */
>   } CopyStateData;
>   
>   /* DestReceiver for COPY (SELECT) TO */
>   typedef struct
>   {
>       DestReceiver pub;                       /* publicly-known function 
> pointers */
>       CopyState       cstate;                 /* CopyStateData for the 
> command */
> ***************
> *** 1268,1283 **** BeginCopy(bool is_from,
> --- 1268,1289 ----
>   }
>   
>   /*
>    * Release resources allocated in a cstate.
>    */
>   static void
>   EndCopy(CopyState cstate)
>   {
> +     if (cstate->filename != NULL && FreeFile(cstate->copy_file))
> +             ereport(ERROR,
> +                             (errcode_for_file_access(),
> +                              errmsg("could not close file \"%s\": %m",
> +                                             cstate->filename)));

In the write case, this is not an improvement -- failure in fclose(3) almost
always arises from the attempt to flush buffered writes.  Note how most other
call sites checking the return value of FreeFile() use an error message like the
original one.  The change to the read case seems fine.

> + 
>       MemoryContextDelete(cstate->copycontext);
>       pfree(cstate);
>   }
>   
>   /*
>    * Setup CopyState to read tuples from a table or a query for COPY TO.
>    */
>   static CopyState
> ***************
> *** 1400,1424 **** DoCopyTo(CopyState cstate)
>   }
>   
>   /*
>    * Clean up storage and release resources for COPY TO.
>    */
>   static void
>   EndCopyTo(CopyState cstate)
>   {
> -     if (cstate->filename != NULL && FreeFile(cstate->copy_file))
> -             ereport(ERROR,
> -                             (errcode_for_file_access(),
> -                              errmsg("could not close file \"%s\": %m",
> -                                             cstate->filename)));
> - 
> -     /*
> -      * Close the relation or query. We can release the AccessShareLock we 
> got.
> -      */
>       if (cstate->queryDesc != NULL)
>       {
>               /* Close down the query and free resources. */
>               ExecutorEnd(cstate->queryDesc);
>               FreeQueryDesc(cstate->queryDesc);
>               PopActiveSnapshot();
>       }
>   
> --- 1406,1421 ----
> ***************
> *** 1681,1746 **** void
>   CopyFromErrorCallback(void *arg)
>   {
>       CopyState       cstate = (CopyState) arg;
>   
>       if (cstate->binary)
>       {
>               /* can't usefully display the data */
>               if (cstate->cur_attname)
> !                     errcontext("COPY %s, line %d, column %s",
>                                          cstate->cur_relname, 
> cstate->cur_lineno,
>                                          cstate->cur_attname);
>               else
> !                     errcontext("COPY %s, line %d",
>                                          cstate->cur_relname, 
> cstate->cur_lineno);
>       }
>       else
>       {
>               if (cstate->cur_attname && cstate->cur_attval)
>               {
>                       /* error is relevant to a particular column */
>                       char       *attval;
>   
>                       attval = limit_printout_length(cstate->cur_attval);
> !                     errcontext("COPY %s, line %d, column %s: \"%s\"",
>                                          cstate->cur_relname, 
> cstate->cur_lineno,
>                                          cstate->cur_attname, attval);
>                       pfree(attval);
>               }
>               else if (cstate->cur_attname)
>               {
>                       /* error is relevant to a particular column, value is 
> NULL */
> !                     errcontext("COPY %s, line %d, column %s: null input",
>                                          cstate->cur_relname, 
> cstate->cur_lineno,
>                                          cstate->cur_attname);
>               }
>               else
>               {
>                       /* error is relevant to a particular line */
>                       if (cstate->line_buf_converted || 
> !cstate->need_transcoding)
>                       {
>                               char       *lineval;
>   
>                               lineval = 
> limit_printout_length(cstate->line_buf.data);
> !                             errcontext("COPY %s, line %d: \"%s\"",
>                                                  cstate->cur_relname, 
> cstate->cur_lineno, lineval);
>                               pfree(lineval);
>                       }
>                       else
>                       {
>                               /*
>                                * Here, the line buffer is still in a foreign 
> encoding, and
>                                * indeed it's quite likely that the error is 
> precisely a
>                                * failure to do encoding conversion (ie, bad 
> data).  We dare
>                                * not try to convert it, and at present 
> there's no way to
>                                * regurgitate it without conversion.  So we 
> have to punt and
>                                * just report the line number.
>                                */
> !                             errcontext("COPY %s, line %d",
>                                                  cstate->cur_relname, 
> cstate->cur_lineno);
>                       }
>               }
>       }
>   }
>   
>   /*
>    * Make sure we don't print an unreasonable amount of COPY data in a 
> message.
> --- 1678,1743 ----
>   CopyFromErrorCallback(void *arg)
>   {
>       CopyState       cstate = (CopyState) arg;
>   
>       if (cstate->binary)
>       {
>               /* can't usefully display the data */
>               if (cstate->cur_attname)
> !                     errcontext("relation %s, line %d, column %s",
>                                          cstate->cur_relname, 
> cstate->cur_lineno,
>                                          cstate->cur_attname);
>               else
> !                     errcontext("relation %s, line %d",
>                                          cstate->cur_relname, 
> cstate->cur_lineno);
>       }
>       else
>       {
>               if (cstate->cur_attname && cstate->cur_attval)
>               {
>                       /* error is relevant to a particular column */
>                       char       *attval;
>   
>                       attval = limit_printout_length(cstate->cur_attval);
> !                     errcontext("relation %s, line %d, column %s: \"%s\"",
>                                          cstate->cur_relname, 
> cstate->cur_lineno,
>                                          cstate->cur_attname, attval);
>                       pfree(attval);
>               }
>               else if (cstate->cur_attname)
>               {
>                       /* error is relevant to a particular column, value is 
> NULL */
> !                     errcontext("relation %s, line %d, column %s: null 
> input",
>                                          cstate->cur_relname, 
> cstate->cur_lineno,
>                                          cstate->cur_attname);
>               }
>               else
>               {
>                       /* error is relevant to a particular line */
>                       if (cstate->line_buf_converted || 
> !cstate->need_transcoding)
>                       {
>                               char       *lineval;
>   
>                               lineval = 
> limit_printout_length(cstate->line_buf.data);
> !                             errcontext("relation %s, line %d: \"%s\"",
>                                                  cstate->cur_relname, 
> cstate->cur_lineno, lineval);
>                               pfree(lineval);
>                       }
>                       else
>                       {
>                               /*
>                                * Here, the line buffer is still in a foreign 
> encoding, and
>                                * indeed it's quite likely that the error is 
> precisely a
>                                * failure to do encoding conversion (ie, bad 
> data).  We dare
>                                * not try to convert it, and at present 
> there's no way to
>                                * regurgitate it without conversion.  So we 
> have to punt and
>                                * just report the line number.
>                                */
> !                             errcontext("relation %s, line %d",
>                                                  cstate->cur_relname, 
> cstate->cur_lineno);
>                       }
>               }
>       }
>   }
>   
>   /*
>    * Make sure we don't print an unreasonable amount of COPY data in a 
> message.
> ***************
> *** 1782,1798 **** limit_printout_length(const char *str)
>    */
>   static uint64
>   CopyFrom(CopyState cstate)
>   {
>       HeapTuple       tuple;
>       TupleDesc       tupDesc;
>       Datum      *values;
>       bool       *nulls;
> -     bool            done = false;
>       ResultRelInfo *resultRelInfo;
>       EState     *estate = cstate->estate; /* for ExecConstraints() */
>       TupleTableSlot *slot;
>       MemoryContext oldcontext = CurrentMemoryContext;
>       ErrorContextCallback errcontext;
>       CommandId       mycid = GetCurrentCommandId(true);
>       int                     hi_options = 0; /* start with default 
> heap_insert options */
>       BulkInsertState bistate;
> --- 1779,1794 ----
> ***************
> *** 1906,1936 **** CopyFrom(CopyState cstate)
>       bistate = GetBulkInsertState();
>   
>       /* Set up callback to identify error line number */
>       errcontext.callback = CopyFromErrorCallback;
>       errcontext.arg = (void *) cstate;
>       errcontext.previous = error_context_stack;
>       error_context_stack = &errcontext;
>   
> !     while (!done)
>       {
>               bool            skip_tuple;
>               Oid                     loaded_oid = InvalidOid;
>   
>               CHECK_FOR_INTERRUPTS();
>   
>               /* Reset the per-tuple exprcontext */
>               ResetPerTupleExprContext(estate);
>   
>               /* Switch into its memory context */
>               MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
>   
> !             done = !NextCopyFrom(cstate, values, nulls, &loaded_oid);
> !             if (done)
>                       break;
>   
>               /* And now we can form the input tuple. */
>               tuple = heap_form_tuple(tupDesc, values, nulls);
>   
>               if (loaded_oid != InvalidOid)
>                       HeapTupleSetOid(tuple, loaded_oid);
>   
> --- 1902,1931 ----
>       bistate = GetBulkInsertState();
>   
>       /* Set up callback to identify error line number */
>       errcontext.callback = CopyFromErrorCallback;
>       errcontext.arg = (void *) cstate;
>       errcontext.previous = error_context_stack;
>       error_context_stack = &errcontext;
>   
> !     for (;;)
>       {
>               bool            skip_tuple;
>               Oid                     loaded_oid = InvalidOid;
>   
>               CHECK_FOR_INTERRUPTS();
>   
>               /* Reset the per-tuple exprcontext */
>               ResetPerTupleExprContext(estate);
>   
>               /* Switch into its memory context */
>               MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
>   
> !             if (!NextCopyFrom(cstate, values, nulls, &loaded_oid))
>                       break;
>   
>               /* And now we can form the input tuple. */
>               tuple = heap_form_tuple(tupDesc, values, nulls);
>   
>               if (loaded_oid != InvalidOid)
>                       HeapTupleSetOid(tuple, loaded_oid);
>   
> ***************
> *** 2015,2031 **** CopyFrom(CopyState cstate)
>        */
>       if (hi_options & HEAP_INSERT_SKIP_WAL)
>               heap_sync(cstate->rel);
>   
>       return processed;
>   }
>   
>   /*
> !  * Setup CopyState to read tuples from a file for COPY FROM.
>    */
>   CopyState
>   BeginCopyFrom(Relation rel,
>                         const char *filename,
>                         List *attnamelist,
>                         List *options)
>   {
>       CopyState       cstate;
> --- 2010,2032 ----
>        */
>       if (hi_options & HEAP_INSERT_SKIP_WAL)
>               heap_sync(cstate->rel);
>   
>       return processed;
>   }
>   
>   /*
> !  * CopyGetAttnums - build an integer list of attnums to be copied
> !  *
> !  * The input attnamelist is either the user-specified column list,
> !  * or NIL if there was none (in which case we want all the non-dropped
> !  * columns).
> !  *
> !  * rel can be NULL ... it's only used for error reports.
>    */
>   CopyState
>   BeginCopyFrom(Relation rel,

This is just a verbatim copy of the CopyGetAttnums() header comment.  (The
middle paragraph happens to be true, though.)

>                         const char *filename,
>                         List *attnamelist,
>                         List *options)
>   {
>       CopyState       cstate;
> ***************
> *** 2208,2224 **** BeginCopyFrom(Relation rel,
>       MemoryContextSwitchTo(oldcontext);
>   
>       return cstate;
>   }
>   
>   /*
>    * Read next tuple from file for COPY FROM. Return false if no more tuples.
>    *
> !  * valus and nulls arrays must be the same length as columns of the
>    * relation passed to BeginCopyFrom. Oid of the tuple is returned with
>    * tupleOid separately.
>    */
>   bool
>   NextCopyFrom(CopyState cstate, Datum *values, bool *nulls, Oid *tupleOid)
>   {
>       TupleDesc       tupDesc;
>       Form_pg_attribute *attr;
> --- 2209,2225 ----
>       MemoryContextSwitchTo(oldcontext);
>   
>       return cstate;
>   }
>   
>   /*
>    * Read next tuple from file for COPY FROM. Return false if no more tuples.
>    *
> !  * values and nulls arrays must be the same length as columns of the
>    * relation passed to BeginCopyFrom. Oid of the tuple is returned with
>    * tupleOid separately.
>    */
>   bool
>   NextCopyFrom(CopyState cstate, Datum *values, bool *nulls, Oid *tupleOid)
>   {
>       TupleDesc       tupDesc;
>       Form_pg_attribute *attr;
> ***************
> *** 2453,2474 **** NextCopyFrom(CopyState cstate, Datum *values, bool *nulls, 
> Oid *tupleOid)
>   /*
>    * Clean up storage and release resources for COPY FROM.
>    */
>   void
>   EndCopyFrom(CopyState cstate)
>   {
>       FreeExecutorState(cstate->estate);
>   
> -     if (cstate->filename != NULL && FreeFile(cstate->copy_file))
> -             ereport(ERROR,
> -                             (errcode_for_file_access(),
> -                              errmsg("could not close file \"%s\": %m",
> -                                             cstate->filename)));
> - 
>       /* Clean up storage */
>       EndCopy(cstate);
>   }
>   
>   
>   /*
>    * Read the next input line and stash it in line_buf, with conversion to
>    * server encoding.
> --- 2454,2469 ----

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SQL/MED - file_fdw

Reply via email to