On Mon, Feb 07, 2011 at 03:39:39PM +0900, Itagaki Takahiro wrote: > On Sun, Feb 6, 2011 at 09:01, Noah Misch <n...@leadboat.com> wrote: > > Most "parse analysis"-type bits of DoCopy() move to BeginCopy(). > > It would be possible to move more FROM-only or TO-only members in BeginCopy() > into their BeginCopyFrom/To(), but it is just a code refactoring requires > more code rearrangement. It should be done by another try even if needed.
Agreed. > > CopyState.processed is gone, with the row count now bubbling up > > through return values. > > I removed it from CopyState because it represents "number of inserted rows" > in COPY FROM. However, for the viewpoint of API, the numbers of lines in > the input file is more suitable. To avoid the confusion, I moved it > from a common member to COPY FROM-only variable. Perhaps I'm missing something. The new API does not expose a "processed" count at all; that count is used for the command tag of a top-level COPY. This part of the patch is just changing how we structure the code to maintain that tally, and it has no implications for observers outside copy.c. Right? > > For COPY TO, the relkind checks move from DoCopyTo() to BeginCopyTo(). ??I'm > > skeptical about this one. ??It's not required for correctness, and the > > relkind > > checks for COPY FROM remain in CopyFrom(). > > I think error checks should be done in the early phase, so I moved the check > to BeginCopyTo(). However, relkind check in COPY FROM is needed only for > COPY FROM command because the relation is the inserted target. For APIs, > the relation is used as a template for input file. So, we cannot perform > the check in BeginCopyFrom(). The choice of where to put them does not affect anything outside of copy.c, and placement in DoCopyTo() would make the symmetry between the TO and FROM code paths easier to follow. Not a major concern, though. > > file_fdw uses CopyFromErrorCallback() to give errors the proper context. > > ??The > > function uses template strings like "COPY %s, line %d", where %s is the > > name of > > the relation being copied. ??Presumably file_fdw and other features using > > this > > API would wish to customize that error message prefix, and the relation name > > might not be apropos at all. ??How about another argument to BeginCopyFrom, > > specifying an error prefix to be stashed in the CopyState? > > I changed "COPY %s, .." to "relation %s, ..." because the first string is > the relation name anyway. We could have another prefix argument, but I think > it has little information for errors. That's perhaps an improvement for file_fdw, but not for regular COPY. My comment originated with a faulty idea that file_fdw's internal use of COPY was an implementation detail that users should not need to see. Looking now, the file_fdw documentation clearly explains the tie to COPY, even referring users to the COPY documentation. I no longer see a need to hide the fact that the "foreign" source is a PostgreSQL COPY command. The error messages are fine as they were. Some future client of the new API may wish to hide its internal COPY use, but there's no need to design for that now. > We also have many "COPY" in other messages, but they won't be used actually > because the are messages for invalid arguments and file_fdw will have own > validater function. All invalid arguments will be filtered in CREATE commands. Agreed; "could not read from COPY file: %m" appears to be the primary one liable to happen in practice. The greater failure with that one is that, given a query reading from multiple file_fdw tables, you can't tell which file had a problem. This issue runs broader than the patch at hand; I will start another thread to address it. Let's proceed with this patch, not changing any error messages. If other discussion concludes that the desired behavior requires an enhancement to this new API, a followup commit can implement that. > >> ! ?? ?? ?? ?? ?? ?? /* Reset the per-tuple exprcontext */ > >> ! ?? ?? ?? ?? ?? ?? ResetPerTupleExprContext(estate); > >> ! > >> ! ?? ?? ?? ?? ?? ?? /* Switch into its memory context */ > >> ! ?? ?? ?? ?? ?? ?? > >> MemoryContextSwitchTo(GetPerTupleMemoryContext(estate)); > > > > Shouldn't a switch to this context happen inside NextCopyFrom(), then again > > for > > the calls to heap_form_tuple/HeapTupleSetOid? ??I found previous discussion > > on > > this matter, but no conclusion. > > In my understanding, NextCopyFrom() should use CurrentMemoryContext provided > by the caller. The file_fdw will use executor's per-tuple context for it. In a direct call to COPY FROM, all of NextCopyFrom() uses the per-tuple context of CopyState->estate. We reset that context before each call to NextCopyFrom(). This is true before (ignoring code movement) and after the patch. A file_fdw NextCopyFrom() call will use the per-tuple context of the executor performing a foreign scan. Allocations will arise primarily in type input functions. ExecEvalExpr(), used to acquire default values, will still use the per-tuple context of CopyState->estate. That per-tuple context will never get reset explicitly, so default value computations leak until EndCopyFrom(). I see memory context use was discussed already, but I don't see the aforementioned specific details addressed: http://archives.postgresql.org/message-id/AANLkTikfiM1qknr9=tl+xemblaj+yjolhag3xsh7m...@mail.gmail.com The use of ExecEvalExpr() in NextCopyFrom() also seems contrary to the execnodes.h comment you quoted in that thread ("CurrentMemoryContext should be set to ecxt_per_tuple_memory before calling ExecEvalExpr() --- see ExecEvalExprSwitchContext()."). NextCopyFrom() passes CopyState->estate to ExecEvalExpr, but the current context may be the per-tuple context of a different executor state. > Another issue in the automatic context switch is to discard the previous > values in the next call while the caller is unaware. True; we would need to document that pointers stored in the "values" argument of NextCopyData() are valid only until the next call. Does that already work with file_fdw's usage pattern, or would file_fdw need to make copies? Three hunk-specific comments (diff is between previously-reviewed copy.c and current master plus today's patch): > *** a/src/backend/commands/copy.c > --- b/src/backend/commands/copy.c > *************** > *** 175,193 **** typedef struct CopyStateData > * The definition of input functions and default expressions are stored > * in these variables. > */ > EState *estate; > AttrNumber num_defaults; > bool file_has_oids; > FmgrInfo oid_in_function; > Oid oid_typioparam; > ! FmgrInfo *in_functions; > ! Oid *typioparams; > ! int *defmap; > ExprState **defexprs; /* array of default att expressions */ > } CopyStateData; > > /* DestReceiver for COPY (SELECT) TO */ > typedef struct > { > DestReceiver pub; /* publicly-known function > pointers */ > CopyState cstate; /* CopyStateData for the > command */ > --- 175,193 ---- > * The definition of input functions and default expressions are stored > * in these variables. > */ This block comment remains roughly half correct. Again, I think a small comment on every field below should replace it. > EState *estate; > AttrNumber num_defaults; > bool file_has_oids; > FmgrInfo oid_in_function; > Oid oid_typioparam; > ! FmgrInfo *in_functions; /* array of input functions for each > attrs */ > ! Oid *typioparams; /* array of element types for > in_functions */ > ! int *defmap; /* array of default att > numbers */ > ExprState **defexprs; /* array of default att expressions */ > } CopyStateData; > > /* DestReceiver for COPY (SELECT) TO */ > typedef struct > { > DestReceiver pub; /* publicly-known function > pointers */ > CopyState cstate; /* CopyStateData for the > command */ > *************** > *** 1268,1283 **** BeginCopy(bool is_from, > --- 1268,1289 ---- > } > > /* > * Release resources allocated in a cstate. > */ > static void > EndCopy(CopyState cstate) > { > + if (cstate->filename != NULL && FreeFile(cstate->copy_file)) > + ereport(ERROR, > + (errcode_for_file_access(), > + errmsg("could not close file \"%s\": %m", > + cstate->filename))); In the write case, this is not an improvement -- failure in fclose(3) almost always arises from the attempt to flush buffered writes. Note how most other call sites checking the return value of FreeFile() use an error message like the original one. The change to the read case seems fine. > + > MemoryContextDelete(cstate->copycontext); > pfree(cstate); > } > > /* > * Setup CopyState to read tuples from a table or a query for COPY TO. > */ > static CopyState > *************** > *** 1400,1424 **** DoCopyTo(CopyState cstate) > } > > /* > * Clean up storage and release resources for COPY TO. > */ > static void > EndCopyTo(CopyState cstate) > { > - if (cstate->filename != NULL && FreeFile(cstate->copy_file)) > - ereport(ERROR, > - (errcode_for_file_access(), > - errmsg("could not close file \"%s\": %m", > - cstate->filename))); > - > - /* > - * Close the relation or query. We can release the AccessShareLock we > got. > - */ > if (cstate->queryDesc != NULL) > { > /* Close down the query and free resources. */ > ExecutorEnd(cstate->queryDesc); > FreeQueryDesc(cstate->queryDesc); > PopActiveSnapshot(); > } > > --- 1406,1421 ---- > *************** > *** 1681,1746 **** void > CopyFromErrorCallback(void *arg) > { > CopyState cstate = (CopyState) arg; > > if (cstate->binary) > { > /* can't usefully display the data */ > if (cstate->cur_attname) > ! errcontext("COPY %s, line %d, column %s", > cstate->cur_relname, > cstate->cur_lineno, > cstate->cur_attname); > else > ! errcontext("COPY %s, line %d", > cstate->cur_relname, > cstate->cur_lineno); > } > else > { > if (cstate->cur_attname && cstate->cur_attval) > { > /* error is relevant to a particular column */ > char *attval; > > attval = limit_printout_length(cstate->cur_attval); > ! errcontext("COPY %s, line %d, column %s: \"%s\"", > cstate->cur_relname, > cstate->cur_lineno, > cstate->cur_attname, attval); > pfree(attval); > } > else if (cstate->cur_attname) > { > /* error is relevant to a particular column, value is > NULL */ > ! errcontext("COPY %s, line %d, column %s: null input", > cstate->cur_relname, > cstate->cur_lineno, > cstate->cur_attname); > } > else > { > /* error is relevant to a particular line */ > if (cstate->line_buf_converted || > !cstate->need_transcoding) > { > char *lineval; > > lineval = > limit_printout_length(cstate->line_buf.data); > ! errcontext("COPY %s, line %d: \"%s\"", > cstate->cur_relname, > cstate->cur_lineno, lineval); > pfree(lineval); > } > else > { > /* > * Here, the line buffer is still in a foreign > encoding, and > * indeed it's quite likely that the error is > precisely a > * failure to do encoding conversion (ie, bad > data). We dare > * not try to convert it, and at present > there's no way to > * regurgitate it without conversion. So we > have to punt and > * just report the line number. > */ > ! errcontext("COPY %s, line %d", > cstate->cur_relname, > cstate->cur_lineno); > } > } > } > } > > /* > * Make sure we don't print an unreasonable amount of COPY data in a > message. > --- 1678,1743 ---- > CopyFromErrorCallback(void *arg) > { > CopyState cstate = (CopyState) arg; > > if (cstate->binary) > { > /* can't usefully display the data */ > if (cstate->cur_attname) > ! errcontext("relation %s, line %d, column %s", > cstate->cur_relname, > cstate->cur_lineno, > cstate->cur_attname); > else > ! errcontext("relation %s, line %d", > cstate->cur_relname, > cstate->cur_lineno); > } > else > { > if (cstate->cur_attname && cstate->cur_attval) > { > /* error is relevant to a particular column */ > char *attval; > > attval = limit_printout_length(cstate->cur_attval); > ! errcontext("relation %s, line %d, column %s: \"%s\"", > cstate->cur_relname, > cstate->cur_lineno, > cstate->cur_attname, attval); > pfree(attval); > } > else if (cstate->cur_attname) > { > /* error is relevant to a particular column, value is > NULL */ > ! errcontext("relation %s, line %d, column %s: null > input", > cstate->cur_relname, > cstate->cur_lineno, > cstate->cur_attname); > } > else > { > /* error is relevant to a particular line */ > if (cstate->line_buf_converted || > !cstate->need_transcoding) > { > char *lineval; > > lineval = > limit_printout_length(cstate->line_buf.data); > ! errcontext("relation %s, line %d: \"%s\"", > cstate->cur_relname, > cstate->cur_lineno, lineval); > pfree(lineval); > } > else > { > /* > * Here, the line buffer is still in a foreign > encoding, and > * indeed it's quite likely that the error is > precisely a > * failure to do encoding conversion (ie, bad > data). We dare > * not try to convert it, and at present > there's no way to > * regurgitate it without conversion. So we > have to punt and > * just report the line number. > */ > ! errcontext("relation %s, line %d", > cstate->cur_relname, > cstate->cur_lineno); > } > } > } > } > > /* > * Make sure we don't print an unreasonable amount of COPY data in a > message. > *************** > *** 1782,1798 **** limit_printout_length(const char *str) > */ > static uint64 > CopyFrom(CopyState cstate) > { > HeapTuple tuple; > TupleDesc tupDesc; > Datum *values; > bool *nulls; > - bool done = false; > ResultRelInfo *resultRelInfo; > EState *estate = cstate->estate; /* for ExecConstraints() */ > TupleTableSlot *slot; > MemoryContext oldcontext = CurrentMemoryContext; > ErrorContextCallback errcontext; > CommandId mycid = GetCurrentCommandId(true); > int hi_options = 0; /* start with default > heap_insert options */ > BulkInsertState bistate; > --- 1779,1794 ---- > *************** > *** 1906,1936 **** CopyFrom(CopyState cstate) > bistate = GetBulkInsertState(); > > /* Set up callback to identify error line number */ > errcontext.callback = CopyFromErrorCallback; > errcontext.arg = (void *) cstate; > errcontext.previous = error_context_stack; > error_context_stack = &errcontext; > > ! while (!done) > { > bool skip_tuple; > Oid loaded_oid = InvalidOid; > > CHECK_FOR_INTERRUPTS(); > > /* Reset the per-tuple exprcontext */ > ResetPerTupleExprContext(estate); > > /* Switch into its memory context */ > MemoryContextSwitchTo(GetPerTupleMemoryContext(estate)); > > ! done = !NextCopyFrom(cstate, values, nulls, &loaded_oid); > ! if (done) > break; > > /* And now we can form the input tuple. */ > tuple = heap_form_tuple(tupDesc, values, nulls); > > if (loaded_oid != InvalidOid) > HeapTupleSetOid(tuple, loaded_oid); > > --- 1902,1931 ---- > bistate = GetBulkInsertState(); > > /* Set up callback to identify error line number */ > errcontext.callback = CopyFromErrorCallback; > errcontext.arg = (void *) cstate; > errcontext.previous = error_context_stack; > error_context_stack = &errcontext; > > ! for (;;) > { > bool skip_tuple; > Oid loaded_oid = InvalidOid; > > CHECK_FOR_INTERRUPTS(); > > /* Reset the per-tuple exprcontext */ > ResetPerTupleExprContext(estate); > > /* Switch into its memory context */ > MemoryContextSwitchTo(GetPerTupleMemoryContext(estate)); > > ! if (!NextCopyFrom(cstate, values, nulls, &loaded_oid)) > break; > > /* And now we can form the input tuple. */ > tuple = heap_form_tuple(tupDesc, values, nulls); > > if (loaded_oid != InvalidOid) > HeapTupleSetOid(tuple, loaded_oid); > > *************** > *** 2015,2031 **** CopyFrom(CopyState cstate) > */ > if (hi_options & HEAP_INSERT_SKIP_WAL) > heap_sync(cstate->rel); > > return processed; > } > > /* > ! * Setup CopyState to read tuples from a file for COPY FROM. > */ > CopyState > BeginCopyFrom(Relation rel, > const char *filename, > List *attnamelist, > List *options) > { > CopyState cstate; > --- 2010,2032 ---- > */ > if (hi_options & HEAP_INSERT_SKIP_WAL) > heap_sync(cstate->rel); > > return processed; > } > > /* > ! * CopyGetAttnums - build an integer list of attnums to be copied > ! * > ! * The input attnamelist is either the user-specified column list, > ! * or NIL if there was none (in which case we want all the non-dropped > ! * columns). > ! * > ! * rel can be NULL ... it's only used for error reports. > */ > CopyState > BeginCopyFrom(Relation rel, This is just a verbatim copy of the CopyGetAttnums() header comment. (The middle paragraph happens to be true, though.) > const char *filename, > List *attnamelist, > List *options) > { > CopyState cstate; > *************** > *** 2208,2224 **** BeginCopyFrom(Relation rel, > MemoryContextSwitchTo(oldcontext); > > return cstate; > } > > /* > * Read next tuple from file for COPY FROM. Return false if no more tuples. > * > ! * valus and nulls arrays must be the same length as columns of the > * relation passed to BeginCopyFrom. Oid of the tuple is returned with > * tupleOid separately. > */ > bool > NextCopyFrom(CopyState cstate, Datum *values, bool *nulls, Oid *tupleOid) > { > TupleDesc tupDesc; > Form_pg_attribute *attr; > --- 2209,2225 ---- > MemoryContextSwitchTo(oldcontext); > > return cstate; > } > > /* > * Read next tuple from file for COPY FROM. Return false if no more tuples. > * > ! * values and nulls arrays must be the same length as columns of the > * relation passed to BeginCopyFrom. Oid of the tuple is returned with > * tupleOid separately. > */ > bool > NextCopyFrom(CopyState cstate, Datum *values, bool *nulls, Oid *tupleOid) > { > TupleDesc tupDesc; > Form_pg_attribute *attr; > *************** > *** 2453,2474 **** NextCopyFrom(CopyState cstate, Datum *values, bool *nulls, > Oid *tupleOid) > /* > * Clean up storage and release resources for COPY FROM. > */ > void > EndCopyFrom(CopyState cstate) > { > FreeExecutorState(cstate->estate); > > - if (cstate->filename != NULL && FreeFile(cstate->copy_file)) > - ereport(ERROR, > - (errcode_for_file_access(), > - errmsg("could not close file \"%s\": %m", > - cstate->filename))); > - > /* Clean up storage */ > EndCopy(cstate); > } > > > /* > * Read the next input line and stash it in line_buf, with conversion to > * server encoding. > --- 2454,2469 ---- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers