Re: Make COPY format extendable: Extract COPY TO format implementations

Sutou Kouhei Mon, 05 Feb 2024 01:05:47 -0800

Hi,

In <zcckwaefrloqp...@paquier.xyz>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Mon, 5 Feb 2024 16:14:08 +0900,
  Michael Paquier <mich...@paquier.xyz> wrote:


> So, I've looked at all that today, and finished by applying two
> patches as of 2889fd23be56 and 95fb5b49024a to get some of the
> weirdness with the workhorse routines out of the way.

Thanks!

> As this is called within the OneRow routine, I can live with that.  If
> there is an opposition to that, we could just attach it within the
> routines.

I don't object the approach.

> I am attaching a v12 which is close to what I want it to be, with
> much more documentation and comments.  There are two things that I've
> changed compared to the previous versions though:
> 1) I have added a callback to set up the input and output functions
> rather than attach that in the Start callback.

I'm OK with this. I just don't use them in Apache Arrow COPY
FORMAT extension.

> - No need for plugins to think about how to allocate this data.  v11
> and other versions were doing things the wrong way by allocating this
> stuff in the wrong memory context as we switch to the COPY context
> when we are in the Start routines.

Oh, sorry. I missed it when I moved them.

> 2) I have backpedaled on the postpare callback, which did not bring
> much in clarity IMO while being a CSV-only callback.  Note that we
> have in copyfromparse.c more paths that are only for CSV but the past
> versions of the patch never cared about that.  This makes the text and
> CSV implementations much closer to each other, as a result.

Ah, sorry. I forgot to eliminate cstate->opts.csv_mode in
CopyReadLineText(). The postpare callback is for
optimization. If it doesn't improve performance, we don't
need to introduce it.

We may want to try eliminating cstate->opts.csv_mode in
CopyReadLineText() for performance. But we don't need to
do this in introducing CopyFromRoutine. We can defer it.

So I don't object removing the postpare callback.

>                                              Let me know if you have
> comments about all that.

Here are some comments for the patch:

+       /*
+        * Called when COPY FROM is started to set up the input functions
+        * associated to the relation's attributes writing to.  `fmgr_info` can 
be

fmgr_info ->
finfo

+        * optionally filled to provide the catalog information of the input
+        * function.  `typioparam` can be optinally filled to define the OID of

optinally ->
optionally

+        * the type to pass to the input function.  `atttypid` is the OID of 
data
+        * type used by the relation's attribute.
+        */
+       void            (*CopyFromInFunc) (Oid atttypid, FmgrInfo *finfo,
+                                                                  Oid 
*typioparam);

How about passing CopyFromState cstate too like other
callbacks for consistency?

+       /*
+        * Copy one row to a set of `values` and `nulls` of size tupDesc->natts.
+        *
+        * 'econtext' is used to evaluate default expression for each column 
that
+        * is either not read from the file or is using the DEFAULT option of 
COPY

or is ->
or

(I'm not sure...)

+        * FROM.  It is NULL if no default values are used.
+        *
+        * Returns false if there are no more tuples to copy.
+        */
+       bool            (*CopyFromOneRow) (CopyFromState cstate, ExprContext 
*econtext,
+                                                                  Datum 
*values, bool *nulls);

+typedef struct CopyToRoutine
+{
+       /*
+        * Called when COPY TO is started to set up the output functions
+        * associated to the relation's attributes reading from.  `fmgr_info` 
can

fmgr_info ->
finfo

+        * be optionally filled. `atttypid` is the OID of data type used by the
+        * relation's attribute.
+        */
+       void            (*CopyToOutFunc) (Oid atttypid, FmgrInfo *finfo);

How about passing CopyToState cstate too like other
callbacks for consistency?


@@ -200,4 +204,10 @@ extern void ReceiveCopyBinaryHeader(CopyFromState cstate);
 extern int     CopyReadAttributesCSV(CopyFromState cstate);
 extern int     CopyReadAttributesText(CopyFromState cstate);
 
+/* Callbacks for CopyFromRoutine->OneRow */

CopyFromRoutine->OneRow ->
CopyFromRoutine->CopyFromOneRow

+extern bool CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext,
+                                                          Datum *values, bool 
*nulls);
+extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext,
+                                                                Datum *values, 
bool *nulls);
+
 #endif                                                 /* COPYFROM_INTERNAL_H 
*/

+/*
+ * CopyFromTextStart

CopyFromTextStart ->
CopyFromBinaryStart

+ *
+ * Start of COPY FROM for binary format.
+ */
+static void
+CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+       /* Read and verify binary header */
+       ReceiveCopyBinaryHeader(cstate);
+}
+
+/*
+ * CopyFromTextEnd

CopyFromTextEnd ->
CopyFromBinaryEnd

+ *
+ * End of COPY FROM for binary format.
+ */
+static void
+CopyFromBinaryEnd(CopyFromState cstate)
+{
+       /* nothing to do */
+}


diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 91433d439b..d02a7773e3 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -473,6 +473,7 @@ ConvertRowtypeExpr
 CookedConstraint
 CopyDest
 CopyFormatOptions
+CopyFromRoutine
 CopyFromState
 CopyFromStateData
 CopyHeaderChoice
@@ -482,6 +483,7 @@ CopyMultiInsertInfo
 CopyOnErrorChoice
 CopySource
 CopyStmt
+CopyToRoutine
 CopyToState
 CopyToStateData
 Cost

Wow! I didn't know that we need to update typedefs.list when
I add a "typedef struct".


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

Reply via email to