On 2022-08-15 22:23, Damir Belyalov wrote:
I expected I could see 9999 rows after COPY, but just saw 999 rows.
Also I implemented your case and it worked correctly.

Thanks for the new patch!

Here are some comments on it.

+           if (safecstate->saved_tuples < REPLAY_BUFFER_SIZE)
+           {
+ valid_row = NextCopyFrom(cstate, econtext, values, nulls);
+               if (valid_row)
+               {
+                   /* Fill replay_buffer in oldcontext*/
+                   MemoryContextSwitchTo(safecstate->oldcontext);
+ safecstate->replay_buffer[safecstate->saved_tuples++] = heap_form_tuple(RelationGetDescr(cstate->rel), values, nulls);

+ /* Buffer was filled, commit subtransaction and prepare to replay */
+               ReleaseCurrentSubTransaction();

What is actually being committed by this ReleaseCurrentSubTransaction()?
It seems to me that just safecstate->replay_buffer is fulfilled before this commit. As a test, I rewrote this ReleaseCurrentSubTransaction() to RollbackAndReleaseCurrentSubTransaction() and COPYed over 1000 rows of data, but same data were loaded.

+#define            REPLAY_BUFFER_SIZE 1000

I feel it might be better to have it as a parameter rather than fixed at 1000.

+/*
+ * Analog of NextCopyFrom() but ignore rows with errors while copying.
+ */
+static bool
+safeNextCopyFrom(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)

NextCopyFrom() is in copyfromparse.c while safeNextCopyFrom() is in copyfrom.c. Since safeNextCopyFrom() is analog of NextCopyFrom() as commented, would it be natural to put them in the same file?

188 +               safecstate->errors++;
189 +               if (safecstate->errors <= 100)
190 +                   ereport(WARNING,
191 +                           (errcode(errdata->sqlerrcode),
192 +                           errmsg("%s", errdata->context)));

It would be better to state in the manual that errors exceeding 100 are not displayed.
Or, it might be acceptable to output all errors that exceed 100.

+typedef struct SafeCopyFromState
+{
+#define            REPLAY_BUFFER_SIZE 1000
+ HeapTuple replay_buffer[REPLAY_BUFFER_SIZE]; /* accumulates tuples for replaying it after an error */ + int saved_tuples; /* # of tuples in replay_buffer */ + int replayed_tuples; /* # of tuples was replayed from buffer */
+   int             errors;                     /* total # of errors */
+   bool            replay_is_active;
+   bool            begin_subtransaction;
+ bool processed_remaining_tuples; /* for case of replaying last tuples */
+   bool            skip_row;

It would be helpful to add comments about skip_row, etc.

```
$ git apply ../patch/0003-COPY_IGNORE_ERRORS.patch

../patch/0003-COPY_IGNORE_ERRORS.patch:86: indent with spaces.
                                                 Datum *values, bool *nulls);
warning: 1 line adds whitespace errors.
```

There was a warning when applying the patch.


```
=# copy test from '/tmp/10000.data' with (ignore_errors);
WARNING:  FIND 0 ERRORS
COPY 1003
```

When there were no errors, this WARNING seems not necessary.


--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION


Reply via email to