Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

2024-02-16 Thread Jim Jones



On 16.02.24 21:31, David G. Johnston wrote:
> Yes.  In particular not all columns in the table need be specified in
> the copy command so while the parsed input data is all nulls the
> record itself may not be.

Yeah, you have a point there.
I guess if users want to avoid it to happen they can rely on NOT NULL
constraints.

Thanks

-- 
Jim





Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

2024-02-16 Thread David G. Johnston
On Fri, Feb 16, 2024 at 1:16 PM Jim Jones  wrote:

> In case all columns of a record have been set to null due to data type
> incompatibility, should we insert it at all?


Yes.  In particular not all columns in the table need be specified in the
copy command so while the parsed input data is all nulls the record itself
may not be.

The system should allow the user to exclude rows with incomplete data by
ignoring a not null constraint violation.

In short we shouldn't judge non-usefulness and instead give tools to the
user to decide for themselves.

David J.


Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

2024-02-16 Thread Jim Jones
Hi!

On 12.02.24 01:00, jian he wrote:
> attached v2.
> syntax: `on_error set_to_null`
> based on upthread discussion, now if you specified `on_error
> set_to_null` and your column has `not
> null` constraint, we convert the error field to null, so it may error
> while bulk inserting for violating NOT NULL constraint.
That's a very nice feature. Thanks for implementing it!

v2 applies cleanly and works as described.

\pset null '(NULL)'

CREATE TEMPORARY TABLE t1 (a int, b int);
COPY t1 (a,b) FROM STDIN;
1    a
2    1
3    2
4    b
a    c
\.
SELECT * FROM t1;

CONTEXT:  COPY t1, line 1, column b: "a"
 a | b
---+---
(0 rows)


CREATE TEMPORARY TABLE t2 (a int, b int);
COPY t2 (a,b) FROM STDIN WITH (on_error set_to_null);
1    a
2    1
3    2
4    b
a    c
\.
SELECT * FROM t2;

psql:test-copy-on_error-2.sql:12: NOTICE:  some columns of 3 rows, value
were converted to NULL due to data type incompatibility
COPY 5
   a    |   b    
+
  1 | (NULL)
  2 |  1
  3 |  2
  4 | (NULL)
 (NULL) | (NULL)
(5 rows)


I have one question though:

In case all columns of a record have been set to null due to data type
incompatibility, should we insert it at all? See t2 example above.
I'm not sure if these records would be of any use in the table. What do
you think?

Since the parameter is already called "set_to_null", maybe it is not
necessary to mention in the NOTICE message that the values have been set
to null.
Perhaps something like "XX records were only partially copied due to
data type incompatibility"


-- 
Jim





Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

2024-02-11 Thread jian he
attached v2.
syntax: `on_error set_to_null`
based on upthread discussion, now if you specified `on_error
set_to_null` and your column has `not
null` constraint, we convert the error field to null, so it may error
while bulk inserting for violating NOT NULL constraint.
From c95bb7b7c072f510b9a60695714be21345f21591 Mon Sep 17 00:00:00 2001
From: jian he 
Date: Sat, 10 Feb 2024 15:08:41 +0800
Subject: [PATCH v2 1/1] on_error set_to_null

any data type conversion errors while COPY FROM will set that column value to be NULL.
discussion: https://www.postgresql.org/message-id/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=bp3d1_asfe...@mail.gmail.com
---
 doc/src/sgml/ref/copy.sgml   |  1 +
 src/backend/commands/copy.c  |  2 ++
 src/backend/commands/copyfrom.c  | 30 ++--
 src/backend/commands/copyfromparse.c | 28 --
 src/include/commands/copy.h  |  1 +
 src/test/regress/expected/copy2.out  | 22 
 src/test/regress/sql/copy2.sql   | 23 +
 7 files changed, 99 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 55764fc1..d8b609b6 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -390,6 +390,7 @@ COPY { table_name [ ( error_action value of
   stop means fail the command, while
   ignore means discard the input row and continue with the next one.
+  set_to_null means the input value will set to null and continue with the next field.
   The default is stop.
  
  
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cc0786c6..9c7d6ebd 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -422,6 +422,8 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
 		return COPY_ON_ERROR_STOP;
 	if (pg_strcasecmp(sval, "ignore") == 0)
 		return COPY_ON_ERROR_IGNORE;
+	if (pg_strcasecmp(sval, "set_to_null") == 0)
+		return COPY_ON_ERROR_NULL;
 
 	ereport(ERROR,
 			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 41f6bc43..2a87bcf3 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1005,6 +1005,7 @@ CopyFrom(CopyFromState cstate)
 			 * information according to ON_ERROR.
 			 */
 			if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+			{
 
 /*
  * Just make ErrorSaveContext ready for the next NextCopyFrom.
@@ -1013,11 +1014,18 @@ CopyFrom(CopyFromState cstate)
  */
 cstate->escontext->error_occurred = false;
 
-			/* Report that this tuple was skipped by the ON_ERROR clause */
-			pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED,
-		 ++skipped);
+/* Report that this tuple was skipped by the ON_ERROR clause */
+pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED,
+			++skipped);
 
-			continue;
+continue;
+			}
+			/*
+			 * Just make ErrorSaveContext ready for the next NextCopyFrom.
+			 *
+			*/
+			if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+cstate->escontext->error_occurred = false;
 		}
 
 		ExecStoreVirtualTuple(myslot);
@@ -1312,7 +1320,7 @@ CopyFrom(CopyFromState cstate)
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
 
-	if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
+	if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
 		cstate->num_errors > 0)
 		ereport(NOTICE,
 errmsg_plural("%llu row was skipped due to data type incompatibility",
@@ -1320,6 +1328,14 @@ CopyFrom(CopyFromState cstate)
 			  (unsigned long long) cstate->num_errors,
 			  (unsigned long long) cstate->num_errors));
 
+	if (cstate->opts.on_error == COPY_ON_ERROR_NULL &&
+		cstate->num_errors > 0)
+		ereport(NOTICE,
+errmsg_plural("some columns of %llu rows, value was converted to NULL due to data type incompatibility",
+			  "some columns of %llu rows, value were converted to NULL due to data type incompatibility",
+			  (unsigned long long) cstate->num_errors,
+			  (unsigned long long) cstate->num_errors));
+
 	if (bistate != NULL)
 		FreeBulkInsertState(bistate);
 
@@ -1463,11 +1479,13 @@ BeginCopyFrom(ParseState *pstate,
 		cstate->escontext->error_occurred = false;
 
 		/*
-		 * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
+		 * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_NULL. We'll add other
 		 * options later
 		 */
 		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
 			cstate->escontext->details_wanted = false;
+		else if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+			cstate->escontext->details_wanted = false;
 	}
 	else
 		cstate->escontext = NULL;
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 90675636..9d77c3d1 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -873,6 +873,7 @@ 

Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

2024-02-06 Thread Yugo NAGATA
On Mon, 5 Feb 2024 14:26:46 +0800
jian he  wrote:

> On Mon, Feb 5, 2024 at 10:29 AM torikoshia  wrote:
> >
> > Hi,
> >
> > On 2024-02-03 15:22, jian he wrote:
> > > The idea of on_error is to tolerate errors, I think.
> > > if a column has a not null constraint, let it cannot be used with
> > > (on_error 'null')
> >
> > > +   /*
> > > +* we can specify on_error 'null', but it can only apply to
> > > columns
> > > +* don't have not null constraint.
> > > +   */
> > > +   if (att->attnotnull && cstate->opts.on_error ==
> > > COPY_ON_ERROR_NULL)
> > > +   ereport(ERROR,
> > > +   (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
> > > +errmsg("copy on_error 'null' cannot be used with
> > > not null constraint column")));
> >
> > This means we cannot use ON_ERROR 'null' even when there is one column
> > which have NOT NULL constraint, i.e. primary key, right?
> > IMHO this is strong constraint and will decrease the opportunity to use
> > this feature.
> 
> I don't want to fail in the middle of bulk inserts,
> so I thought immediately erroring out would be a great idea.
> Let's see what other people think.

I also think this restriction is too strong because it is very
common that a table has a primary key, unless there is some way
to specify columns that can be set to NULL. Even when ON_ERROR
is specified, any constraint violation errors cannot be generally
ignored, so we cannot elimiate the posibility COPY FROM fails in
the middle due to invalid data, anyway.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 




Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

2024-02-06 Thread jian he
On Tue, Feb 6, 2024 at 3:46 PM Yugo NAGATA  wrote:
>
> On Tue, 06 Feb 2024 09:39:09 +0900 (JST)
> Kyotaro Horiguchi  wrote:
>
> > At Mon, 5 Feb 2024 17:22:56 +0900, Yugo NAGATA  wrote 
> > in
> > > On Mon, 05 Feb 2024 11:28:59 +0900
> > > torikoshia  wrote:
> > >
> > > > > Based on this, I've made a patch.
> > > > > based on COPY Synopsis: ON_ERROR 'error_action'
> > > > > on_error 'null', the  keyword NULL should be single quoted.
> > > >
> > > > As you mentioned, single quotation seems a little odd..
> > > >
> > > > I'm not sure what is the best name and syntax for this feature, but
> > > > since current error_action are verbs('stop' and 'ignore'), I feel 'null'
> > > > might not be appropriate.
> > >
> > > I am not in favour of using 'null' either, so I suggested to use
> > > "set_to_null" or more generic syntax like  "set_to (col, val)" in my
> > > previous post[1], although I'm not convinced what is the best either.
> > >
> > > [1] 
> > > https://www.postgresql.org/message-id/20240129172858.ccb6c77c3be95a295e2b2b44%40sraoss.co.jp
> >
> > Tom sugggested using a separate option, and I agree with the
> > suggestion. Taking this into consideration, I imagined something like
> > the following, for example.  Although I'm not sure we are actually
> > going to do whole-tuple replacement, the action name in this example
> > has the suffix '-column'.
> >
> > COPY (on_error 'replace-colomn', replacement 'null') ..
>
> Thank you for your information. I've found a post[1] you mentioned,
> where adding a separate option for error log destination was suggested.
>
> Considering consistency with other options, adding a separate option
> would be better if we want to specify a value to replace the invalid
> value, without introducing a complex syntax that allows options with
> more than one parameters. Maybe, if we allow to use values for the
> replacement other than NULL, we have to also add a option to specify
> a column (or a type)  for each replacement value. Or,  we may add a
> option to specify a list of replacement values as many as the number of
> columns, each of whose default is NULL.
>
> Anyway, I prefer 'replace" (or 'set_to') to just 'null' as the option
> value.
>

Let's say tabe t column (a,b,c)
if we support set_to_null(a,b), what should we do if column c has an error.
should we ignore this row or error out immediately?
also I am not sure it's doable to just extract columnList from the
function defGetCopyOnErrorChoice.

to make `COPY x from stdin (on_error set_to_null(a,b);` work,
we may need to refactor to gram.y, in a similar way we do force null

i am ok with
COPY x from stdin (on_error set_to_null);




Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

2024-02-05 Thread Yugo NAGATA
On Tue, 06 Feb 2024 09:39:09 +0900 (JST)
Kyotaro Horiguchi  wrote:

> At Mon, 5 Feb 2024 17:22:56 +0900, Yugo NAGATA  wrote in 
> > On Mon, 05 Feb 2024 11:28:59 +0900
> > torikoshia  wrote:
> > 
> > > > Based on this, I've made a patch.
> > > > based on COPY Synopsis: ON_ERROR 'error_action'
> > > > on_error 'null', the  keyword NULL should be single quoted.
> > > 
> > > As you mentioned, single quotation seems a little odd..
> > > 
> > > I'm not sure what is the best name and syntax for this feature, but 
> > > since current error_action are verbs('stop' and 'ignore'), I feel 'null' 
> > > might not be appropriate.
> > 
> > I am not in favour of using 'null' either, so I suggested to use
> > "set_to_null" or more generic syntax like  "set_to (col, val)" in my
> > previous post[1], although I'm not convinced what is the best either.
> > 
> > [1] 
> > https://www.postgresql.org/message-id/20240129172858.ccb6c77c3be95a295e2b2b44%40sraoss.co.jp
> 
> Tom sugggested using a separate option, and I agree with the
> suggestion. Taking this into consideration, I imagined something like
> the following, for example.  Although I'm not sure we are actually
> going to do whole-tuple replacement, the action name in this example
> has the suffix '-column'.
> 
> COPY (on_error 'replace-colomn', replacement 'null') ..

Thank you for your information. I've found a post[1] you mentioned, 
where adding a separate option for error log destination was suggested. 

Considering consistency with other options, adding a separate option
would be better if we want to specify a value to replace the invalid
value, without introducing a complex syntax that allows options with
more than one parameters. Maybe, if we allow to use values for the
replacement other than NULL, we have to also add a option to specify
a column (or a type)  for each replacement value. Or,  we may add a
option to specify a list of replacement values as many as the number of
columns, each of whose default is NULL.

Anyway, I prefer 'replace" (or 'set_to') to just 'null' as the option
value.

[1] https://www.postgresql.org/message-id/2070915.1705527477%40sss.pgh.pa.us

Regards,
Yugo Nagata


> regards.
> 
> -- 
> Kyotaro Horiguchi
> NTT Open Source Software Center


-- 
Yugo NAGATA 




Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

2024-02-05 Thread Kyotaro Horiguchi
At Mon, 5 Feb 2024 17:22:56 +0900, Yugo NAGATA  wrote in 
> On Mon, 05 Feb 2024 11:28:59 +0900
> torikoshia  wrote:
> 
> > > Based on this, I've made a patch.
> > > based on COPY Synopsis: ON_ERROR 'error_action'
> > > on_error 'null', the  keyword NULL should be single quoted.
> > 
> > As you mentioned, single quotation seems a little odd..
> > 
> > I'm not sure what is the best name and syntax for this feature, but 
> > since current error_action are verbs('stop' and 'ignore'), I feel 'null' 
> > might not be appropriate.
> 
> I am not in favour of using 'null' either, so I suggested to use
> "set_to_null" or more generic syntax like  "set_to (col, val)" in my
> previous post[1], although I'm not convinced what is the best either.
> 
> [1] 
> https://www.postgresql.org/message-id/20240129172858.ccb6c77c3be95a295e2b2b44%40sraoss.co.jp

Tom sugggested using a separate option, and I agree with the
suggestion. Taking this into consideration, I imagined something like
the following, for example.  Although I'm not sure we are actually
going to do whole-tuple replacement, the action name in this example
has the suffix '-column'.

COPY (on_error 'replace-colomn', replacement 'null') ..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

2024-02-05 Thread Yugo NAGATA
On Mon, 05 Feb 2024 11:28:59 +0900
torikoshia  wrote:

> > Based on this, I've made a patch.
> > based on COPY Synopsis: ON_ERROR 'error_action'
> > on_error 'null', the  keyword NULL should be single quoted.
> 
> As you mentioned, single quotation seems a little odd..
> 
> I'm not sure what is the best name and syntax for this feature, but 
> since current error_action are verbs('stop' and 'ignore'), I feel 'null' 
> might not be appropriate.

I am not in favour of using 'null' either, so I suggested to use
"set_to_null" or more generic syntax like  "set_to (col, val)" in my
previous post[1], although I'm not convinced what is the best either.

[1] 
https://www.postgresql.org/message-id/20240129172858.ccb6c77c3be95a295e2b2b44%40sraoss.co.jp

Regards,
Yugo Nagata

-- 
Yugo NAGATA 




Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

2024-02-04 Thread jian he
On Mon, Feb 5, 2024 at 10:29 AM torikoshia  wrote:
>
> Hi,
>
> On 2024-02-03 15:22, jian he wrote:
> > The idea of on_error is to tolerate errors, I think.
> > if a column has a not null constraint, let it cannot be used with
> > (on_error 'null')
>
> > +   /*
> > +* we can specify on_error 'null', but it can only apply to
> > columns
> > +* don't have not null constraint.
> > +   */
> > +   if (att->attnotnull && cstate->opts.on_error ==
> > COPY_ON_ERROR_NULL)
> > +   ereport(ERROR,
> > +   (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
> > +errmsg("copy on_error 'null' cannot be used with
> > not null constraint column")));
>
> This means we cannot use ON_ERROR 'null' even when there is one column
> which have NOT NULL constraint, i.e. primary key, right?
> IMHO this is strong constraint and will decrease the opportunity to use
> this feature.

I don't want to fail in the middle of bulk inserts,
so I thought immediately erroring out would be a great idea.
Let's see what other people think.




Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

2024-02-04 Thread torikoshia

Hi,

On 2024-02-03 15:22, jian he wrote:

The idea of on_error is to tolerate errors, I think.
if a column has a not null constraint, let it cannot be used with
(on_error 'null')



+   /*
+* we can specify on_error 'null', but it can only apply to 
columns

+* don't have not null constraint.
+   */
+   if (att->attnotnull && cstate->opts.on_error == 
COPY_ON_ERROR_NULL)

+   ereport(ERROR,
+   (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+errmsg("copy on_error 'null' cannot be used with 
not null constraint column")));


This means we cannot use ON_ERROR 'null' even when there is one column 
which have NOT NULL constraint, i.e. primary key, right?
IMHO this is strong constraint and will decrease the opportunity to use 
this feature.


It might be better to allow error_action 'null' for tables which have 
NOT NULL constraint columns, and when facing soft errors for those rows, 
skip that row or stop COPY.



Based on this, I've made a patch.
based on COPY Synopsis: ON_ERROR 'error_action'
on_error 'null', the  keyword NULL should be single quoted.


As you mentioned, single quotation seems a little odd..

I'm not sure what is the best name and syntax for this feature, but 
since current error_action are verbs('stop' and 'ignore'), I feel 'null' 
might not be appropriate.



demo:
COPY check_ign_err FROM STDIN WITH (on_error 'null');
1 {1} a
2 {2} 1
3 {3} 2
4 {4} b
a {5} c
\.

\pset null NULL

SELECT * FROM check_ign_err;
  n   |  m  |  k
--+-+--
1 | {1} | NULL
2 | {2} |1
3 | {3} |2
4 | {4} | NULL
 NULL | {5} | NULL


Since we notice the number of ignored rows when ON_ERROR is 'ignore', 
users may want to know the number of rows which was changed to NULL when 
using ON_ERROR 'null'.


--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation




Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

2024-02-02 Thread jian he
The idea of on_error is to tolerate errors, I think.
if a column has a not null constraint, let it cannot be used with
(on_error 'null')

Based on this, I've made a patch.
based on COPY Synopsis: ON_ERROR 'error_action'
on_error 'null', the  keyword NULL should be single quoted.

demo:
COPY check_ign_err FROM STDIN WITH (on_error 'null');
1 {1} a
2 {2} 1
3 {3} 2
4 {4} b
a {5} c
\.

\pset null NULL

SELECT * FROM check_ign_err;
  n   |  m  |  k
--+-+--
1 | {1} | NULL
2 | {2} |1
3 | {3} |2
4 | {4} | NULL
 NULL | {5} | NULL
From 19afa942af22fd3d2ed2436c6bc7ce02f00bb570 Mon Sep 17 00:00:00 2001
From: jian he 
Date: Sat, 3 Feb 2024 14:04:08 +0800
Subject: [PATCH v1 1/1] introduce copy on_error 'null' option

on_error 'null', null needs single quoted.
any data type conversion error will treat that column value to be NULL.
it will not work with column have not null constraint,
we check this at BeginCopyFrom.

discussion: https://www.postgresql.org/message-id/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=bp3d1_asfe...@mail.gmail.com
---
 doc/src/sgml/ref/copy.sgml   |  1 +
 src/backend/commands/copy.c  |  2 ++
 src/backend/commands/copyfrom.c  | 28 
 src/backend/commands/copyfromparse.c | 27 +--
 src/include/commands/copy.h  |  1 +
 src/test/regress/expected/copy2.out  | 26 ++
 src/test/regress/sql/copy2.sql   | 28 
 7 files changed, 107 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 55764fc1..d3c4ebdc 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -390,6 +390,7 @@ COPY { table_name [ ( error_action value of
   stop means fail the command, while
   ignore means discard the input row and continue with the next one.
+  null means the input value will be null and continue with the next one.
   The default is stop.
  
  
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cc0786c6..01ce47b0 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -422,6 +422,8 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
 		return COPY_ON_ERROR_STOP;
 	if (pg_strcasecmp(sval, "ignore") == 0)
 		return COPY_ON_ERROR_IGNORE;
+	if (pg_strcasecmp(sval, "null") == 0)
+		return COPY_ON_ERROR_NULL;
 
 	ereport(ERROR,
 			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 1fe70b91..f4c5704e 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1005,6 +1005,7 @@ CopyFrom(CopyFromState cstate)
 			 * information according to ON_ERROR.
 			 */
 			if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+			{
 
 /*
  * Just make ErrorSaveContext ready for the next NextCopyFrom.
@@ -1013,11 +1014,18 @@ CopyFrom(CopyFromState cstate)
  */
 cstate->escontext->error_occurred = false;
 
-			/* Report that this tuple was skipped by the ON_ERROR clause */
-			pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED,
-		 ++skipped);
+/* Report that this tuple was skipped by the ON_ERROR clause */
+pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED,
+			++skipped);
 
-			continue;
+continue;
+			}
+			/*
+			 * Just make ErrorSaveContext ready for the next NextCopyFrom.
+			 *
+			*/
+			if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+cstate->escontext->error_occurred = false;
 		}
 
 		ExecStoreVirtualTuple(myslot);
@@ -1313,6 +1321,7 @@ CopyFrom(CopyFromState cstate)
 	error_context_stack = errcallback.previous;
 
 	if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
+		cstate->opts.on_error != COPY_ON_ERROR_NULL &&
 		cstate->num_errors > 0)
 		ereport(NOTICE,
 errmsg_plural("%llu row was skipped due to data type incompatibility",
@@ -1468,6 +1477,8 @@ BeginCopyFrom(ParseState *pstate,
 		 */
 		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
 			cstate->escontext->details_wanted = false;
+		else if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+			cstate->escontext->details_wanted = false;
 	}
 	else
 		cstate->escontext = NULL;
@@ -1621,6 +1632,15 @@ BeginCopyFrom(ParseState *pstate,
 		if (att->attisdropped)
 			continue;
 
+		/*
+		 * we can specify on_error 'null', but it can only apply to columns
+		 * don't have not null constraint.
+		*/
+		if (att->attnotnull && cstate->opts.on_error == COPY_ON_ERROR_NULL)
+			ereport(ERROR,
+	(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+	 errmsg("copy on_error 'null' cannot be used with not null constraint column")));
+
 		/* Fetch the input function and typioparam info */
 		if (cstate->opts.binary)
 			getTypeBinaryInputInfo(att->atttypid,
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 7cacd0b7..9475a9dc 100644
--- 

Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

2024-01-29 Thread Yugo NAGATA
On Fri, 26 Jan 2024 08:08:29 -0700
"David G. Johnston"  wrote:

> Hi,
> 
> The option choice of "ignore" in the COPY ON_ERROR clause seems overly
> generic.  There would seem to be two relevant ways to ignore bad column
> input data - drop the entire row or just set the column value to null.  I
> can see us wanting to provide the set to null option and in any case having
> the option name be explicit that it ignores the row seems like a good idea.

I am not in favour of renaming the option name "ignore", instead we can
use another style of name for the option to set the column value to NULL,
for example, "set_to_null".

(Maybe, we can make a more generic option "set_to (col, val)" that can set
the value of column specified by "col" value to the specified value "val" 
(e.g. 'N/A') on a soft error, although the syntax would be a bit complex...) 

IMO, it is more simple to define "ignore" as to skip the entire row rather
than having variety of "ignore". Once defined it so, the option to set the
column value to NULL should not be called "ignore" because values in other
columns will be inserted.

Regards,
Yugo Nagata

> 
> David J.


-- 
Yugo NAGATA 




Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

2024-01-28 Thread David G. Johnston
On Sun, Jan 28, 2024 at 4:51 PM jian he  wrote:

> On Fri, Jan 26, 2024 at 11:09 PM David G. Johnston
>  wrote:
> >
> > Hi,
> >
> > The option choice of "ignore" in the COPY ON_ERROR clause seems overly
> generic.  There would seem to be two relevant ways to ignore bad column
> input data - drop the entire row or just set the column value to null.  I
> can see us wanting to provide the set to null option and in any case having
> the option name be explicit that it ignores the row seems like a good idea.
>
> two issue I found out while playing around with it;
> create table x1(a int not null, b int not null );
>
> another issue:
> COPY x1 from stdin (on_error null);
>
> when we already have `not null` top level constraint for table x1.
> Do we need an error immediately?
> "on_error null" seems to conflict with `not null` constraint (assume
> refers to the same column).
> it may fail while doing bulk inserts while on_error is set to null
> because of violating a not null constraint.
>

You should not error immediately since whether or not there is a problem is
table and data dependent.  I would not check for the case of all columns
being defined not null and just let the mismatch happen.

That said, maybe with this being a string we can accept something like:
'null, ignore'

And so if attempting to place any one null fails, assuming we can make that
a soft error too, we would then ignore the entire row.

David J.


Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

2024-01-28 Thread jian he
On Fri, Jan 26, 2024 at 11:09 PM David G. Johnston
 wrote:
>
> Hi,
>
> The option choice of "ignore" in the COPY ON_ERROR clause seems overly 
> generic.  There would seem to be two relevant ways to ignore bad column input 
> data - drop the entire row or just set the column value to null.  I can see 
> us wanting to provide the set to null option and in any case having the 
> option name be explicit that it ignores the row seems like a good idea.

two issue I found out while playing around with it;
create table x1(a int not null, b int not null );

you can only do:
COPY x1 from stdin (on_error 'null');
but you cannot do
COPY x1 from stdin (on_error null);
we need to hack the gram.y to escape the "null". I don't know how to
make it work.
related post I found:
https://stackoverflow.com/questions/31786611/how-to-escape-flex-keyword

another issue:
COPY x1 from stdin (on_error null);

when we already have `not null` top level constraint for table x1.
Do we need an error immediately?
"on_error null" seems to conflict with `not null` constraint (assume
refers to the same column).
it may fail while doing bulk inserts while on_error is set to null
because of violating a not null constraint.