On 22/01/26 11:45, jian he wrote:
Should FORCE_NOT_NULL be allowed to be used with ON_ERROR set_null? It
seems to me that ON_ERROR set_null overwrite the FORCE_NOT_NULL
behaviour:

postgres=# create table t4(a int, b varchar(5));
CREATE TABLE

postgres=# copy t4 from 'data.csv' with (FORCE_NOT_NULL(b), format csv, 
delimiter ',', NULL 'NULL', ON_ERROR set_null);
NOTICE:  invalid values in 2 rows were replaced with null due to data type 
incompatibility
COPY 5

postgres=# \pset null 'NULL'
Null display is "NULL".
postgres=# select * from t4;
  a |  b
---+------
  1 | aaaa
  2 | bbbb
  2 | NULL
  2 | NULL
  5 | NULL
(5 rows)

Note that only the ccccc rows on .csv file was inserted with a NULL
value on b column. The 5,NULL row was inserted with a "NULL" string as a
value:

postgres=# select * from t4 where b is null;
  a |  b
---+------
  2 | NULL
  2 | NULL
(2 rows)

The contents of data.csv:
     1,aaaa
     2,bbbb
     2,ccccc
     2,ccccc
     5,NULL

Perhaps we should block the usage of FORCE_NOT_NULL with ON_ERROR
SET_NULL?

FORCE_NOT_NULL is related to how we handle NULL string in column value.

We first process cstate->opts.force_notnull_flags, cstate->opts.force_null_flags
then InputFunctionCallSafe.
see copyfromparse.c, CopyFromTextLikeOneRow ``if (is_csv)``loop.

I think these two are unrelated things, FORCE_NOT_NULL should be fine with
ON_ERROR SET_NULL.
you can see related tests in
https://git.postgresql.org/cgit/postgresql.git/tree/src/test/regress/sql/copy2.sql#n330

Am I missing something?

Yeah, after some more thinking it seems ok to use both options together. I just found a bit strange when using integer columns. Consider this example:

cat data.csv
1,11
2,22
3,
4,44

postgres=# create table t(a int not null, b int);
CREATE TABLE

postgres=# copy t from '/Users/matheus/dev/pgdev/copy-on-error-set-null/data.csv' with (FORCE_NOT_NULL(b), format csv, delimiter ',', ON_ERROR set_null);
NOTICE:  1 row was replaced with null due to data type incompatibility
COPY 4

postgres=# select * from t where b is null;
 a | b
---+---
 3 |
(1 row)

We are requiring a not null value on column b but we are still generating rows with null values on b.

The reasoning on this is that the row 3 would generate a "invalid input syntax for type integer" error and the ON_ERROR set_null fix this by inserting a NULL value. It make sense I think but I'm wondering if it could cause any confusion?


On monitoring.sgml we have the following for pg_stat_progress_copy
tuples_skipped:
        Number of tuples skipped because they contain malformed data.
        This counter only advances when a value other than
        <literal>stop</literal> is specified to the <literal>ON_ERROR</literal>

IIUC we are not updating this view if we set a column to NULL due to an
error, perhaps this documentation should be updated to mention that it
will not be updated with ON_ERROR set_null?


IMHO, we don't need to mention ON_ERROR set_null, since we do not support it.
change to the following should be ok, i think.

       <para>
        Number of tuples skipped because they contain malformed data.
        This counter only advances when
        <literal>ignore</literal> is specified to the 
<literal>ON_ERROR</literal>
        option.
</para></entry>

It looks good, I was thinking in something like this.


I may have missing something, but we are still considering implementing
the REJECT_LIMIT + ON_ERROR set_null?
Possibly as a separate patch later.
Ok, good, thanks.

--
Matheus Alcantara
EDB: https://www.enterprisedb.com


Reply via email to