Daniel Gustafsson <dan...@yesql.se> writes:
>> On 8 Nov 2023, at 19:18, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> I think an actually usable feature of this sort would involve
>> copying all the failed lines to some alternate output medium,
>> perhaps a second table with a TEXT column to receive the original
>> data line.  (Or maybe an array of text that could receive the
>> broken-down field values?)  Maybe we could dump the message info,
>> line number, field name etc into additional columns.

> I agree that the errors should be easily visible to the user in some way.  The
> feature is for sure interesting, especially in data warehouse type jobs where
> dirty data is often ingested.

I agree it's interesting, but we need to get it right the first time.

Here is a very straw-man-level sketch of what I think might work.
The option to COPY FROM looks something like

        ERRORS TO other_table_name (item [, item [, ...]])

where the "items" are keywords identifying the information item
we will insert into each successive column of the target table.
This design allows the user to decide which items are of use
to them.  I envision items like

LINENO  bigint          COPY line number, counting from 1
LINE    text            raw text of line (after encoding conversion)
FIELDS  text[]          separated, de-escaped string fields (the data
                        that was or would be fed to input functions)
FIELD   text            name of troublesome field, if field-specific
MESSAGE text            error message text
DETAIL  text            error message detail, if any
SQLSTATE text           error SQLSTATE code

Some of these would have to be populated as NULL if we didn't get
that far in processing the line.  In the worst case, which is
encoding conversion failure, I think we couldn't populate any of
the data items except LINENO.

Not sure if we need to insist that the target table columns be
exactly the data types I show above.  It'd be nice to allow
the LINENO target to be plain int, perhaps.  OTOH, do we really
want to have to deal with issues like conversion failures while
trying to report an error?

> As a data point, Greenplum has this feature with additional SQL syntax to
> control it:
>       COPY .. LOG ERRORS SEGMENT REJECT LIMIT xyz ROWS;
> LOG ERRORS instructs the database to log the faulty rows and SEGMENT REJECT
> LIMIT xyz ROWS sets the limit of how many rows can be faulty before the
> operation errors out.  I'm not at all advocating that we should mimic this,
> just wanted to add a reference to postgres derivative where this has been
> implemented.

Hm.  A "reject limit" might be a useful add-on, but I wouldn't advocate
including it in the initial patch.

                        regards, tom lane


Reply via email to