subject:"Re\: POC PATCH\: copy from ... exceptions to\: \(was Re\: VLDB Features\)"

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-04-17 Thread Masahiko Sawada

On Wed, Apr 17, 2024 at 4:28 PM torikoshia  wrote:
>
> On 2024-04-16 13:16, Masahiko Sawada wrote:
> > On Tue, Apr 2, 2024 at 7:34 PM torikoshia 
> > wrote:
> >>
> >> On 2024-04-01 11:31, Masahiko Sawada wrote:
> >> > On Fri, Mar 29, 2024 at 11:54 AM torikoshia
> >> >  wrote:
> >> >>
> >> >> On 2024-03-28 21:54, Masahiko Sawada wrote:
> >> >> > On Thu, Mar 28, 2024 at 9:38 PM torikoshia 
> >> >> > 
> >> >> > wrote:
> >> >> >>
> >> >> >> On 2024-03-28 10:20, Masahiko Sawada wrote:
> >> >> >> > Hi,
> >> >> >> >
> >> >> >> > On Thu, Jan 18, 2024 at 5:33 PM Masahiko Sawada 
> >> >> >> > 
> >> >> >> > wrote:
> >> >> >> >>
> >> >> >> >> On Thu, Jan 18, 2024 at 4:59 PM Alexander Korotkov
> >> >> >> >>  wrote:
> >> >> >> >> >
> >> >> >> >> > On Thu, Jan 18, 2024 at 4:16 AM torikoshia 
> >> >> >> >> >  wrote:
> >> >> >> >> > > On 2024-01-18 10:10, jian he wrote:
> >> >> >> >> > > > On Thu, Jan 18, 2024 at 8:57 AM Masahiko Sawada 
> >> >> >> >> > > > 
> >> >> >> >> > > > wrote:
> >> >> >> >> > > >> On Thu, Jan 18, 2024 at 6:38 AM Tom Lane 
> >> >> >> >> > > >>  wrote:
> >> >> >> >> > > >> > Kyotaro-san's suggestion isn't bad, though I might 
> >> >> >> >> > > >> > shorten it to
> >> >> >> >> > > >> > error_action {error|ignore|log} (or perhaps "stop" 
> >> >> >> >> > > >> > instead of "error")?
> >> >> >> >> > > >> > You will need a separate parameter anyway to specify the 
> >> >> >> >> > > >> > destination
> >> >> >> >> > > >> > of "log", unless "none" became an illegal table name 
> >> >> >> >> > > >> > when I wasn't
> >> >> >> >> > > >> > looking.  I don't buy that one parameter that has some 
> >> >> >> >> > > >> > special values
> >> >> >> >> > > >> > while other values could be names will be a good design. 
> >> >> >> >> > > >> >  Moreover,
> >> >> >> >> > > >> > what if we want to support (say) log-to-file along with 
> >> >> >> >> > > >> > log-to-table?
> >> >> >> >> > > >> > Trying to distinguish a file name from a table name 
> >> >> >> >> > > >> > without any other
> >> >> >> >> > > >> > context seems impossible.
> >> >> >> >> > > >>
> >> >> >> >> > > >> I've been thinking we can add more values to this option 
> >> >> >> >> > > >> to log errors
> >> >> >> >> > > >> not only to the server logs but also to the error table 
> >> >> >> >> > > >> (not sure
> >> >> >> >> > > >> details but I imagined an error table is created for each 
> >> >> >> >> > > >> table on
> >> >> >> >> > > >> error), without an additional option for the destination 
> >> >> >> >> > > >> name. The
> >> >> >> >> > > >> values would be like error_action 
> >> >> >> >> > > >> {error|ignore|save-logs|save-table}.
> >> >> >> >> > > >>
> >> >> >> >> > > >
> >> >> >> >> > > > another idea:
> >> >> >> >> > > > on_error {error|ignore|other_future_option}
> >> >> >> >> > > > if not specified then by default ERROR.
> >> >> >> >> > > > You can also specify ERROR or IGNORE for now.
> >> >> >> >> > > >
> >> >> >> >> > > > I agree, the parameter "error_action" is better than 
> >> >> >> >> > > > "location".
> >> >> >> >> > >
> >> >> >> >> > > I'm not sure whether error_action or on_error is better, but 
> >> >> >> >> > > either way
> >> >> >> >> > > "error_action error" and "on_error error" seems a bit odd to 
> >> >> >> >> > > me.
> >> >> >> >> > > I feel "stop" is better for both cases as Tom suggested.
> >> >> >> >> >
> >> >> >> >> > OK.  What about this?
> >> >> >> >> > on_error {stop|ignore|other_future_option}
> >> >> >> >> > where other_future_option might be compound like "file 
> >> >> >> >> > 'copy.log'" or
> >> >> >> >> > "table 'copy_log'".
> >> >> >> >>
> >> >> >> >> +1
> >> >> >> >>
> >> >> >> >
> >> >> >> > I realized that ON_ERROR syntax synoposis in the documentation is 
> >> >> >> > not
> >> >> >> > correct. The option doesn't require the value to be quoted and the
> >> >> >> > value can be omitted. The attached patch fixes it.
> >> >> >> >
> >> >> >> > Regards,
> >> >> >>
> >> >> >> Thanks!
> >> >> >>
> >> >> >> Attached patch fixes the doc, but I'm wondering perhaps it might be
> >> >> >> better to modify the codes to prohibit abbreviation of the value.
> >> >> >>
> >> >> >> When seeing the query which abbreviates ON_ERROR value, I feel it's
> >> >> >> not
> >> >> >> obvious what happens compared to other options which tolerates
> >> >> >> abbreviation of the value such as FREEZE or HEADER.
> >> >> >>
> >> >> >>COPY t1 FROM stdin WITH (ON_ERROR);
> >> >> >>
> >> >> >> What do you think?
> >> >> >
> >> >> > Indeed. Looking at options of other commands such as VACUUM and
> >> >> > EXPLAIN, I can see that we can omit a boolean value, but non-boolean
> >> >> > parameters require its value. The HEADER option is not a pure boolean
> >> >> > parameter but we can omit the value. It seems to be for backward
> >> >> > compatibility; it used to be a boolean parameter. I agree that the
> >> >> > above example would confuse users.
> >> >> >
> >> >> > Regards,
> >> >>
> >> >> Thanks for your comment!
> >> >>
> >> >> Attached a patch

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-04-17 Thread torikoshia


On 2024-04-16 13:16, Masahiko Sawada wrote:
On Tue, Apr 2, 2024 at 7:34 PM torikoshia  
wrote:


On 2024-04-01 11:31, Masahiko Sawada wrote:
> On Fri, Mar 29, 2024 at 11:54 AM torikoshia
>  wrote:
>>
>> On 2024-03-28 21:54, Masahiko Sawada wrote:
>> > On Thu, Mar 28, 2024 at 9:38 PM torikoshia 
>> > wrote:
>> >>
>> >> On 2024-03-28 10:20, Masahiko Sawada wrote:
>> >> > Hi,
>> >> >
>> >> > On Thu, Jan 18, 2024 at 5:33 PM Masahiko Sawada 
>> >> > wrote:
>> >> >>
>> >> >> On Thu, Jan 18, 2024 at 4:59 PM Alexander Korotkov
>> >> >>  wrote:
>> >> >> >
>> >> >> > On Thu, Jan 18, 2024 at 4:16 AM torikoshia 
 wrote:
>> >> >> > > On 2024-01-18 10:10, jian he wrote:
>> >> >> > > > On Thu, Jan 18, 2024 at 8:57 AM Masahiko Sawada 

>> >> >> > > > wrote:
>> >> >> > > >> On Thu, Jan 18, 2024 at 6:38 AM Tom Lane  
wrote:
>> >> >> > > >> > Kyotaro-san's suggestion isn't bad, though I might shorten it 
to
>> >> >> > > >> > error_action {error|ignore|log} (or perhaps "stop" instead of 
"error")?
>> >> >> > > >> > You will need a separate parameter anyway to specify the 
destination
>> >> >> > > >> > of "log", unless "none" became an illegal table name when I 
wasn't
>> >> >> > > >> > looking.  I don't buy that one parameter that has some 
special values
>> >> >> > > >> > while other values could be names will be a good design.  
Moreover,
>> >> >> > > >> > what if we want to support (say) log-to-file along with 
log-to-table?
>> >> >> > > >> > Trying to distinguish a file name from a table name without 
any other
>> >> >> > > >> > context seems impossible.
>> >> >> > > >>
>> >> >> > > >> I've been thinking we can add more values to this option to log 
errors
>> >> >> > > >> not only to the server logs but also to the error table (not 
sure
>> >> >> > > >> details but I imagined an error table is created for each table 
on
>> >> >> > > >> error), without an additional option for the destination name. 
The
>> >> >> > > >> values would be like error_action 
{error|ignore|save-logs|save-table}.
>> >> >> > > >>
>> >> >> > > >
>> >> >> > > > another idea:
>> >> >> > > > on_error {error|ignore|other_future_option}
>> >> >> > > > if not specified then by default ERROR.
>> >> >> > > > You can also specify ERROR or IGNORE for now.
>> >> >> > > >
>> >> >> > > > I agree, the parameter "error_action" is better than "location".
>> >> >> > >
>> >> >> > > I'm not sure whether error_action or on_error is better, but 
either way
>> >> >> > > "error_action error" and "on_error error" seems a bit odd to me.
>> >> >> > > I feel "stop" is better for both cases as Tom suggested.
>> >> >> >
>> >> >> > OK.  What about this?
>> >> >> > on_error {stop|ignore|other_future_option}
>> >> >> > where other_future_option might be compound like "file 'copy.log'" or
>> >> >> > "table 'copy_log'".
>> >> >>
>> >> >> +1
>> >> >>
>> >> >
>> >> > I realized that ON_ERROR syntax synoposis in the documentation is not
>> >> > correct. The option doesn't require the value to be quoted and the
>> >> > value can be omitted. The attached patch fixes it.
>> >> >
>> >> > Regards,
>> >>
>> >> Thanks!
>> >>
>> >> Attached patch fixes the doc, but I'm wondering perhaps it might be
>> >> better to modify the codes to prohibit abbreviation of the value.
>> >>
>> >> When seeing the query which abbreviates ON_ERROR value, I feel it's
>> >> not
>> >> obvious what happens compared to other options which tolerates
>> >> abbreviation of the value such as FREEZE or HEADER.
>> >>
>> >>COPY t1 FROM stdin WITH (ON_ERROR);
>> >>
>> >> What do you think?
>> >
>> > Indeed. Looking at options of other commands such as VACUUM and
>> > EXPLAIN, I can see that we can omit a boolean value, but non-boolean
>> > parameters require its value. The HEADER option is not a pure boolean
>> > parameter but we can omit the value. It seems to be for backward
>> > compatibility; it used to be a boolean parameter. I agree that the
>> > above example would confuse users.
>> >
>> > Regards,
>>
>> Thanks for your comment!
>>
>> Attached a patch which modifies the code to prohibit omission of its
>> value.
>>
>> I was a little unsure about adding a regression test for this, but I
>> have not added it since other COPY option doesn't test the omission of
>> its value.
>
> Probably should we change the doc as well since ON_ERROR value doesn't
> necessarily need to be single-quoted?

Agreed.
Since it seems this issue is independent from the omission of ON_ERROR
option value, attached a separate patch.



Thank you for the patches! These patches look good to me. I'll push
them, barring any objections.

Regards,


Thanks for your review and apply!

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-04-15 Thread Masahiko Sawada

On Tue, Apr 2, 2024 at 7:34 PM torikoshia  wrote:
>
> On 2024-04-01 11:31, Masahiko Sawada wrote:
> > On Fri, Mar 29, 2024 at 11:54 AM torikoshia
> >  wrote:
> >>
> >> On 2024-03-28 21:54, Masahiko Sawada wrote:
> >> > On Thu, Mar 28, 2024 at 9:38 PM torikoshia 
> >> > wrote:
> >> >>
> >> >> On 2024-03-28 10:20, Masahiko Sawada wrote:
> >> >> > Hi,
> >> >> >
> >> >> > On Thu, Jan 18, 2024 at 5:33 PM Masahiko Sawada 
> >> >> > 
> >> >> > wrote:
> >> >> >>
> >> >> >> On Thu, Jan 18, 2024 at 4:59 PM Alexander Korotkov
> >> >> >>  wrote:
> >> >> >> >
> >> >> >> > On Thu, Jan 18, 2024 at 4:16 AM torikoshia 
> >> >> >> >  wrote:
> >> >> >> > > On 2024-01-18 10:10, jian he wrote:
> >> >> >> > > > On Thu, Jan 18, 2024 at 8:57 AM Masahiko Sawada 
> >> >> >> > > > 
> >> >> >> > > > wrote:
> >> >> >> > > >> On Thu, Jan 18, 2024 at 6:38 AM Tom Lane  
> >> >> >> > > >> wrote:
> >> >> >> > > >> > Kyotaro-san's suggestion isn't bad, though I might shorten 
> >> >> >> > > >> > it to
> >> >> >> > > >> > error_action {error|ignore|log} (or perhaps "stop" instead 
> >> >> >> > > >> > of "error")?
> >> >> >> > > >> > You will need a separate parameter anyway to specify the 
> >> >> >> > > >> > destination
> >> >> >> > > >> > of "log", unless "none" became an illegal table name when I 
> >> >> >> > > >> > wasn't
> >> >> >> > > >> > looking.  I don't buy that one parameter that has some 
> >> >> >> > > >> > special values
> >> >> >> > > >> > while other values could be names will be a good design.  
> >> >> >> > > >> > Moreover,
> >> >> >> > > >> > what if we want to support (say) log-to-file along with 
> >> >> >> > > >> > log-to-table?
> >> >> >> > > >> > Trying to distinguish a file name from a table name without 
> >> >> >> > > >> > any other
> >> >> >> > > >> > context seems impossible.
> >> >> >> > > >>
> >> >> >> > > >> I've been thinking we can add more values to this option to 
> >> >> >> > > >> log errors
> >> >> >> > > >> not only to the server logs but also to the error table (not 
> >> >> >> > > >> sure
> >> >> >> > > >> details but I imagined an error table is created for each 
> >> >> >> > > >> table on
> >> >> >> > > >> error), without an additional option for the destination 
> >> >> >> > > >> name. The
> >> >> >> > > >> values would be like error_action 
> >> >> >> > > >> {error|ignore|save-logs|save-table}.
> >> >> >> > > >>
> >> >> >> > > >
> >> >> >> > > > another idea:
> >> >> >> > > > on_error {error|ignore|other_future_option}
> >> >> >> > > > if not specified then by default ERROR.
> >> >> >> > > > You can also specify ERROR or IGNORE for now.
> >> >> >> > > >
> >> >> >> > > > I agree, the parameter "error_action" is better than 
> >> >> >> > > > "location".
> >> >> >> > >
> >> >> >> > > I'm not sure whether error_action or on_error is better, but 
> >> >> >> > > either way
> >> >> >> > > "error_action error" and "on_error error" seems a bit odd to me.
> >> >> >> > > I feel "stop" is better for both cases as Tom suggested.
> >> >> >> >
> >> >> >> > OK.  What about this?
> >> >> >> > on_error {stop|ignore|other_future_option}
> >> >> >> > where other_future_option might be compound like "file 'copy.log'" 
> >> >> >> > or
> >> >> >> > "table 'copy_log'".
> >> >> >>
> >> >> >> +1
> >> >> >>
> >> >> >
> >> >> > I realized that ON_ERROR syntax synoposis in the documentation is not
> >> >> > correct. The option doesn't require the value to be quoted and the
> >> >> > value can be omitted. The attached patch fixes it.
> >> >> >
> >> >> > Regards,
> >> >>
> >> >> Thanks!
> >> >>
> >> >> Attached patch fixes the doc, but I'm wondering perhaps it might be
> >> >> better to modify the codes to prohibit abbreviation of the value.
> >> >>
> >> >> When seeing the query which abbreviates ON_ERROR value, I feel it's
> >> >> not
> >> >> obvious what happens compared to other options which tolerates
> >> >> abbreviation of the value such as FREEZE or HEADER.
> >> >>
> >> >>COPY t1 FROM stdin WITH (ON_ERROR);
> >> >>
> >> >> What do you think?
> >> >
> >> > Indeed. Looking at options of other commands such as VACUUM and
> >> > EXPLAIN, I can see that we can omit a boolean value, but non-boolean
> >> > parameters require its value. The HEADER option is not a pure boolean
> >> > parameter but we can omit the value. It seems to be for backward
> >> > compatibility; it used to be a boolean parameter. I agree that the
> >> > above example would confuse users.
> >> >
> >> > Regards,
> >>
> >> Thanks for your comment!
> >>
> >> Attached a patch which modifies the code to prohibit omission of its
> >> value.
> >>
> >> I was a little unsure about adding a regression test for this, but I
> >> have not added it since other COPY option doesn't test the omission of
> >> its value.
> >
> > Probably should we change the doc as well since ON_ERROR value doesn't
> > necessarily need to be single-quoted?
>
> Agreed.
> Since it seems this issue is independent from the omission of ON_ERROR
> option value, attached a separate patch.
>

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-04-02 Thread torikoshia


On 2024-04-01 11:31, Masahiko Sawada wrote:
On Fri, Mar 29, 2024 at 11:54 AM torikoshia 
 wrote:


On 2024-03-28 21:54, Masahiko Sawada wrote:
> On Thu, Mar 28, 2024 at 9:38 PM torikoshia 
> wrote:
>>
>> On 2024-03-28 10:20, Masahiko Sawada wrote:
>> > Hi,
>> >
>> > On Thu, Jan 18, 2024 at 5:33 PM Masahiko Sawada 
>> > wrote:
>> >>
>> >> On Thu, Jan 18, 2024 at 4:59 PM Alexander Korotkov
>> >>  wrote:
>> >> >
>> >> > On Thu, Jan 18, 2024 at 4:16 AM torikoshia  
wrote:
>> >> > > On 2024-01-18 10:10, jian he wrote:
>> >> > > > On Thu, Jan 18, 2024 at 8:57 AM Masahiko Sawada 

>> >> > > > wrote:
>> >> > > >> On Thu, Jan 18, 2024 at 6:38 AM Tom Lane  
wrote:
>> >> > > >> > Kyotaro-san's suggestion isn't bad, though I might shorten it to
>> >> > > >> > error_action {error|ignore|log} (or perhaps "stop" instead of 
"error")?
>> >> > > >> > You will need a separate parameter anyway to specify the 
destination
>> >> > > >> > of "log", unless "none" became an illegal table name when I 
wasn't
>> >> > > >> > looking.  I don't buy that one parameter that has some special 
values
>> >> > > >> > while other values could be names will be a good design.  
Moreover,
>> >> > > >> > what if we want to support (say) log-to-file along with 
log-to-table?
>> >> > > >> > Trying to distinguish a file name from a table name without any 
other
>> >> > > >> > context seems impossible.
>> >> > > >>
>> >> > > >> I've been thinking we can add more values to this option to log 
errors
>> >> > > >> not only to the server logs but also to the error table (not sure
>> >> > > >> details but I imagined an error table is created for each table on
>> >> > > >> error), without an additional option for the destination name. The
>> >> > > >> values would be like error_action 
{error|ignore|save-logs|save-table}.
>> >> > > >>
>> >> > > >
>> >> > > > another idea:
>> >> > > > on_error {error|ignore|other_future_option}
>> >> > > > if not specified then by default ERROR.
>> >> > > > You can also specify ERROR or IGNORE for now.
>> >> > > >
>> >> > > > I agree, the parameter "error_action" is better than "location".
>> >> > >
>> >> > > I'm not sure whether error_action or on_error is better, but either 
way
>> >> > > "error_action error" and "on_error error" seems a bit odd to me.
>> >> > > I feel "stop" is better for both cases as Tom suggested.
>> >> >
>> >> > OK.  What about this?
>> >> > on_error {stop|ignore|other_future_option}
>> >> > where other_future_option might be compound like "file 'copy.log'" or
>> >> > "table 'copy_log'".
>> >>
>> >> +1
>> >>
>> >
>> > I realized that ON_ERROR syntax synoposis in the documentation is not
>> > correct. The option doesn't require the value to be quoted and the
>> > value can be omitted. The attached patch fixes it.
>> >
>> > Regards,
>>
>> Thanks!
>>
>> Attached patch fixes the doc, but I'm wondering perhaps it might be
>> better to modify the codes to prohibit abbreviation of the value.
>>
>> When seeing the query which abbreviates ON_ERROR value, I feel it's
>> not
>> obvious what happens compared to other options which tolerates
>> abbreviation of the value such as FREEZE or HEADER.
>>
>>COPY t1 FROM stdin WITH (ON_ERROR);
>>
>> What do you think?
>
> Indeed. Looking at options of other commands such as VACUUM and
> EXPLAIN, I can see that we can omit a boolean value, but non-boolean
> parameters require its value. The HEADER option is not a pure boolean
> parameter but we can omit the value. It seems to be for backward
> compatibility; it used to be a boolean parameter. I agree that the
> above example would confuse users.
>
> Regards,

Thanks for your comment!

Attached a patch which modifies the code to prohibit omission of its
value.

I was a little unsure about adding a regression test for this, but I
have not added it since other COPY option doesn't test the omission of
its value.


Probably should we change the doc as well since ON_ERROR value doesn't
necessarily need to be single-quoted?


Agreed.
Since it seems this issue is independent from the omission of ON_ERROR 
option value, attached a separate patch.



The rest looks good to me.

Alexander, what do you think about this change as you're the committer
of this feature?



--
Regards,

--
Atsushi Torikoshi
NTT DATA Group CorporationFrom 1b4bec3c2223246ec59ffb9eb7de2f1de27315f7 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Fri, 29 Mar 2024 11:36:12 +0900
Subject: [PATCH v1] Disallow ON_ERROR option without value

Currently ON_ERROR option of COPY allows to omit its value,
but the syntax synopsis in the documentation requires it.

Since it seems non-boolean parameters usually require its value
and it's not obvious what happens when value of ON_ERROR is
omitted, this patch disallows ON_ERROR without its value.
---
 src/backend/commands/copy.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 28cf8b040a..2719bf28b7 100644
---

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-03-31 Thread Masahiko Sawada

On Fri, Mar 29, 2024 at 11:54 AM torikoshia  wrote:
>
> On 2024-03-28 21:54, Masahiko Sawada wrote:
> > On Thu, Mar 28, 2024 at 9:38 PM torikoshia 
> > wrote:
> >>
> >> On 2024-03-28 10:20, Masahiko Sawada wrote:
> >> > Hi,
> >> >
> >> > On Thu, Jan 18, 2024 at 5:33 PM Masahiko Sawada 
> >> > wrote:
> >> >>
> >> >> On Thu, Jan 18, 2024 at 4:59 PM Alexander Korotkov
> >> >>  wrote:
> >> >> >
> >> >> > On Thu, Jan 18, 2024 at 4:16 AM torikoshia 
> >> >> >  wrote:
> >> >> > > On 2024-01-18 10:10, jian he wrote:
> >> >> > > > On Thu, Jan 18, 2024 at 8:57 AM Masahiko Sawada 
> >> >> > > > 
> >> >> > > > wrote:
> >> >> > > >> On Thu, Jan 18, 2024 at 6:38 AM Tom Lane  
> >> >> > > >> wrote:
> >> >> > > >> > Kyotaro-san's suggestion isn't bad, though I might shorten it 
> >> >> > > >> > to
> >> >> > > >> > error_action {error|ignore|log} (or perhaps "stop" instead of 
> >> >> > > >> > "error")?
> >> >> > > >> > You will need a separate parameter anyway to specify the 
> >> >> > > >> > destination
> >> >> > > >> > of "log", unless "none" became an illegal table name when I 
> >> >> > > >> > wasn't
> >> >> > > >> > looking.  I don't buy that one parameter that has some special 
> >> >> > > >> > values
> >> >> > > >> > while other values could be names will be a good design.  
> >> >> > > >> > Moreover,
> >> >> > > >> > what if we want to support (say) log-to-file along with 
> >> >> > > >> > log-to-table?
> >> >> > > >> > Trying to distinguish a file name from a table name without 
> >> >> > > >> > any other
> >> >> > > >> > context seems impossible.
> >> >> > > >>
> >> >> > > >> I've been thinking we can add more values to this option to log 
> >> >> > > >> errors
> >> >> > > >> not only to the server logs but also to the error table (not sure
> >> >> > > >> details but I imagined an error table is created for each table 
> >> >> > > >> on
> >> >> > > >> error), without an additional option for the destination name. 
> >> >> > > >> The
> >> >> > > >> values would be like error_action 
> >> >> > > >> {error|ignore|save-logs|save-table}.
> >> >> > > >>
> >> >> > > >
> >> >> > > > another idea:
> >> >> > > > on_error {error|ignore|other_future_option}
> >> >> > > > if not specified then by default ERROR.
> >> >> > > > You can also specify ERROR or IGNORE for now.
> >> >> > > >
> >> >> > > > I agree, the parameter "error_action" is better than "location".
> >> >> > >
> >> >> > > I'm not sure whether error_action or on_error is better, but either 
> >> >> > > way
> >> >> > > "error_action error" and "on_error error" seems a bit odd to me.
> >> >> > > I feel "stop" is better for both cases as Tom suggested.
> >> >> >
> >> >> > OK.  What about this?
> >> >> > on_error {stop|ignore|other_future_option}
> >> >> > where other_future_option might be compound like "file 'copy.log'" or
> >> >> > "table 'copy_log'".
> >> >>
> >> >> +1
> >> >>
> >> >
> >> > I realized that ON_ERROR syntax synoposis in the documentation is not
> >> > correct. The option doesn't require the value to be quoted and the
> >> > value can be omitted. The attached patch fixes it.
> >> >
> >> > Regards,
> >>
> >> Thanks!
> >>
> >> Attached patch fixes the doc, but I'm wondering perhaps it might be
> >> better to modify the codes to prohibit abbreviation of the value.
> >>
> >> When seeing the query which abbreviates ON_ERROR value, I feel it's
> >> not
> >> obvious what happens compared to other options which tolerates
> >> abbreviation of the value such as FREEZE or HEADER.
> >>
> >>COPY t1 FROM stdin WITH (ON_ERROR);
> >>
> >> What do you think?
> >
> > Indeed. Looking at options of other commands such as VACUUM and
> > EXPLAIN, I can see that we can omit a boolean value, but non-boolean
> > parameters require its value. The HEADER option is not a pure boolean
> > parameter but we can omit the value. It seems to be for backward
> > compatibility; it used to be a boolean parameter. I agree that the
> > above example would confuse users.
> >
> > Regards,
>
> Thanks for your comment!
>
> Attached a patch which modifies the code to prohibit omission of its
> value.
>
> I was a little unsure about adding a regression test for this, but I
> have not added it since other COPY option doesn't test the omission of
> its value.

Probably should we change the doc as well since ON_ERROR value doesn't
necessarily need to be single-quoted?

The rest looks good to me.

Alexander, what do you think about this change as you're the committer
of this feature?

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-03-28 Thread torikoshia


On 2024-03-28 21:54, Masahiko Sawada wrote:
On Thu, Mar 28, 2024 at 9:38 PM torikoshia  
wrote:


On 2024-03-28 10:20, Masahiko Sawada wrote:
> Hi,
>
> On Thu, Jan 18, 2024 at 5:33 PM Masahiko Sawada 
> wrote:
>>
>> On Thu, Jan 18, 2024 at 4:59 PM Alexander Korotkov
>>  wrote:
>> >
>> > On Thu, Jan 18, 2024 at 4:16 AM torikoshia  
wrote:
>> > > On 2024-01-18 10:10, jian he wrote:
>> > > > On Thu, Jan 18, 2024 at 8:57 AM Masahiko Sawada 
>> > > > wrote:
>> > > >> On Thu, Jan 18, 2024 at 6:38 AM Tom Lane  wrote:
>> > > >> > Kyotaro-san's suggestion isn't bad, though I might shorten it to
>> > > >> > error_action {error|ignore|log} (or perhaps "stop" instead of 
"error")?
>> > > >> > You will need a separate parameter anyway to specify the destination
>> > > >> > of "log", unless "none" became an illegal table name when I wasn't
>> > > >> > looking.  I don't buy that one parameter that has some special 
values
>> > > >> > while other values could be names will be a good design.  Moreover,
>> > > >> > what if we want to support (say) log-to-file along with 
log-to-table?
>> > > >> > Trying to distinguish a file name from a table name without any 
other
>> > > >> > context seems impossible.
>> > > >>
>> > > >> I've been thinking we can add more values to this option to log errors
>> > > >> not only to the server logs but also to the error table (not sure
>> > > >> details but I imagined an error table is created for each table on
>> > > >> error), without an additional option for the destination name. The
>> > > >> values would be like error_action {error|ignore|save-logs|save-table}.
>> > > >>
>> > > >
>> > > > another idea:
>> > > > on_error {error|ignore|other_future_option}
>> > > > if not specified then by default ERROR.
>> > > > You can also specify ERROR or IGNORE for now.
>> > > >
>> > > > I agree, the parameter "error_action" is better than "location".
>> > >
>> > > I'm not sure whether error_action or on_error is better, but either way
>> > > "error_action error" and "on_error error" seems a bit odd to me.
>> > > I feel "stop" is better for both cases as Tom suggested.
>> >
>> > OK.  What about this?
>> > on_error {stop|ignore|other_future_option}
>> > where other_future_option might be compound like "file 'copy.log'" or
>> > "table 'copy_log'".
>>
>> +1
>>
>
> I realized that ON_ERROR syntax synoposis in the documentation is not
> correct. The option doesn't require the value to be quoted and the
> value can be omitted. The attached patch fixes it.
>
> Regards,

Thanks!

Attached patch fixes the doc, but I'm wondering perhaps it might be
better to modify the codes to prohibit abbreviation of the value.

When seeing the query which abbreviates ON_ERROR value, I feel it's 
not

obvious what happens compared to other options which tolerates
abbreviation of the value such as FREEZE or HEADER.

   COPY t1 FROM stdin WITH (ON_ERROR);

What do you think?


Indeed. Looking at options of other commands such as VACUUM and
EXPLAIN, I can see that we can omit a boolean value, but non-boolean
parameters require its value. The HEADER option is not a pure boolean
parameter but we can omit the value. It seems to be for backward
compatibility; it used to be a boolean parameter. I agree that the
above example would confuse users.

Regards,


Thanks for your comment!

Attached a patch which modifies the code to prohibit omission of its 
value.


I was a little unsure about adding a regression test for this, but I 
have not added it since other COPY option doesn't test the omission of 
its value.


--
Regards,

--
Atsushi Torikoshi
NTT DATA Group CorporationFrom 1b4bec3c2223246ec59ffb9eb7de2f1de27315f7 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Fri, 29 Mar 2024 11:36:12 +0900
Subject: [PATCH v1] Disallow ON_ERROR option without value

Currently ON_ERROR option of COPY allows to omit its value,
but the syntax synopsis in the documentation requires it.

Since it seems non-boolean parameters usually require its value
and it's not obvious what happens when value of ON_ERROR is
omitted, this patch disallows ON_ERROR without its value.
---
 src/backend/commands/copy.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 28cf8b040a..2719bf28b7 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -392,7 +392,7 @@ defGetCopyHeaderChoice(DefElem *def, bool is_from)
 static CopyOnErrorChoice
 defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
 {
-	char	   *sval;
+	char	   *sval = defGetString(def);
 
 	if (!is_from)
 		ereport(ERROR,
@@ -400,16 +400,9 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
  errmsg("COPY ON_ERROR cannot be used with COPY TO"),
  parser_errposition(pstate, def->location)));
 
-	/*
-	 * If no parameter value given, assume the default value.
-	 */
-	if (def->arg == NULL)
-		return COPY_ON_ERROR_STOP;
-
 	/*
 	 * Allow "stop", or

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-03-28 Thread Masahiko Sawada

On Thu, Mar 28, 2024 at 9:38 PM torikoshia  wrote:
>
> On 2024-03-28 10:20, Masahiko Sawada wrote:
> > Hi,
> >
> > On Thu, Jan 18, 2024 at 5:33 PM Masahiko Sawada 
> > wrote:
> >>
> >> On Thu, Jan 18, 2024 at 4:59 PM Alexander Korotkov
> >>  wrote:
> >> >
> >> > On Thu, Jan 18, 2024 at 4:16 AM torikoshia  
> >> > wrote:
> >> > > On 2024-01-18 10:10, jian he wrote:
> >> > > > On Thu, Jan 18, 2024 at 8:57 AM Masahiko Sawada 
> >> > > > 
> >> > > > wrote:
> >> > > >> On Thu, Jan 18, 2024 at 6:38 AM Tom Lane  wrote:
> >> > > >> > Kyotaro-san's suggestion isn't bad, though I might shorten it to
> >> > > >> > error_action {error|ignore|log} (or perhaps "stop" instead of 
> >> > > >> > "error")?
> >> > > >> > You will need a separate parameter anyway to specify the 
> >> > > >> > destination
> >> > > >> > of "log", unless "none" became an illegal table name when I wasn't
> >> > > >> > looking.  I don't buy that one parameter that has some special 
> >> > > >> > values
> >> > > >> > while other values could be names will be a good design.  
> >> > > >> > Moreover,
> >> > > >> > what if we want to support (say) log-to-file along with 
> >> > > >> > log-to-table?
> >> > > >> > Trying to distinguish a file name from a table name without any 
> >> > > >> > other
> >> > > >> > context seems impossible.
> >> > > >>
> >> > > >> I've been thinking we can add more values to this option to log 
> >> > > >> errors
> >> > > >> not only to the server logs but also to the error table (not sure
> >> > > >> details but I imagined an error table is created for each table on
> >> > > >> error), without an additional option for the destination name. The
> >> > > >> values would be like error_action 
> >> > > >> {error|ignore|save-logs|save-table}.
> >> > > >>
> >> > > >
> >> > > > another idea:
> >> > > > on_error {error|ignore|other_future_option}
> >> > > > if not specified then by default ERROR.
> >> > > > You can also specify ERROR or IGNORE for now.
> >> > > >
> >> > > > I agree, the parameter "error_action" is better than "location".
> >> > >
> >> > > I'm not sure whether error_action or on_error is better, but either way
> >> > > "error_action error" and "on_error error" seems a bit odd to me.
> >> > > I feel "stop" is better for both cases as Tom suggested.
> >> >
> >> > OK.  What about this?
> >> > on_error {stop|ignore|other_future_option}
> >> > where other_future_option might be compound like "file 'copy.log'" or
> >> > "table 'copy_log'".
> >>
> >> +1
> >>
> >
> > I realized that ON_ERROR syntax synoposis in the documentation is not
> > correct. The option doesn't require the value to be quoted and the
> > value can be omitted. The attached patch fixes it.
> >
> > Regards,
>
> Thanks!
>
> Attached patch fixes the doc, but I'm wondering perhaps it might be
> better to modify the codes to prohibit abbreviation of the value.
>
> When seeing the query which abbreviates ON_ERROR value, I feel it's not
> obvious what happens compared to other options which tolerates
> abbreviation of the value such as FREEZE or HEADER.
>
>COPY t1 FROM stdin WITH (ON_ERROR);
>
> What do you think?

Indeed. Looking at options of other commands such as VACUUM and
EXPLAIN, I can see that we can omit a boolean value, but non-boolean
parameters require its value. The HEADER option is not a pure boolean
parameter but we can omit the value. It seems to be for backward
compatibility; it used to be a boolean parameter. I agree that the
above example would confuse users.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-03-28 Thread torikoshia


On 2024-03-28 10:20, Masahiko Sawada wrote:

Hi,

On Thu, Jan 18, 2024 at 5:33 PM Masahiko Sawada  
wrote:


On Thu, Jan 18, 2024 at 4:59 PM Alexander Korotkov 
 wrote:

>
> On Thu, Jan 18, 2024 at 4:16 AM torikoshia  wrote:
> > On 2024-01-18 10:10, jian he wrote:
> > > On Thu, Jan 18, 2024 at 8:57 AM Masahiko Sawada 
> > > wrote:
> > >> On Thu, Jan 18, 2024 at 6:38 AM Tom Lane  wrote:
> > >> > Kyotaro-san's suggestion isn't bad, though I might shorten it to
> > >> > error_action {error|ignore|log} (or perhaps "stop" instead of "error")?
> > >> > You will need a separate parameter anyway to specify the destination
> > >> > of "log", unless "none" became an illegal table name when I wasn't
> > >> > looking.  I don't buy that one parameter that has some special values
> > >> > while other values could be names will be a good design.  Moreover,
> > >> > what if we want to support (say) log-to-file along with log-to-table?
> > >> > Trying to distinguish a file name from a table name without any other
> > >> > context seems impossible.
> > >>
> > >> I've been thinking we can add more values to this option to log errors
> > >> not only to the server logs but also to the error table (not sure
> > >> details but I imagined an error table is created for each table on
> > >> error), without an additional option for the destination name. The
> > >> values would be like error_action {error|ignore|save-logs|save-table}.
> > >>
> > >
> > > another idea:
> > > on_error {error|ignore|other_future_option}
> > > if not specified then by default ERROR.
> > > You can also specify ERROR or IGNORE for now.
> > >
> > > I agree, the parameter "error_action" is better than "location".
> >
> > I'm not sure whether error_action or on_error is better, but either way
> > "error_action error" and "on_error error" seems a bit odd to me.
> > I feel "stop" is better for both cases as Tom suggested.
>
> OK.  What about this?
> on_error {stop|ignore|other_future_option}
> where other_future_option might be compound like "file 'copy.log'" or
> "table 'copy_log'".

+1



I realized that ON_ERROR syntax synoposis in the documentation is not
correct. The option doesn't require the value to be quoted and the
value can be omitted. The attached patch fixes it.

Regards,


Thanks!

Attached patch fixes the doc, but I'm wondering perhaps it might be 
better to modify the codes to prohibit abbreviation of the value.


When seeing the query which abbreviates ON_ERROR value, I feel it's not 
obvious what happens compared to other options which tolerates 
abbreviation of the value such as FREEZE or HEADER.


  COPY t1 FROM stdin WITH (ON_ERROR);

What do you think?

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-03-27 Thread Masahiko Sawada

Hi,

On Thu, Jan 18, 2024 at 5:33 PM Masahiko Sawada  wrote:
>
> On Thu, Jan 18, 2024 at 4:59 PM Alexander Korotkov  
> wrote:
> >
> > On Thu, Jan 18, 2024 at 4:16 AM torikoshia  
> > wrote:
> > > On 2024-01-18 10:10, jian he wrote:
> > > > On Thu, Jan 18, 2024 at 8:57 AM Masahiko Sawada 
> > > > wrote:
> > > >> On Thu, Jan 18, 2024 at 6:38 AM Tom Lane  wrote:
> > > >> > Kyotaro-san's suggestion isn't bad, though I might shorten it to
> > > >> > error_action {error|ignore|log} (or perhaps "stop" instead of 
> > > >> > "error")?
> > > >> > You will need a separate parameter anyway to specify the destination
> > > >> > of "log", unless "none" became an illegal table name when I wasn't
> > > >> > looking.  I don't buy that one parameter that has some special values
> > > >> > while other values could be names will be a good design.  Moreover,
> > > >> > what if we want to support (say) log-to-file along with log-to-table?
> > > >> > Trying to distinguish a file name from a table name without any other
> > > >> > context seems impossible.
> > > >>
> > > >> I've been thinking we can add more values to this option to log errors
> > > >> not only to the server logs but also to the error table (not sure
> > > >> details but I imagined an error table is created for each table on
> > > >> error), without an additional option for the destination name. The
> > > >> values would be like error_action {error|ignore|save-logs|save-table}.
> > > >>
> > > >
> > > > another idea:
> > > > on_error {error|ignore|other_future_option}
> > > > if not specified then by default ERROR.
> > > > You can also specify ERROR or IGNORE for now.
> > > >
> > > > I agree, the parameter "error_action" is better than "location".
> > >
> > > I'm not sure whether error_action or on_error is better, but either way
> > > "error_action error" and "on_error error" seems a bit odd to me.
> > > I feel "stop" is better for both cases as Tom suggested.
> >
> > OK.  What about this?
> > on_error {stop|ignore|other_future_option}
> > where other_future_option might be compound like "file 'copy.log'" or
> > "table 'copy_log'".
>
> +1
>

I realized that ON_ERROR syntax synoposis in the documentation is not
correct. The option doesn't require the value to be quoted and the
value can be omitted. The attached patch fixes it.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


0001-doc-Fix-COPY-ON_ERROR-option-syntax-synopsis.patch
Description: Binary data

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-19 Thread torikoshia

On 2024-01-19 22:27, Alexander Korotkov wrote:

Hi!

On Fri, Jan 19, 2024 at 2:37 PM torikoshia  
wrote:

Thanks for making the patch!

The patch is pushed!  The proposed changes are incorporated excluding 
this.

> -   /* If SAVE_ERROR_TO is specified, skip rows
> with soft errors */
> +   /* If ON_ERROR is specified with IGNORE, skip
> rows with soft errors */

This is correct now, but considering future works which add other
options like "file 'copy.log'" and
"table 'copy_log'", it may be better not to limit the case to 
'IGNORE'.

How about something like this?

   If ON_ERROR is specified and the value is not STOP, skip rows with
soft errors

I think when we have more options, then we wouldn't just skip rows
with soft errors but rather save them.  So, I left this comment as is
for now.

Agreed.
Thanks for the notification!

--
Regards,
Alexander Korotkov

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-19 Thread Alexander Korotkov

Hi!

On Fri, Jan 19, 2024 at 2:37 PM torikoshia  wrote:
> Thanks for making the patch!

The patch is pushed!  The proposed changes are incorporated excluding this.

> > -   /* If SAVE_ERROR_TO is specified, skip rows
> > with soft errors */
> > +   /* If ON_ERROR is specified with IGNORE, skip
> > rows with soft errors */
>
> This is correct now, but considering future works which add other
> options like "file 'copy.log'" and
> "table 'copy_log'", it may be better not to limit the case to 'IGNORE'.
> How about something like this?
>
>If ON_ERROR is specified and the value is not STOP, skip rows with
> soft errors

I think when we have more options, then we wouldn't just skip rows
with soft errors but rather save them.  So, I left this comment as is
for now.

--
Regards,
Alexander Korotkov

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-19 Thread torikoshia


On 2024-01-18 23:59, jian he wrote:

Hi.
patch refactored based on "on_error {stop|ignore}"
doc changes:

--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,7 +43,7 @@ COPY { table_name [ ( column_name [, ...] ) | * }
 FORCE_NOT_NULL { ( column_name [, ...] ) | * }
 FORCE_NULL { ( column_name [, ...] ) | * }
-SAVE_ERROR_TO 'class="parameter">location'
+ON_ERROR 'class="parameter">error_action'
 ENCODING 'class="parameter">encoding_name'

 
  
@@ -375,20 +375,20 @@ COPY { table_name [ ( 


-SAVE_ERROR_TO
+ON_ERROR
 
  
-  Specifies to save error information to class="parameter">
-  location when there is malformed data in the 
input.

-  Currently, only error (default) and
none
+  Specifies which 
+  error_action to perform when there is malformed
data in the input.
+  Currently, only stop (default) and
ignore
   values are supported.
-  If the error value is specified,
+  If the stop value is specified,
   COPY stops operation at the first error.
-  If the none value is specified,
+  If the ignore value is specified,
   COPY skips malformed data and continues 
copying data.

   The option is allowed only in COPY FROM.
-  The none value is allowed only when
-  not using binary format.
+  Only stop value is allowed only when
+  using binary format.
  


Thanks for making the patch!

Here are some comments:


-  The none value is allowed only when
-  not using binary format.
+  Only stop value is allowed only when
+  using binary format.


The second 'only' may be unnecessary.

-   /* If SAVE_ERROR_TO is specified, skip rows 
with soft errors */
+   /* If ON_ERROR is specified with IGNORE, skip 
rows with soft errors */


This is correct now, but considering future works which add other 
options like "file 'copy.log'" and

"table 'copy_log'", it may be better not to limit the case to 'IGNORE'.
How about something like this?

  If ON_ERROR is specified and the value is not STOP, skip rows with 
soft errors



-COPY x from stdin (format BINARY, save_error_to none);
-COPY x to stdin (save_error_to none);
+COPY x from stdin (format BINARY, ON_ERROR ignore);
+COPY x from stdin (ON_ERROR unsupported);
 COPY x to stdin (format TEXT, force_quote(a));
 COPY x from stdin (format CSV, force_quote(a));


In the existing test for copy2.sql, the COPY options are written in 
lower case(e.g. 'format')　and option value(e.g. 'BINARY') are written in 
upper case.

It would be more consistent to align them.


--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-18 Thread jian he

Hi.
patch refactored based on "on_error {stop|ignore}"
doc changes:

--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,7 +43,7 @@ COPY { table_name [ ( column_name [, ...] ) | * }
 FORCE_NOT_NULL { ( column_name [, ...] ) | * }
 FORCE_NULL { ( column_name [, ...] ) | * }
-SAVE_ERROR_TO 'location'
+ON_ERROR 'error_action'
 ENCODING 'encoding_name'
 
  
@@ -375,20 +375,20 @@ COPY { table_name [ ( 


-SAVE_ERROR_TO
+ON_ERROR
 
  
-  Specifies to save error information to 
-  location when there is malformed data in the input.
-  Currently, only error (default) and
none
+  Specifies which 
+  error_action to perform when there is malformed
data in the input.
+  Currently, only stop (default) and
ignore
   values are supported.
-  If the error value is specified,
+  If the stop value is specified,
   COPY stops operation at the first error.
-  If the none value is specified,
+  If the ignore value is specified,
   COPY skips malformed data and continues copying data.
   The option is allowed only in COPY FROM.
-  The none value is allowed only when
-  not using binary format.
+  Only stop value is allowed only when
+  using binary format.
  
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 85881ca0..c30baec1 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,7 +43,7 @@ COPY { table_name [ ( column_name [, ...] ) | * }
 FORCE_NOT_NULL { ( column_name [, ...] ) | * }
 FORCE_NULL { ( column_name [, ...] ) | * }
-SAVE_ERROR_TO 'location'
+ON_ERROR 'error_action'
 ENCODING 'encoding_name'
 
  
@@ -375,20 +375,20 @@ COPY { table_name [ ( 
 

-SAVE_ERROR_TO
+ON_ERROR
 
  
-  Specifies to save error information to 
-  location when there is malformed data in the input.
-  Currently, only error (default) and none
+  Specifies which 
+  error_action to perform when there is malformed data in the input.
+  Currently, only stop (default) and ignore
   values are supported.
-  If the error value is specified,
+  If the stop value is specified,
   COPY stops operation at the first error.
-  If the none value is specified,
+  If the ignore value is specified,
   COPY skips malformed data and continues copying data.
   The option is allowed only in COPY FROM.
-  The none value is allowed only when
-  not using binary format.
+  Only stop value is allowed only when
+  using binary format.
  
 

@@ -577,7 +577,7 @@ COPY count
 

 COPY stops operation at the first error when
-SAVE_ERROR_TO is not specified. This
+ON_ERROR is not specified. This
 should not lead to problems in the event of a COPY
 TO, but the target table will already have received
 earlier rows in a COPY FROM. These rows will not
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index c36d7f1d..cc0786c6 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -395,39 +395,39 @@ defGetCopyHeaderChoice(DefElem *def, bool is_from)
 }
 
 /*
- * Extract a CopySaveErrorToChoice value from a DefElem.
+ * Extract a CopyOnErrorChoice value from a DefElem.
  */
-static CopySaveErrorToChoice
-defGetCopySaveErrorToChoice(DefElem *def, ParseState *pstate, bool is_from)
+static CopyOnErrorChoice
+defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
 {
 	char	   *sval;
 
 	if (!is_from)
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY SAVE_ERROR_TO cannot be used with COPY TO"),
+ errmsg("COPY ON_ERROR cannot be used with COPY TO"),
  parser_errposition(pstate, def->location)));
 
 	/*
 	 * If no parameter value given, assume the default value.
 	 */
 	if (def->arg == NULL)
-		return COPY_SAVE_ERROR_TO_ERROR;
+		return COPY_ON_ERROR_STOP;
 
 	/*
-	 * Allow "error", or "none" values.
+	 * Allow "stop", or "ignore" values.
 	 */
 	sval = defGetString(def);
-	if (pg_strcasecmp(sval, "error") == 0)
-		return COPY_SAVE_ERROR_TO_ERROR;
-	if (pg_strcasecmp(sval, "none") == 0)
-		return COPY_SAVE_ERROR_TO_NONE;
+	if (pg_strcasecmp(sval, "stop") == 0)
+		return COPY_ON_ERROR_STOP;
+	if (pg_strcasecmp(sval, "ignore") == 0)
+		return COPY_ON_ERROR_IGNORE;
 
 	ereport(ERROR,
 			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-			 errmsg("COPY save_error_to \"%s\" not recognized", sval),
+			 errmsg("COPY ON_ERROR \"%s\" not recognized", sval),
 			 parser_errposition(pstate, def->location)));
-	return COPY_SAVE_ERROR_TO_ERROR;	/* keep compiler quiet */
+	return COPY_ON_ERROR_STOP;	/* keep compiler quiet */
 }
 
 /*
@@ -455,7 +455,7 @@ ProcessCopyOptions(ParseState *pstate,
 	bool		format_specified = false;
 	bool		freeze_specified = false;
 	bool		header_specified = false;
-	bool		save_error_to_specified = false;
+	bool		on_error_specified = false;

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-18 Thread torikoshia


On 2024-01-18 16:59, Alexander Korotkov wrote:
On Thu, Jan 18, 2024 at 4:16 AM torikoshia  
wrote:

On 2024-01-18 10:10, jian he wrote:
> On Thu, Jan 18, 2024 at 8:57 AM Masahiko Sawada 
> wrote:
>> On Thu, Jan 18, 2024 at 6:38 AM Tom Lane  wrote:
>> > Kyotaro-san's suggestion isn't bad, though I might shorten it to
>> > error_action {error|ignore|log} (or perhaps "stop" instead of "error")?
>> > You will need a separate parameter anyway to specify the destination
>> > of "log", unless "none" became an illegal table name when I wasn't
>> > looking.  I don't buy that one parameter that has some special values
>> > while other values could be names will be a good design.  Moreover,
>> > what if we want to support (say) log-to-file along with log-to-table?
>> > Trying to distinguish a file name from a table name without any other
>> > context seems impossible.
>>
>> I've been thinking we can add more values to this option to log errors
>> not only to the server logs but also to the error table (not sure
>> details but I imagined an error table is created for each table on
>> error), without an additional option for the destination name. The
>> values would be like error_action {error|ignore|save-logs|save-table}.
>>
>
> another idea:
> on_error {error|ignore|other_future_option}
> if not specified then by default ERROR.
> You can also specify ERROR or IGNORE for now.
>
> I agree, the parameter "error_action" is better than "location".

I'm not sure whether error_action or on_error is better, but either 
way

"error_action error" and "on_error error" seems a bit odd to me.
I feel "stop" is better for both cases as Tom suggested.


OK.  What about this?
on_error {stop|ignore|other_future_option}
where other_future_option might be compound like "file 'copy.log'" or
"table 'copy_log'".


Thanks, also +1 from me.

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-18 Thread Masahiko Sawada

On Thu, Jan 18, 2024 at 4:59 PM Alexander Korotkov  wrote:
>
> On Thu, Jan 18, 2024 at 4:16 AM torikoshia  wrote:
> > On 2024-01-18 10:10, jian he wrote:
> > > On Thu, Jan 18, 2024 at 8:57 AM Masahiko Sawada 
> > > wrote:
> > >> On Thu, Jan 18, 2024 at 6:38 AM Tom Lane  wrote:
> > >> > Kyotaro-san's suggestion isn't bad, though I might shorten it to
> > >> > error_action {error|ignore|log} (or perhaps "stop" instead of "error")?
> > >> > You will need a separate parameter anyway to specify the destination
> > >> > of "log", unless "none" became an illegal table name when I wasn't
> > >> > looking.  I don't buy that one parameter that has some special values
> > >> > while other values could be names will be a good design.  Moreover,
> > >> > what if we want to support (say) log-to-file along with log-to-table?
> > >> > Trying to distinguish a file name from a table name without any other
> > >> > context seems impossible.
> > >>
> > >> I've been thinking we can add more values to this option to log errors
> > >> not only to the server logs but also to the error table (not sure
> > >> details but I imagined an error table is created for each table on
> > >> error), without an additional option for the destination name. The
> > >> values would be like error_action {error|ignore|save-logs|save-table}.
> > >>
> > >
> > > another idea:
> > > on_error {error|ignore|other_future_option}
> > > if not specified then by default ERROR.
> > > You can also specify ERROR or IGNORE for now.
> > >
> > > I agree, the parameter "error_action" is better than "location".
> >
> > I'm not sure whether error_action or on_error is better, but either way
> > "error_action error" and "on_error error" seems a bit odd to me.
> > I feel "stop" is better for both cases as Tom suggested.
>
> OK.  What about this?
> on_error {stop|ignore|other_future_option}
> where other_future_option might be compound like "file 'copy.log'" or
> "table 'copy_log'".

+1

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-18 Thread Pavel Stehule

čt 18. 1. 2024 v 8:59 odesílatel Alexander Korotkov 
napsal:

> On Thu, Jan 18, 2024 at 4:16 AM torikoshia 
> wrote:
> > On 2024-01-18 10:10, jian he wrote:
> > > On Thu, Jan 18, 2024 at 8:57 AM Masahiko Sawada  >
> > > wrote:
> > >> On Thu, Jan 18, 2024 at 6:38 AM Tom Lane  wrote:
> > >> > Kyotaro-san's suggestion isn't bad, though I might shorten it to
> > >> > error_action {error|ignore|log} (or perhaps "stop" instead of
> "error")?
> > >> > You will need a separate parameter anyway to specify the destination
> > >> > of "log", unless "none" became an illegal table name when I wasn't
> > >> > looking.  I don't buy that one parameter that has some special
> values
> > >> > while other values could be names will be a good design.  Moreover,
> > >> > what if we want to support (say) log-to-file along with
> log-to-table?
> > >> > Trying to distinguish a file name from a table name without any
> other
> > >> > context seems impossible.
> > >>
> > >> I've been thinking we can add more values to this option to log errors
> > >> not only to the server logs but also to the error table (not sure
> > >> details but I imagined an error table is created for each table on
> > >> error), without an additional option for the destination name. The
> > >> values would be like error_action {error|ignore|save-logs|save-table}.
> > >>
> > >
> > > another idea:
> > > on_error {error|ignore|other_future_option}
> > > if not specified then by default ERROR.
> > > You can also specify ERROR or IGNORE for now.
> > >
> > > I agree, the parameter "error_action" is better than "location".
> >
> > I'm not sure whether error_action or on_error is better, but either way
> > "error_action error" and "on_error error" seems a bit odd to me.
> > I feel "stop" is better for both cases as Tom suggested.
>
> OK.  What about this?
> on_error {stop|ignore|other_future_option}
> where other_future_option might be compound like "file 'copy.log'" or
> "table 'copy_log'".
>

+1

it is consistent with psql

Regards

Pavel

>
> --
> Regards,
> Alexander Korotkov
>
>
>

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-17 Thread Alexander Korotkov

On Thu, Jan 18, 2024 at 4:16 AM torikoshia  wrote:
> On 2024-01-18 10:10, jian he wrote:
> > On Thu, Jan 18, 2024 at 8:57 AM Masahiko Sawada 
> > wrote:
> >> On Thu, Jan 18, 2024 at 6:38 AM Tom Lane  wrote:
> >> > Kyotaro-san's suggestion isn't bad, though I might shorten it to
> >> > error_action {error|ignore|log} (or perhaps "stop" instead of "error")?
> >> > You will need a separate parameter anyway to specify the destination
> >> > of "log", unless "none" became an illegal table name when I wasn't
> >> > looking.  I don't buy that one parameter that has some special values
> >> > while other values could be names will be a good design.  Moreover,
> >> > what if we want to support (say) log-to-file along with log-to-table?
> >> > Trying to distinguish a file name from a table name without any other
> >> > context seems impossible.
> >>
> >> I've been thinking we can add more values to this option to log errors
> >> not only to the server logs but also to the error table (not sure
> >> details but I imagined an error table is created for each table on
> >> error), without an additional option for the destination name. The
> >> values would be like error_action {error|ignore|save-logs|save-table}.
> >>
> >
> > another idea:
> > on_error {error|ignore|other_future_option}
> > if not specified then by default ERROR.
> > You can also specify ERROR or IGNORE for now.
> >
> > I agree, the parameter "error_action" is better than "location".
>
> I'm not sure whether error_action or on_error is better, but either way
> "error_action error" and "on_error error" seems a bit odd to me.
> I feel "stop" is better for both cases as Tom suggested.

OK.  What about this?
on_error {stop|ignore|other_future_option}
where other_future_option might be compound like "file 'copy.log'" or
"table 'copy_log'".

--
Regards,
Alexander Korotkov

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-17 Thread torikoshia


On 2024-01-18 10:10, jian he wrote:
On Thu, Jan 18, 2024 at 8:57 AM Masahiko Sawada  
wrote:


On Thu, Jan 18, 2024 at 6:38 AM Tom Lane  wrote:
>
> Alexander Korotkov  writes:
> > On Wed, Jan 17, 2024 at 9:49 AM Kyotaro Horiguchi
> >  wrote:
> >> On the other hand, SAVE_ERROR_TO takes 'error' or 'none', which
> >> indicate "immediately error out" and 'just ignore the failure'
> >> respectively, but these options hardly seem to denote a 'location',
> >> and appear more like an 'action'. I somewhat suspect that this
> >> parameter name intially conceived with the assupmtion that it would
> >> take file names or similar parameters. I'm not sure if others will
> >> agree, but I think the parameter name might not be the best
> >> choice. For instance, considering the addition of the third value
> >> 'log', something like on_error_action (error, ignore, log) would be
> >> more intuitively understandable. What do you think?
>
> > Probably, but I'm not sure about that.  The name SAVE_ERROR_TO assumes
> > the next word will be location, not action.  With some stretch we can
> > assume 'error' to be location.  I think it would be even more stretchy
> > to think that SAVE_ERROR_TO is followed by action.
>
> The other problem with this terminology is that with 'none', what it
> is doing is the exact opposite of "saving" the errors.  I agree we
> need a better name.

Agreed.

>
> Kyotaro-san's suggestion isn't bad, though I might shorten it to
> error_action {error|ignore|log} (or perhaps "stop" instead of "error")?
> You will need a separate parameter anyway to specify the destination
> of "log", unless "none" became an illegal table name when I wasn't
> looking.  I don't buy that one parameter that has some special values
> while other values could be names will be a good design.  Moreover,
> what if we want to support (say) log-to-file along with log-to-table?
> Trying to distinguish a file name from a table name without any other
> context seems impossible.

I've been thinking we can add more values to this option to log errors
not only to the server logs but also to the error table (not sure
details but I imagined an error table is created for each table on
error), without an additional option for the destination name. The
values would be like error_action {error|ignore|save-logs|save-table}.



another idea:
on_error {error|ignore|other_future_option}
if not specified then by default ERROR.
You can also specify ERROR or IGNORE for now.

I agree, the parameter "error_action" is better than "location".


I'm not sure whether error_action or on_error is better, but either way 
"error_action error" and "on_error error" seems a bit odd to me.

I feel "stop" is better for both cases as Tom suggested.

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-17 Thread jian he

On Thu, Jan 18, 2024 at 8:57 AM Masahiko Sawada  wrote:
>
> On Thu, Jan 18, 2024 at 6:38 AM Tom Lane  wrote:
> >
> > Alexander Korotkov  writes:
> > > On Wed, Jan 17, 2024 at 9:49 AM Kyotaro Horiguchi
> > >  wrote:
> > >> On the other hand, SAVE_ERROR_TO takes 'error' or 'none', which
> > >> indicate "immediately error out" and 'just ignore the failure'
> > >> respectively, but these options hardly seem to denote a 'location',
> > >> and appear more like an 'action'. I somewhat suspect that this
> > >> parameter name intially conceived with the assupmtion that it would
> > >> take file names or similar parameters. I'm not sure if others will
> > >> agree, but I think the parameter name might not be the best
> > >> choice. For instance, considering the addition of the third value
> > >> 'log', something like on_error_action (error, ignore, log) would be
> > >> more intuitively understandable. What do you think?
> >
> > > Probably, but I'm not sure about that.  The name SAVE_ERROR_TO assumes
> > > the next word will be location, not action.  With some stretch we can
> > > assume 'error' to be location.  I think it would be even more stretchy
> > > to think that SAVE_ERROR_TO is followed by action.
> >
> > The other problem with this terminology is that with 'none', what it
> > is doing is the exact opposite of "saving" the errors.  I agree we
> > need a better name.
>
> Agreed.
>
> >
> > Kyotaro-san's suggestion isn't bad, though I might shorten it to
> > error_action {error|ignore|log} (or perhaps "stop" instead of "error")?
> > You will need a separate parameter anyway to specify the destination
> > of "log", unless "none" became an illegal table name when I wasn't
> > looking.  I don't buy that one parameter that has some special values
> > while other values could be names will be a good design.  Moreover,
> > what if we want to support (say) log-to-file along with log-to-table?
> > Trying to distinguish a file name from a table name without any other
> > context seems impossible.
>
> I've been thinking we can add more values to this option to log errors
> not only to the server logs but also to the error table (not sure
> details but I imagined an error table is created for each table on
> error), without an additional option for the destination name. The
> values would be like error_action {error|ignore|save-logs|save-table}.
>

another idea:
on_error {error|ignore|other_future_option}
if not specified then by default ERROR.
You can also specify ERROR or IGNORE for now.

I agree, the parameter "error_action" is better than "location".

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-17 Thread Masahiko Sawada

On Thu, Jan 18, 2024 at 6:38 AM Tom Lane  wrote:
>
> Alexander Korotkov  writes:
> > On Wed, Jan 17, 2024 at 9:49 AM Kyotaro Horiguchi
> >  wrote:
> >> On the other hand, SAVE_ERROR_TO takes 'error' or 'none', which
> >> indicate "immediately error out" and 'just ignore the failure'
> >> respectively, but these options hardly seem to denote a 'location',
> >> and appear more like an 'action'. I somewhat suspect that this
> >> parameter name intially conceived with the assupmtion that it would
> >> take file names or similar parameters. I'm not sure if others will
> >> agree, but I think the parameter name might not be the best
> >> choice. For instance, considering the addition of the third value
> >> 'log', something like on_error_action (error, ignore, log) would be
> >> more intuitively understandable. What do you think?
>
> > Probably, but I'm not sure about that.  The name SAVE_ERROR_TO assumes
> > the next word will be location, not action.  With some stretch we can
> > assume 'error' to be location.  I think it would be even more stretchy
> > to think that SAVE_ERROR_TO is followed by action.
>
> The other problem with this terminology is that with 'none', what it
> is doing is the exact opposite of "saving" the errors.  I agree we
> need a better name.

Agreed.

>
> Kyotaro-san's suggestion isn't bad, though I might shorten it to
> error_action {error|ignore|log} (or perhaps "stop" instead of "error")?
> You will need a separate parameter anyway to specify the destination
> of "log", unless "none" became an illegal table name when I wasn't
> looking.  I don't buy that one parameter that has some special values
> while other values could be names will be a good design.  Moreover,
> what if we want to support (say) log-to-file along with log-to-table?
> Trying to distinguish a file name from a table name without any other
> context seems impossible.

I've been thinking we can add more values to this option to log errors
not only to the server logs but also to the error table (not sure
details but I imagined an error table is created for each table on
error), without an additional option for the destination name. The
values would be like error_action {error|ignore|save-logs|save-table}.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-17 Thread Tom Lane

Alexander Korotkov  writes:
> On Wed, Jan 17, 2024 at 9:49 AM Kyotaro Horiguchi
>  wrote:
>> On the other hand, SAVE_ERROR_TO takes 'error' or 'none', which
>> indicate "immediately error out" and 'just ignore the failure'
>> respectively, but these options hardly seem to denote a 'location',
>> and appear more like an 'action'. I somewhat suspect that this
>> parameter name intially conceived with the assupmtion that it would
>> take file names or similar parameters. I'm not sure if others will
>> agree, but I think the parameter name might not be the best
>> choice. For instance, considering the addition of the third value
>> 'log', something like on_error_action (error, ignore, log) would be
>> more intuitively understandable. What do you think?

> Probably, but I'm not sure about that.  The name SAVE_ERROR_TO assumes
> the next word will be location, not action.  With some stretch we can
> assume 'error' to be location.  I think it would be even more stretchy
> to think that SAVE_ERROR_TO is followed by action.

The other problem with this terminology is that with 'none', what it
is doing is the exact opposite of "saving" the errors.  I agree we
need a better name.

Kyotaro-san's suggestion isn't bad, though I might shorten it to
error_action {error|ignore|log} (or perhaps "stop" instead of "error")?
You will need a separate parameter anyway to specify the destination
of "log", unless "none" became an illegal table name when I wasn't
looking.  I don't buy that one parameter that has some special values
while other values could be names will be a good design.  Moreover,
what if we want to support (say) log-to-file along with log-to-table?
Trying to distinguish a file name from a table name without any other
context seems impossible.

regards, tom lane

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-17 Thread Alexander Korotkov

On Wed, Jan 17, 2024 at 9:49 AM Kyotaro Horiguchi
 wrote:
> At Wed, 17 Jan 2024 14:38:54 +0900, torikoshia  
> wrote in
> > Hi,
> >
> > Thanks for applying!
> >
> > > + errmsg_plural("%zd row were skipped due to data type
> > > incompatibility",
> >
> > Sorry, I just noticed it, but 'were' should be 'was' here?
> >
> > >> BTW I'm thinking we should add a column to pg_stat_progress_copy that
> > >> counts soft errors. I'll suggest this in another thread.
> > > Please do!
> >
> > I've started it here:
> >
> > https://www.postgresql.org/message-id/d12fd8c99adcae2744212cb23feff...@oss.nttdata.com
>
> Switching topics, this commit (9e2d870119) adds the following help message:
>
>
> >   "COPY { %s [ ( %s [, ...] ) ] | ( %s ) }\n"
> >   "TO { '%s' | PROGRAM '%s' | STDOUT }\n"
> > ...
> >   "SAVE_ERROR_TO '%s'\n"
> > ...
> >   _("location"),
>
> On the other hand, SAVE_ERROR_TO takes 'error' or 'none', which
> indicate "immediately error out" and 'just ignore the failure'
> respectively, but these options hardly seem to denote a 'location',
> and appear more like an 'action'. I somewhat suspect that this
> parameter name intially conceived with the assupmtion that it would
> take file names or similar parameters. I'm not sure if others will
> agree, but I think the parameter name might not be the best
> choice. For instance, considering the addition of the third value
> 'log', something like on_error_action (error, ignore, log) would be
> more intuitively understandable. What do you think?

Probably, but I'm not sure about that.  The name SAVE_ERROR_TO assumes
the next word will be location, not action.  With some stretch we can
assume 'error' to be location.  I think it would be even more stretchy
to think that SAVE_ERROR_TO is followed by action.  Probably, we can
replace SAVE_ERROR_TO with another name which could be naturally
followed by action, but I don't have something appropriate in mind.
However, I'm not native english speaker and certainly could miss
something.

--
Regards,
Alexander Korotkov

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-17 Thread Alexander Korotkov

On Wed, Jan 17, 2024 at 7:38 AM torikoshia  wrote:
>
> Hi,
>
> Thanks for applying!
>
> > +   errmsg_plural("%zd row were skipped due
> > to data type incompatibility",
>
> Sorry, I just noticed it, but 'were' should be 'was' here?

Sure, the fix is pushed.

--
Regards,
Alexander Korotkov

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-16 Thread Kyotaro Horiguchi

At Wed, 17 Jan 2024 14:38:54 +0900, torikoshia  
wrote in 
> Hi,
> 
> Thanks for applying!
> 
> > + errmsg_plural("%zd row were skipped due to data type
> > incompatibility",
> 
> Sorry, I just noticed it, but 'were' should be 'was' here?
> 
> >> BTW I'm thinking we should add a column to pg_stat_progress_copy that
> >> counts soft errors. I'll suggest this in another thread.
> > Please do!
> 
> I've started it here:
> 
> https://www.postgresql.org/message-id/d12fd8c99adcae2744212cb23feff...@oss.nttdata.com

Switching topics, this commit (9e2d870119) adds the following help message:


>   "COPY { %s [ ( %s [, ...] ) ] | ( %s ) }\n"
>   "TO { '%s' | PROGRAM '%s' | STDOUT }\n"
> ...
>   "SAVE_ERROR_TO '%s'\n"
> ...
>   _("location"),

On the other hand, SAVE_ERROR_TO takes 'error' or 'none', which
indicate "immediately error out" and 'just ignore the failure'
respectively, but these options hardly seem to denote a 'location',
and appear more like an 'action'. I somewhat suspect that this
parameter name intially conceived with the assupmtion that it would
take file names or similar parameters. I'm not sure if others will
agree, but I think the parameter name might not be the best
choice. For instance, considering the addition of the third value
'log', something like on_error_action (error, ignore, log) would be
more intuitively understandable. What do you think?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-16 Thread torikoshia


Hi,

Thanks for applying!

+   errmsg_plural("%zd row were skipped due 
to data type incompatibility",


Sorry, I just noticed it, but 'were' should be 'was' here?


BTW I'm thinking we should add a column to pg_stat_progress_copy that
counts soft errors. I'll suggest this in another thread.

Please do!


I've started it here:

https://www.postgresql.org/message-id/d12fd8c99adcae2744212cb23feff...@oss.nttdata.com


--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-16 Thread Alexander Korotkov

Hi,

> Thanks for updating the patch!

You're welcome!

> Here is a minor comment:
>
> > +/*
> > + * Extract a defGetCopySaveErrorToChoice value from a DefElem.
> > + */
>
> Should be Extract a "CopySaveErrorToChoice"?

Fixed.

> BTW I'm thinking we should add a column to pg_stat_progress_copy that
> counts soft errors. I'll suggest this in another thread.

Please do!

--
Regards,
Alexander Korotkov


0001-Add-new-COPY-option-SAVE_ERROR_TO-v6.patch
Description: Binary data

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-15 Thread torikoshia


On 2024-01-16 00:17, Alexander Korotkov wrote:
On Mon, Jan 15, 2024 at 8:44 AM Masahiko Sawada  
wrote:


On Mon, Jan 15, 2024 at 8:21 AM Alexander Korotkov 
 wrote:

>
> On Sun, Jan 14, 2024 at 10:35 PM Masahiko Sawada  
wrote:
> > Thank you for updating the patch. Here are two comments:
> >
> > ---
> > +   if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_UNSPECIFIED &&
> > +   cstate->num_errors > 0)
> > +   ereport(WARNING,
> > +   errmsg("%zd rows were skipped due to data type 
incompatibility",
> > +  cstate->num_errors));
> > +
> > /* Done, clean up */
> > error_context_stack = errcallback.previous;
> >
> > If a malformed input is not the last data, the context message seems odd:
> >
> > postgres(1:1769258)=# create table test (a int);
> > CREATE TABLE
> > postgres(1:1769258)=# copy test from stdin (save_error_to none);
> > Enter data to be copied followed by a newline.
> > End with a backslash and a period on a line by itself, or an EOF signal.
> > >> a
> > >> 1
> > >>
> > 2024-01-15 05:05:53.980 JST [1769258] WARNING:  1 rows were skipped
> > due to data type incompatibility
> > 2024-01-15 05:05:53.980 JST [1769258] CONTEXT:  COPY test, line 3: ""
> > COPY 1
> >
> > I think it's better to report the WARNING after resetting the
> > error_context_stack. Or is a WARNING really appropriate here? The
> > v15-0001-Make-COPY-FROM-more-error-tolerant.patch[1] uses NOTICE but
> > the v1-0001-Add-new-COPY-option-SAVE_ERROR_TO.patch[2] changes it to
> > WARNING without explanation.
>
> Thank you for noticing this.  I think NOTICE is more appropriate here.
> There is nothing to "worry" about: the user asked to ignore the errors
> and we did.  And yes, it doesn't make sense to use the last line as
> the context.  Fixed.
>
> > ---
> > +-- test missing data: should fail
> > +COPY check_ign_err FROM STDIN WITH (save_error_to none);
> > +1  {1}
> > +\.
> >
> > We might want to cover the extra data cases too.
>
> Agreed, the relevant test is added.

Thank you for updating the patch. I have one minor point:

+   if (cstate->opts.save_error_to != 
COPY_SAVE_ERROR_TO_UNSPECIFIED &&

+   cstate->num_errors > 0)
+   ereport(NOTICE,
+   errmsg("%zd rows were skipped due to
data type incompatibility",
+  cstate->num_errors));
+

We can use errmsg_plural() instead.


Makes sense.  Fixed.


I have a question about the option values; do you think we need to
have another value of SAVE_ERROR_TO option to explicitly specify the
current default behavior, i.e. not accept any error? With the v4
patch, the user needs to omit SAVE_ERROR_TO option to accept errors
during COPY FROM. If we change the default behavior in the future,
many users will be affected and probably end up changing their
applications to keep the current default behavior.


Valid point.  I've implemented the handling of CopySaveErrorToChoice
in a similar way to CopyHeaderChoice.

Please, check the revised patch attached.


Thanks for updating the patch!

Here is a minor comment:


+/*
+ * Extract a defGetCopySaveErrorToChoice value from a DefElem.
+ */


Should be Extract a "CopySaveErrorToChoice"?


BTW I'm thinking we should add a column to pg_stat_progress_copy that 
counts soft errors. I'll suggest this in another thread.



--
Regards,
Alexander Korotkov


--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-15 Thread Alexander Korotkov

On Mon, Jan 15, 2024 at 8:44 AM Masahiko Sawada  wrote:
>
> On Mon, Jan 15, 2024 at 8:21 AM Alexander Korotkov  
> wrote:
> >
> > On Sun, Jan 14, 2024 at 10:35 PM Masahiko Sawada  
> > wrote:
> > > Thank you for updating the patch. Here are two comments:
> > >
> > > ---
> > > +   if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_UNSPECIFIED &&
> > > +   cstate->num_errors > 0)
> > > +   ereport(WARNING,
> > > +   errmsg("%zd rows were skipped due to data type 
> > > incompatibility",
> > > +  cstate->num_errors));
> > > +
> > > /* Done, clean up */
> > > error_context_stack = errcallback.previous;
> > >
> > > If a malformed input is not the last data, the context message seems odd:
> > >
> > > postgres(1:1769258)=# create table test (a int);
> > > CREATE TABLE
> > > postgres(1:1769258)=# copy test from stdin (save_error_to none);
> > > Enter data to be copied followed by a newline.
> > > End with a backslash and a period on a line by itself, or an EOF signal.
> > > >> a
> > > >> 1
> > > >>
> > > 2024-01-15 05:05:53.980 JST [1769258] WARNING:  1 rows were skipped
> > > due to data type incompatibility
> > > 2024-01-15 05:05:53.980 JST [1769258] CONTEXT:  COPY test, line 3: ""
> > > COPY 1
> > >
> > > I think it's better to report the WARNING after resetting the
> > > error_context_stack. Or is a WARNING really appropriate here? The
> > > v15-0001-Make-COPY-FROM-more-error-tolerant.patch[1] uses NOTICE but
> > > the v1-0001-Add-new-COPY-option-SAVE_ERROR_TO.patch[2] changes it to
> > > WARNING without explanation.
> >
> > Thank you for noticing this.  I think NOTICE is more appropriate here.
> > There is nothing to "worry" about: the user asked to ignore the errors
> > and we did.  And yes, it doesn't make sense to use the last line as
> > the context.  Fixed.
> >
> > > ---
> > > +-- test missing data: should fail
> > > +COPY check_ign_err FROM STDIN WITH (save_error_to none);
> > > +1  {1}
> > > +\.
> > >
> > > We might want to cover the extra data cases too.
> >
> > Agreed, the relevant test is added.
>
> Thank you for updating the patch. I have one minor point:
>
> +   if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_UNSPECIFIED &&
> +   cstate->num_errors > 0)
> +   ereport(NOTICE,
> +   errmsg("%zd rows were skipped due to
> data type incompatibility",
> +  cstate->num_errors));
> +
>
> We can use errmsg_plural() instead.

Makes sense.  Fixed.

> I have a question about the option values; do you think we need to
> have another value of SAVE_ERROR_TO option to explicitly specify the
> current default behavior, i.e. not accept any error? With the v4
> patch, the user needs to omit SAVE_ERROR_TO option to accept errors
> during COPY FROM. If we change the default behavior in the future,
> many users will be affected and probably end up changing their
> applications to keep the current default behavior.

Valid point.  I've implemented the handling of CopySaveErrorToChoice
in a similar way to CopyHeaderChoice.

Please, check the revised patch attached.

--
Regards,
Alexander Korotkov


0001-Add-new-COPY-option-SAVE_ERROR_TO-v5.patch
Description: Binary data

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-14 Thread Masahiko Sawada

On Mon, Jan 15, 2024 at 8:21 AM Alexander Korotkov  wrote:
>
> On Sun, Jan 14, 2024 at 10:35 PM Masahiko Sawada  
> wrote:
> > Thank you for updating the patch. Here are two comments:
> >
> > ---
> > +   if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_UNSPECIFIED &&
> > +   cstate->num_errors > 0)
> > +   ereport(WARNING,
> > +   errmsg("%zd rows were skipped due to data type 
> > incompatibility",
> > +  cstate->num_errors));
> > +
> > /* Done, clean up */
> > error_context_stack = errcallback.previous;
> >
> > If a malformed input is not the last data, the context message seems odd:
> >
> > postgres(1:1769258)=# create table test (a int);
> > CREATE TABLE
> > postgres(1:1769258)=# copy test from stdin (save_error_to none);
> > Enter data to be copied followed by a newline.
> > End with a backslash and a period on a line by itself, or an EOF signal.
> > >> a
> > >> 1
> > >>
> > 2024-01-15 05:05:53.980 JST [1769258] WARNING:  1 rows were skipped
> > due to data type incompatibility
> > 2024-01-15 05:05:53.980 JST [1769258] CONTEXT:  COPY test, line 3: ""
> > COPY 1
> >
> > I think it's better to report the WARNING after resetting the
> > error_context_stack. Or is a WARNING really appropriate here? The
> > v15-0001-Make-COPY-FROM-more-error-tolerant.patch[1] uses NOTICE but
> > the v1-0001-Add-new-COPY-option-SAVE_ERROR_TO.patch[2] changes it to
> > WARNING without explanation.
>
> Thank you for noticing this.  I think NOTICE is more appropriate here.
> There is nothing to "worry" about: the user asked to ignore the errors
> and we did.  And yes, it doesn't make sense to use the last line as
> the context.  Fixed.
>
> > ---
> > +-- test missing data: should fail
> > +COPY check_ign_err FROM STDIN WITH (save_error_to none);
> > +1  {1}
> > +\.
> >
> > We might want to cover the extra data cases too.
>
> Agreed, the relevant test is added.

Thank you for updating the patch. I have one minor point:

+   if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_UNSPECIFIED &&
+   cstate->num_errors > 0)
+   ereport(NOTICE,
+   errmsg("%zd rows were skipped due to
data type incompatibility",
+  cstate->num_errors));
+

We can use errmsg_plural() instead.

I have a question about the option values; do you think we need to
have another value of SAVE_ERROR_TO option to explicitly specify the
current default behavior, i.e. not accept any error? With the v4
patch, the user needs to omit SAVE_ERROR_TO option to accept errors
during COPY FROM. If we change the default behavior in the future,
many users will be affected and probably end up changing their
applications to keep the current default behavior.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-14 Thread Alexander Korotkov

On Sun, Jan 14, 2024 at 10:35 PM Masahiko Sawada  wrote:
> Thank you for updating the patch. Here are two comments:
>
> ---
> +   if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_UNSPECIFIED &&
> +   cstate->num_errors > 0)
> +   ereport(WARNING,
> +   errmsg("%zd rows were skipped due to data type 
> incompatibility",
> +  cstate->num_errors));
> +
> /* Done, clean up */
> error_context_stack = errcallback.previous;
>
> If a malformed input is not the last data, the context message seems odd:
>
> postgres(1:1769258)=# create table test (a int);
> CREATE TABLE
> postgres(1:1769258)=# copy test from stdin (save_error_to none);
> Enter data to be copied followed by a newline.
> End with a backslash and a period on a line by itself, or an EOF signal.
> >> a
> >> 1
> >>
> 2024-01-15 05:05:53.980 JST [1769258] WARNING:  1 rows were skipped
> due to data type incompatibility
> 2024-01-15 05:05:53.980 JST [1769258] CONTEXT:  COPY test, line 3: ""
> COPY 1
>
> I think it's better to report the WARNING after resetting the
> error_context_stack. Or is a WARNING really appropriate here? The
> v15-0001-Make-COPY-FROM-more-error-tolerant.patch[1] uses NOTICE but
> the v1-0001-Add-new-COPY-option-SAVE_ERROR_TO.patch[2] changes it to
> WARNING without explanation.

Thank you for noticing this.  I think NOTICE is more appropriate here.
There is nothing to "worry" about: the user asked to ignore the errors
and we did.  And yes, it doesn't make sense to use the last line as
the context.  Fixed.

> ---
> +-- test missing data: should fail
> +COPY check_ign_err FROM STDIN WITH (save_error_to none);
> +1  {1}
> +\.
>
> We might want to cover the extra data cases too.

Agreed, the relevant test is added.

--
Regards,
Alexander Korotkov


0001-Add-new-COPY-option-SAVE_ERROR_TO-v4.patch
Description: Binary data

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-14 Thread Masahiko Sawada

On Sun, Jan 14, 2024 at 10:30 AM Alexander Korotkov
 wrote:
>
> Hi!
>
> I think this is a demanding and long-waited feature.  The thread is
> pretty long, but mostly it was disputes about how to save the errors.
> The present patch includes basic infrastructure and ability to ignore
> errors, thus it's pretty simple.
>
> On Sat, Jan 13, 2024 at 4:20 PM jian he  wrote:
> > On Fri, Jan 12, 2024 at 10:59 AM torikoshia  
> > wrote:
> > >
> > >
> > > Thanks for reviewing!
> > >
> > > Updated the patch merging your suggestions except below points:
> > >
> > > > +   cstate->num_errors = 0;
> > >
> > > Since cstate is already initialized in below lines, this may be
> > > redundant.
> > >
> > > | /* Allocate workspace and zero all fields */
> > > | cstate = (CopyFromStateData *) palloc0(sizeof(CopyFromStateData));
> > >
> > >
> > > >  +   Assert(!cstate->escontext->details_wanted);
> > >
> > > I'm not sure this is necessary, considering we're going to add other
> > > options like 'table' and 'log', which need details_wanted soon.
> > >
> > >
> > > --
> > > Regards,
> >
> > make save_error_to option cannot be used with COPY TO.
> > add redundant test, save_error_to with COPY TO test.
>
> I've incorporated these changes.  Also, I've changed
> CopyFormatOptions.save_error_to to enum and made some edits in
> comments and the commit message.  I'm going to push this if there are
> no objections.

Thank you for updating the patch. Here are two comments:

---
+   if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_UNSPECIFIED &&
+   cstate->num_errors > 0)
+   ereport(WARNING,
+   errmsg("%zd rows were skipped due to data type incompatibility",
+  cstate->num_errors));
+
/* Done, clean up */
error_context_stack = errcallback.previous;

If a malformed input is not the last data, the context message seems odd:

postgres(1:1769258)=# create table test (a int);
CREATE TABLE
postgres(1:1769258)=# copy test from stdin (save_error_to none);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> a
>> 1
>>
2024-01-15 05:05:53.980 JST [1769258] WARNING:  1 rows were skipped
due to data type incompatibility
2024-01-15 05:05:53.980 JST [1769258] CONTEXT:  COPY test, line 3: ""
COPY 1

I think it's better to report the WARNING after resetting the
error_context_stack. Or is a WARNING really appropriate here? The
v15-0001-Make-COPY-FROM-more-error-tolerant.patch[1] uses NOTICE but
the v1-0001-Add-new-COPY-option-SAVE_ERROR_TO.patch[2] changes it to
WARNING without explanation.

---
+-- test missing data: should fail
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+1  {1}
+\.

We might want to cover the extra data cases too.

Regards,

[1] 
https://www.postgresql.org/message-id/CACJufxEkkqnozdnvNMGxVAA94KZaCPkYw_Cx4JKG9ueNaZma_A%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/3d0b349ddbd4ae5f605f77b491697158%40oss.nttdata.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-13 Thread Alexander Korotkov

Hi!

I think this is a demanding and long-waited feature.  The thread is
pretty long, but mostly it was disputes about how to save the errors.
The present patch includes basic infrastructure and ability to ignore
errors, thus it's pretty simple.

On Sat, Jan 13, 2024 at 4:20 PM jian he  wrote:
> On Fri, Jan 12, 2024 at 10:59 AM torikoshia  
> wrote:
> >
> >
> > Thanks for reviewing!
> >
> > Updated the patch merging your suggestions except below points:
> >
> > > +   cstate->num_errors = 0;
> >
> > Since cstate is already initialized in below lines, this may be
> > redundant.
> >
> > | /* Allocate workspace and zero all fields */
> > | cstate = (CopyFromStateData *) palloc0(sizeof(CopyFromStateData));
> >
> >
> > >  +   Assert(!cstate->escontext->details_wanted);
> >
> > I'm not sure this is necessary, considering we're going to add other
> > options like 'table' and 'log', which need details_wanted soon.
> >
> >
> > --
> > Regards,
>
> make save_error_to option cannot be used with COPY TO.
> add redundant test, save_error_to with COPY TO test.

I've incorporated these changes.  Also, I've changed
CopyFormatOptions.save_error_to to enum and made some edits in
comments and the commit message.  I'm going to push this if there are
no objections.

--
Regards,
Alexander Korotkov

0001-Add-new-COPY-option-SAVE_ERROR_TO-v3.patch
Description: Binary data

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-13 Thread jian he

On Fri, Jan 12, 2024 at 10:59 AM torikoshia  wrote:
>
>
> Thanks for reviewing!
>
> Updated the patch merging your suggestions except below points:
>
> > +   cstate->num_errors = 0;
>
> Since cstate is already initialized in below lines, this may be
> redundant.
>
> | /* Allocate workspace and zero all fields */
> | cstate = (CopyFromStateData *) palloc0(sizeof(CopyFromStateData));
>
>
> >  +   Assert(!cstate->escontext->details_wanted);
>
> I'm not sure this is necessary, considering we're going to add other
> options like 'table' and 'log', which need details_wanted soon.
>
>
> --
> Regards,

make save_error_to option cannot be used with COPY TO.
add redundant test, save_error_to with COPY TO test.


v2-0001-minor-refactor.no-cfbot
Description: Binary data

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-11 Thread torikoshia

On Wed, Jan 10, 2024 at 4:42 PM Masahiko Sawada  
wrote:

Yeah, I'm still thinking it's better to implement this feature
incrementally. Given we're closing to feature freeze, I think it's
unlikely to get the whole feature into PG17 since there are still many
design discussions we need in addition to what Torikoshi-san pointed
out. The feature like "ignore errors" or "logging errors" would have
higher possibilities. Even if we get only these parts of the whole
"error table" feature into PG17, it will make it much easier to

implement "error tables" feature.

+1.
I'm also going to make patch for "logging errors", since this 
functionality is isolated from v7 patch.

Seems promising. I'll look at the patch.

Thanks a lot!
Sorry to attach v2 if you already reviewed v1..

On 2024-01-11 12:13, jian he wrote:
On Tue, Jan 9, 2024 at 10:36 PM torikoshia  
wrote:

On Tue, Dec 19, 2023 at 10:14 AM Masahiko Sawada 

wrote:
> If we want only such a feature we need to implement it together (the
> patch could be split, though). But if some parts of the feature are
> useful for users as well, I'd recommend implementing it incrementally.
> That way, the patches can get small and it would be easy for reviewers
> and committers to review/commit them.

Jian, how do you think this comment?

Looking back at the discussion so far, it seems that not everyone 
thinks
saving table information is the best idea[1] and some people think 
just

skipping error data is useful.[2]

Since there are issues to be considered from the design such as
physical/logical replication treatment, putting error information to
table is likely to take time for consensus building and development.

Wouldn't it be better to follow the following advice and develop the
functionality incrementally?

On Fri, Dec 15, 2023 at 4:49 AM Masahiko Sawada
 wrote:
> So I'm thinking we may be able to implement this
> feature incrementally. The first step would be something like an
> option to ignore all errors or an option to specify the maximum number
> of errors to tolerate before raising an ERROR. The second step would
> be to support logging destinations such as server logs and tables.

Attached a patch for this "first step" with reference to v7 patch, 
which

logged errors and simpler than latest one.
- This patch adds new option SAVE_ERROR_TO, but currently only 
supports

'none', which means just skips error data. It is expected to support
'log' and 'table'.
- This patch Skips just soft errors and don't handle other errors such
as missing column data.

Hi.
I made the following change based on your patch
(v1-0001-Add-new-COPY-option-SAVE_ERROR_TO.patch)

* when specified SAVE_ERROR_TO, move the initialization of
ErrorSaveContext to the function BeginCopyFrom.
I think that's the right place to initialize struct CopyFromState 
field.

* I think your patch when there are N rows have malformed data, then it
will initialize N ErrorSaveContext.
In the struct CopyFromStateData, I changed it to ErrorSaveContext 
*escontext.

So if an error occurred, you can just set the escontext accordingly.
* doc: mention "If this option is omitted, COPY
stops operation at the first error."
* Since we only support 'none' for now, 'none' means we don't want
ErrorSaveContext metadata,
 so we should set cstate->escontext->details_wanted to false.

BTW I have question and comment about v15 patch:

> +   {
> +   /*
> +   *
> +   * InputFunctionCall is more faster than
> InputFunctionCallSafe.
> +   *
> +   */

Have you measured this?
When I tested it in an older patch, there were no big difference[3].

Thanks for pointing it out, I probably was over thinking.

  > -   SAVEPOINT SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P 
SECURITY

SELECT
  > +   SAVEPOINT SAVE_ERROR SCALAR SCHEMA SCHEMAS SCROLL SEARCH 
SECOND_P

SECURITY SELECT

There was a comment that we shouldn't add new keyword for this[4].

Thanks for pointing it out.

Thanks for reviewing!

Updated the patch merging your suggestions except below points:

+   cstate->num_errors = 0;

Since cstate is already initialized in below lines, this may be 
redundant.

| /* Allocate workspace and zero all fields */
| cstate = (CopyFromStateData *) palloc0(sizeof(CopyFromStateData));

 +   Assert(!cstate->escontext->details_wanted);

I'm not sure this is necessary, considering we're going to add other 
options like 'table' and 'log', which need details_wanted soon.

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group CorporationFrom a3f14a0e7e9a7b5fb961ad6b6b7b163cf6534a26 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Fri, 12 Jan 2024 11:32:00 +0900
Subject: [PATCH v2] Add new COPY option SAVE_ERROR_TO

Currently when source data contains unexpected data regarding data type or
range, entire COPY fails. However, in some cases such data can be ignored and
just copying normal data is preferable.

This patch adds a new option

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-10 Thread jian he

On Tue, Jan 9, 2024 at 10:36 PM torikoshia  wrote:
>
> On Tue, Dec 19, 2023 at 10:14 AM Masahiko Sawada 
> wrote:
> > If we want only such a feature we need to implement it together (the
> > patch could be split, though). But if some parts of the feature are
> > useful for users as well, I'd recommend implementing it incrementally.
> > That way, the patches can get small and it would be easy for reviewers
> > and committers to review/commit them.
>
> Jian, how do you think this comment?
>
> Looking back at the discussion so far, it seems that not everyone thinks
> saving table information is the best idea[1] and some people think just
> skipping error data is useful.[2]
>
> Since there are issues to be considered from the design such as
> physical/logical replication treatment, putting error information to
> table is likely to take time for consensus building and development.
>
> Wouldn't it be better to follow the following advice and develop the
> functionality incrementally?
>
> On Fri, Dec 15, 2023 at 4:49 AM Masahiko Sawada
>  wrote:
> > So I'm thinking we may be able to implement this
> > feature incrementally. The first step would be something like an
> > option to ignore all errors or an option to specify the maximum number
> > of errors to tolerate before raising an ERROR. The second step would
> > be to support logging destinations such as server logs and tables.
>
>
> Attached a patch for this "first step" with reference to v7 patch, which
> logged errors and simpler than latest one.
> - This patch adds new option SAVE_ERROR_TO, but currently only supports
> 'none', which means just skips error data. It is expected to support
> 'log' and 'table'.
> - This patch Skips just soft errors and don't handle other errors such
> as missing column data.

Hi.
I made the following change based on your patch
(v1-0001-Add-new-COPY-option-SAVE_ERROR_TO.patch)

* when specified SAVE_ERROR_TO, move the initialization of
ErrorSaveContext to the function BeginCopyFrom.
I think that's the right place to initialize struct CopyFromState field.
* I think your patch when there are N rows have malformed data, then it
will initialize N ErrorSaveContext.
In the struct CopyFromStateData, I changed it to ErrorSaveContext *escontext.
So if an error occurred, you can just set the escontext accordingly.
* doc: mention "If this option is omitted, COPY
stops operation at the first error."
* Since we only support 'none' for now, 'none' means we don't want
ErrorSaveContext metadata,
 so we should set cstate->escontext->details_wanted to false.

> BTW I have question and comment about v15 patch:
>
> > +   {
> > +   /*
> > +   *
> > +   * InputFunctionCall is more faster than
> > InputFunctionCallSafe.
> > +   *
> > +   */
>
> Have you measured this?
> When I tested it in an older patch, there were no big difference[3].
Thanks for pointing it out, I probably was over thinking.

>   > -   SAVEPOINT SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P SECURITY
> SELECT
>   > +   SAVEPOINT SAVE_ERROR SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P
> SECURITY SELECT
>
> There was a comment that we shouldn't add new keyword for this[4].
>
Thanks for pointing it out.


v1-0001-minor-refactor.no-cfbot
Description: Binary data

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-09 Thread Masahiko Sawada

On Tue, Jan 9, 2024 at 11:36 PM torikoshia  wrote:
>
> On Tue, Dec 19, 2023 at 10:14 AM Masahiko Sawada 
> wrote:
> > If we want only such a feature we need to implement it together (the
> > patch could be split, though). But if some parts of the feature are
> > useful for users as well, I'd recommend implementing it incrementally.
> > That way, the patches can get small and it would be easy for reviewers
> > and committers to review/commit them.
>
> Jian, how do you think this comment?
>
> Looking back at the discussion so far, it seems that not everyone thinks
> saving table information is the best idea[1] and some people think just
> skipping error data is useful.[2]
>
> Since there are issues to be considered from the design such as
> physical/logical replication treatment, putting error information to
> table is likely to take time for consensus building and development.
>
> Wouldn't it be better to follow the following advice and develop the
> functionality incrementally?

Yeah, I'm still thinking it's better to implement this feature
incrementally. Given we're closing to feature freeze, I think it's
unlikely to get the whole feature into PG17 since there are still many
design discussions we need in addition to what Torikoshi-san pointed
out. The feature like "ignore errors" or "logging errors" would have
higher possibilities. Even if we get only these parts of the whole
"error table" feature into PG17, it will make it much easier to
implement "error tables" feature.

>
> On Fri, Dec 15, 2023 at 4:49 AM Masahiko Sawada
>  wrote:
> > So I'm thinking we may be able to implement this
> > feature incrementally. The first step would be something like an
> > option to ignore all errors or an option to specify the maximum number
> > of errors to tolerate before raising an ERROR. The second step would
> > be to support logging destinations such as server logs and tables.
>
>
> Attached a patch for this "first step" with reference to v7 patch, which
> logged errors and simpler than latest one.
> - This patch adds new option SAVE_ERROR_TO, but currently only supports
> 'none', which means just skips error data. It is expected to support
> 'log' and 'table'.
> - This patch Skips just soft errors and don't handle other errors such
> as missing column data.

Seems promising. I'll look at the patch.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-09 Thread torikoshia

On Tue, Dec 19, 2023 at 10:14 AM Masahiko Sawada  
wrote:

If we want only such a feature we need to implement it together (the
patch could be split, though). But if some parts of the feature are
useful for users as well, I'd recommend implementing it incrementally.
That way, the patches can get small and it would be easy for reviewers
and committers to review/commit them.

Jian, how do you think this comment?

Looking back at the discussion so far, it seems that not everyone thinks 
saving table information is the best idea[1] and some people think just 
skipping error data is useful.[2]

Since there are issues to be considered from the design such as 
physical/logical replication treatment, putting error information to 
table is likely to take time for consensus building and development.

Wouldn't it be better to follow the following advice and develop the 
functionality incrementally?

On Fri, Dec 15, 2023 at 4:49 AM Masahiko Sawada 
 wrote:

So I'm thinking we may be able to implement this
feature incrementally. The first step would be something like an
option to ignore all errors or an option to specify the maximum number
of errors to tolerate before raising an ERROR. The second step would
be to support logging destinations such as server logs and tables.

Attached a patch for this "first step" with reference to v7 patch, which 
logged errors and simpler than latest one.
- This patch adds new option SAVE_ERROR_TO, but currently only supports 
'none', which means just skips error data. It is expected to support 
'log' and 'table'.
- This patch Skips just soft errors and don't handle other errors such 
as missing column data.

BTW I have question and comment about v15 patch:

+   {
+   /*
+   *
+   * InputFunctionCall is more faster than 
InputFunctionCallSafe.

+   *
+   */

Have you measured this?
When I tested it in an older patch, there were no big difference[3].

 > -   SAVEPOINT SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P SECURITY 
SELECT
 > +   SAVEPOINT SAVE_ERROR SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P 
SECURITY SELECT

There was a comment that we shouldn't add new keyword for this[4].

I left as it was in v7 patch regarding these points.

[1] 
https://www.postgresql.org/message-id/20231109002600.fuihn34bjqqgmbjm%40awork3.anarazel.de
[2] 
https://www.postgresql.org/message-id/CAD21AoCeEOBN49fu43e6tBTynnswugA3oZ5AZvLeyDCpxpCXPg%40mail.gmail.com
[3] 
https://www.postgresql.org/message-id/19551e8c2717c24689913083f841ddb5%40oss.nttdata.com
[4] 
https://www.postgresql.org/message-id/20230322175000.qbdctk7bnmifh5an%40awork3.anarazel.de

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group CorporationFrom 675b8b8408e23f22940a99b40cb7ec3e1b36cac3 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Tue, 9 Jan 2024 23:10:14 +0900
Subject: [PATCH v1] Add new COPY option SAVE_ERROR_TO

Currently when source data contains unexpected data regarding data type or
range, entire COPY fails. However, in some cases such data can be ignored and
just copying normal data is preferable.

This patch adds a new option SAVE_ERROR_TO, which specifies where to save the
error information. When this option is specified, COPY skips soft errors and
continues copying data.

Currently SAVE_ERROR_TO only supports 'none'. This indicates error information
is not saved and COPY just skips the unexpected data and continues running.

Later works are expected to add more choices, such as 'log' and 'table'.

Author: Damir Belyalov, Atsushi Torikoshi, referenced with jian he's patch.
---
 doc/src/sgml/ref/copy.sgml   | 20 +-
 src/backend/commands/copy.c  | 19 ++
 src/backend/commands/copyfrom.c  | 33 
 src/backend/commands/copyfromparse.c | 16 +---
 src/bin/psql/tab-complete.c  |  7 -
 src/include/commands/copy.h  |  1 +
 src/include/commands/copyfrom_internal.h |  3 +++
 src/test/regress/expected/copy2.out  | 28 
 src/test/regress/sql/copy2.sql   | 27 +++
 9 files changed, 148 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18ecc69c33..87f2b3e7a2 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,6 +43,7 @@ COPY { table_name [ ( column_name [, ...] ) | * }
 FORCE_NOT_NULL { ( column_name [, ...] ) | * }
 FORCE_NULL { ( column_name [, ...] ) | * }
+SAVE_ERROR_TO 'location'
 ENCODING 'encoding_name'

@@ -373,6 +374,22 @@ COPY { table_name [ ( 

+   
+SAVE_ERROR_TO
+
+ 
+  Specifies where to save error information when there are malformed data in
+  the input. If this option is specified, COPY skips
+  malformed data and continues copying data.
+  Currently only none is supported.
+  This option is allowed only in COPY FROM, and only when
+

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-05 Thread jian he

On Fri, Jan 5, 2024 at 4:37 PM jian he  wrote:
>
> > > > be reused for a different user.
> > > >
> > >
> > > You are right.
> > > so I changed, now the schema owner will be the error table owner.
> > > every error table tuple inserts,
> > > I switch to schema owner, do the insert, then switch back to the
> > > COPY_FROM operation user.
> > > now everyone (except superuser) will need explicit grant to access the
> > > error table.
> >
> > There are some compilation issues reported at [1] for the patch:
> > [04:04:26.288] copyfromparse.c: In function ‘NextCopyFrom’:
> > [04:04:26.288] copyfromparse.c:1126:25: error: ‘copy_errors_tupDesc’
> > may be used uninitialized in this function
> > [-Werror=maybe-uninitialized]
> > [04:04:26.288] 1126 | copy_errors_tup = heap_form_tuple(copy_errors_tupDesc,
> > [04:04:26.288] | ^~~~
> > [04:04:26.288] 1127 | t_values,
> > [04:04:26.288] | ~
> > [04:04:26.288] 1128 | t_isnull);
> > [04:04:26.288] | ~
> > [04:04:26.288] copyfromparse.c:1160:4: error: ‘copy_errorsrel’ may be
> > used uninitialized in this function [-Werror=maybe-uninitialized]
> > [04:04:26.288] 1160 | table_close(copy_errorsrel, RowExclusiveLock);
> > [04:04:26.288] | ^
> >
> > [1] - https://cirrus-ci.com/task/4785221183209472
> >
>
> I fixed this issue, and also improved the doc.
> Other implementations have not changed.

bother again.
This time, I used the ci test it again.
now there should be no warning.
From f033ef4025dbe2012007434dacd4821718443571 Mon Sep 17 00:00:00 2001
From: jian he 
Date: Sat, 6 Jan 2024 02:34:22 +0800
Subject: [PATCH v14 1/1] Make COPY FROM more error tolerant

At present, when processing the source file, COPY FROM may encounter three types of data type conversion errors.
* extra data after last expected column
* missing data for column \"%s\"
* data type conversion error.

Instead of throwing errors while copying, save_error (boolean) specifier will
save errors to the table copy_errors for all the copy from operation happend in the same schema.

 We check the existence of table copy_errors,
 we also check the data definition of copy_errors via compare column names and data types.
 If copy_errors already exists and meets the criteria then errors metadata will save to it.
 If copy_errors does not exist, then create it.
 If copy_errors exist, cannot use for saving error, then raise an error.

 the table copy_errors is per schema-wise, it's owned by the copy from
 operation destination schema's owner.
 The table owner has full privilege on copy_errors,
 other non-superuser need gain privilege to access it.
---
 doc/src/sgml/ref/copy.sgml   | 120 -
 src/backend/commands/copy.c  |  12 ++
 src/backend/commands/copyfrom.c  | 133 +-
 src/backend/commands/copyfromparse.c | 217 +--
 src/backend/parser/gram.y|   8 +-
 src/bin/psql/tab-complete.c  |   3 +-
 src/include/commands/copy.h  |   1 +
 src/include/commands/copyfrom_internal.h |   6 +
 src/include/parser/kwlist.h  |   1 +
 src/test/regress/expected/copy2.out  | 137 ++
 src/test/regress/sql/copy2.sql   | 123 +
 11 files changed, 745 insertions(+), 16 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18ecc69c..f6cdf0cf 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -44,6 +44,7 @@ COPY { table_name [ ( column_name [, ...] ) | * }
 FORCE_NULL { ( column_name [, ...] ) | * }
 ENCODING 'encoding_name'
+SAVE_ERROR [ boolean ]
 
  
 
@@ -411,6 +412,18 @@ WHERE condition
 

 
+   
+SAVE_ERROR
+
+ 
+  Specifies that any data conversion errors while copying will automatically saved in table COPY_ERRORS and the COPY FROM operation will not be interrupted by conversion errors.
+  This option is not allowed when using binary format. This option
+  is only supported for COPY FROM syntax.
+  If this option is omitted, any data type conversion errors will be raised immediately.
+ 
+
+   
+
   
  
 
@@ -564,6 +577,7 @@ COPY count
 amount to a considerable amount of wasted disk space if the failure
 happened well into a large copy operation. You might wish to invoke
 VACUUM to recover the wasted space.
+To continue copying while skip conversion errors in a COPY FROM, you might wish to specify SAVE_ERROR.

 

@@ -572,6 +586,18 @@ COPY count
 null strings to null values and unquoted null strings to empty strings.

 
+   
+If the SAVE_ERROR option is specified and conversion errors occur while copying,
+PostgreSQL will first check for the existence of the table COPY_ERRORS, then save the conversion error information to it.
+If it does exist, but the table definition cannot use it to save the error, an error is raised, COPY FROM

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-05 Thread jian he

On Fri, Jan 5, 2024 at 12:05 AM vignesh C  wrote:
>
> On Thu, 28 Dec 2023 at 09:27, jian he  wrote:
> >
> > On Wed, Dec 20, 2023 at 8:27 PM Masahiko Sawada  
> > wrote:
> > >
> > >
> > > Why do we need to use SPI? I think we can form heap tuples and insert
> > > them to the error table. Creating the error table also doesn't need to
> > > use SPI.
> > >
> > Thanks for pointing it out. I figured out how to form heap tuples and
> > insert them to the error table.
> > but I don't know how to create the error table without using SPI.
> > Please pointer it out.
> >
> > > >
> > > > copy_errors one per schema.
> > > > foo.copy_errors will be owned by the schema: foo owner.
> > >
> > > It seems that the error table is created when the SAVE_ERROR is used
> > > for the first time. It probably blocks concurrent COPY FROM commands
> > > with SAVE_ERROR option to different tables if the error table is not
> > > created yet.
> > >
> > I don't know how to solve this problem Maybe we can document this.
> > but it will block the COPY FROM immediately.
> >
> > > >
> > > > if you can insert to a table in that specific schema let's say foo,
> > > > then you will get privilege to INSERT/DELETE/SELECT
> > > > to foo.copy_errors.
> > > > If you are not a superuser, you are only allowed to do
> > > > INSERT/DELETE/SELECT on foo.copy_errors rows where USERID =
> > > > current_user::regrole::oid.
> > > > This is done via row level security.
> > >
> > > I don't think it works. If the user is dropped, the user's oid could
> > > be reused for a different user.
> > >
> >
> > You are right.
> > so I changed, now the schema owner will be the error table owner.
> > every error table tuple inserts,
> > I switch to schema owner, do the insert, then switch back to the
> > COPY_FROM operation user.
> > now everyone (except superuser) will need explicit grant to access the
> > error table.
>
> There are some compilation issues reported at [1] for the patch:
> [04:04:26.288] copyfromparse.c: In function ‘NextCopyFrom’:
> [04:04:26.288] copyfromparse.c:1126:25: error: ‘copy_errors_tupDesc’
> may be used uninitialized in this function
> [-Werror=maybe-uninitialized]
> [04:04:26.288] 1126 | copy_errors_tup = heap_form_tuple(copy_errors_tupDesc,
> [04:04:26.288] | ^~~~
> [04:04:26.288] 1127 | t_values,
> [04:04:26.288] | ~
> [04:04:26.288] 1128 | t_isnull);
> [04:04:26.288] | ~
> [04:04:26.288] copyfromparse.c:1160:4: error: ‘copy_errorsrel’ may be
> used uninitialized in this function [-Werror=maybe-uninitialized]
> [04:04:26.288] 1160 | table_close(copy_errorsrel, RowExclusiveLock);
> [04:04:26.288] | ^
>
> [1] - https://cirrus-ci.com/task/4785221183209472
>

I fixed this issue, and also improved the doc.
Other implementations have not changed.
From 99ebe03caa9d50b2cd3cdcd05becccd4b61684e1 Mon Sep 17 00:00:00 2001
From: jian he 
Date: Fri, 5 Jan 2024 16:29:38 +0800
Subject: [PATCH v14 1/1] Make COPY FROM more error tolerant

At present, when processing the source file, COPY FROM may encounter three types of data type conversion errors.
* extra data after last expected column
* missing data for column \"%s\"
* data type conversion error.

Instead of throwing errors while copying, save_error (boolean) specifier will
save errors to the table copy_errors for all the copy from operation happend in the same schema.

 We check the existence of table copy_errors,
 we also check the data definition of copy_errors via compare column names and data types.
 If copy_errors already exists and meets the criteria then errors metadata will save to it.
 If copy_errors does not exist, then create it.
 If copy_errors exist, cannot use for saving error, then raise an error.

 the table copy_errors is per schema-wise, it's owned by the copy from
 operation destination schema's owner.
 The table owner has full privilege on copy_errors,
 other non-superuser need gain privilege to access it.
---
 doc/src/sgml/ref/copy.sgml   | 120 -
 src/backend/commands/copy.c  |  12 ++
 src/backend/commands/copyfrom.c  | 133 +-
 src/backend/commands/copyfromparse.c | 217 +--
 src/backend/parser/gram.y|   8 +-
 src/bin/psql/tab-complete.c  |   3 +-
 src/include/commands/copy.h  |   1 +
 src/include/commands/copyfrom_internal.h |   6 +
 src/include/parser/kwlist.h  |   1 +
 src/test/regress/expected/copy2.out  | 137 ++
 src/test/regress/sql/copy2.sql   | 123 +
 11 files changed, 745 insertions(+), 16 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18ecc69c..f6cdf0cf 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -44,6 +44,7 @@ COPY { table_name [ ( column_name [, ...] ) | * }
 FORCE_NULL { ( column_name [, ...] ) | * }
 ENCODING 'encoding_name'
+

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-04 Thread vignesh C

On Thu, 28 Dec 2023 at 09:27, jian he  wrote:
>
> On Wed, Dec 20, 2023 at 8:27 PM Masahiko Sawada  wrote:
> >
> >
> > Why do we need to use SPI? I think we can form heap tuples and insert
> > them to the error table. Creating the error table also doesn't need to
> > use SPI.
> >
> Thanks for pointing it out. I figured out how to form heap tuples and
> insert them to the error table.
> but I don't know how to create the error table without using SPI.
> Please pointer it out.
>
> > >
> > > copy_errors one per schema.
> > > foo.copy_errors will be owned by the schema: foo owner.
> >
> > It seems that the error table is created when the SAVE_ERROR is used
> > for the first time. It probably blocks concurrent COPY FROM commands
> > with SAVE_ERROR option to different tables if the error table is not
> > created yet.
> >
> I don't know how to solve this problem Maybe we can document this.
> but it will block the COPY FROM immediately.
>
> > >
> > > if you can insert to a table in that specific schema let's say foo,
> > > then you will get privilege to INSERT/DELETE/SELECT
> > > to foo.copy_errors.
> > > If you are not a superuser, you are only allowed to do
> > > INSERT/DELETE/SELECT on foo.copy_errors rows where USERID =
> > > current_user::regrole::oid.
> > > This is done via row level security.
> >
> > I don't think it works. If the user is dropped, the user's oid could
> > be reused for a different user.
> >
>
> You are right.
> so I changed, now the schema owner will be the error table owner.
> every error table tuple inserts,
> I switch to schema owner, do the insert, then switch back to the
> COPY_FROM operation user.
> now everyone (except superuser) will need explicit grant to access the
> error table.

There are some compilation issues reported at [1] for the patch:
[04:04:26.288] copyfromparse.c: In function ‘NextCopyFrom’:
[04:04:26.288] copyfromparse.c:1126:25: error: ‘copy_errors_tupDesc’
may be used uninitialized in this function
[-Werror=maybe-uninitialized]
[04:04:26.288] 1126 | copy_errors_tup = heap_form_tuple(copy_errors_tupDesc,
[04:04:26.288] | ^~~~
[04:04:26.288] 1127 | t_values,
[04:04:26.288] | ~
[04:04:26.288] 1128 | t_isnull);
[04:04:26.288] | ~
[04:04:26.288] copyfromparse.c:1160:4: error: ‘copy_errorsrel’ may be
used uninitialized in this function [-Werror=maybe-uninitialized]
[04:04:26.288] 1160 | table_close(copy_errorsrel, RowExclusiveLock);
[04:04:26.288] | ^

[1] - https://cirrus-ci.com/task/4785221183209472

Regards,
Vignesh

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-12-27 Thread jian he

On Wed, Dec 20, 2023 at 8:27 PM Masahiko Sawada  wrote:
>
>
> Why do we need to use SPI? I think we can form heap tuples and insert
> them to the error table. Creating the error table also doesn't need to
> use SPI.
>
Thanks for pointing it out. I figured out how to form heap tuples and
insert them to the error table.
but I don't know how to create the error table without using SPI.
Please pointer it out.

> >
> > copy_errors one per schema.
> > foo.copy_errors will be owned by the schema: foo owner.
>
> It seems that the error table is created when the SAVE_ERROR is used
> for the first time. It probably blocks concurrent COPY FROM commands
> with SAVE_ERROR option to different tables if the error table is not
> created yet.
>
I don't know how to solve this problem Maybe we can document this.
but it will block the COPY FROM immediately.

> >
> > if you can insert to a table in that specific schema let's say foo,
> > then you will get privilege to INSERT/DELETE/SELECT
> > to foo.copy_errors.
> > If you are not a superuser, you are only allowed to do
> > INSERT/DELETE/SELECT on foo.copy_errors rows where USERID =
> > current_user::regrole::oid.
> > This is done via row level security.
>
> I don't think it works. If the user is dropped, the user's oid could
> be reused for a different user.
>

You are right.
so I changed, now the schema owner will be the error table owner.
every error table tuple inserts,
I switch to schema owner, do the insert, then switch back to the
COPY_FROM operation user.
now everyone (except superuser) will need explicit grant to access the
error table.
From 8c8c266f1dc809ffa0ec9f4262bdd912ed6b758a Mon Sep 17 00:00:00 2001
From: pgaddict 
Date: Wed, 27 Dec 2023 20:15:24 +0800
Subject: [PATCH v13 1/1] Make COPY FROM more error tolerant
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Currently COPY FROM has 3 types of error while processing the source file.
* extra data after last expected column
* missing data for column \"%s\"
* data type conversion error.

Instead of throwing errors while copying, save_error specifier will
save errors to table copy_errors for all the copy from operation in the same schema.

We check the existing copy_errors table definition by column name and column data type.
if table already exists and meets the criteria then errors metadata will save to copy_errors.
if the table does not exist, then create one.

table copy_errors is per schema-wise, it's owned by the copy from
operation destination schema's owner.
The table owner has full privilege on copy_errors,
other non-superuser need gain privilege to access it.

Only works for COPY FROM, non-BINARY mode.
---
 doc/src/sgml/ref/copy.sgml   | 121 -
 src/backend/commands/copy.c  |  12 ++
 src/backend/commands/copyfrom.c  | 133 +-
 src/backend/commands/copyfromparse.c | 217 +--
 src/backend/parser/gram.y|   8 +-
 src/bin/psql/tab-complete.c  |   3 +-
 src/include/commands/copy.h  |   1 +
 src/include/commands/copyfrom_internal.h |   6 +
 src/include/parser/kwlist.h  |   1 +
 src/test/regress/expected/copy2.out  | 137 ++
 src/test/regress/sql/copy2.sql   | 123 +
 11 files changed, 746 insertions(+), 16 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18ecc69c..1d0ff0b6 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -44,6 +44,7 @@ COPY { table_name [ ( column_name [, ...] ) | * }
 FORCE_NULL { ( column_name [, ...] ) | * }
 ENCODING 'encoding_name'
+SAVE_ERROR [ boolean ]
 
  
 
@@ -411,6 +412,18 @@ WHERE condition
 

 
+   
+SAVE_ERROR
+
+ 
+  Specifies that any data conversion errors while copying will automatically saved in table COPY_ERRORS and the COPY FROM operation will not be interrupted by conversion errors.
+  This option is not allowed when using binary format. Note that this
+  is only supported in current COPY FROM syntax.
+  If this option is omitted, any data type conversion errors will be raised immediately.
+ 
+
+   
+
   
  
 
@@ -564,6 +577,7 @@ COPY count
 amount to a considerable amount of wasted disk space if the failure
 happened well into a large copy operation. You might wish to invoke
 VACUUM to recover the wasted space.
+To continue copying while skip conversion errors in a COPY FROM, you might wish to specify SAVE_ERROR.

 

@@ -572,6 +586,18 @@ COPY count
 null strings to null values and unquoted null strings to empty strings.

 
+   
+If the SAVE_ERROR option is specified and conversion errors occur while copying,
+PostgreSQL will first check the table COPY_ERRORS existence, then save the conversion error related information to it.
+If it does exist, but the actual table definition cannot use it to save the

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-12-20 Thread Masahiko Sawada

On Wed, Dec 20, 2023 at 1:07 PM jian he  wrote:
>
> On Tue, Dec 19, 2023 at 9:14 AM Masahiko Sawada  wrote:
> >
> >
> > The error table hub idea is still unclear to me. I assume that there
> > are error tables at least on each database. And an error table can
> > have error data that happened during COPY FROM, including malformed
> > lines. Do the error tables grow without bounds and the users have to
> > delete rows at some point? If so, who can do that? How can we achieve
> > that the users can see only errored rows they generated? And the issue
> > with logical replication also needs to be resolved. Anyway, if we go
> > this direction, we need to discuss the overall design.
> >
> > Regards,
> >
> > --
> > Masahiko Sawada
> > Amazon Web Services: https://aws.amazon.com
>
> Please check my latest attached POC.
> Main content is to build spi query, execute the spi query, regress
> test and regress output.

Why do we need to use SPI? I think we can form heap tuples and insert
them to the error table. Creating the error table also doesn't need to
use SPI.

>
> copy_errors one per schema.
> foo.copy_errors will be owned by the schema: foo owner.

It seems that the error table is created when the SAVE_ERROR is used
for the first time. It probably blocks concurrent COPY FROM commands
with SAVE_ERROR option to different tables if the error table is not
created yet.

>
> if you can insert to a table in that specific schema let's say foo,
> then you will get privilege to INSERT/DELETE/SELECT
> to foo.copy_errors.
> If you are not a superuser, you are only allowed to do
> INSERT/DELETE/SELECT on foo.copy_errors rows where USERID =
> current_user::regrole::oid.
> This is done via row level security.

I don't think it works. If the user is dropped, the user's oid could
be reused for a different user.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-12-19 Thread jian he

On Tue, Dec 19, 2023 at 9:14 AM Masahiko Sawada  wrote:
>
>
> The error table hub idea is still unclear to me. I assume that there
> are error tables at least on each database. And an error table can
> have error data that happened during COPY FROM, including malformed
> lines. Do the error tables grow without bounds and the users have to
> delete rows at some point? If so, who can do that? How can we achieve
> that the users can see only errored rows they generated? And the issue
> with logical replication also needs to be resolved. Anyway, if we go
> this direction, we need to discuss the overall design.
>
> Regards,
>
> --
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com

Please check my latest attached POC.
Main content is to build spi query, execute the spi query, regress
test and regress output.

copy_errors one per schema.
foo.copy_errors will be owned by the schema: foo owner.

if you can insert to a table in that specific schema let's say foo,
then you will get privilege to INSERT/DELETE/SELECT
to foo.copy_errors.
If you are not a superuser, you are only allowed to do
INSERT/DELETE/SELECT on foo.copy_errors rows where USERID =
current_user::regrole::oid.
This is done via row level security.

Since foo.copy_errors is mainly INSERT operations, if copy_errors grow
too much, that means your source file has many errors, it will take a
very long time to finish the whole COPY. maybe we can capture how many
errors encountered in another client.

I don't know how to deal with logic replication. looking for ideas.
From 9affaf6d94eb4afe26fc7181e38e53eed14e0216 Mon Sep 17 00:00:00 2001
From: pgaddict 
Date: Wed, 20 Dec 2023 11:26:25 +0800
Subject: [PATCH v12 1/1] Make COPY FROM more error tolerant

Currently COPY FROM has 3 types of error while processing the source file.
* extra data after last expected column
* missing data for column \"%s\"
* data type conversion error.

Instead of throwing errors while copying, save_error specifier will
save errors to table copy_errors for all the copy from operation in the same schema.

We check the existing copy_error table definition by column name and column data type.
if table already exists and meets the criteria then errors will to it
if the table does not exist, then create one.

copy_errors is per schema, it's owned by schema's owner.
for non-superusers, if you can do insert in that schema, then you can insert to copy_errors,
but you are only allowed to select/delet your own rows, which is judged by current_user
with copy_error's userid column. Priviledge restirction is implmented via ROW LEVEL SECURITY.

Only works for COPY FROM, non-BINARY mode.

While copying, if error never happened, error saving  table will be dropped at the ending of COPY FROM.
If the error saving table exists, meaning at least once COPY FROM errors has happened,
then all the future errors will be saved to that table.
We save the error related meta info to error saving table using SPI,
that is construct a query string, then execute the query.
---
 contrib/file_fdw/file_fdw.c  |   4 +-
 doc/src/sgml/ref/copy.sgml   | 122 ++-
 src/backend/commands/copy.c  |  12 ++
 src/backend/commands/copyfrom.c  | 168 -
 src/backend/commands/copyfromparse.c | 179 +--
 src/backend/parser/gram.y|   8 +-
 src/bin/psql/tab-complete.c  |   3 +-
 src/include/commands/copy.h  |   3 +-
 src/include/commands/copyfrom_internal.h |   5 +
 src/include/parser/kwlist.h  |   1 +
 src/test/regress/expected/copy2.out  | 160 
 src/test/regress/sql/copy2.sql   | 142 ++
 12 files changed, 787 insertions(+), 20 deletions(-)

diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 2189be8a..2d3eb34f 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -751,7 +751,7 @@ fileIterateForeignScan(ForeignScanState *node)
 	 */
 	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
 	found = NextCopyFrom(festate->cstate, econtext,
-		 slot->tts_values, slot->tts_isnull);
+		 slot->tts_values, slot->tts_isnull, NULL);
 	if (found)
 		ExecStoreVirtualTuple(slot);

@@ -1183,7 +1183,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
 		MemoryContextReset(tupcontext);
 		MemoryContextSwitchTo(tupcontext);

-		found = NextCopyFrom(cstate, NULL, values, nulls);
+		found = NextCopyFrom(cstate, NULL, values, nulls, NULL);

 		MemoryContextSwitchTo(oldcontext);

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18ecc69c..3dbf70ee 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -44,6 +44,7 @@ COPY { table_name [ ( column_name [, ...] ) | * }
 FORCE_NULL { ( column_name [, ...] ) | * }
 ENCODING 'encoding_name'
+SAVE_ERROR [ boolean ]

@@ -411,6 +412,18 @@ WHERE condition

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-12-18 Thread Masahiko Sawada

On Mon, Dec 18, 2023 at 4:41 PM jian he  wrote:
>
> On Mon, Dec 18, 2023 at 1:09 PM torikoshia  wrote:
> >
> > Hi,
> >
> > > save the error metadata to  system catalogs would be more expensive,
> > > please see below explanation.
> > > I have no knowledge of publications.
> > > but i feel there is a feature request: publication FOR ALL TABLES
> > > exclude regex_pattern.
> > > Anyway, that would be another topic.
> >
> > I think saving error metadata to system catalog is not a good idea, too.
> > And I believe Sawada-san just pointed out missing features and did not
> > suggested that we use system catalog.
> >
> > > I don't think "specify the maximum number  of errors to tolerate
> > > before raising an ERROR." is very useful
> >
> > That may be so.
> > I imagine it's useful in some use case since some loading tools have
> > such options.
> > Anyway I agree it's not necessary for initial patch as mentioned in [1].
> >
> > > I suppose we can specify an ERRORFILE directory. similar
> > > implementation [2], demo in [3]
> > > it will generate 2 files, one file shows the malform line content as
> > > is, another file shows the error info.
> >
> > That may be a good option when considering "(2) logging errors to
> > somewhere".
> >
> > What do you think about the proposal to develop these features in
> > incrementally?
> >
>
> I am more with  tom's idea [1], that is when errors happen (data type
> conversion only), do not fail, AND we save the error to a table.  I
> guess we can implement this logic together, only with a new COPY
> option.

If we want only such a feature we need to implement it together (the
patch could be split, though). But if some parts of the feature are
useful for users as well, I'd recommend implementing it incrementally.
That way, the patches can get small and it would be easy for reviewers
and committers to review/commit them.

>
> imagine a case (it's not that contrived, imho), while conversion from
> text to table's int, postgres isspace is different from the source
> text file's isspace logic.
> then all the lines are malformed. If we just say on error continue and
> not save error meta info, the user is still confused which field has
> the wrong data, then the user will probably try to incrementally test
> which field contains malformed data.
>
> Since we need to save the error somewhere.
> Everyone has the privilege to INSERT can do COPY.
> I think we also need to handle the access privilege also.
> So like I mentioned above, one copy_error error table hub, then
> everyone can view/select their own copy failure record.

The error table hub idea is still unclear to me. I assume that there
are error tables at least on each database. And an error table can
have error data that happened during COPY FROM, including malformed
lines. Do the error tables grow without bounds and the users have to
delete rows at some point? If so, who can do that? How can we achieve
that the users can see only errored rows they generated? And the issue
with logical replication also needs to be resolved. Anyway, if we go
this direction, we need to discuss the overall design.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-12-18 Thread Masahiko Sawada

On Mon, Dec 18, 2023 at 9:16 AM jian he  wrote:
>
> On Fri, Dec 15, 2023 at 4:49 AM Masahiko Sawada  wrote:
> >
> > Hi,
> >
> > I've read this thread and the latest patch. IIUC with SAVE_ERROR
> > option, COPY FROM creates an error table for the target table and
> > writes error information there.
> >
> > While I agree that the final shape of this feature would be something
> > like that design, I'm concerned some features are missing in order to
> > make this feature useful in practice. For instance, error logs are
> > inserted to error tables without bounds, meaning that users who want
> > to tolerate errors during COPY FROM  will have to truncate or drop the
> > error tables periodically, or the database will grow with error logs
> > without limit. Ideally such maintenance work should be done by the
> > database. There might be some users who want to log such conversion
> > errors in server logs to avoid such maintenance work. I think we
> > should provide an option for where to write, at least. Also, since the
> > error tables are normal user tables internally, error logs are also
> > replicated to subscribers if there is a publication FOR ALL TABLES,
> > unlike system catalogs. I think some users would not like such
> > behavior.
>
> save the error metadata to  system catalogs would be more expensive,
> please see below explanation.
> I have no knowledge of publications.
> but i feel there is a feature request: publication FOR ALL TABLES
> exclude regex_pattern.
> Anyway, that would be another topic.

I don't think the new regex idea would be a good solution for the
existing users who are using FOR ALL TABLES publication. It's not
desirable that they have to change the publication because of this
feature. With the current patch, a logical replication using FOR ALL
TABLES publication will stop immediately after an error information is
inserted into a new error table unless the same error table is created
on subscribers.

>
> > Looking at SAVE_ERROR feature closely, I think it consists of two
> > separate features. That is, it enables COPY FROM to load data while
> > (1) tolerating errors and (2) logging errors to somewhere (i.e., an
> > error table). If we implement only (1), it would be like COPY FROM
> > tolerate errors infinitely and log errors to /dev/null. The user
> > cannot see the error details but I guess it could still help some
> > cases as Andres mentioned[1] (it might be a good idea to send the
> > number of rows successfully loaded in a NOTICE message if some rows
> > could not be loaded). Then with (2), COPY FROM can log error
> > information to somewhere such as tables and server logs and the user
> > can select it. So I'm thinking we may be able to implement this
> > feature incrementally. The first step would be something like an
> > option to ignore all errors or an option to specify the maximum number
> > of errors to tolerate before raising an ERROR. The second step would
> > be to support logging destinations such as server logs and tables.
> >
> > Regards,
> >
> > [1] 
> > https://www.postgresql.org/message-id/20231109002600.fuihn34bjqqgmbjm%40awork3.anarazel.de
> >
> > --
> > Masahiko Sawada
> > Amazon Web Services: https://aws.amazon.com
>
> > feature incrementally. The first step would be something like an
> > option to ignore all errors or an option to specify the maximum number
> > of errors to tolerate before raising an ERROR. The second step would
>
> I don't think "specify the maximum number  of errors to tolerate
> before raising an ERROR." is very useful
>
> QUOTE from [1]
> MAXERROR [AS] error_count
> If the load returns the error_count number of errors or greater, the
> load fails. If the load returns fewer errors, it continues and returns
> an INFO message that states the number of rows that could not be
> loaded. Use this parameter to allow loads to continue when certain
> rows fail to load into the table because of formatting errors or other
> inconsistencies in the data.
> Set this value to 0 or 1 if you want the load to fail as soon as the
> first error occurs. The AS keyword is optional. The MAXERROR default
> value is 0 and the limit is 10.
> The actual number of errors reported might be greater than the
> specified MAXERROR because of the parallel nature of Amazon Redshift.
> If any node in the Amazon Redshift cluster detects that MAXERROR has
> been exceeded, each node reports all of the errors it has encountered.
> END OF QUOTE
>
> option MAXERROR error_count. iiuc, it fails while validating line
> error_count + 1, else it raises a notice, tells you how many rows have
> errors.
>
> * case when error_count is small, and the copy fails, it only tells
> you that at least the error_count line has malformed data. but what if
> the actual malformed rows are very big. In this case, this failure
> error message is not that helpful.
> * case when error_count is very big, and the copy does not fail. then
> the actual malformed data rows are very big (still

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-12-17 Thread jian he

On Mon, Dec 18, 2023 at 1:09 PM torikoshia  wrote:
>
> Hi,
>
> > save the error metadata to  system catalogs would be more expensive,
> > please see below explanation.
> > I have no knowledge of publications.
> > but i feel there is a feature request: publication FOR ALL TABLES
> > exclude regex_pattern.
> > Anyway, that would be another topic.
>
> I think saving error metadata to system catalog is not a good idea, too.
> And I believe Sawada-san just pointed out missing features and did not
> suggested that we use system catalog.
>
> > I don't think "specify the maximum number  of errors to tolerate
> > before raising an ERROR." is very useful
>
> That may be so.
> I imagine it's useful in some use case since some loading tools have
> such options.
> Anyway I agree it's not necessary for initial patch as mentioned in [1].
>
> > I suppose we can specify an ERRORFILE directory. similar
> > implementation [2], demo in [3]
> > it will generate 2 files, one file shows the malform line content as
> > is, another file shows the error info.
>
> That may be a good option when considering "(2) logging errors to
> somewhere".
>
> What do you think about the proposal to develop these features in
> incrementally?
>

I am more with  tom's idea [1], that is when errors happen (data type
conversion only), do not fail, AND we save the error to a table.  I
guess we can implement this logic together, only with a new COPY
option.

imagine a case (it's not that contrived, imho), while conversion from
text to table's int, postgres isspace is different from the source
text file's isspace logic.
then all the lines are malformed. If we just say on error continue and
not save error meta info, the user is still confused which field has
the wrong data, then the user will probably try to incrementally test
which field contains malformed data.

Since we need to save the error somewhere.
Everyone has the privilege to INSERT can do COPY.
I think we also need to handle the access privilege also.
So like I mentioned above, one copy_error error table hub, then
everyone can view/select their own copy failure record.

but save to a server text file/directory, not easy for an INSERT
privilege user to see these files, I think.
similarly not easy to see these failed records in log for limited privilege.

if someone wants to fail at maxerror rows, they can do it, since we
will count how many rows failed.
even though I didn't get it.

[1] https://www.postgresql.org/message-id/900123.1699488001%40sss.pgh.pa.us

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-12-17 Thread torikoshia


Hi,


save the error metadata to  system catalogs would be more expensive,
please see below explanation.
I have no knowledge of publications.
but i feel there is a feature request: publication FOR ALL TABLES
exclude regex_pattern.
Anyway, that would be another topic.


I think saving error metadata to system catalog is not a good idea, too.
And I believe Sawada-san just pointed out missing features and did not 
suggested that we use system catalog.



I don't think "specify the maximum number  of errors to tolerate
before raising an ERROR." is very useful


That may be so.
I imagine it's useful in some use case since some loading tools have 
such options.

Anyway I agree it's not necessary for initial patch as mentioned in [1].


I suppose we can specify an ERRORFILE directory. similar
implementation [2], demo in [3]
it will generate 2 files, one file shows the malform line content as
is, another file shows the error info.


That may be a good option when considering "(2) logging errors to 
somewhere".


What do you think about the proposal to develop these features in 
incrementally?


On 2023-12-15 05:48, Masahiko Sawada wrote:

So I'm thinking we may be able to implement this
feature incrementally. The first step would be something like an
option to ignore all errors or an option to specify the maximum number
of errors to tolerate before raising an ERROR. The second step would
be to support logging destinations such as server logs and tables.


[1] 
https://www.postgresql.org/message-id/752672.1699474336%40sss.pgh.pa.us


--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-12-17 Thread torikoshia

On 2023-12-15 05:48, Masahiko Sawada wrote:

Thanks for joining this discussion!

I've read this thread and the latest patch. IIUC with SAVE_ERROR
option, COPY FROM creates an error table for the target table and
writes error information there.

While I agree that the final shape of this feature would be something
like that design, I'm concerned some features are missing in order to
make this feature useful in practice. For instance, error logs are
inserted to error tables without bounds, meaning that users who want
to tolerate errors during COPY FROM will have to truncate or drop the
error tables periodically, or the database will grow with error logs
without limit. Ideally such maintenance work should be done by the
database. There might be some users who want to log such conversion
errors in server logs to avoid such maintenance work. I think we
should provide an option for where to write, at least. Also, since the
error tables are normal user tables internally, error logs are also
replicated to subscribers if there is a publication FOR ALL TABLES,
unlike system catalogs. I think some users would not like such
behavior.

Looking at SAVE_ERROR feature closely, I think it consists of two
separate features. That is, it enables COPY FROM to load data while
(1) tolerating errors and (2) logging errors to somewhere (i.e., an
error table). If we implement only (1), it would be like COPY FROM
tolerate errors infinitely and log errors to /dev/null. The user
cannot see the error details but I guess it could still help some
cases as Andres mentioned[1] (it might be a good idea to send the
number of rows successfully loaded in a NOTICE message if some rows
could not be loaded). Then with (2), COPY FROM can log error
information to somewhere such as tables and server logs and the user
can select it.

+1.
I may be biased since I wrote some ~v6 patches which just output the
soft errors and number of skipped rows to log, but I think just (1)
would be worth implementing as you pointed out and I like if users could
choose where to log output.

I think there would be situations where it is preferable to save errors
to server log even considering problems which were pointed out in [1],
i.e. manually loading data.

[1]
https://www.postgresql.org/message-id/739953.1699467519%40sss.pgh.pa.us

feature incrementally. The first step would be something like an
option to ignore all errors or an option to specify the maximum number
of errors to tolerate before raising an ERROR. The second step would
be to support logging destinations such as server logs and tables.

Regards,

[1]
https://www.postgresql.org/message-id/20231109002600.fuihn34bjqqgmbjm%40awork3.anarazel.de

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-12-17 Thread jian he

On Fri, Dec 15, 2023 at 4:49 AM Masahiko Sawada  wrote:
>
> Hi,
>
> I've read this thread and the latest patch. IIUC with SAVE_ERROR
> option, COPY FROM creates an error table for the target table and
> writes error information there.
>
> While I agree that the final shape of this feature would be something
> like that design, I'm concerned some features are missing in order to
> make this feature useful in practice. For instance, error logs are
> inserted to error tables without bounds, meaning that users who want
> to tolerate errors during COPY FROM  will have to truncate or drop the
> error tables periodically, or the database will grow with error logs
> without limit. Ideally such maintenance work should be done by the
> database. There might be some users who want to log such conversion
> errors in server logs to avoid such maintenance work. I think we
> should provide an option for where to write, at least. Also, since the
> error tables are normal user tables internally, error logs are also
> replicated to subscribers if there is a publication FOR ALL TABLES,
> unlike system catalogs. I think some users would not like such
> behavior.

save the error metadata to  system catalogs would be more expensive,
please see below explanation.
I have no knowledge of publications.
but i feel there is a feature request: publication FOR ALL TABLES
exclude regex_pattern.
Anyway, that would be another topic.

> Looking at SAVE_ERROR feature closely, I think it consists of two
> separate features. That is, it enables COPY FROM to load data while
> (1) tolerating errors and (2) logging errors to somewhere (i.e., an
> error table). If we implement only (1), it would be like COPY FROM
> tolerate errors infinitely and log errors to /dev/null. The user
> cannot see the error details but I guess it could still help some
> cases as Andres mentioned[1] (it might be a good idea to send the
> number of rows successfully loaded in a NOTICE message if some rows
> could not be loaded). Then with (2), COPY FROM can log error
> information to somewhere such as tables and server logs and the user
> can select it. So I'm thinking we may be able to implement this
> feature incrementally. The first step would be something like an
> option to ignore all errors or an option to specify the maximum number
> of errors to tolerate before raising an ERROR. The second step would
> be to support logging destinations such as server logs and tables.
>
> Regards,
>
> [1] 
> https://www.postgresql.org/message-id/20231109002600.fuihn34bjqqgmbjm%40awork3.anarazel.de
>
> --
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com

> feature incrementally. The first step would be something like an
> option to ignore all errors or an option to specify the maximum number
> of errors to tolerate before raising an ERROR. The second step would

I don't think "specify the maximum number  of errors to tolerate
before raising an ERROR." is very useful

QUOTE from [1]
MAXERROR [AS] error_count
If the load returns the error_count number of errors or greater, the
load fails. If the load returns fewer errors, it continues and returns
an INFO message that states the number of rows that could not be
loaded. Use this parameter to allow loads to continue when certain
rows fail to load into the table because of formatting errors or other
inconsistencies in the data.
Set this value to 0 or 1 if you want the load to fail as soon as the
first error occurs. The AS keyword is optional. The MAXERROR default
value is 0 and the limit is 10.
The actual number of errors reported might be greater than the
specified MAXERROR because of the parallel nature of Amazon Redshift.
If any node in the Amazon Redshift cluster detects that MAXERROR has
been exceeded, each node reports all of the errors it has encountered.
END OF QUOTE

option MAXERROR error_count. iiuc, it fails while validating line
error_count + 1, else it raises a notice, tells you how many rows have
errors.

* case when error_count is small, and the copy fails, it only tells
you that at least the error_count line has malformed data. but what if
the actual malformed rows are very big. In this case, this failure
error message is not that helpful.
* case when error_count is very big, and the copy does not fail. then
the actual malformed data rows are very big (still less than
error_count). but there is no error report, you don't know which line
has an error.

Either way, if the file has a large portion of malformed rows, then
the MAXERROR option does not make sense.
so maybe we don't need a threshold for tolerating errors.

however, we can have an option, not actually copy to the table, but
only validate, similar to NOLOAD in [1]

why we save the error:
* if only a small portion of malformed rows then saving the error
metadata would be cheap.
* if a large portion of malformed rows then  copy will be slow but we
saved the error metadata. Now you can fix it based on this error
metadata.

I think saving

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-12-14 Thread Masahiko Sawada

Hi,

On Tue, Dec 12, 2023 at 10:04 PM jian he  wrote:
>
> On Mon, Dec 11, 2023 at 10:05 PM Alena Rybakina
>  wrote:
> >
> > Hi! Thank you for your work. Your patch looks better!
> > Yes, thank you! It works fine, and I see that the regression tests have 
> > been passed. 
> > However, when I ran 'copy from with save_error' operation with simple csv 
> > files (copy_test.csv, copy_test1.csv) for tables test, test1 (how I created 
> > it, I described below):
> >
> > postgres=# create table test (x int primary key, y int not null);
> > postgres=# create table test1 (x int, z int, CONSTRAINT fk_x
> >   FOREIGN KEY(x)
> >   REFERENCES test(x));
> >
> > I did not find a table with saved errors after operation, although I 
> > received a log about it:
> >
> > postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV 
> > save_error
> > NOTICE:  2 rows were skipped because of error. skipped row saved to table 
> > public.test_error
> > ERROR:  duplicate key value violates unique constraint "test_pkey"
> > DETAIL:  Key (x)=(2) already exists.
> > CONTEXT:  COPY test, line 3
> >
> > postgres=# select * from public.test_error;
> > ERROR:  relation "public.test_error" does not exist
> > LINE 1: select * from public.test_error;
> >
> > postgres=# \copy test1 from '/home/alena/copy_test1.csv' DELIMITER ',' CSV 
> > save_error
> > NOTICE:  2 rows were skipped because of error. skipped row saved to table 
> > public.test1_error
> > ERROR:  insert or update on table "test1" violates foreign key constraint 
> > "fk_x"
> > DETAIL:  Key (x)=(2) is not present in table "test".
> >
> > postgres=# select * from public.test1_error;
> > ERROR:  relation "public.test1_error" does not exist
> > LINE 1: select * from public.test1_error;
> >
> > Two lines were written correctly in the csv files, therefore they should 
> > have been added to the tables, but they were not added to the tables test 
> > and test1.
> >
> > If I leave only the correct rows, everything works fine and the rows are 
> > added to the tables.
> >
> > in copy_test.csv:
> >
> > 2,0
> >
> > 1,1
> >
> > in copy_test1.csv:
> >
> > 2,0
> >
> > 2,1
> >
> > 1,1
> >
> > postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV
> > COPY 2
> > postgres=# \copy test1 from '/home/alena/copy_test1.csv' DELIMITER ',' CSV 
> > save_error
> > NOTICE:  No error happened.Error holding table public.test1_error will be 
> > droped
> > COPY 3
> >
> > Maybe I'm launching it the wrong way. If so, let me know about it.
>
> looks like the above is about constraints violation while copying.
> constraints violation while copying not in the scope of this patch.
>
> Since COPY FROM is very like the INSERT command,
> you do want all the valid constraints to check all the copied rows?
>
> but the notice raised by the patch is not right.
> So I place the drop error saving table or raise notice logic above
> `ExecResetTupleTable(estate->es_tupleTable, false)` in the function
> CopyFrom.
>
> >
> > I also notice interesting behavior if the table was previously created by 
> > the user. When I was creating an error_table before the 'copy from' 
> > operation,
> > I received a message saying that it is impossible to create a table with 
> > the same name (it is shown below) during the 'copy from' operation.
> > I think you should add information about this in the documentation, since 
> > this seems to be normal behavior to me.
> >
>
> doc changed. you may check it.

I've read this thread and the latest patch. IIUC with SAVE_ERROR
option, COPY FROM creates an error table for the target table and
writes error information there.

While I agree that the final shape of this feature would be something
like that design, I'm concerned some features are missing in order to
make this feature useful in practice. For instance, error logs are
inserted to error tables without bounds, meaning that users who want
to tolerate errors during COPY FROM  will have to truncate or drop the
error tables periodically, or the database will grow with error logs
without limit. Ideally such maintenance work should be done by the
database. There might be some users who want to log such conversion
errors in server logs to avoid such maintenance work. I think we
should provide an option for where to write, at least. Also, since the
error tables are normal user tables internally, error logs are also
replicated to subscribers if there is a publication FOR ALL TABLES,
unlike system catalogs. I think some users would not like such
behavior.

Looking at SAVE_ERROR feature closely, I think it consists of two
separate features. That is, it enables COPY FROM to load data while
(1) tolerating errors and (2) logging errors to somewhere (i.e., an
error table). If we implement only (1), it would be like COPY FROM
tolerate errors infinitely and log errors to /dev/null. The user
cannot see the error details but I guess it could still help some
cases as Andres mentioned[1] (it might be a good idea to

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-12-14 Thread Alena Rybakina


On 12.12.2023 16:04, jian he wrote:

On Mon, Dec 11, 2023 at 10:05 PM Alena Rybakina
  wrote:

Hi! Thank you for your work. Your patch looks better!
Yes, thank you! It works fine, and I see that the regression tests have been 
passed. 
However, when I ran 'copy from with save_error' operation with simple csv files 
(copy_test.csv, copy_test1.csv) for tables test, test1 (how I created it, I 
described below):

postgres=# create table test (x int primary key, y int not null);
postgres=# create table test1 (x int, z int, CONSTRAINT fk_x
   FOREIGN KEY(x)
   REFERENCES test(x));

I did not find a table with saved errors after operation, although I received a 
log about it:

postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV 
save_error
NOTICE:  2 rows were skipped because of error. skipped row saved to table 
public.test_error
ERROR:  duplicate key value violates unique constraint "test_pkey"
DETAIL:  Key (x)=(2) already exists.
CONTEXT:  COPY test, line 3

postgres=# select * from public.test_error;
ERROR:  relation "public.test_error" does not exist
LINE 1: select * from public.test_error;

postgres=# \copy test1 from '/home/alena/copy_test1.csv' DELIMITER ',' CSV 
save_error
NOTICE:  2 rows were skipped because of error. skipped row saved to table 
public.test1_error
ERROR:  insert or update on table "test1" violates foreign key constraint "fk_x"
DETAIL:  Key (x)=(2) is not present in table "test".

postgres=# select * from public.test1_error;
ERROR:  relation "public.test1_error" does not exist
LINE 1: select * from public.test1_error;

Two lines were written correctly in the csv files, therefore they should have 
been added to the tables, but they were not added to the tables test and test1.

If I leave only the correct rows, everything works fine and the rows are added 
to the tables.

in copy_test.csv:

2,0

1,1

in copy_test1.csv:

2,0

2,1

1,1

postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV
COPY 2
postgres=# \copy test1 from '/home/alena/copy_test1.csv' DELIMITER ',' CSV 
save_error
NOTICE:  No error happened.Error holding table public.test1_error will be droped
COPY 3

Maybe I'm launching it the wrong way. If so, let me know about it.

looks like the above is about constraints violation while copying.
constraints violation while copying not in the scope of this patch.

Since COPY FROM is very like the INSERT command,
you do want all the valid constraints to check all the copied rows?

No, I think it will be too much.

but the notice raised by the patch is not right.
So I place the drop error saving table or raise notice logic above
`ExecResetTupleTable(estate->es_tupleTable, false)` in the function
CopyFrom.

Yes, I see it and agree with you.

I also notice interesting behavior if the table was previously created by the 
user. When I was creating an error_table before the 'copy from' operation,
I received a message saying that it is impossible to create a table with the 
same name (it is shown below) during the 'copy from' operation.
I think you should add information about this in the documentation, since this 
seems to be normal behavior to me.


doc changed. you may check it.

Yes, I saw it. Thank you.

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-12-12 Thread jian he

On Mon, Dec 11, 2023 at 10:05 PM Alena Rybakina
 wrote:
>
> Hi! Thank you for your work. Your patch looks better!
> Yes, thank you! It works fine, and I see that the regression tests have been 
> passed. 
> However, when I ran 'copy from with save_error' operation with simple csv 
> files (copy_test.csv, copy_test1.csv) for tables test, test1 (how I created 
> it, I described below):
>
> postgres=# create table test (x int primary key, y int not null);
> postgres=# create table test1 (x int, z int, CONSTRAINT fk_x
>   FOREIGN KEY(x)
>   REFERENCES test(x));
>
> I did not find a table with saved errors after operation, although I received 
> a log about it:
>
> postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV 
> save_error
> NOTICE:  2 rows were skipped because of error. skipped row saved to table 
> public.test_error
> ERROR:  duplicate key value violates unique constraint "test_pkey"
> DETAIL:  Key (x)=(2) already exists.
> CONTEXT:  COPY test, line 3
>
> postgres=# select * from public.test_error;
> ERROR:  relation "public.test_error" does not exist
> LINE 1: select * from public.test_error;
>
> postgres=# \copy test1 from '/home/alena/copy_test1.csv' DELIMITER ',' CSV 
> save_error
> NOTICE:  2 rows were skipped because of error. skipped row saved to table 
> public.test1_error
> ERROR:  insert or update on table "test1" violates foreign key constraint 
> "fk_x"
> DETAIL:  Key (x)=(2) is not present in table "test".
>
> postgres=# select * from public.test1_error;
> ERROR:  relation "public.test1_error" does not exist
> LINE 1: select * from public.test1_error;
>
> Two lines were written correctly in the csv files, therefore they should have 
> been added to the tables, but they were not added to the tables test and 
> test1.
>
> If I leave only the correct rows, everything works fine and the rows are 
> added to the tables.
>
> in copy_test.csv:
>
> 2,0
>
> 1,1
>
> in copy_test1.csv:
>
> 2,0
>
> 2,1
>
> 1,1
>
> postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV
> COPY 2
> postgres=# \copy test1 from '/home/alena/copy_test1.csv' DELIMITER ',' CSV 
> save_error
> NOTICE:  No error happened.Error holding table public.test1_error will be 
> droped
> COPY 3
>
> Maybe I'm launching it the wrong way. If so, let me know about it.

looks like the above is about constraints violation while copying.
constraints violation while copying not in the scope of this patch.

Since COPY FROM is very like the INSERT command,
you do want all the valid constraints to check all the copied rows?

but the notice raised by the patch is not right.
So I place the drop error saving table or raise notice logic above
`ExecResetTupleTable(estate->es_tupleTable, false)` in the function
CopyFrom.

>
> I also notice interesting behavior if the table was previously created by the 
> user. When I was creating an error_table before the 'copy from' operation,
> I received a message saying that it is impossible to create a table with the 
> same name (it is shown below) during the 'copy from' operation.
> I think you should add information about this in the documentation, since 
> this seems to be normal behavior to me.
>

doc changed. you may check it.
From 3024bf3b727b728c58dfef41c62d7a93c083b887 Mon Sep 17 00:00:00 2001
From: pgaddict 
Date: Tue, 12 Dec 2023 20:58:45 +0800
Subject: [PATCH v11 1/1] Make COPY FROM more error tolerant

Currently COPY FROM has 3 types of error while processing the source file.
* extra data after last expected column
* missing data for column \"%s\"
* data type conversion error.

Instead of throwing errors while copying, save_error specifier will
save errors to a error saving table automatically.

We check the error saving table definition by column name and column data type.
if table already exists and meets the criteria then errors will save to that table.
if the table does not exist, then create one.

Only works for COPY FROM, non-BINARY mode.

While copying, if error never happened, error saving  table will be dropped at the ending of COPY FROM.
If the error saving table exists, meaning at least once COPY FROM errors has happened,
then all the future errors will be saved to that table.
We save the error related meta info to error saving table using SPI,
that is construct a query string, then execute the query.
---
 contrib/file_fdw/file_fdw.c  |   4 +-
 doc/src/sgml/ref/copy.sgml   | 100 +-
 src/backend/commands/copy.c  |  12 ++
 src/backend/commands/copyfrom.c  | 146 +++-
 src/backend/commands/copyfromparse.c | 169 +--
 src/backend/parser/gram.y|   8 +-
 src/bin/psql/tab-complete.c  |   3 +-
 src/include/commands/copy.h  |   3 +-
 src/include/commands/copyfrom_internal.h |   7 +
 src/include/parser/kwlist.h  |   1 +
 src/test/regress/expected/copy2.out  | 135 ++

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-12-11 Thread Alena Rybakina


Hi! Thank you for your work. Your patch looks better!

On 10.12.2023 13:32, jian he wrote:

On Fri, Dec 8, 2023 at 3:09 PM Alena Rybakina  wrote:

Thank you for your work. Unfortunately, your code contained errors during the 
make installation:

'SAVEPOINT' after 'SAVE_ERROR' in unreserved_keyword list is misplaced
'SAVEPOINT' after 'SAVE_ERROR' in bare_label_keyword list is misplaced
make[2]: *** [../../../src/Makefile.global:783: gram.c] Error 1
make[1]: *** [Makefile:131: parser/gram.h] Error 2
make[1]: *** Waiting for unfinished jobs
make: *** [src/Makefile.global:383: submake-generated-headers] Error 2

I have ubuntu 22.04 operation system.

On 06.12.2023 13:47, jian he wrote:

On Tue, Dec 5, 2023 at 6:07 PM Alena Rybakina  wrote:

Hi!

Thank you for your contribution to this thread.


I reviewed it and have a few questions.

1. I have seen that you delete a table before creating it, to which you want to add 
errors due to a failed "copy from" operation. I think this is wrong because 
this table can save useful data for the user.
At a minimum, we should warn the user about this, but I think we can just add 
some number at the end of the name, such as name_table1, name_table_2.

Sorry. I don't understand this part.
Currently, if the error table name already exists, then the copy will
fail, an error will be reported.
I try to first create a table, if no error then the error table will be dropped.

To be honest, first of all, I misunderstood this part of the code. Now I see 
that it works the way you mentioned.

However, I didn't see if you dealt with cases where we already had a table with 
the same name as the table error.
I mean, when is he trying to create for the first time, or will we never be 
able to face such a problem?

Can you demo the expected behavior?

Unfortunately, I was unable to launch it due to a build issue.


Hopefully attached will work.


Yes, thank you! It works fine, and I see that the regression tests have 
been passed. 



However, when I ran 'copy from with save_error' operation with simple 
csv files (copy_test.csv, copy_test1.csv) for tables test, test1 (how I 
created it, I described below):


postgres=# create table test (x int primary key, y int not null);
postgres=# create table test1 (x int, z int, CONSTRAINT fk_x
  FOREIGN KEY(x)
  REFERENCES test(x));

I did not find a table with saved errors after operation, although I 
received a log about it:


postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV 
save_error
NOTICE:  2 rows were skipped because of error. skipped row saved to 
table public.test_error

ERROR:  duplicate key value violates unique constraint "test_pkey"
DETAIL:  Key (x)=(2) already exists.
CONTEXT:  COPY test, line 3

postgres=# select * from public.test_error;
ERROR:  relation "public.test_error" does not exist
LINE 1: select * from public.test_error;

postgres=# \copy test1 from '/home/alena/copy_test1.csv' DELIMITER ',' 
CSV save_error
NOTICE:  2 rows were skipped because of error. skipped row saved to 
table public.test1_error
ERROR:  insert or update on table "test1" violates foreign key 
constraint "fk_x"

DETAIL:  Key (x)=(2) is not present in table "test".

postgres=# select * from public.test1_error;
ERROR:  relation "public.test1_error" does not exist
LINE 1: select * from public.test1_error;

Two lines were written correctly in the csv files, therefore they should 
have been added to the tables, but they were not added to the tables 
test and test1.


If I leave only the correct rows, everything works fine and the rows are 
added to the tables.


in copy_test.csv:

2,0

1,1

in copy_test1.csv:

2,0

2,1

1,1

postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV
COPY 2
postgres=# \copy test1 from '/home/alena/copy_test1.csv' DELIMITER ',' 
CSV save_error
NOTICE:  No error happened.Error holding table public.test1_error will 
be droped

COPY 3

Maybe I'm launching it the wrong way. If so, let me know about it.


I also notice interesting behavior if the table was previously created 
by the user. When I was creating an error_table before the 'copy from' 
operation,
I received a message saying that it is impossible to create a table with 
the same name (it is shown below) during the 'copy from' operation.
I think you should add information about this in the documentation, 
since this seems to be normal behavior to me.


postgres=# CREATE TABLE test_error (LINENO BIGINT, LINE TEXT,
FIELD TEXT, SOURCE TEXT, ERR_MESSAGE TEXT,
ERR_DETAIL TEXT, ERRORCODE TEXT);
CREATE TABLE
postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV 
save_error
ERROR:  Error save table public.test_error already exists. Cannot use it 
for COPY FROM error saving



2. I noticed that you are forming a table name using the type of errors that 
prevent rows from being added during 'copy from' operation.
I think it would be better to use the name of the source file that was used 
while 'copy from' was

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-12-10 Thread jian he

On Fri, Dec 8, 2023 at 3:09 PM Alena Rybakina  wrote:
>
> Thank you for your work. Unfortunately, your code contained errors during the 
> make installation:
>
> 'SAVEPOINT' after 'SAVE_ERROR' in unreserved_keyword list is misplaced
> 'SAVEPOINT' after 'SAVE_ERROR' in bare_label_keyword list is misplaced
> make[2]: *** [../../../src/Makefile.global:783: gram.c] Error 1
> make[1]: *** [Makefile:131: parser/gram.h] Error 2
> make[1]: *** Waiting for unfinished jobs
> make: *** [src/Makefile.global:383: submake-generated-headers] Error 2
>
> I have ubuntu 22.04 operation system.
>
> On 06.12.2023 13:47, jian he wrote:
>
> On Tue, Dec 5, 2023 at 6:07 PM Alena Rybakina  
> wrote:
>
> Hi!
>
> Thank you for your contribution to this thread.
>
>
> I reviewed it and have a few questions.
>
> 1. I have seen that you delete a table before creating it, to which you want 
> to add errors due to a failed "copy from" operation. I think this is wrong 
> because this table can save useful data for the user.
> At a minimum, we should warn the user about this, but I think we can just add 
> some number at the end of the name, such as name_table1, name_table_2.
>
> Sorry. I don't understand this part.
> Currently, if the error table name already exists, then the copy will
> fail, an error will be reported.
> I try to first create a table, if no error then the error table will be 
> dropped.
>
> To be honest, first of all, I misunderstood this part of the code. Now I see 
> that it works the way you mentioned.
>
> However, I didn't see if you dealt with cases where we already had a table 
> with the same name as the table error.
> I mean, when is he trying to create for the first time, or will we never be 
> able to face such a problem?
>
> Can you demo the expected behavior?
>
> Unfortunately, I was unable to launch it due to a build issue.
>

Hopefully attached will work.

> 2. I noticed that you are forming a table name using the type of errors that 
> prevent rows from being added during 'copy from' operation.
> I think it would be better to use the name of the source file that was used 
> while 'copy from' was running.
> In addition, there may be several such files, it is also worth considering.
>
> Another column added.
> now it looks like:
>
> SELECT * FROM save_error_csv_error;
>  filename | lineno |line
>  | field | source | err_message |
> err_detail | errorcode
> --+++---++-++---
>  STDIN|  1 | 2002232 40  50  60  70
> 80 | NULL  | NULL   | extra data after last expected column   |
> NULL   | 22P04
>  STDIN|  1 | 2000230 23
>  | d | NULL   | missing data for column "d" | NULL
>   | 22P04
>  STDIN|  1 | z,,""
>  | a | z  | invalid input syntax for type integer: "z"  | NULL
>   | 22P02
>  STDIN|  2 | \0,,
>  | a | \0 | invalid input syntax for type integer: "\0" | NULL
>   | 22P02
>
> Yes, I see the "filename" column, and this will solve the problem, but 
> "STDIN" is unclear to me.

please see comment in struct CopyFromStateData:
char*filename; /* filename, or NULL for STDIN */


>  */
>
> Maybe we can rewrite it like this:
>
> /* Check, the err_nsp.error_rel table has already existed
> * and if it is, check its column name and data types.
>
refactored.
From 2510dc2e2b13c60a5a7e184bf8e55325601d97e0 Mon Sep 17 00:00:00 2001
From: pgaddict 
Date: Sun, 10 Dec 2023 09:51:42 +0800
Subject: [PATCH v10 1/1] Make COPY FROM more error tolerant

Currently COPY FROM has 3 types of error while processing the source file.
* extra data after last expected column
* missing data for column \"%s\"
* data type conversion error.

Instead of throwing errors while copying, save_error will save errors to a table automatically.
We check the table definition via column name and column data type.
if table already exists and meets the criteria then errors will save to that table.
if the table does not exist, then create one.

Only works for COPY FROM, non-BINARY mode.

While copying, if error never happened, error save table will be dropped at the ending of COPY FROM.
If the error saving table already exists, meaning at least once COPY FROM errors has happened,
then all the future errors will be saved to that table.
we save the error to error saving table using SPI, construct a query, then execute the query.
---
 contrib/file_fdw/file_fdw.c  |   4 +-
 doc/src/sgml/ref/copy.sgml   |  93 +
 src/backend/commands/copy.c  |  12 ++
 src/backend/commands/copyfrom.c  | 146 +++-
 src/backend/commands/copyfromparse.c | 169 +--
 src/backend/parser/gram.y|   8 +-
 src/bin/psql/tab-complete.c  |   3 +-

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-12-07 Thread Alena Rybakina

Thank you for your work. Unfortunately, your code contained errors 
during the make installation:


'SAVEPOINT' after 'SAVE_ERROR' in unreserved_keyword list is misplaced
'SAVEPOINT' after 'SAVE_ERROR' in bare_label_keyword list is misplaced
make[2]: *** [../../../src/Makefile.global:783: gram.c] Error 1
make[1]: *** [Makefile:131: parser/gram.h] Error 2
make[1]: *** Waiting for unfinished jobs
make: *** [src/Makefile.global:383: submake-generated-headers] Error 2

I have ubuntu 22.04 operation system.

On 06.12.2023 13:47, jian he wrote:

On Tue, Dec 5, 2023 at 6:07 PM Alena Rybakina  wrote:

Hi!

Thank you for your contribution to this thread.


I reviewed it and have a few questions.

1. I have seen that you delete a table before creating it, to which you want to add 
errors due to a failed "copy from" operation. I think this is wrong because 
this table can save useful data for the user.
At a minimum, we should warn the user about this, but I think we can just add 
some number at the end of the name, such as name_table1, name_table_2.

Sorry. I don't understand this part.
Currently, if the error table name already exists, then the copy will
fail, an error will be reported.
I try to first create a table, if no error then the error table will be dropped.
To be honest, first of all, I misunderstood this part of the code. Now I 
see that it works the way you mentioned.


However, I didn't see if you dealt with cases where we already had a 
table with the same name as the table error.
I mean, when is he trying to create for the first time, or will we never 
be able to face such a problem?

Can you demo the expected behavior?

Unfortunately, I was unable to launch it due to a build issue.



2. I noticed that you are forming a table name using the type of errors that 
prevent rows from being added during 'copy from' operation.
I think it would be better to use the name of the source file that was used 
while 'copy from' was running.
In addition, there may be several such files, it is also worth considering.


Another column added.
now it looks like:

SELECT * FROM save_error_csv_error;
  filename | lineno |line
  | field | source | err_message |
err_detail | errorcode
--+++---++-++---
  STDIN|  1 | 2002232 40  50  60  70
80 | NULL  | NULL   | extra data after last expected column   |
NULL   | 22P04
  STDIN|  1 | 2000230 23
  | d | NULL   | missing data for column "d" | NULL
   | 22P04
  STDIN|  1 | z,,""
  | a | z  | invalid input syntax for type integer: "z"  | NULL
   | 22P02
  STDIN|  2 | \0,,
  | a | \0 | invalid input syntax for type integer: "\0" | NULL
   | 22P02

Yes, I see the "filename" column, and this will solve the problem, but 
"STDIN" is unclear to me.

3. I found spelling:

/* no err_nsp.error_rel table then crete one. for holding error. */


fixed.


4. Maybe rewrite this comment

these info need, no error will drop err_nsp.error_rel table
to:
this information is necessary, no error will lead to the deletion of the 
err_sp.error_rel table.


fixed.

Thank you.

5. Is this part of the comment needed? I think it duplicates the information 
below when we form the query.

  * . column list(order by attnum, begin from ctid) =
  *{ctid, lineno,line,field,source,err_message,err_detail,errorcode}
  * . data types (from attnum = -1) ={tid, int8,text,text,text,text,text,text}

I'm not sure if we need to order the rows by number. It might be easier to work 
with these lines in the order they appear.


Simplified the comment. "order by attnum" is to make sure that if
there is a table already existing, and the column name is like X and
the data type like Y, then we consider this table is good for holding
potential error info.

COPY FROM, main entry point is NextCopyFrom.
Now for non-binary mode, if you specified save_error then it will not
fail at NextCopyFrom.
all these three errors will be tolerated: extra data after last
expected column, missing data for column, data type conversion.

It looks clearer and better, thanks!

Comments in the format of questions are unusual for me, I perceive them 
to think about it, for example, as here (contrib/bloom/blinsert.c:312):


/*

 * Didn't find place to insert in notFullPage array.  Allocate new page.
 * (XXX is it good to do this while holding ex-lock on the metapage??)
 */

Maybe we can rewrite it like this:

/* Check, the err_nsp.error_rel table has already existed
* and if it is, check its column name and data types.

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-12-06 Thread jian he

On Tue, Dec 5, 2023 at 6:07 PM Alena Rybakina  wrote:
>
> Hi!
>
> Thank you for your contribution to this thread.
>
>
> I reviewed it and have a few questions.
>
> 1. I have seen that you delete a table before creating it, to which you want 
> to add errors due to a failed "copy from" operation. I think this is wrong 
> because this table can save useful data for the user.
> At a minimum, we should warn the user about this, but I think we can just add 
> some number at the end of the name, such as name_table1, name_table_2.

Sorry. I don't understand this part.
Currently, if the error table name already exists, then the copy will
fail, an error will be reported.
I try to first create a table, if no error then the error table will be dropped.
Can you demo the expected behavior?

> 2. I noticed that you are forming a table name using the type of errors that 
> prevent rows from being added during 'copy from' operation.
> I think it would be better to use the name of the source file that was used 
> while 'copy from' was running.
> In addition, there may be several such files, it is also worth considering.
>

Another column added.
now it looks like:

SELECT * FROM save_error_csv_error;
 filename | lineno |line
 | field | source | err_message |
err_detail | errorcode
--+++---++-++---
 STDIN|  1 | 2002232 40  50  60  70
80 | NULL  | NULL   | extra data after last expected column   |
NULL   | 22P04
 STDIN|  1 | 2000230 23
 | d | NULL   | missing data for column "d" | NULL
  | 22P04
 STDIN|  1 | z,,""
 | a | z  | invalid input syntax for type integer: "z"  | NULL
  | 22P02
 STDIN|  2 | \0,,
 | a | \0 | invalid input syntax for type integer: "\0" | NULL
  | 22P02


> 3. I found spelling:
>
> /* no err_nsp.error_rel table then crete one. for holding error. */
>

fixed.

> 4. Maybe rewrite this comment
>
> these info need, no error will drop err_nsp.error_rel table
> to:
> this information is necessary, no error will lead to the deletion of the 
> err_sp.error_rel table.
>

fixed.

> 5. Is this part of the comment needed? I think it duplicates the information 
> below when we form the query.
>
>  * . column list(order by attnum, begin from ctid) =
>  *{ctid, lineno,line,field,source,err_message,err_detail,errorcode}
>  * . data types (from attnum = -1) ={tid, int8,text,text,text,text,text,text}
>
> I'm not sure if we need to order the rows by number. It might be easier to 
> work with these lines in the order they appear.
>
Simplified the comment. "order by attnum" is to make sure that if
there is a table already existing, and the column name is like X and
the data type like Y, then we consider this table is good for holding
potential error info.

COPY FROM, main entry point is NextCopyFrom.
Now for non-binary mode, if you specified save_error then it will not
fail at NextCopyFrom.
all these three errors will be tolerated: extra data after last
expected column, missing data for column, data type conversion.
From 990e1e0f5130431cf32069963bb980bb0692ce0b Mon Sep 17 00:00:00 2001
From: pgaddict 
Date: Wed, 6 Dec 2023 18:26:32 +0800
Subject: [PATCH v9 1/1] Make COPY FROM more error tolerant

Currently COPY FROM has 3 types of error while processing the source file.
* extra data after last expected column
* missing data for column \"%s\"
* data type conversion error.

Instead of throwing errors while copying, save_error will save errors to a table automatically.
We check the table definition via column name and column data type.
if table already exists and meets the criteria then errors will save to that table.
if the table does not exist, then create one.

Only works for COPY FROM, non-BINARY mode.

While copying, if error never happened, error save table will be dropped at the ending of COPY FROM.
If the error saving table already exists, meaning at least once COPY FROM errors has happened,
then all the future errors will be saved to that table.
we save the error to error saving table using SPI, construct a query, then execute the query.
---
 contrib/file_fdw/file_fdw.c  |   4 +-
 doc/src/sgml/ref/copy.sgml   |  93 
 src/backend/commands/copy.c  |  12 ++
 src/backend/commands/copyfrom.c  | 147 ++-
 src/backend/commands/copyfromparse.c | 171 +--
 src/backend/parser/gram.y|   8 +-
 src/bin/psql/tab-complete.c  |   3 +-
 src/include/commands/copy.h  |   3 +-
 src/include/commands/copyfrom_internal.h |   7 +
 src/include/parser/kwlist.h  |   1 +
 src/test/regress/expected/copy2.out  | 135 ++
 src/test/regress/sql/copy2.sql   | 108

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-12-05 Thread Alena Rybakina


Hi!

Thank you for your contribution to this thread.

On 04.12.2023 05:23, jian he wrote:

hi.
here is my implementation based on previous discussions

add a new COPY FROM flag save_error.
save_error only works with non-BINARY flags.
save_error is easier for me to implement, if using "save error" I
worry, 2 words, gram.y will not work.
save_error also works other flag like {csv mode, force_null, force_not_null}

overall logic is:
if  save_error is specified then
if error_holding table not exists then create one
if error_holding table exists set error_firsttime to false.
if  save_error is not specified then work as master branch.

if errors happen then insert error info to error_holding table.
if errors do not exist and error_firsttime is true then drop the table.
if errors do not exist and error_firsttime is false then raise a
notice: All the past error holding saved at %s.%s

error holding table:
schema will be the same as COPY destination table.
the table name will be: COPY destination name concatenate with "_error".

error_holding table definition:
CREATE TABLE err_nsp.error_rel (LINENO BIGINT, LINE TEXT,
FIELD TEXT, SOURCE TEXT, ERR_MESSAGE TEXT,
ERR_DETAIL TEXT, ERRORCODE TEXT);

the following field is not implemented.
FIELDS  text[], separated, de-escaped string fields (the data that was
or would be fed to input functions)

because imagine following case:
create type test as (a int, b text);
create table copy_comp (c1 int, c2 test default '(11,test)', c3 date);
copy copy_comp from stdin with (default '\D');
1 \D '2022-07-04'
\.
table copy_comp;

I feel it's hard from textual '\D'  to get text[] `(11,test)` via SPI.
--
demo:

create table copy_default_error_save (
id integer,
text_value text not null default 'test',
ts_value timestamp without time zone not null default '2022-07-05'
);
copy copy_default_error_save from stdin with (save_error, default '\D');
k value '2022-07-04'
z \D '2022-07-03ASKL'
s \D \D
\.

NOTICE:  3 rows were skipped because of error. skipped row saved to
table public.copy_default_error_save_error
select  * from copy_default_error_save_error;
  lineno |   line   |  field   |  source
   | err_message |
err_detail | errorcode
+--+--+--+-++---
   1 | k   value   '2022-07-04' | id   | k
   | invalid input syntax for type integer: "k"  |
   | 22P02
   2 | z   \D  '2022-07-03ASKL' | id   | z
   | invalid input syntax for type integer: "z"  |
   | 22P02
   2 | z   \D  '2022-07-03ASKL' | ts_value |
'2022-07-03ASKL' | invalid input syntax for type timestamp:
"'2022-07-03ASKL'" || 22007
   3 | s   \D  \D   | id   | s
   | invalid input syntax for type integer: "s"  |
   | 22P02
(4 rows)

The doc is not so good.

COPY FROM (save_error),  it will not be as fast as COPY FROM (save_error false).
With save_error, we can only use InputFunctionCallSafe, which I
believe is not as fast as InputFunctionCall.
If any conversion error happens, we need to call the SPI interface,
that would add more overhead. also we can only insert error cases row
by row. (maybe we can insert to error_save values(error1), (error2).
(I will try later)...

The main code is about constructing SPI query, and test and test output.

I reviewed it and have a few questions.

1. I have seen that you delete a table before creating it, to which you 
want to add errors due to a failed "copy from" operation. I think this 
is wrong because this table can save useful data for the user.
At a minimum, we should warn the user about this, but I think we can 
just add some number at the end of the name, such as name_table1, 
name_table_2.


2. I noticed that you are forming a table name using the type of errors 
that prevent rows from being added during 'copy from' operation.
I think it would be better to use the name of the source file that was 
used while 'copy from' was running.

In addition, there may be several such files, it is also worth considering.

3. I found spelling:

/* no err_nsp.error_rel table then crete one. for holding error. */

4. Maybe rewrite this comment

these info need, no error will drop err_nsp.error_rel table
to:
this information is necessary, no error will lead to the deletion of the 
err_sp.error_rel table.


5. Is this part of the comment needed? I think it duplicates the 
information below when we form the query.


 * . column list(order by attnum, begin from ctid) =
 *    {ctid, lineno,line,field,source,err_message,err_detail,errorcode}
 * . data types (from attnum = -1) ={tid, 
int8,text,text,text,text,text,text}


I'm not sure if we need to order the rows by number. It might be easier 
to work with

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-12-03 Thread jian he

hi.
here is my implementation based on previous discussions

add a new COPY FROM flag save_error.
save_error only works with non-BINARY flags.
save_error is easier for me to implement, if using "save error" I
worry, 2 words, gram.y will not work.
save_error also works other flag like {csv mode, force_null, force_not_null}

overall logic is:
if  save_error is specified then
   if error_holding table not exists then create one
   if error_holding table exists set error_firsttime to false.
if  save_error is not specified then work as master branch.

if errors happen then insert error info to error_holding table.
if errors do not exist and error_firsttime is true then drop the table.
if errors do not exist and error_firsttime is false then raise a
notice: All the past error holding saved at %s.%s

error holding table:
schema will be the same as COPY destination table.
the table name will be: COPY destination name concatenate with "_error".

error_holding table definition:
CREATE TABLE err_nsp.error_rel (LINENO BIGINT, LINE TEXT,
FIELD TEXT, SOURCE TEXT, ERR_MESSAGE TEXT,
ERR_DETAIL TEXT, ERRORCODE TEXT);

the following field is not implemented.
FIELDS  text[], separated, de-escaped string fields (the data that was
or would be fed to input functions)

because imagine following case:
create type test as (a int, b text);
create table copy_comp (c1 int, c2 test default '(11,test)', c3 date);
copy copy_comp from stdin with (default '\D');
1 \D '2022-07-04'
\.
table copy_comp;

I feel it's hard from textual '\D'  to get text[] `(11,test)` via SPI.
--
demo:

create table copy_default_error_save (
id integer,
text_value text not null default 'test',
ts_value timestamp without time zone not null default '2022-07-05'
);
copy copy_default_error_save from stdin with (save_error, default '\D');
k value '2022-07-04'
z \D '2022-07-03ASKL'
s \D \D
\.

NOTICE:  3 rows were skipped because of error. skipped row saved to
table public.copy_default_error_save_error
select  * from copy_default_error_save_error;
 lineno |   line   |  field   |  source
  | err_message |
err_detail | errorcode
+--+--+--+-++---
  1 | k   value   '2022-07-04' | id   | k
  | invalid input syntax for type integer: "k"  |
  | 22P02
  2 | z   \D  '2022-07-03ASKL' | id   | z
  | invalid input syntax for type integer: "z"  |
  | 22P02
  2 | z   \D  '2022-07-03ASKL' | ts_value |
'2022-07-03ASKL' | invalid input syntax for type timestamp:
"'2022-07-03ASKL'" || 22007
  3 | s   \D  \D   | id   | s
  | invalid input syntax for type integer: "s"  |
  | 22P02
(4 rows)

The doc is not so good.

COPY FROM (save_error),  it will not be as fast as COPY FROM (save_error false).
With save_error, we can only use InputFunctionCallSafe, which I
believe is not as fast as InputFunctionCall.
If any conversion error happens, we need to call the SPI interface,
that would add more overhead. also we can only insert error cases row
by row. (maybe we can insert to error_save values(error1), (error2).
(I will try later)...

The main code is about constructing SPI query, and test and test output.
From 7aeb55cb0c8b1b36fd5c468fee0b07d4c13d1a7d Mon Sep 17 00:00:00 2001
From: pgaddict 
Date: Sun, 3 Dec 2023 22:58:40 +0800
Subject: [PATCH v8 1/1] Add a new COPY option: SAVE_ERROR. Only works for COPY
 FROM, non-BINARY mode.

Currently NextCopyFrom can have 3 errors reported.
* extra data after last expected column
* missing data for column \"%s\"
* main function InputFunctionCall inside error.

Currently, we only deal with InputFunctionCall errors only.
instead of throw error while copying, save_error will save errors to a table automatically.
We check the table definition via column name and column data type.
if table already exists and meets the condition then errors will save to that table.
While copying, if error never happened, error save table will be dropped at the ending of COPY.

If the error saving table already exists,
meaning at least once COPY FROM errors had happened,
then all the future error will save to that table.
---
 contrib/file_fdw/file_fdw.c  |   4 +-
 doc/src/sgml/ref/copy.sgml   |  88 +
 src/backend/commands/copy.c  |  12 ++
 src/backend/commands/copyfrom.c  | 151 ++-
 src/backend/commands/copyfromparse.c |  89 -
 src/backend/parser/gram.y|   8 +-
 src/bin/psql/tab-complete.c  |   3 +-
 src/include/commands/copy.h  |   3 +-
 src/include/commands/copyfrom_internal.h |   7 ++
 src/include/parser/kwlist.h  |   1 +

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-11-23 Thread Andrei Lepikhov


On 14/11/2023 17:10, Damir Belyalov wrote:

  Here is a very straw-man-level sketch of what I think might work.
  The option to COPY FROM looks something like

       ERRORS TO other_table_name (item [, item [, ...]])


I tried to implement the patch using a table and came across a number of 
questions.


Which table should we implement for this feature: a system catalog table 
or store this table as a file or create a new table?


In these cases, security and user rights management issues arise.
It is better for other users not to see error lines from another user. 
It is also not clear how access rights to this table are inherited and 
be given.


Previous reviews have given helpful ideas about storing errors in the 
new table.
It should be trivial code - use the current table name + 'err' + suffix 
as we already do in the case of conflicting auto-generated index names.
The 'errors table' must inherit any right policies from the table, to 
which we do the copy.


--
regards,
Andrei Lepikhov
Postgres Professional

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-11-15 Thread jian he

On Thu, Nov 9, 2023 at 4:12 AM Tom Lane  wrote:
>
> Daniel Gustafsson  writes:
> >> On 8 Nov 2023, at 19:18, Tom Lane  wrote:
> >> I think an actually usable feature of this sort would involve
> >> copying all the failed lines to some alternate output medium,
> >> perhaps a second table with a TEXT column to receive the original
> >> data line.  (Or maybe an array of text that could receive the
> >> broken-down field values?)  Maybe we could dump the message info,
> >> line number, field name etc into additional columns.
>
> > I agree that the errors should be easily visible to the user in some way.  
> > The
> > feature is for sure interesting, especially in data warehouse type jobs 
> > where
> > dirty data is often ingested.
>
> I agree it's interesting, but we need to get it right the first time.
>
> Here is a very straw-man-level sketch of what I think might work.
> The option to COPY FROM looks something like
>
> ERRORS TO other_table_name (item [, item [, ...]])
>
> where the "items" are keywords identifying the information item
> we will insert into each successive column of the target table.
> This design allows the user to decide which items are of use
> to them.  I envision items like
>
> LINENO  bigint  COPY line number, counting from 1
> LINEtextraw text of line (after encoding conversion)
> FIELDS  text[]  separated, de-escaped string fields (the data
> that was or would be fed to input functions)
> FIELD   textname of troublesome field, if field-specific
> MESSAGE texterror message text
> DETAIL  texterror message detail, if any
> SQLSTATE text   error SQLSTATE code
>

just
SAVE ERRORS

automatically create a table to hold the error. (validate
auto-generated table name uniqueness, validate create privilege).
and the table will have the above related info. if no error then table
gets dropped.

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-11-14 Thread zhihuifan1213

Damir Belyalov  writes:

>   Here is a very straw-man-level sketch of what I think might work.
>   The option to COPY FROM looks something like
>
>ERRORS TO other_table_name (item [, item [, ...]])
>
> I tried to implement the patch using a table and came across a number of 
> questions.
>
> Which table should we implement for this feature: a system catalog table or 
> store this table as a file or create a new
> table?

I think system catalog should not be a option at the first place since
it requires more extra workload to do.  see the calls of
IsCatalogRelation in heapam.c.

I prefer to create a new normal heap relation rather than a file since
heap realtion probabaly have better APIs. 

> In these cases, security and user rights management issues arise.
> It is better for other users not to see error lines from another
> user. It is also not clear how access rights to this 
> table are inherited and be given.

How about creating the table just allowing the current user to
read/write or just same as the relation we are copying to? 

-- 
Best Regards
Andy Fan

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-11-14 Thread Alena Rybakina


Hi!

On 14.11.2023 13:10, Damir Belyalov wrote:


 Here is a very straw-man-level sketch of what I think might work.
 The option to COPY FROM looks something like

      ERRORS TO other_table_name (item [, item [, ...]])


I tried to implement the patch using a table and came across a number 
of questions.


Which table should we implement for this feature: a system catalog 
table or store this table as a file or create a new table?


In these cases, security and user rights management issues arise.
It is better for other users not to see error lines from another user. 
It is also not clear how access rights to this table are inherited and 
be given.



Maybe we can add a guc or a parameter to output such errors during the 
execution of the copy function with errors and check whether the user 
has enough rights to set such a parameter?


That is, I propose to give the user a choice to run copy with and 
without saving errors and at the same time immediately check whether the 
option with error output is possible for him in principle?


--
Regards,
Alena Rybakina

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-11-14 Thread Damir Belyalov

>  Here is a very straw-man-level sketch of what I think might work.
>  The option to COPY FROM looks something like
>
>   ERRORS TO other_table_name (item [, item [, ...]])
>

I tried to implement the patch using a table and came across a number of
questions.

Which table should we implement for this feature: a system catalog table or
store this table as a file or create a new table?

In these cases, security and user rights management issues arise.
It is better for other users not to see error lines from another user. It
is also not clear how access rights to this table are inherited and be
given.


--
Regards,
Damir Belyalov
Postgres Professional

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-11-08 Thread zhihuifan1213



Tom Lane  writes:

> Daniel Gustafsson  writes:
>>> On 8 Nov 2023, at 19:18, Tom Lane  wrote:
>>> I think an actually usable feature of this sort would involve
>>> copying all the failed lines to some alternate output medium,
>>> perhaps a second table with a TEXT column to receive the original
>>> data line.  (Or maybe an array of text that could receive the
>>> broken-down field values?)  Maybe we could dump the message info,
>>> line number, field name etc into additional columns.
>
>> I agree that the errors should be easily visible to the user in some way.  
>> The
>> feature is for sure interesting, especially in data warehouse type jobs where
>> dirty data is often ingested.
>
> I agree it's interesting, but we need to get it right the first time.
>
> Here is a very straw-man-level sketch of what I think might work.
> The option to COPY FROM looks something like
>
>   ERRORS TO other_table_name (item [, item [, ...]])
>
> where the "items" are keywords identifying the information item
> we will insert into each successive column of the target table.
> This design allows the user to decide which items are of use
> to them.  I envision items like

While I'm pretty happy with the overall design, which is 'ERRORS to
other_table_name' specially. I'm a bit confused why do we need to
write the codes for (item [, item [, ...]]), not only because it
requires more coding but also requires user to make more decisions.
will it be anything wrong to make all of them as default? 

> LINENObigint  COPY line number, counting from 1
> LINE  textraw text of line (after encoding conversion)
> FIELDStext[]  separated, de-escaped string fields (the data
>   that was or would be fed to input functions)
> FIELD textname of troublesome field, if field-specific
> MESSAGE   texterror message text
> DETAILtexterror message detail, if any
> SQLSTATE text error SQLSTATE code
>


-- 
Best Regards
Andy Fan

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-11-08 Thread Damir Belyalov

Hello everyone!
Thanks for turning back to this patch.


I had already thought about storing errors in the table / separate file /
logfile and it seems to me that the best way is to output errors in
logfile. As for user it is more convenient to look for errors in the place
where they are usually generated - in logfile and if he wants to intercept
them he could easily do that by few commands.


The analogues of this feature in other DBSM usually had additional files
for storing errors, but their features had too many options (see attached
files).
I also think that the best way is to simplify this feature for the first
version and don't use redundant adjustments such as additional files and
other options.
IMHO for more complicated operations with loading tables files pgloader
exists: https://github.com/dimitri/pgloader


Links of analogues of COPY IGNORE_DATATYPE_ERRORS
https://dev.mysql.com/doc/refman/8.0/en/load-data.html
https://docs.aws.amazon.com/redshift/latest/dg/r_COPY_command_examples.html

Regards,
Damir Belyalov
Postgres Professional

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-11-08 Thread Andres Freund

Hi,

On 2023-11-08 19:00:01 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2023-11-08 13:18:39 -0500, Tom Lane wrote:
> >> I think an actually usable feature of this sort would involve
> >> copying all the failed lines to some alternate output medium,
> >> perhaps a second table with a TEXT column to receive the original
> >> data line.
> 
> > If we go in that direction, we should make it possible to *not* use such a
> > table as well, for some uses it'd be pointless.
> 
> Why?  You can always just drop the errors table if you don't want it.

I think it'll often just end up littering the database, particularly if the
callers don't care about a few errors.

> But I fail to see the use-case for ignoring errors altogether.

My experience is that there's often a few errors due to bad encoding, missing
escaping etc that you don't care sufficiently about when importing large
quantities of data.

Greetings,

Andres Freund

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-11-08 Thread Tom Lane

Andres Freund  writes:
> On 2023-11-08 13:18:39 -0500, Tom Lane wrote:
>> I think an actually usable feature of this sort would involve
>> copying all the failed lines to some alternate output medium,
>> perhaps a second table with a TEXT column to receive the original
>> data line.

> If we go in that direction, we should make it possible to *not* use such a
> table as well, for some uses it'd be pointless.

Why?  You can always just drop the errors table if you don't want it.
But I fail to see the use-case for ignoring errors altogether.

> Another way of reporting errors could be for copy to return invalid input back
> to the client, via the copy protocol.

Color me skeptical.  There are approximately zero clients in the
world today that could handle simultaneous return of data during
a COPY.  Certainly neither libpq nor psql are within hailing
distance of being able to support that.  Maybe in some far
future it could be made to work --- but if you want it in the v1
patch, you just moved the goalposts into the next county.

regards, tom lane

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-11-08 Thread Andres Freund

Hi,

On 2023-11-08 13:18:39 -0500, Tom Lane wrote:
> Damir  writes:
> > [ v7-0002-Add-new-COPY-option-IGNORE_DATATYPE_ERRORS.patch ]
> 
> Sorry for being so late to the party, but ... I don't think this
> is a well-designed feature as it stands.  Simply dropping failed rows
> seems like an unusable definition for any application that has
> pretensions of robustness.

Not everything needs to be a robust application though. I've definitely cursed
at postgres for lacking this.

> I think an actually usable feature of this sort would involve
> copying all the failed lines to some alternate output medium,
> perhaps a second table with a TEXT column to receive the original
> data line.  (Or maybe an array of text that could receive the
> broken-down field values?)  Maybe we could dump the message info,
> line number, field name etc into additional columns.

If we go in that direction, we should make it possible to *not* use such a
table as well, for some uses it'd be pointless.

Another way of reporting errors could be for copy to return invalid input back
to the client, via the copy protocol. That would allow the client to handle
failing rows and also to abort if the number of errors or the type of errors
gets to be too big.

Greetings,

Andres Freund

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-11-08 Thread Tom Lane

Daniel Gustafsson  writes:
>> On 8 Nov 2023, at 19:18, Tom Lane  wrote:
>> I think an actually usable feature of this sort would involve
>> copying all the failed lines to some alternate output medium,
>> perhaps a second table with a TEXT column to receive the original
>> data line.  (Or maybe an array of text that could receive the
>> broken-down field values?)  Maybe we could dump the message info,
>> line number, field name etc into additional columns.

> I agree that the errors should be easily visible to the user in some way.  The
> feature is for sure interesting, especially in data warehouse type jobs where
> dirty data is often ingested.

I agree it's interesting, but we need to get it right the first time.

Here is a very straw-man-level sketch of what I think might work.
The option to COPY FROM looks something like

ERRORS TO other_table_name (item [, item [, ...]])

where the "items" are keywords identifying the information item
we will insert into each successive column of the target table.
This design allows the user to decide which items are of use
to them.  I envision items like

LINENO  bigint  COPY line number, counting from 1
LINEtextraw text of line (after encoding conversion)
FIELDS  text[]  separated, de-escaped string fields (the data
that was or would be fed to input functions)
FIELD   textname of troublesome field, if field-specific
MESSAGE texterror message text
DETAIL  texterror message detail, if any
SQLSTATE text   error SQLSTATE code

Some of these would have to be populated as NULL if we didn't get
that far in processing the line.  In the worst case, which is
encoding conversion failure, I think we couldn't populate any of
the data items except LINENO.

Not sure if we need to insist that the target table columns be
exactly the data types I show above.  It'd be nice to allow
the LINENO target to be plain int, perhaps.  OTOH, do we really
want to have to deal with issues like conversion failures while
trying to report an error?

> As a data point, Greenplum has this feature with additional SQL syntax to
> control it:
>   COPY .. LOG ERRORS SEGMENT REJECT LIMIT xyz ROWS;
> LOG ERRORS instructs the database to log the faulty rows and SEGMENT REJECT
> LIMIT xyz ROWS sets the limit of how many rows can be faulty before the
> operation errors out.  I'm not at all advocating that we should mimic this,
> just wanted to add a reference to postgres derivative where this has been
> implemented.

Hm.  A "reject limit" might be a useful add-on, but I wouldn't advocate
including it in the initial patch.

regards, tom lane

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-11-08 Thread Daniel Gustafsson

> On 8 Nov 2023, at 19:18, Tom Lane  wrote:

> I think an actually usable feature of this sort would involve
> copying all the failed lines to some alternate output medium,
> perhaps a second table with a TEXT column to receive the original
> data line.  (Or maybe an array of text that could receive the
> broken-down field values?)  Maybe we could dump the message info,
> line number, field name etc into additional columns.

I agree that the errors should be easily visible to the user in some way.  The
feature is for sure interesting, especially in data warehouse type jobs where
dirty data is often ingested.

As a data point, Greenplum has this feature with additional SQL syntax to
control it:

COPY .. LOG ERRORS SEGMENT REJECT LIMIT xyz ROWS;

LOG ERRORS instructs the database to log the faulty rows and SEGMENT REJECT
LIMIT xyz ROWS sets the limit of how many rows can be faulty before the
operation errors out.  I'm not at all advocating that we should mimic this,
just wanted to add a reference to postgres derivative where this has been
implemented.

--
Daniel Gustafsson

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-11-08 Thread Tom Lane

Damir  writes:
> [ v7-0002-Add-new-COPY-option-IGNORE_DATATYPE_ERRORS.patch ]

Sorry for being so late to the party, but ... I don't think this
is a well-designed feature as it stands.  Simply dropping failed rows
seems like an unusable definition for any application that has
pretensions of robustness.  "But", you say, "we're emitting WARNING
messages about it".  That's *useless*.  For most applications WARNING
messages just go into the bit bucket, or worse they cause memory leaks
(because the app never reads them).  An app that tried to read them
would have to cope with all sorts of fun such as translated messages.
Furthermore, as best I can tell from the provided test cases, the
messages completely lack basic context such as which field or line
the problem occurred in.  An app trying to use this to understand
which input lines had failed would not get far.

I think an actually usable feature of this sort would involve
copying all the failed lines to some alternate output medium,
perhaps a second table with a TEXT column to receive the original
data line.  (Or maybe an array of text that could receive the
broken-down field values?)  Maybe we could dump the message info,
line number, field name etc into additional columns.

Also it'd be a good idea to have a vision of how the feature
could be extended to cope with lower-level errors, such as
lines that have the wrong number of columns or other problems
with line-level syntax.  I don't say we need to cope with that
immediately, but it's going to be something people will want
to add, I think.

regards, tom lane

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-09-20 Thread Damir

Although v7 patch doesn't have commit messages on the patch, I think 
leave commit message is good for reviewers.


Sure, didn't notice it. Added the commit message to the updated patch.



  * Note: DON'T convert this error to "soft" style (errsave/ereturn). We
  * want this data type to stay permanently in the hard-error world 
so that

  * it can be used for testing that such cases still work reasonably.


From this point of view, I think this is a supposed way of using widget.


I agree, it's a good approach for checking datatype errors, because 
that's what was intended.



OTOH widget is declared in create_type.sql and I'm not sure it's ok to 
use it in another test copy2.sql.


I think that other regress tests with 'widget' type that will be created 
in the future can be not only in the create_type.sql. So it's not a 
problem that some type or function is taken from another regress test. 
For example, the table 'onek' is used in many regress tests.



Regards,

Damir Belyalov

Postgres Professional
From 0e1193e00bb5ee810a015a2baaf7c79e395a54c7 Mon Sep 17 00:00:00 2001
From: Damir Belyalov 
Date: Fri, 15 Sep 2023 11:14:57 +0300
Subject: [PATCH v7] Add new COPY option IGNORE_DATATYPE_ERRORS

Currently entire COPY fails even when there is one unexpected data
regarding data type or range.
IGNORE_DATATYPE_ERRORS ignores these errors and skips them and COPY
data which don't contain problem.

This patch uses the soft error handling infrastructure, which is
introduced by d9f7f5d32f20.

Author: Damir Belyalov, Atsushi Torikoshi

---
 doc/src/sgml/ref/copy.sgml   | 13 +
 src/backend/commands/copy.c  | 13 +
 src/backend/commands/copyfrom.c  | 37 
 src/backend/commands/copyfromparse.c | 20 ++---
 src/bin/psql/tab-complete.c  |  3 +-
 src/include/commands/copy.h  |  1 +
 src/include/commands/copyfrom_internal.h |  3 ++
 src/test/regress/expected/copy2.out  | 28 ++
 src/test/regress/sql/copy2.sql   | 26 +
 9 files changed, 139 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 4d614a0225..d5cdbb4025 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,6 +43,7 @@ COPY { table_name [ ( column_name [, ...] ) | * }
 FORCE_NOT_NULL ( column_name [, ...] )
 FORCE_NULL ( column_name [, ...] )
+IGNORE_DATATYPE_ERRORS [ boolean ]
 ENCODING 'encoding_name'
 
  
@@ -370,6 +371,18 @@ COPY { table_name [ ( 

 
+   
+IGNORE_DATATYPE_ERRORS
+
+ 
+  Drops rows that contain malformed data while copying. These are rows
+  with columns where the data type's input-function raises an error.
+  This option is not allowed when using binary format.  Note that this
+  is only supported in current COPY syntax.
+ 
+
+   
+

 ENCODING
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f14fae3308..beb73f5357 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -419,6 +419,7 @@ ProcessCopyOptions(ParseState *pstate,
 	bool		format_specified = false;
 	bool		freeze_specified = false;
 	bool		header_specified = false;
+	bool		ignore_datatype_errors_specified = false;
 	ListCell   *option;
 
 	/* Support external use for option sanity checking */
@@ -458,6 +459,13 @@ ProcessCopyOptions(ParseState *pstate,
 			freeze_specified = true;
 			opts_out->freeze = defGetBoolean(defel);
 		}
+		else if (strcmp(defel->defname, "ignore_datatype_errors") == 0)
+		{
+			if (ignore_datatype_errors_specified)
+errorConflictingDefElem(defel, pstate);
+			ignore_datatype_errors_specified = true;
+			opts_out->ignore_datatype_errors = defGetBoolean(defel);
+		}
 		else if (strcmp(defel->defname, "delimiter") == 0)
 		{
 			if (opts_out->delim)
@@ -594,6 +602,11 @@ ProcessCopyOptions(ParseState *pstate,
 (errcode(ERRCODE_SYNTAX_ERROR),
  errmsg("cannot specify DEFAULT in BINARY mode")));
 
+	if (opts_out->binary && opts_out->ignore_datatype_errors)
+		ereport(ERROR,
+(errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify IGNORE_DATATYPE_ERRORS in BINARY mode")));
+
 	/* Set defaults for omitted options */
 	if (!opts_out->delim)
 		opts_out->delim = opts_out->csv_mode ? "," : "\t";
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 70871ed819..b18aea6376 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -752,6 +752,14 @@ CopyFrom(CopyFromState cstate)
 		ti_options |= TABLE_INSERT_FROZEN;
 	}
 
+	/* Set up soft error handler for IGNORE_DATATYPE_ERRORS */
+	if (cstate->opts.ignore_datatype_errors)
+	{
+		ErrorSaveContext escontext = {T_ErrorSaveContext};
+		escontext.details_wanted = true;
+		cstate->escontext = escontext;
+	}
+
 	/*
 	 * We need a ResultRelInfo so we can use the regular executor's
 	 * index-entry-making

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-09-19 Thread torikoshia


On 2023-09-15 19:02, Damir Belyalov wrote:

Since v5 patch failed applying anymore, updated the patch.


Thank you for updating the patch . I made a little review on it where
corrected some formatting.



Thanks for your review and update!
I don't have objections the modification of the codes and comments.
Although v7 patch doesn't have commit messages on the patch, I think 
leave commit message is good for reviewers.



- COPY with a datatype error that can't be handled as a soft error


I didn't know proper way to test this, but I've found data type
widget's
input function widget_in() defined to occur hard-error in regress.c,
attached patch added a test using it.


This test seems to be weird a bit, because of the "widget" type. The
hard error is thrown by the previous test with missing data. Also
it'll be interesting for me to list all cases when a hard error can be
thrown.


Although missing data error is hard error, the suggestion from Andres 
was adding `dataype` error:



- COPY with a datatype error that can't be handled as a soft error


As described in widghet_in(), widget is intentionally left emitting hard 
error for testing purpose:


  * Note: DON'T convert this error to "soft" style (errsave/ereturn).  
We
  * want this data type to stay permanently in the hard-error world so 
that

  * it can be used for testing that such cases still work reasonably.


From this point of view, I think this is a supposed way of using widget.
OTOH widget is declared in create_type.sql and I'm not sure it's ok to 
use it in another test copy2.sql.


--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-09-15 Thread Damir Belyalov

> Since v5 patch failed applying anymore, updated the patch.
Thank you for updating the patch . I made a little review on it where
corrected some formatting.


> - COPY with a datatype error that can't be handled as a soft error
>
> I didn't know proper way to test this, but I've found data type widget's
> input function widget_in() defined to occur hard-error in regress.c,
> attached patch added a test using it.
>

This test seems to be weird a bit, because of the "widget" type. The hard
error is thrown by the previous test with missing data. Also it'll be
interesting for me to list all cases when a hard error can be thrown.

Regards,
Damir Belyalov
Postgres Professional
From 0e1193e00bb5ee810a015a2baaf7c79e395a54c7 Mon Sep 17 00:00:00 2001
From: Damir Belyalov 
Date: Fri, 15 Sep 2023 11:14:57 +0300
Subject: [PATCH] ignore errors

---
 doc/src/sgml/ref/copy.sgml   | 13 +
 src/backend/commands/copy.c  | 13 +
 src/backend/commands/copyfrom.c  | 37 
 src/backend/commands/copyfromparse.c | 20 ++---
 src/bin/psql/tab-complete.c  |  3 +-
 src/include/commands/copy.h  |  1 +
 src/include/commands/copyfrom_internal.h |  3 ++
 src/test/regress/expected/copy2.out  | 28 ++
 src/test/regress/sql/copy2.sql   | 26 +
 9 files changed, 139 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 4d614a0225..d5cdbb4025 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,6 +43,7 @@ COPY { table_name [ ( column_name [, ...] ) | * }
 FORCE_NOT_NULL ( column_name [, ...] )
 FORCE_NULL ( column_name [, ...] )
+IGNORE_DATATYPE_ERRORS [ boolean ]
 ENCODING 'encoding_name'
 
  
@@ -370,6 +371,18 @@ COPY { table_name [ ( 

 
+   
+IGNORE_DATATYPE_ERRORS
+
+ 
+  Drops rows that contain malformed data while copying. These are rows
+  with columns where the data type's input-function raises an error.
+  This option is not allowed when using binary format.  Note that this
+  is only supported in current COPY syntax.
+ 
+
+   
+

 ENCODING
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f14fae3308..beb73f5357 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -419,6 +419,7 @@ ProcessCopyOptions(ParseState *pstate,
 	bool		format_specified = false;
 	bool		freeze_specified = false;
 	bool		header_specified = false;
+	bool		ignore_datatype_errors_specified = false;
 	ListCell   *option;
 
 	/* Support external use for option sanity checking */
@@ -458,6 +459,13 @@ ProcessCopyOptions(ParseState *pstate,
 			freeze_specified = true;
 			opts_out->freeze = defGetBoolean(defel);
 		}
+		else if (strcmp(defel->defname, "ignore_datatype_errors") == 0)
+		{
+			if (ignore_datatype_errors_specified)
+errorConflictingDefElem(defel, pstate);
+			ignore_datatype_errors_specified = true;
+			opts_out->ignore_datatype_errors = defGetBoolean(defel);
+		}
 		else if (strcmp(defel->defname, "delimiter") == 0)
 		{
 			if (opts_out->delim)
@@ -594,6 +602,11 @@ ProcessCopyOptions(ParseState *pstate,
 (errcode(ERRCODE_SYNTAX_ERROR),
  errmsg("cannot specify DEFAULT in BINARY mode")));
 
+	if (opts_out->binary && opts_out->ignore_datatype_errors)
+		ereport(ERROR,
+(errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify IGNORE_DATATYPE_ERRORS in BINARY mode")));
+
 	/* Set defaults for omitted options */
 	if (!opts_out->delim)
 		opts_out->delim = opts_out->csv_mode ? "," : "\t";
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 70871ed819..b18aea6376 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -752,6 +752,14 @@ CopyFrom(CopyFromState cstate)
 		ti_options |= TABLE_INSERT_FROZEN;
 	}
 
+	/* Set up soft error handler for IGNORE_DATATYPE_ERRORS */
+	if (cstate->opts.ignore_datatype_errors)
+	{
+		ErrorSaveContext escontext = {T_ErrorSaveContext};
+		escontext.details_wanted = true;
+		cstate->escontext = escontext;
+	}
+
 	/*
 	 * We need a ResultRelInfo so we can use the regular executor's
 	 * index-entry-making machinery.  (There used to be a huge amount of code
@@ -987,7 +995,36 @@ CopyFrom(CopyFromState cstate)
 
 		/* Directly store the values/nulls array in the slot */
 		if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
+		{
+			if (cstate->opts.ignore_datatype_errors &&
+cstate->ignored_errors_count > 0)
+ereport(WARNING,
+		errmsg("%zd rows were skipped due to data type incompatibility",
+			   cstate->ignored_errors_count));
 			break;
+		}
+
+		/* Soft error occured, skip this tuple and log the reason */
+		if (cstate->escontext.error_occurred)
+		{
+			ErrorSaveContext new_escontext = {T_ErrorSaveContext};
+
+			/* Adjust elevel so we don't jump out */
+

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-08-21 Thread torikoshia


Since v5 patch failed applying anymore, updated the patch.

On 2023-03-23 02:50, Andres Freund wrote:

I suggest adding a few more tests:

- COPY with a datatype error that can't be handled as a soft error


I didn't know proper way to test this, but I've found data type widget's
input function widget_in() defined to occur hard-error in regress.c,
attached patch added a test using it.

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom de00c1555e0ee4a61565346946f4f3a4e851252c Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Mon, 21 Aug 2023 20:30:29 +0900
Subject: [PATCH v6] Add new COPY option IGNORE_DATATYPE_ERRORS

Currently entire COPY fails even when there is one unexpected data
regarding data type or range.
IGNORE_DATATYPE_ERRORS ignores these errors and skips them and COPY
data which don't contain problem.

This patch uses the soft error handling infrastructure, which is
introduced by d9f7f5d32f20.

Author: Damir Belyalov, Atsushi Torikoshi
---
 doc/src/sgml/ref/copy.sgml   | 13 +
 src/backend/commands/copy.c  | 13 +
 src/backend/commands/copyfrom.c  | 37 
 src/backend/commands/copyfromparse.c | 19 +---
 src/bin/psql/tab-complete.c  |  3 +-
 src/include/commands/copy.h  |  1 +
 src/include/commands/copyfrom_internal.h |  3 ++
 src/test/regress/expected/copy2.out  | 28 ++
 src/test/regress/sql/copy2.sql   | 27 +
 9 files changed, 139 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 4d614a0225..d5cdbb4025 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,6 +43,7 @@ COPY { table_name [ ( column_name [, ...] ) | * }
 FORCE_NOT_NULL ( column_name [, ...] )
 FORCE_NULL ( column_name [, ...] )
+IGNORE_DATATYPE_ERRORS [ boolean ]
 ENCODING 'encoding_name'
 
  
@@ -370,6 +371,18 @@ COPY { table_name [ ( 

 
+   
+IGNORE_DATATYPE_ERRORS
+
+ 
+  Drops rows that contain malformed data while copying. These are rows
+  with columns where the data type's input-function raises an error.
+  This option is not allowed when using binary format.  Note that this
+  is only supported in current COPY syntax.
+ 
+
+   
+

 ENCODING
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f14fae3308..beb73f5357 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -419,6 +419,7 @@ ProcessCopyOptions(ParseState *pstate,
 	bool		format_specified = false;
 	bool		freeze_specified = false;
 	bool		header_specified = false;
+	bool		ignore_datatype_errors_specified = false;
 	ListCell   *option;
 
 	/* Support external use for option sanity checking */
@@ -458,6 +459,13 @@ ProcessCopyOptions(ParseState *pstate,
 			freeze_specified = true;
 			opts_out->freeze = defGetBoolean(defel);
 		}
+		else if (strcmp(defel->defname, "ignore_datatype_errors") == 0)
+		{
+			if (ignore_datatype_errors_specified)
+errorConflictingDefElem(defel, pstate);
+			ignore_datatype_errors_specified = true;
+			opts_out->ignore_datatype_errors = defGetBoolean(defel);
+		}
 		else if (strcmp(defel->defname, "delimiter") == 0)
 		{
 			if (opts_out->delim)
@@ -594,6 +602,11 @@ ProcessCopyOptions(ParseState *pstate,
 (errcode(ERRCODE_SYNTAX_ERROR),
  errmsg("cannot specify DEFAULT in BINARY mode")));
 
+	if (opts_out->binary && opts_out->ignore_datatype_errors)
+		ereport(ERROR,
+(errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify IGNORE_DATATYPE_ERRORS in BINARY mode")));
+
 	/* Set defaults for omitted options */
 	if (!opts_out->delim)
 		opts_out->delim = opts_out->csv_mode ? "," : "\t";
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index b47cb5c66d..853adb8414 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -752,6 +752,14 @@ CopyFrom(CopyFromState cstate)
 		ti_options |= TABLE_INSERT_FROZEN;
 	}
 
+	/* Set up soft error handler for IGNORE_DATATYPE_ERRORS */
+	if (cstate->opts.ignore_datatype_errors)
+	{
+		ErrorSaveContext escontext = {T_ErrorSaveContext};
+		escontext.details_wanted = true;
+		cstate->escontext = escontext;
+	}
+
 	/*
 	 * We need a ResultRelInfo so we can use the regular executor's
 	 * index-entry-making machinery.  (There used to be a huge amount of code
@@ -987,7 +995,36 @@ CopyFrom(CopyFromState cstate)
 
 		/* Directly store the values/nulls array in the slot */
 		if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
+		{
+			if (cstate->opts.ignore_datatype_errors &&
+cstate->ignored_error_count > 0)
+ereport(WARNING,
+		errmsg("%zd rows were skipped due to data type incompatibility",
+			cstate->ignored_error_count));
 			break;
+		}
+
+		/* Soft error occured, skip this tuple and log the reason */
+		if (cstate->escontext.error_occurred)
+

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-05-21 Thread Alena Rybakina


I'm sorry I was unable to respond right away.

On 09.05.2023 17:23, torikoshia wrote:
You may already understand it, but these variable names are given in 
imitation of FREEZE and BINARY cases:


  --- a/src/include/commands/copy.h
  +++ b/src/include/commands/copy.h
  @@ -42,6 +42,7 @@ typedef struct CopyFormatOptions
   * -1 if not specified */
  bool    binary; /* binary format? */
  bool    freeze; /* freeze rows on loading? */
  +   bool    ignore_datatype_errors;  /* ignore rows with 
datatype errors */


  --- a/src/backend/commands/copy.c
  +++ b/src/backend/commands/copy.c
  @@ -419,6 +419,7 @@ ProcessCopyOptions(ParseState *pstate,
  bool    format_specified = false;
  bool    freeze_specified = false;
  bool    header_specified = false;
  +   bool    ignore_datatype_errors_specified = false;

I think it would be sane to align the names with the FREEZE and BINARY 
options.


I agree with the name is too long and we once used the name 
'ignore_errors'.
However, current implementation does not ignore all errors but just 
data type error, so I renamed it.

There may be a better name, but I haven't come up with one.


Yes, you are right, I saw it.



As far as I take a quick look at on PostgreSQL source code, there're 
few variable name with "_counter". It seems to be used for function names.

Something like "ignored_errors_count" might be better.
I noticed that many variables are named with the "_counter" postfix, and 
most of them are used as a counter. For example, PgStat_StatTabEntry or 
JitInstrumentation structures consisted of many such variables. Despite 
this, I agree with your suggested name, because I found many similar 
variables that are used in the program as a counter, but it seems to me 
that the most of them are still used by local variables in the function.

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-05-09 Thread torikoshia


On 2023-05-07 05:05, Alena Rybakina wrote:
Thanks for your reviewing and comments!


I noticed that you used _ignore_datatype_errors_specified_ variable in
_copy.c_ , but guc has a short name _ignore_datatype_errors_. Also you
used the short variable name in _CopyFormatOptions_ structure.


You may already understand it, but these variable names are given in 
imitation of FREEZE and BINARY cases:


  --- a/src/include/commands/copy.h
  +++ b/src/include/commands/copy.h
  @@ -42,6 +42,7 @@ typedef struct CopyFormatOptions
   * -1 if not specified */
  boolbinary; /* binary format? */
  boolfreeze; /* freeze rows on loading? */
  +   boolignore_datatype_errors;  /* ignore rows with datatype 
errors */


  --- a/src/backend/commands/copy.c
  +++ b/src/backend/commands/copy.c
  @@ -419,6 +419,7 @@ ProcessCopyOptions(ParseState *pstate,
  boolformat_specified = false;
  boolfreeze_specified = false;
  boolheader_specified = false;
  +   boolignore_datatype_errors_specified = false;



Name used _ignore_datatype_errors_specified _is seemed very long to
me, may be use a short version of it (_ignore_datatype_errors_) in
_copy.c_ too?


I think it would be sane to align the names with the FREEZE and BINARY 
options.


I agree with the name is too long and we once used the name 
'ignore_errors'.
However, current implementation does not ignore all errors but just data 
type error, so I renamed it.

There may be a better name, but I haven't come up with one.


Besides, I noticed that you used _ignored_errors_ variable in
_CopyFromStateData_ structure and it's name is strikingly similar to
name (_ignore_datatype_error__s_), but they have different meanings.
Maybe it will be better to rename it as _ignored_errors_counter_?


As far as I take a quick look at on PostgreSQL source code, there're few 
variable name with "_counter". It seems to be used for function names.

Something like "ignored_errors_count" might be better.


--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-05-06 Thread Alena Rybakina


Hi!

Thank you, Damir, for your patch. It is very interesting to review it!

It seemed to me that the names of variables are not the same everywhere.

I noticed that you used /ignore_datatype_errors_specified/ variable in 
/copy.c/ , but guc has a short name /ignore_datatype_errors/. Also you 
used the short variable name in /CopyFormatOptions/ structure.
Name used /ignore_datatype_errors_specified /is seemed very long to me, 
may be use a short version of it (/ignore_datatype_errors/) in /copy.c/ too?


Besides, I noticed that you used /ignored_errors/ variable in 
/CopyFromStateData/ structure and it's name is strikingly similar to 
name (/ignore_datatype_error//s/), but they have different meanings.

Maybe it will be better to rename it as /ignored_errors_counter/?


I tested last version 
/v5-0001-Add-new-COPY-option-IGNORE_DATATYPE_ERRORS.patch/ with /bytea/ 
data type and transaction cases. Eventually, I didn't find any problem 
there.

I described my steps in more detail, if I missed something.

*First of all, I ran copy function with IGNORE_DATATYPE_ERRORS parameter 
being in transaction block.*

/
//File t2.csv exists:/

|id,b
769,\
1,\6e
2,\x5
5,\x|

/Test:/
CREATE TABLE t (id INT , b  BYTEA) ;
postgres=# BEGIN;
copy t FROM '/home/alena/postgres/t2.csv'  WITH (format 'csv', 
IGNORE_DATATYPE_ERRORS, delimiter ',', HEADER);

SAVEPOINT my_savepoint;
BEGIN
WARNING:  invalid input syntax for type bytea
WARNING:  invalid input syntax for type bytea
WARNING:  invalid hexadecimal data: odd number of digits
WARNING:  3 rows were skipped due to data type incompatibility
COPY 1
SAVEPOINT
postgres=*# copy t FROM '/home/alena/postgres/t2.csv'  WITH (format 
'csv', IGNORE_DATATYPE_ERRORS, delimiter ',', HEADER);

WARNING:  invalid input syntax for type bytea
WARNING:  invalid input syntax for type bytea
WARNING:  invalid hexadecimal data: odd number of digits
WARNING:  3 rows were skipped due to data type incompatibility
COPY 1
postgres=*# ROLLBACK TO my_savepoint;
ROLLBACK
postgres=*# select * from t;
 id | b
+
  5 | \x
(1 row)

postgres=*# copy t FROM '/home/alena/postgres/t2.csv'  WITH (format 
'csv', IGNORE_DATATYPE_ERRORS, delimiter ',', HEADER);

WARNING:  invalid input syntax for type bytea
WARNING:  invalid input syntax for type bytea
WARNING:  invalid hexadecimal data: odd number of digits
WARNING:  3 rows were skipped due to data type incompatibility
COPY 1
postgres=*# select * from t;
 id | b
+
  5 | \x
  5 | \x
(2 rows)

postgres=*# commit;
COMMIT

*I tried to use the similar test and moved transaction block in function:*
CREATE FUNCTION public.log2()
 RETURNS void
 LANGUAGE plpgsql
 SECURITY DEFINER
AS $function$
BEGIN;
copy t FROM '/home/alena/postgres/t2.csv'  WITH (format 'csv', 
IGNORE_DATATYPE_ERRORS, delimiter ',', HEADER);

SAVEPOINT my_savepoint;
END;
$function$;

postgres=# delete from t;

postgres=# select 1 as t from log2();
WARNING:  invalid input syntax for type bytea
WARNING:  invalid input syntax for type bytea
WARNING:  invalid hexadecimal data: odd number of digits
WARNING:  3 rows were skipped due to data type incompatibility
 t
---
 1
(1 row)

*Secondly I checked function copy with bytea datatype. *
/t1.csv consists:/
id,b
769,\x2d
1,\x6e
2,\x5c
5,\x

/And I ran it:/

postgres=# delete from t;
DELETE 4
postgres=# copy t FROM '/home/alena/postgres/t2.csv'  WITH (format 
'csv', IGNORE_DATATYPE_ERRORS, delimiter ',', HEADER);

WARNING:  invalid input syntax for type bytea
WARNING:  invalid input syntax for type bytea
WARNING:  invalid hexadecimal data: odd number of digits
WARNING:  3 rows were skipped due to data type incompatibility
COPY 1
postgres=# select * from t;
 id | b
+
  5 | \x
(1 row)

--
---
Alena Rybakina
Postgres Professional

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-03-28 Thread Damir Belyalov

>
> I might misunderstand something, but I believe the v5 patch uses
> copy_generic_opt_list and it does not add IGNORE_DATATYPE_ERRORS as a
> keyword.
> It modifies neither kwlist.h nor gram.y.
>

Sorry, didn't notice that. I think everything is alright now.

Regards,
Damir Belyalov
Postgres Professional

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-03-28 Thread torikoshia


On 2023-03-27 23:28, Damir Belyalov wrote:

Hi!

I made the specified changes and my patch turned out the same as
yours. The performance measurements were the same too.


Thanks for your review and measurements.


The only thing left to do is how not to add IGNORE_DATATYPE_ERRORS as
a keyword. See how this is done for parameters such as FORCE_NOT_NULL,
FORCE_NULL, FORCE_QUOTE. They are not in kwlist.h and are not as
keywords in gram.y.


I might misunderstand something, but I believe the v5 patch uses 
copy_generic_opt_list and it does not add IGNORE_DATATYPE_ERRORS as a 
keyword.

It modifies neither kwlist.h nor gram.y.

--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-03-27 Thread Damir Belyalov

Hi!

I made the specified changes and my patch turned out the same as yours. The
performance measurements were the same too.

The only thing left to do is how not to add IGNORE_DATATYPE_ERRORS as a
keyword. See how this is done for parameters such as FORCE_NOT_NULL,
FORCE_NULL, FORCE_QUOTE. They are not in kwlist.h and are not as keywords
in gram.y.

Regards,
Damir Belyalov
Postgres Professional

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-03-27 Thread torikoshia

On 2023-03-23 02:50, Andres Freund wrote:
Thanks again for your review.
Attached v5 patch.

Have you measured whether this has negative performance effects when 
*NOT*

using the new option?

I loaded 1000 rows of pgbench_accounts on my laptop and compared the 
elapsed time.
GUCs changed from the default are logging_collector = on, 
log_error_verbosity = verbose.

Three tests were run under each condition and the middle of them is 
listed below:

- patch NOT applied(36f40ce2dc66): 35299ms
- patch applied, without IGNORE_DATATYPE_ERRORS: 34409ms
- patch applied, with IGNORE_DATATYPE_ERRORS: 35510ms

It seems there are no significant degradation.

Also tested the elapsed time when loading data which has some datatype 
error with IGNORE_DATATYPE_ERRORS:

- data has 100 rows of error: 35269ms
- data has 1000 rows of error: 34577ms
- data has 500 rows of error: 48925ms

500 rows of error consumes much time, but it seems to be influenced 
by logging time.
Here are test results under log_min_messages and client_min_messages are 
'error':

- data has 500 data type error: 23972ms
- data has 0 data type error: 34320ms

Now conversely, when there were many data type errors, it consumes less 
time.
This seems like a reasonable result since the amount of skipped data is 
increasing.

As-is this does not work with FORMAT BINARY - and converting the binary 
input
functions to support soft errors won't happen for 16. So I think you 
need to

raise an error if BINARY and IGNORE_DATATYPE_ERRORS are specified.

Added the option check.

On 2023-03-22 22:34:20 +0900, torikoshia wrote:
> @@ -985,9 +986,28 @@ CopyFrom(CopyFromState cstate)
>
>   ExecClearTuple(myslot);
>
> + if (cstate->opts.ignore_datatype_errors)
> + {
> + escontext.details_wanted = true;
> + cstate->escontext = escontext;
> + }

I think it might be worth pulling this out of the loop. That does mean 
you'd
have to reset escontext.error_occurred after an error, but that doesn't 
seem

too bad, you need to do other cleanup anyway.

Pull this out of the loop and added process for resetting escontext.

> @@ -956,10 +957,20 @@ NextCopyFrom(CopyFromState cstate, ExprContext 
*econtext,
>   values[m] = ExecEvalExpr(defexprs[m], econtext, 
[m]);
>   }
>   else
> - values[m] = InputFunctionCall(_functions[m],
> - 
  string,
> - 
  typioparams[m],
> -
   att->atttypmod);
> + /* If IGNORE_DATATYPE_ERRORS is enabled skip 
rows with datatype errors */
> + if (!InputFunctionCallSafe(_functions[m],
> + 
   string,
> + 
   typioparams[m],
> +
att->atttypmod,
> +
(Node *) >escontext,
> +
[m]))
> + {
> + cstate->ignored_errors++;
> +
> + ereport(WARNING,
> + errmsg("%s", 
cstate->escontext.error_data->message));

That isn't right - you loose all the details of the message. As is 
you'd also

leak the error context.

I think the best bet for now is to do something like
/* adjust elevel so we don't jump out */
cstate->escontext.error_data->elevel = WARNING;
/* despite the name, this won't raise an error if elevel < ERROR */
ThrowErrorData(cstate->escontext.error_data);

As I mentioned in one previous email, added above codes for now.

I wonder if we ought to provide a wrapper for this? It could e.g. know 
to

mention the original elevel and such?

I don't think NextCopyFrom() is the right place to emit this warning - 
it
e.g. is also called from file_fdw.c, which might want to do something 
else

with the error. From a layering POV it seems cleaner to do this in
CopyFrom(). You already have a check for escontext.error_occurred there
anyway.

Agreed.

> @@ -3378,6 +3378,10 @@ copy_opt_item:
>   {
>   $$ = makeDefElem("freeze", (Node *) 
makeBoolean(true), @1);
>   }
> + | IGNORE_DATATYPE_ERRORS
> + {
> + $$ =

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-03-24 Thread torikoshia


On 2023-03-23 02:50, Andres Freund wrote:

Hi,

Tom, see below - I wonder if should provide one more piece of 
infrastructure

around the saved error stuff...


Have you measured whether this has negative performance effects when 
*NOT*

using the new option?


As-is this does not work with FORMAT BINARY - and converting the binary 
input
functions to support soft errors won't happen for 16. So I think you 
need to

raise an error if BINARY and IGNORE_DATATYPE_ERRORS are specified.


On 2023-03-22 22:34:20 +0900, torikoshia wrote:

@@ -985,9 +986,28 @@ CopyFrom(CopyFromState cstate)

ExecClearTuple(myslot);

+   if (cstate->opts.ignore_datatype_errors)
+   {
+   escontext.details_wanted = true;
+   cstate->escontext = escontext;
+   }


I think it might be worth pulling this out of the loop. That does mean 
you'd
have to reset escontext.error_occurred after an error, but that doesn't 
seem

too bad, you need to do other cleanup anyway.


@@ -956,10 +957,20 @@ NextCopyFrom(CopyFromState cstate, ExprContext 
*econtext,

values[m] = ExecEvalExpr(defexprs[m], econtext, 
[m]);
}
else
-   values[m] = InputFunctionCall(_functions[m],
-   
  string,
-   
  typioparams[m],
-  
   att->atttypmod);
+/* If IGNORE_DATATYPE_ERRORS is enabled skip rows with datatype 
errors */

+   if (!InputFunctionCallSafe(_functions[m],
+   
   string,
+   
   typioparams[m],
+  
att->atttypmod,
+  
(Node *) >escontext,
+  
[m]))
+   {
+   cstate->ignored_errors++;
+
+   ereport(WARNING,
+   errmsg("%s", 
cstate->escontext.error_data->message));


That isn't right - you loose all the details of the message. As is 
you'd also

leak the error context.

I think the best bet for now is to do something like
/* adjust elevel so we don't jump out */
cstate->escontext.error_data->elevel = WARNING;
/* despite the name, this won't raise an error if elevel < ERROR */
ThrowErrorData(cstate->escontext.error_data);


Thanks for your reviewing!
I'll try to fix it this way for the time being.

I wonder if we ought to provide a wrapper for this? It could e.g. know 
to

mention the original elevel and such?


--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-03-22 Thread Andres Freund

Hi,

Tom, see below - I wonder if should provide one more piece of infrastructure
around the saved error stuff...

Have you measured whether this has negative performance effects when *NOT*
using the new option?

As-is this does not work with FORMAT BINARY - and converting the binary input
functions to support soft errors won't happen for 16. So I think you need to
raise an error if BINARY and IGNORE_DATATYPE_ERRORS are specified.

On 2023-03-22 22:34:20 +0900, torikoshia wrote:
> @@ -985,9 +986,28 @@ CopyFrom(CopyFromState cstate)
>  
>   ExecClearTuple(myslot);
>  
> + if (cstate->opts.ignore_datatype_errors)
> + {
> + escontext.details_wanted = true;
> + cstate->escontext = escontext;
> + }

I think it might be worth pulling this out of the loop. That does mean you'd
have to reset escontext.error_occurred after an error, but that doesn't seem
too bad, you need to do other cleanup anyway.

> @@ -956,10 +957,20 @@ NextCopyFrom(CopyFromState cstate, ExprContext 
> *econtext,
>   values[m] = ExecEvalExpr(defexprs[m], econtext, 
> [m]);
>   }
>   else
> - values[m] = InputFunctionCall(_functions[m],
> - 
>   string,
> - 
>   typioparams[m],
> - 
>   att->atttypmod);
> + /* If IGNORE_DATATYPE_ERRORS is enabled skip 
> rows with datatype errors */
> + if (!InputFunctionCallSafe(_functions[m],
> + 
>string,
> + 
>typioparams[m],
> + 
>att->atttypmod,
> + 
>(Node *) >escontext,
> + 
>[m]))
> + {
> + cstate->ignored_errors++;
> +
> + ereport(WARNING,
> + errmsg("%s", 
> cstate->escontext.error_data->message));

That isn't right - you loose all the details of the message. As is you'd also
leak the error context.

I think the best bet for now is to do something like
/* adjust elevel so we don't jump out */
cstate->escontext.error_data->elevel = WARNING;
/* despite the name, this won't raise an error if elevel < ERROR */
ThrowErrorData(cstate->escontext.error_data);

I wonder if we ought to provide a wrapper for this? It could e.g. know to
mention the original elevel and such?

I don't think NextCopyFrom() is the right place to emit this warning - it
e.g. is also called from file_fdw.c, which might want to do something else
with the error. From a layering POV it seems cleaner to do this in
CopyFrom(). You already have a check for escontext.error_occurred there
anyway.

> @@ -3378,6 +3378,10 @@ copy_opt_item:
>   {
>   $$ = makeDefElem("freeze", (Node *) 
> makeBoolean(true), @1);
>   }
> + | IGNORE_DATATYPE_ERRORS
> + {
> + $$ = 
> makeDefElem("ignore_datatype_errors", (Node *)makeBoolean(true), @1);
> + }
>   | DELIMITER opt_as Sconst
>   {
>   $$ = makeDefElem("delimiter", (Node *) 
> makeString($3), @1);

I think we shouldn't add a new keyword for this, but only support this via
/* new COPY option syntax */
copy_generic_opt_list:
copy_generic_opt_elem

Further increasing the size of the grammar with random keywords when we have
more generic ways to represent them seems unnecessary.

> +-- tests for IGNORE_DATATYPE_ERRORS option
> +CREATE TABLE check_ign_err (n int, m int[], k int);
> +COPY check_ign_err FROM STDIN WITH IGNORE_DATATYPE_ERRORS;
> +1{1} 1
> +a{2} 2
> +3{3} 33
> +4{a, 4}  4
> +
> +5{5} 5
> +\.
> +SELECT * FROM check_ign_err;
> +

I suggest adding a few more tests:

- COPY with a datatype error that can't be handled as a soft error
- test documenting that COPY FORMAT BINARY is incompatible with 
IGNORE_DATATYPE_ERRORS
- a soft error showing the error context - although that will require some
  care to avoid the function name + line in the output

Greetings,

Andres Freund

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-03-22 Thread torikoshia


On 2023-03-17 21:23, torikoshia wrote:

On 2023-03-07 18:09, Daniel Gustafsson wrote:

On 7 Mar 2023, at 09:35, Damir Belyalov  wrote:


I felt just logging "Error: %ld" would make people wonder the meaning 
of

the %ld. Logging something like ""Error: %ld data type errors were
found" might be clearer.

Thanks. For more clearance change the message to: "Errors were found: 
%".


I'm not convinced that this adds enough clarity to assist the user.  
We also
shouldn't use "error" in a WARNING log since the user has explicitly 
asked to

skip rows on error, so it's not an error per se.

+1


How about something like:

  ereport(WARNING,
  (errmsg("%ld rows were skipped due to data type
incompatibility", cstate->ignored_errors),
   errhint("Skipped rows can be inspected in the database log
for reprocessing.")));

Since skipped rows cannot be inspected in the log when
log_error_verbosity is set to terse,
it might be better without this errhint.


Removed errhint.

Modified some codes since v3 couldn't be applied HEAD anymore.

Also modified v3 patch as below:


65 +   if (cstate->opts.ignore_datatype_errors)
66 +   cstate->ignored_errors = 0;
67 +


It seems not necessary since cstate is initialized by palloc0() in 
BeginCopyFrom().



134 +   ereport(LOG,
135 +   errmsg("%s", 
cstate->escontext.error_data->message));

136 +
137 +   return true;


Since LOG means 'Reports information of interest to administrators'
according to the manual[1], datatype error should not be logged as
LOG. I put it back in WARNING.

[1] 
https://www.postgresql.org/docs/current/runtime-config-logging.html#RUNTIME-CONFIG-SEVERITY-LEVELS



--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom 6764d7e0f21ca266d7426cb922fd00e5138ec857 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Wed, 22 Mar 2023 22:00:15 +0900
Subject: [PATCH v4] Add new COPY option IGNORE_DATATYPE_ERRORS

Add new COPY option IGNORE_DATATYPE_ERRORS.

Currently entire COPY fails even when there is one unexpected
data regarding data type or range.
IGNORE_DATATYPE_ERRORS ignores these errors and skips them and
COPY data which don't contain problem.

This patch uses the soft error handling infrastructure, which
is introduced by d9f7f5d32f20.

Author: Damir Belyalov, Atsushi Torikoshi

---
 doc/src/sgml/ref/copy.sgml   | 12 
 src/backend/commands/copy.c  |  8 
 src/backend/commands/copyfrom.c  | 20 
 src/backend/commands/copyfromparse.c | 19 +++
 src/backend/parser/gram.y|  8 +++-
 src/include/commands/copy.h  |  1 +
 src/include/commands/copyfrom_internal.h |  3 +++
 src/include/parser/kwlist.h  |  1 +
 src/test/regress/expected/copy2.out  | 15 +++
 src/test/regress/sql/copy2.sql   | 12 
 10 files changed, 94 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 5e591ed2e6..168b1c05d9 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -34,6 +34,7 @@ COPY { table_name [ ( format_name
 FREEZE [ boolean ]
+IGNORE_DATATYPE_ERRORS [ boolean ]
 DELIMITER 'delimiter_character'
 NULL 'null_string'
 HEADER [ boolean | MATCH ]
@@ -234,6 +235,17 @@ COPY { table_name [ ( 

 
+   
+IGNORE_DATATYPE_ERRORS
+
+ 
+  Drops rows that contain malformed data while copying. These are rows
+  with columns where the data type's input-function raises an error.
+  Outputs warnings about rows with incorrect data to system logfile.
+ 
+
+   
+

 DELIMITER
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f14fae3308..02d911abbe 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -419,6 +419,7 @@ ProcessCopyOptions(ParseState *pstate,
 	bool		format_specified = false;
 	bool		freeze_specified = false;
 	bool		header_specified = false;
+	bool		ignore_datatype_errors_specified = false;
 	ListCell   *option;
 
 	/* Support external use for option sanity checking */
@@ -458,6 +459,13 @@ ProcessCopyOptions(ParseState *pstate,
 			freeze_specified = true;
 			opts_out->freeze = defGetBoolean(defel);
 		}
+		else if (strcmp(defel->defname, "ignore_datatype_errors") == 0)
+		{
+			if (ignore_datatype_errors_specified)
+errorConflictingDefElem(defel, pstate);
+			ignore_datatype_errors_specified = true;
+			opts_out->ignore_datatype_errors = defGetBoolean(defel);
+		}
 		else if (strcmp(defel->defname, "delimiter") == 0)
 		{
 			if (opts_out->delim)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 80bca79cd0..85c47f54b2 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -953,6 +953,7 @@ CopyFrom(CopyFromState cstate)
 	{
 		TupleTableSlot *myslot;
 		bool		skip_tuple;
+		ErrorSaveContext

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-03-17 Thread torikoshia


On 2023-03-07 18:09, Daniel Gustafsson wrote:

On 7 Mar 2023, at 09:35, Damir Belyalov  wrote:


I felt just logging "Error: %ld" would make people wonder the meaning 
of

the %ld. Logging something like ""Error: %ld data type errors were
found" might be clearer.

Thanks. For more clearance change the message to: "Errors were found: 
%".


I'm not convinced that this adds enough clarity to assist the user.  We 
also
shouldn't use "error" in a WARNING log since the user has explicitly 
asked to

skip rows on error, so it's not an error per se.

+1


How about something like:

  ereport(WARNING,
  (errmsg("%ld rows were skipped due to data type
incompatibility", cstate->ignored_errors),
   errhint("Skipped rows can be inspected in the database log
for reprocessing.")));
Since skipped rows cannot be inspected in the log when 
log_error_verbosity is set to terse,

it might be better without this errhint.

--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-03-07 Thread Daniel Gustafsson

> On 7 Mar 2023, at 09:35, Damir Belyalov  wrote:

> I felt just logging "Error: %ld" would make people wonder the meaning of 
> the %ld. Logging something like ""Error: %ld data type errors were 
> found" might be clearer.
>  
> Thanks. For more clearance change the message to: "Errors were found: %". 

I'm not convinced that this adds enough clarity to assist the user.  We also
shouldn't use "error" in a WARNING log since the user has explicitly asked to
skip rows on error, so it's not an error per se. How about something like:

  ereport(WARNING,
  (errmsg("%ld rows were skipped due to data type incompatibility", 
cstate->ignored_errors),
   errhint("Skipped rows can be inspected in the database log for 
reprocessing.")));

--
Daniel Gustafsson

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-03-07 Thread Damir Belyalov

>
> FWIW, Greenplum has a similar construct (but which also logs the errors
> in the
> db) where data type errors are skipped as long as the number of errors
> don't
> exceed a reject limit.  If the reject limit is reached then the COPY
> fails:
> >
> >   LOG ERRORS [ SEGMENT REJECT LIMIT  [ ROWS | PERCENT ]]
> >
> IIRC the gist of this was to catch then the user copies the wrong input
> data or
> plain has a broken file.  Rather than finding out after copying n rows
> which
> are likely to be garbage the process can be restarted.
>

I think this is a matter for discussion. The same question is: "Where to
log errors to separate files or to the system logfile?".
IMO it's better for users to log short-detailed error message to system
logfile and not output errors to the terminal.


This version of the patch has a compiler error in the error message:
>
Yes, corrected it. Changed "ignored_errors" to int64 because "processed"
(used for counting copy rows) is int64.


I felt just logging "Error: %ld" would make people wonder the meaning of
> the %ld. Logging something like ""Error: %ld data type errors were
> found" might be clearer.
>

Thanks. For more clearance change the message to: "Errors were found: %".

Regards, Damir Belyalov
Postgres Professional
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index c25b52d0cb..706b929947 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -34,6 +34,7 @@ COPY { table_name [ ( format_name
 FREEZE [ boolean ]
+IGNORE_DATATYPE_ERRORS [ boolean ]
 DELIMITER 'delimiter_character'
 NULL 'null_string'
 HEADER [ boolean | MATCH ]
@@ -233,6 +234,17 @@ COPY { table_name [ ( 

 
+   
+IGNORE_DATATYPE_ERRORS
+
+ 
+  Drops rows that contain malformed data while copying. These are rows
+  with columns where the data type's input-function raises an error.
+  Outputs warnings about rows with incorrect data to system logfile.
+ 
+
+   
+

 DELIMITER
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e34f583ea7..0334894014 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -410,6 +410,7 @@ ProcessCopyOptions(ParseState *pstate,
 	bool		format_specified = false;
 	bool		freeze_specified = false;
 	bool		header_specified = false;
+	bool		ignore_datatype_errors_specified= false;
 	ListCell   *option;
 
 	/* Support external use for option sanity checking */
@@ -449,6 +450,13 @@ ProcessCopyOptions(ParseState *pstate,
 			freeze_specified = true;
 			opts_out->freeze = defGetBoolean(defel);
 		}
+		else if (strcmp(defel->defname, "ignore_datatype_errors") == 0)
+		{
+			if (ignore_datatype_errors_specified)
+errorConflictingDefElem(defel, pstate);
+			ignore_datatype_errors_specified = true;
+			opts_out->ignore_datatype_errors = defGetBoolean(defel);
+		}
 		else if (strcmp(defel->defname, "delimiter") == 0)
 		{
 			if (opts_out->delim)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 29cd1cf4a6..facfc44def 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -949,10 +949,14 @@ CopyFrom(CopyFromState cstate)
 	errcallback.previous = error_context_stack;
 	error_context_stack = 
 
+	if (cstate->opts.ignore_datatype_errors)
+		cstate->ignored_errors = 0;
+
 	for (;;)
 	{
 		TupleTableSlot *myslot;
 		bool		skip_tuple;
+		ErrorSaveContext escontext = {T_ErrorSaveContext};
 
 		CHECK_FOR_INTERRUPTS();
 
@@ -985,9 +989,26 @@ CopyFrom(CopyFromState cstate)
 
 		ExecClearTuple(myslot);
 
+		if (cstate->opts.ignore_datatype_errors)
+		{
+			escontext.details_wanted = true;
+			cstate->escontext = escontext;
+		}
+
 		/* Directly store the values/nulls array in the slot */
 		if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
+		{
+			if (cstate->opts.ignore_datatype_errors && cstate->ignored_errors > 0)
+ereport(WARNING, errmsg("Errors were found: %lld", (long long) cstate->ignored_errors));
 			break;
+		}
+
+		/* Soft error occured, skip this tuple */
+		if (cstate->escontext.error_occurred)
+		{
+			ExecClearTuple(myslot);
+			continue;
+		}
 
 		ExecStoreVirtualTuple(myslot);
 
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 91b564c2bc..9c36b0dc8b 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -70,6 +70,7 @@
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
+#include "nodes/miscnodes.h"
 #include "pgstat.h"
 #include "port/pg_bswap.h"
 #include "utils/builtins.h"
@@ -938,10 +939,23 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 
 			cstate->cur_attname = NameStr(att->attname);
 			cstate->cur_attval = string;
-			values[m] = InputFunctionCall(_functions[m],
-		  string,
-		  typioparams[m],
-		  att->atttypmod);
+
+			/* If IGNORE_DATATYPE_ERRORS is enabled skip

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-03-06 Thread torikoshia


On 2023-03-06 23:03, Daniel Gustafsson wrote:

On 28 Feb 2023, at 15:28, Damir Belyalov  wrote:


Tested patch on all cases: CIM_SINGLE, CIM_MULTI, CIM_MULTI_CONDITION. 
As expected it works.

Also added a description to copy.sgml and made a review on patch.

Thanks for your tests and improvements!

I added 'ignored_errors' integer parameter that should be output after 
the option is finished.
All errors were added to the system logfile with full detailed 
context. Maybe it's better to log only error message.

Certainly.

FWIW, Greenplum has a similar construct (but which also logs the errors 
in the
db) where data type errors are skipped as long as the number of errors 
don't
exceed a reject limit.  If the reject limit is reached then the COPY 
fails:


LOG ERRORS [ SEGMENT REJECT LIMIT  [ ROWS | PERCENT ]]

IIRC the gist of this was to catch then the user copies the wrong input 
data or
plain has a broken file.  Rather than finding out after copying n rows 
which

are likely to be garbage the process can be restarted.

This version of the patch has a compiler error in the error message:

copyfrom.c: In function ‘CopyFrom’:
copyfrom.c:1008:29: error: format ‘%ld’ expects argument of type ‘long
int’, but argument 2 has type ‘uint64’ {aka ‘long long unsigned int’}
[-Werror=format=]
1008 | ereport(WARNING, errmsg("Errors: %ld", cstate->ignored_errors));
 |  ^ ~~
 |  |
 |  uint64 {aka long
long unsigned int}


On that note though, it seems to me that this error message leaves a 
bit to be

desired with regards to the level of detail.

+1.
I felt just logging "Error: %ld" would make people wonder the meaning of 
the %ld. Logging something like ""Error: %ld data type errors were 
found" might be clearer.


--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-03-06 Thread Daniel Gustafsson

> On 28 Feb 2023, at 15:28, Damir Belyalov  wrote:

> Tested patch on all cases: CIM_SINGLE, CIM_MULTI, CIM_MULTI_CONDITION. As 
> expected it works.
> Also added a description to copy.sgml and made a review on patch.
> 
> I added 'ignored_errors' integer parameter that should be output after the 
> option is finished.
> All errors were added to the system logfile with full detailed context. Maybe 
> it's better to log only error message.

FWIW, Greenplum has a similar construct (but which also logs the errors in the
db) where data type errors are skipped as long as the number of errors don't
exceed a reject limit.  If the reject limit is reached then the COPY fails:

LOG ERRORS [ SEGMENT REJECT LIMIT  [ ROWS | PERCENT ]]

IIRC the gist of this was to catch then the user copies the wrong input data or
plain has a broken file.  Rather than finding out after copying n rows which
are likely to be garbage the process can be restarted.

This version of the patch has a compiler error in the error message:

copyfrom.c: In function ‘CopyFrom’:
copyfrom.c:1008:29: error: format ‘%ld’ expects argument of type ‘long int’, 
but argument 2 has type ‘uint64’ {aka ‘long long unsigned int’} 
[-Werror=format=]
1008 | ereport(WARNING, errmsg("Errors: %ld", cstate->ignored_errors));
 |  ^ ~~
 |  |
 |  uint64 {aka long long 
unsigned int}

On that note though, it seems to me that this error message leaves a bit to be
desired with regards to the level of detail.

--
Daniel Gustafsson

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-02-28 Thread Damir Belyalov

Hello

Tested patch on all cases: CIM_SINGLE, CIM_MULTI, CIM_MULTI_CONDITION. As
expected it works.
Also added a description to copy.sgml and made a review on patch.

I added 'ignored_errors' integer parameter that should be output after the
option is finished.
All errors were added to the system logfile with full detailed context.
Maybe it's better to log only error message.
file:///home/abc13/Documents/todo_copy/postgres/v2-0001-Add-COPY-option-IGNORE_DATATYPE_ERRORS.patch



Regards, Damir Belyalov
Postgres Professional
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index c25b52d0cb..706b929947 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -34,6 +34,7 @@ COPY { table_name [ ( format_name
 FREEZE [ boolean ]
+IGNORE_DATATYPE_ERRORS [ boolean ]
 DELIMITER 'delimiter_character'
 NULL 'null_string'
 HEADER [ boolean | MATCH ]
@@ -233,6 +234,17 @@ COPY { table_name [ ( 

 
+   
+IGNORE_DATATYPE_ERRORS
+
+ 
+  Drops rows that contain malformed data while copying. These are rows
+  with columns where the data type's input-function raises an error.
+  Outputs warnings about rows with incorrect data to system logfile.
+ 
+
+   
+

 DELIMITER
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e34f583ea7..0334894014 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -410,6 +410,7 @@ ProcessCopyOptions(ParseState *pstate,
 	bool		format_specified = false;
 	bool		freeze_specified = false;
 	bool		header_specified = false;
+	bool		ignore_datatype_errors_specified= false;
 	ListCell   *option;
 
 	/* Support external use for option sanity checking */
@@ -449,6 +450,13 @@ ProcessCopyOptions(ParseState *pstate,
 			freeze_specified = true;
 			opts_out->freeze = defGetBoolean(defel);
 		}
+		else if (strcmp(defel->defname, "ignore_datatype_errors") == 0)
+		{
+			if (ignore_datatype_errors_specified)
+errorConflictingDefElem(defel, pstate);
+			ignore_datatype_errors_specified = true;
+			opts_out->ignore_datatype_errors = defGetBoolean(defel);
+		}
 		else if (strcmp(defel->defname, "delimiter") == 0)
 		{
 			if (opts_out->delim)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index af52faca6d..ecaa750568 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -955,10 +955,14 @@ CopyFrom(CopyFromState cstate)
 	errcallback.previous = error_context_stack;
 	error_context_stack = 
 
+	if (cstate->opts.ignore_datatype_errors)
+		cstate->ignored_errors = 0;
+
 	for (;;)
 	{
 		TupleTableSlot *myslot;
 		bool		skip_tuple;
+		ErrorSaveContext escontext = {T_ErrorSaveContext};
 
 		CHECK_FOR_INTERRUPTS();
 
@@ -991,9 +995,26 @@ CopyFrom(CopyFromState cstate)
 
 		ExecClearTuple(myslot);
 
+		if (cstate->opts.ignore_datatype_errors)
+		{
+			escontext.details_wanted = true;
+			cstate->escontext = escontext;
+		}
+
 		/* Directly store the values/nulls array in the slot */
 		if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
+		{
+			if (cstate->opts.ignore_datatype_errors && cstate->ignored_errors > 0)
+ereport(WARNING, errmsg("Errors: %ld", cstate->ignored_errors));
 			break;
+		}
+
+		/* Soft error occured, skip this tuple */
+		if (cstate->escontext.error_occurred)
+		{
+			ExecClearTuple(myslot);
+			continue;
+		}
 
 		ExecStoreVirtualTuple(myslot);
 
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 91b564c2bc..9c36b0dc8b 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -70,6 +70,7 @@
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
+#include "nodes/miscnodes.h"
 #include "pgstat.h"
 #include "port/pg_bswap.h"
 #include "utils/builtins.h"
@@ -938,10 +939,23 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 
 			cstate->cur_attname = NameStr(att->attname);
 			cstate->cur_attval = string;
-			values[m] = InputFunctionCall(_functions[m],
-		  string,
-		  typioparams[m],
-		  att->atttypmod);
+
+			/* If IGNORE_DATATYPE_ERRORS is enabled skip rows with datatype errors */
+			if (!InputFunctionCallSafe(_functions[m],
+	   string,
+	   typioparams[m],
+	   att->atttypmod,
+	   (Node *) >escontext,
+	   [m]))
+			{
+cstate->ignored_errors++;
+
+ereport(LOG,
+		errmsg("%s", cstate->escontext.error_data->message));
+
+return true;
+			}
+
 			if (string != NULL)
 nulls[m] = false;
 			cstate->cur_attname = NULL;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index a0138382a1..d79d293c0d 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -701,7 +701,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 	HANDLER HAVING HEADER_P HOLD HOUR_P
 
-	IDENTITY_P IF_P ILIKE

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-02-26 Thread torikoshia


On 2023-02-06 15:00, Tom Lane wrote:

Andres Freund  writes:
On February 5, 2023 9:12:17 PM PST, Tom Lane  
wrote:

Damir Belyalov  writes:
InputFunctionCallSafe() is good for detecting errors from 
input-functions
but there are such errors from NextCopyFrom () that can not be 
detected
with InputFunctionCallSafe(), e.g. "wrong number of columns in 
row''.


If you want to deal with those, then there's more work to be done to 
make
those bits non-error-throwing.  But there's a very finite amount of 
code

involved and no obvious reason why it couldn't be done.



I'm not even sure it makes sense to avoid that kind of error. And
invalid column count or such is something quite different than failing
some data type input routine, or falling a constraint.


I think it could be reasonable to put COPY's overall-line-format
requirements on the same level as datatype input format violations.
I agree that trying to trap every kind of error is a bad idea,
for largely the same reason that the soft-input-errors patches
only trap certain kinds of errors: it's too hard to tell whether
an error is an "internal" error that it's scary to continue past.


Is it a bad idea to limit the scope of allowing errors to 'soft' errors 
in InputFunctionCallSafe()?


I think it could be still useful for some usecases.

  diff --git a/src/test/regress/sql/copy2.sql 
b/src/test/regress/sql/copy2.sql


  +-- tests for IGNORE_DATATYPE_ERRORS option
  +CREATE TABLE check_ign_err (n int, m int[], k int);
  +COPY check_ign_err FROM STDIN WITH IGNORE_DATATYPE_ERRORS;
  +1  {1} 1
  +a  {2} 2
  +3  {3} 33
  +4  {a, 4}  4
  +
  +5  {5} 5
  +\.
  +SELECT * FROM check_ign_err;

  diff --git a/src/test/regress/expected/copy2.out 
b/src/test/regress/expected/copy2.out

  index 090ef6c7a8..08e8056fc1 100644

  +-- tests for IGNORE_DATATYPE_ERRORS option
  +CREATE TABLE check_ign_err (n int, m int[], k int);
  +COPY check_ign_err FROM STDIN WITH IGNORE_DATATYPE_ERRORS;
  +WARNING:  invalid input syntax for type integer: "a"
  +WARNING:  value "33" is out of range for type integer
  +WARNING:  invalid input syntax for type integer: "a"
  +WARNING:  invalid input syntax for type integer: ""
  +SELECT * FROM check_ign_err;
  + n |  m  | k
  +---+-+---
  + 1 | {1} | 1
  + 5 | {5} | 5
  +(2 rows)

--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom 16877d4cdd64db5f85bed9cd559e618d8211e598 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Mon, 27 Feb 2023 12:02:16 +0900
Subject: [PATCH v1] Add COPY option IGNORE_DATATYPE_ERRORS

---
 src/backend/commands/copy.c  |  8 
 src/backend/commands/copyfrom.c  | 11 +++
 src/backend/commands/copyfromparse.c | 12 ++--
 src/backend/parser/gram.y|  8 +++-
 src/bin/psql/tab-complete.c  |  3 ++-
 src/include/commands/copy.h  |  1 +
 src/include/commands/copyfrom_internal.h |  2 ++
 src/include/parser/kwlist.h  |  1 +
 src/test/regress/expected/copy2.out  | 14 ++
 src/test/regress/sql/copy2.sql   | 12 
 10 files changed, 68 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e34f583ea7..2f1cfb3f4d 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -410,6 +410,7 @@ ProcessCopyOptions(ParseState *pstate,
 	bool		format_specified = false;
 	bool		freeze_specified = false;
 	bool		header_specified = false;
+	bool		ignore_datatype_errors_specified= false;
 	ListCell   *option;
 
 	/* Support external use for option sanity checking */
@@ -449,6 +450,13 @@ ProcessCopyOptions(ParseState *pstate,
 			freeze_specified = true;
 			opts_out->freeze = defGetBoolean(defel);
 		}
+		else if (strcmp(defel->defname, "ignore_datatype_errors") == 0)
+		{
+			if (ignore_datatype_errors_specified)
+errorConflictingDefElem(defel, pstate);
+			ignore_datatype_errors_specified= true;
+			opts_out->ignore_datatype_errors = defGetBoolean(defel);
+		}
 		else if (strcmp(defel->defname, "delimiter") == 0)
 		{
 			if (opts_out->delim)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index af52faca6d..24eec6a27d 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -959,6 +959,7 @@ CopyFrom(CopyFromState cstate)
 	{
 		TupleTableSlot *myslot;
 		bool		skip_tuple;
+		ErrorSaveContext escontext = {T_ErrorSaveContext};
 
 		CHECK_FOR_INTERRUPTS();
 
@@ -991,10 +992,20 @@ CopyFrom(CopyFromState cstate)
 
 		ExecClearTuple(myslot);
 
+		if (cstate->opts.ignore_datatype_errors)
+		{
+			escontext.details_wanted = true;
+			cstate->escontext = escontext;
+		}
+
 		/* Directly store the values/nulls array in the slot */
 		if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
 			break;
 
+		/* Soft error occured, skip this tuple */
+		if(cstate->escontext.error_occurred)
+			continue;
+

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-02-05 Thread Tom Lane

Andres Freund  writes:
> On February 5, 2023 9:12:17 PM PST, Tom Lane  wrote:
>> Damir Belyalov  writes:
>>> InputFunctionCallSafe() is good for detecting errors from input-functions
>>> but there are such errors from NextCopyFrom () that can not be detected
>>> with InputFunctionCallSafe(), e.g. "wrong number of columns in row''.

>> If you want to deal with those, then there's more work to be done to make
>> those bits non-error-throwing.  But there's a very finite amount of code
>> involved and no obvious reason why it couldn't be done.

> I'm not even sure it makes sense to avoid that kind of error. And
> invalid column count or such is something quite different than failing
> some data type input routine, or falling a constraint.

I think it could be reasonable to put COPY's overall-line-format
requirements on the same level as datatype input format violations.
I agree that trying to trap every kind of error is a bad idea,
for largely the same reason that the soft-input-errors patches
only trap certain kinds of errors: it's too hard to tell whether
an error is an "internal" error that it's scary to continue past.

regards, tom lane

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-02-05 Thread Andres Freund

Hi, 

On February 5, 2023 9:12:17 PM PST, Tom Lane  wrote:
>Damir Belyalov  writes:
>>> I don't think this is the right approach. Creating a subtransaction for
>>> each row will cause substantial performance issues.
>
>> Subtransactions aren't created for each row. The block of rows in one
>> subtransaction is 1000 (SAFE_BUFFER_SIZE) and can be changed.
>
>I think that at this point, any patch that involves adding subtransactions
>to COPY is dead on arrival; whether it's batched or not is irrelevant.
>(It's not like batching has no downsides.)

Indeed.

>> InputFunctionCallSafe() is good for detecting errors from input-functions
>> but there are such errors from NextCopyFrom () that can not be detected
>> with InputFunctionCallSafe(), e.g. "wrong number of columns in row''.
>
>If you want to deal with those, then there's more work to be done to make
>those bits non-error-throwing.  But there's a very finite amount of code
>involved and no obvious reason why it couldn't be done.  The major problem
>here has always been the indefinite amount of code implicated by calling
>datatype input functions, and we have now created a plausible answer to
>that problem.

I'm not even sure it makes sense to avoid that kind of error. And invalid 
column count or such is something quite different than failing some data type 
input routine, or falling a constraint. 



-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-02-05 Thread Tom Lane

Damir Belyalov  writes:
>> I don't think this is the right approach. Creating a subtransaction for
>> each row will cause substantial performance issues.

> Subtransactions aren't created for each row. The block of rows in one
> subtransaction is 1000 (SAFE_BUFFER_SIZE) and can be changed.

I think that at this point, any patch that involves adding subtransactions
to COPY is dead on arrival; whether it's batched or not is irrelevant.
(It's not like batching has no downsides.)

> InputFunctionCallSafe() is good for detecting errors from input-functions
> but there are such errors from NextCopyFrom () that can not be detected
> with InputFunctionCallSafe(), e.g. "wrong number of columns in row''.

If you want to deal with those, then there's more work to be done to make
those bits non-error-throwing.  But there's a very finite amount of code
involved and no obvious reason why it couldn't be done.  The major problem
here has always been the indefinite amount of code implicated by calling
datatype input functions, and we have now created a plausible answer to
that problem.

regards, tom lane

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-02-05 Thread Damir Belyalov

Hi, Andres!

Thank you for reviewing.


> I don't think this is the right approach. Creating a subtransaction for
> each row will cause substantial performance issues.
>

Subtransactions aren't created for each row. The block of rows in one
subtransaction is 1000 (SAFE_BUFFER_SIZE) and can be changed. There is also
a constraint for the number of bytes MAX_SAFE_BUFFER_BYTES in safe_buffer:
 while (sfcstate->saved_tuples < SAFE_BUFFER_SIZE &&
 sfcstate->safeBufferBytes < MAX_SAFE_BUFFER_BYTES)



We now can call data type input functions without throwing errors, see
> InputFunctionCallSafe(). Use that to avoid throwing an error instead of
> catching it.
>
InputFunctionCallSafe() is good for detecting errors from input-functions
but there are such errors from NextCopyFrom () that can not be detected
with InputFunctionCallSafe(), e.g. "wrong number of columns in row''. Do
you offer to process input-function errors separately from other errors?
Now all errors are processed in one "switch" loop in PG_CATCH, so this
change can complicate code.



Regards,
Damir Belyalov
Postgres Professional

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-02-05 Thread Andres Freund

H,

On 2023-02-03 13:27:24 +0300, Damir Belyalov wrote:
> @@ -625,6 +628,173 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, 
> ResultRelInfo *rri,
>   miinfo->bufferedBytes += tuplen;
>  }
>  
> +/*
> + * Safely reads source data, converts to a tuple and fills tuple buffer.
> + * Skips some data in the case of failed conversion if data source for
> + * a next tuple can be surely read without a danger.
> + */
> +static bool
> +SafeCopying(CopyFromState cstate, ExprContext *econtext, TupleTableSlot 
> *myslot)

> + BeginInternalSubTransaction(NULL);
> + CurrentResourceOwner = sfcstate->oldowner;

I don't think this is the right approach. Creating a subtransaction for
each row will cause substantial performance issues.

We now can call data type input functions without throwing errors, see
InputFunctionCallSafe(). Use that to avoid throwing an error instead of
catching it.

Greetings,

Andres Freund

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-02-03 Thread Damir Belyalov

Hi, Danil and Nikita!
Thank you for reviewing.

Why is there no static keyword in the definition of the SafeCopying()
> function, but it is in the function declaration.
>
Correct this.

675: MemoryContext cxt =
> MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
> 676:
> 677: valid_row = NextCopyFrom(cstate, econtext, myslot->tts_values,
> myslot->tts_isnull);
> 678: tuple_is_valid = valid_row;
> 679:
> 680: if (valid_row)
> 681: sfcstate->safeBufferBytes += cstate->line_buf.len;
> 682:
> 683: CurrentMemoryContext = cxt;
>
> Why are you using a direct assignment to CurrentMemoryContext instead of
> using the MemoryContextSwitchTo function in the SafeCopying() routine?
>

"CurrentMemoryContext = cxt" is the same as "MemoryContextSwitchTo(cxt)",
you can see it in the implementation of MemoryContextSwitchTo(). Also
correct this.


When you initialize the cstate->sfcstate structure, you create a
> cstate->sfcstate->safe_cxt memory context that inherits from oldcontext.
> Was it intended to use cstate->copycontext as the parent context here?
>

Good remark, correct this.



Thanks Nikita Malakhov for advice to create file with errors. But I decided
to to log errors in the system logfile and don't print them to the
terminal. The error's output in logfile is rather simple - only error
context logs (maybe it's better to log all error information?).
There are 2 points why logging errors in logfile is better than logging
errors in another file (e.g. PGDATA/copy_ignore_errors.txt). The user is
used to looking for errors in logfile. Creating another file entails
problems like: 'what file name to create?', 'do we need to make file
rotation?', 'where does this file create?' (we can't create it in PGDATA
cause of memory constraints)



Regards,
Damir Belyalov
Postgres Professional
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index c25b52d0cb..50151aec54 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -34,6 +34,7 @@ COPY { table_name [ ( format_name
 FREEZE [ boolean ]
+IGNORE_ERRORS [ boolean ]
 DELIMITER 'delimiter_character'
 NULL 'null_string'
 HEADER [ boolean | MATCH ]
@@ -233,6 +234,18 @@ COPY { table_name [ ( 

 
+   
+IGNORE_ERRORS
+
+ 
+  Drops rows that contain malformed data while copying. These are rows
+  containing syntax errors in data, rows with too many or too few columns,
+  rows containing columns where the data type's input function raises an error.
+  Logs errors to system logfile and outputs the total number of errors.
+ 
+
+   
+

 DELIMITER
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e34f583ea7..e741ce3e5a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -407,6 +407,7 @@ ProcessCopyOptions(ParseState *pstate,
    bool is_from,
    List *options)
 {
+	bool		ignore_errors_specified = false;
 	bool		format_specified = false;
 	bool		freeze_specified = false;
 	bool		header_specified = false;
@@ -449,6 +450,13 @@ ProcessCopyOptions(ParseState *pstate,
 			freeze_specified = true;
 			opts_out->freeze = defGetBoolean(defel);
 		}
+		else if (strcmp(defel->defname, "ignore_errors") == 0)
+		{
+			if (ignore_errors_specified)
+errorConflictingDefElem(defel, pstate);
+			ignore_errors_specified = true;
+			opts_out->ignore_errors = defGetBoolean(defel);
+		}
 		else if (strcmp(defel->defname, "delimiter") == 0)
 		{
 			if (opts_out->delim)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index af52faca6d..657fa44e0b 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -107,6 +107,9 @@ static char *limit_printout_length(const char *str);
 
 static void ClosePipeFromProgram(CopyFromState cstate);
 
+static bool SafeCopying(CopyFromState cstate, ExprContext *econtext,
+		TupleTableSlot *myslot);
+
 /*
  * error context callback for COPY FROM
  *
@@ -625,6 +628,173 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
 	miinfo->bufferedBytes += tuplen;
 }
 
+/*
+ * Safely reads source data, converts to a tuple and fills tuple buffer.
+ * Skips some data in the case of failed conversion if data source for
+ * a next tuple can be surely read without a danger.
+ */
+static bool
+SafeCopying(CopyFromState cstate, ExprContext *econtext, TupleTableSlot *myslot)
+{
+	SafeCopyFromState  *sfcstate = cstate->sfcstate;
+	bool valid_row = true;
+
+	/* Standard COPY if IGNORE_ERRORS is disabled */
+	if (!cstate->sfcstate)
+		/* Directly stores the values/nulls array in the slot */
+		return NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull);
+
+	if (sfcstate->replayed_tuples < sfcstate->saved_tuples)
+	{
+		Assert(sfcstate->saved_tuples > 0);
+
+		/* Prepare to replay the tuple */
+		heap_deform_tuple(sfcstate->safe_buffer[sfcstate->replayed_tuples++], RelationGetDescr(cstate->rel),
+

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2022-12-09 Thread Danil Anisimow

Hi!

I have looked at your patch and have a few questions.

110: static bool SafeCopying(CopyFromState cstate, ExprContext *econtext,
111: TupleTableSlot *myslot);
---
636: bool
637: SafeCopying(CopyFromState cstate, ExprContext *econtext,
TupleTableSlot *myslot)

Why is there no static keyword in the definition of the SafeCopying()
function, but it is in the function declaration.

675: MemoryContext cxt =
MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
676:
677: valid_row = NextCopyFrom(cstate, econtext, myslot->tts_values,
myslot->tts_isnull);
678: tuple_is_valid = valid_row;
679:
680: if (valid_row)
681: sfcstate->safeBufferBytes += cstate->line_buf.len;
682:
683: CurrentMemoryContext = cxt;

Why are you using a direct assignment to CurrentMemoryContext instead of
using the MemoryContextSwitchTo function in the SafeCopying() routine?

1160: /* Standard copying with option "safe copying" enabled by
IGNORE_ERRORS. */
1161: if (!SafeCopying(cstate, econtext, myslot))
1162: break;

I checked with GDB that the CurrentMemoryContext changes when SafeCopying
returns. And the target context may be different each time you do a COPY in
psql.

1879: cstate->sfcstate->safe_cxt = AllocSetContextCreate(oldcontext,
1880: "Safe_context",
1881: ALLOCSET_DEFAULT_SIZES);

When you initialize the cstate->sfcstate structure, you create a
cstate->sfcstate->safe_cxt memory context that inherits from oldcontext.
Was it intended to use cstate->copycontext as the parent context here?

On Wed, Nov 2, 2022 at 11:46 AM Damir Belyalov  wrote:

> Updated the patch:
> - Optimized and simplified logic of IGNORE_ERRORS
> - Changed variable names to more understandable ones
> - Added an analogue of MAX_BUFFERED_BYTES for safe buffer
>
>
> Regards,
> Damir Belyalov
> Postgres Professional
>

Regards,
Daniil Anisimov
Postgres Professional

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2022-12-07 Thread Nikita Malakhov

Hi Damir!

Your work looks like a very promising feature for production systems,
where data often needs to be loaded from external sources.

I've looked over the discussion and want to make a proposal -
when we load a bunch of records in database it does not make sense
to output errors to command output, and does not make sense to limit
error output to any number at all, because if we decided to load data
anyway - we would want to have a list (a file) with all records that were
discarded because of errors, with related error information, to, say,
deal with errors and process these records later. It looks like a reasonable
addition to your patch.

As a command output some limited number of error messages has much
less meaning than overall stats - records processed, records loaded,
records discarded, total number of errors.

For example you can look the Oracle SQL Loader feature, I hope this could
give some ideas for further improvements.

On Wed, Nov 2, 2022 at 11:46 AM Damir Belyalov  wrote:

> Updated the patch:
> - Optimized and simplified logic of IGNORE_ERRORS
> - Changed variable names to more understandable ones
> - Added an analogue of MAX_BUFFERED_BYTES for safe buffer
>
>
> Regards,
> Damir Belyalov
> Postgres Professional
>
>>

-- 
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

1 2 >

1 - 100 of 115 matches

Mail list logo