Re: [EXTERNAL]: Re: [EXTERNAL]: Re: UPSERT in Postgres

2023-04-10 Thread Peter J. Holzer
On 2023-04-10 14:35:38 +0200, Karsten Hilbert wrote:
> All I really wanted to hint at is that "incoming call
> timestamp" may work pretty well in given settings but does
> not _always_ make for a "unique enough" key.

This is true for all natural primary keys: Any attribute of an entity
which is unique for a given application may not be unique for other
applications.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: [EXTERNAL]: Re: [EXTERNAL]: Re: UPSERT in Postgres

2023-04-10 Thread Karsten Hilbert
Am Mon, Apr 10, 2023 at 01:33:41PM +0200 schrieb Francisco Olarte:

> > > > This the part that's always eluded me: How does the client, the
> > > > UPSERTer, come to hold an id and not know whether or not it's already in
> > > > the database.
> > >
> > > This is extremely easy to do if you have natural instead of surrogate 
> > > keys.
> > >
> > > I work in telephony, upserting the last incoming call timestamp for a
> > > phone number will be exactly that.
> >
> > timezones ?
> > DST ?
>
> A timestamp is a point in the time line, this is what I insert, just a
> real number marking a line, timezones and dst are presentation stuff.

Indeed, as is the assumption which time line the numbers are
referring to. Hence the incoming call timestamp is usable as
a (natural) PK with respect to a given time line only, right?

> > spoofing ?
>
> ¿ Of what ?

The time stamp. But then I assume that is obtained on the
logging system.

All I really wanted to hint at is that "incoming call
timestamp" may work pretty well in given settings but does
not _always_ make for a "unique enough" key.

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B




Re: [EXTERNAL]: Re: [EXTERNAL]: Re: UPSERT in Postgres

2023-04-10 Thread Francisco Olarte
Hi karsten:

On Mon, 10 Apr 2023 at 11:40, Karsten Hilbert  wrote:
>
> Am Mon, Apr 10, 2023 at 09:41:15AM +0200 schrieb Francisco Olarte:
>
> > On Mon, 10 Apr 2023 at 04:16, Rob Sargent  wrote:
> > > > An UPSERT checks whether a row exists, if so, it does an update, if not 
> > > > it does an insert. This is the literal definition.
> > > This the part that's always eluded me: How does the client, the
> > > UPSERTer, come to hold an id and not know whether or not it's already in
> > > the database.
> >
> > This is extremely easy to do if you have natural instead of surrogate keys.
> >
> > I work in telephony, upserting the last incoming call timestamp for a
> > phone number will be exactly that.
>
> timezones ?
> DST ?

A timestamp is a point in the time line, this is what I insert, just a
real number marking a line, timezones and dst are presentation stuff.

> spoofing ?

¿ Of what ? I do it for a phone number, not for a customer, it does
not matter to me if the number came from a legit customer or from a
spoofer, I want it for the phone number.

Francisco Olarte.




Re: [EXTERNAL]: Re: [EXTERNAL]: Re: UPSERT in Postgres

2023-04-10 Thread Karsten Hilbert
Am Mon, Apr 10, 2023 at 09:41:15AM +0200 schrieb Francisco Olarte:

> On Mon, 10 Apr 2023 at 04:16, Rob Sargent  wrote:
> > > An UPSERT checks whether a row exists, if so, it does an update, if not 
> > > it does an insert. This is the literal definition.
> > This the part that's always eluded me: How does the client, the
> > UPSERTer, come to hold an id and not know whether or not it's already in
> > the database.
>
> This is extremely easy to do if you have natural instead of surrogate keys.
>
> I work in telephony, upserting the last incoming call timestamp for a
> phone number will be exactly that.

timezones ?

DST ?

spoofing ?

...

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B




Re: [EXTERNAL]: Re: [EXTERNAL]: Re: UPSERT in Postgres

2023-04-10 Thread Francisco Olarte
Hi Rob:

On Mon, 10 Apr 2023 at 04:16, Rob Sargent  wrote:
> > An UPSERT checks whether a row exists, if so, it does an update, if not it 
> > does an insert. This is the literal definition.
> This the part that's always eluded me: How does the client, the
> UPSERTer, come to hold an id and not know whether or not it's already in
> the database.

This is extremely easy to do if you have natural instead of surrogate keys.

I work in telephony, upserting the last incoming call timestamp for a
phone number will be exactly that.

Francisco Olarte.




Re: [EXTERNAL]: Re: [EXTERNAL]: Re: UPSERT in Postgres

2023-04-10 Thread Francisco Olarte
Hi Louis:

On Mon, 10 Apr 2023 at 03:05, Louis Tian  wrote:
> I think we need to make a distinction between an "operation" and a 
> "statement".
OK

> The concept of idempotency applies to an "operation" not an entire statement.
I think I'll need a definition of both to say anything on this.

> Like how HTTP "PUT" method is defined as "idempotent", you don't say actual 
> HTTP PUT request is idempotent.
Well, in HTTP a request is half defined.

> With the "current_datetime" and "access_count+1", you are effectively 
> changing the value passing to the UPSERT operator.
I can agree with current_datetime ( I doubt I used that... checks the
BOTTOM QUOTE ... effectively I did not ) which can be thought as a
fancy macro for passing a value, but access_count+1 is an expression,
I do not and could not pass it, the engine has to read the row(s)
affected by the insert to use it.

Anyway, what I was trying to point is that UPSERT has no standard
definition, and that the usual implementations of the concept can be
used to build idempotent "requests", but they are many times used for
non-idempotent ones, like timestamp logging or counter updates.

As postgres does not have, AFAIK, an "UPSERT" statement I think you
will need to define it before further discussion, something like
mapping it to an insert on conflict or similar.

> Just like how you changed the payload of a PUT, then obviously there is no 
> reason to expect the state of the database to remain the same.

I did not change the payload of a put, I assume you refer to the
database request, but I'm too old to go hunting for the exact thing
you refer to on a response which quotes even my signatures and the
virus scanner lines.

You can repeat a PUT request or not repeat it. HTTP request cover body
too, changing body is like changing URI, or changing method, it is a
different request. Idempotent in HTTP is same request, body included,
same response. Not similar request, similar response.

Francisco Olarte.




Re: [EXTERNAL]: Re: [EXTERNAL]: Re: UPSERT in Postgres

2023-04-09 Thread Adrian Klaver

On 4/9/23 19:16, Rob Sargent wrote:

On 4/9/23 19:55, Louis Tian wrote:

Hi Alban,

"I am not expecting an error here", by "here" I means when doing a 
TRUE UPSERT (an upsert current does not exist in Postgres).
I am NOT referring to an "Insert on conflict do update" (which despite 
its intention and wide acceptance is not fully equivalent to a true 
upsert).
I understand the error I am getting now is due to not null constraint 
given how "insert on conflict" works.


An UPSERT checks whether a row exists, if so, it does an update, if 
not it does an insert. This is the literal definition.


This the part that's always eluded me: How does the client, the 
UPSERTer, come to hold an id and not know whether or not it's already in 
the database.


My use case is for bulk loading data into a table I know has data that 
will create a PK/Unique violation with the inserted data. It's a quick 
and dirty way to avoid queries that look for potential violations ahead 
of time, basically 'Ask forgiveness' vs 'Get permission'.


--
Adrian Klaver
adrian.kla...@aklaver.com





Re: [EXTERNAL]: Re: [EXTERNAL]: Re: UPSERT in Postgres

2023-04-09 Thread Rob Sargent

On 4/9/23 19:55, Louis Tian wrote:

Hi Alban,

"I am not expecting an error here", by "here" I means when doing a TRUE UPSERT 
(an upsert current does not exist in Postgres).
I am NOT referring to an "Insert on conflict do update" (which despite its 
intention and wide acceptance is not fully equivalent to a true upsert).
I understand the error I am getting now is due to not null constraint given how 
"insert on conflict" works.

An UPSERT checks whether a row exists, if so, it does an update, if not it does 
an insert. This is the literal definition.


This the part that's always eluded me: How does the client, the 
UPSERTer, come to hold an id and not know whether or not it's already in 
the database.






RE: [EXTERNAL]: Re: [EXTERNAL]: Re: UPSERT in Postgres

2023-04-09 Thread Louis Tian
Hi Alban, 

"I am not expecting an error here", by "here" I means when doing a TRUE UPSERT 
(an upsert current does not exist in Postgres). 
I am NOT referring to an "Insert on conflict do update" (which despite its 
intention and wide acceptance is not fully equivalent to a true upsert).
I understand the error I am getting now is due to not null constraint given how 
"insert on conflict" works. 

An UPSERT checks whether a row exists, if so, it does an update, if not it does 
an insert. This is the literal definition.
An UPSERT is NOT defined as try do an INSERT first, if violate uniqueness 
constraint, do update. This is what on conflict do update is doing. 
We cannot define UPSERT with what "ON CONFLICT DO UPDATE" is doing. That is a 
logical fallacy.
 UPSERT is a higher-level abstract concept. ON CONFLICT DO UPDATE is an 
implementation of UPSERT. not the other way around.

When doing a true UPSERT, if a row already exists, there is no need provide an 
(not null) column, since I am doing an update. 

With `UPSERT person (id, is_active)` VALUES (0, true). Is it necessary to 
provide the not null "name" column here logically? 
Not really, I already specified the row with the `id` column, then I specify 
the column I want to update `is_active`. 
* the id does exist; the update can be executed without any issue or assumptions
* the id does not exit; then I am expecting a violate not null constraint. 

On contrast, with `INSERT person (id, is_active) value (0, true) ON CONFLICT DO 
UPDATE set is_active=true`,
this statement will always fail regardless of whether the id exists or not. 
So the behavior of ON CONFLICT DO UPDATE differs from a UPSERT with the 
presence of a not null column. 
Hence why ON CONFLICT DO UPDATE is not complete equivalent to a TRUE upsert.

Cheers,
Louis Tian


-Original Message-
From: Alban Hertroys  
Sent: Sunday, April 9, 2023 7:26 PM
To: Louis Tian 
Cc: Peter Geoghegan ; pgsql-general@lists.postgresql.org
Subject: [EXTERNAL]: Re: [EXTERNAL]: Re: UPSERT in Postgres

CAUTION: This email originated from outside of Envirada. Do not click links or 
open attachments unless you recognize the sender and know the content is safe.


> On 7 Apr 2023, at 2:49, Louis Tian  wrote:

(…)

> I am not expecting an error here. The problem is with no conflict it always 
> go down the insert path first and results in a not null constraint error.
> While I am expecting the insert is never executed in the first place when 
> that row already exist (as identified by it primary key). So the update 
> execute without error.
> I hope the pesudo code above is enough to clarify the difference?

Your assumption on what the problem is, is not correct. The problem is not with 
the conflict resolution, it is with your statement violating a not null 
constraint.

It doesn’t matter whether you insert first or update first, either operation is 
going to violate that constraint. You’re specifying a NULL value for a column 
that doesn’t accept that because it has a NOT NULL constraint. That is your 
problem.

Alban Hertroys
--
There is always an exception to always.




--
Message  protected by MailGuard: e-mail anti-virus, anti-spam and content 
filtering.https://www.mailguard.com.au/mg
Click here to report this message as spam:
https://console.mailguard.com.au/ras/25ayougcIv/4FIia1zrtWT2nnuHlesEOS/1.8



RE: [EXTERNAL]: Re: [EXTERNAL]: Re: UPSERT in Postgres

2023-04-09 Thread Louis Tian
Hi Francisco, 

I think we need to make a distinction between an "operation" and a "statement".
The concept of idempotency applies to an "operation" not an entire statement. 
Like how HTTP "PUT" method is defined as "idempotent", you don't say actual 
HTTP PUT request is idempotent. 
With the "current_datetime" and "access_count+1", you are effectively changing 
the value passing to the UPSERT operator. 
Just like how you changed the payload of a PUT, then obviously there is no 
reason to expect the state of the database to remain the same. 

Cheers,
Louis Tian


-Original Message-
From: Francisco Olarte  
Sent: Sunday, April 9, 2023 7:32 PM
To: Louis Tian 
Cc: Peter Geoghegan ; pgsql-general@lists.postgresql.org
Subject: [EXTERNAL]: Re: [EXTERNAL]: Re: UPSERT in Postgres

CAUTION: This email originated from outside of Envirada. Do not click links or 
open attachments unless you recognize the sender and know the content is safe.


(not the OP on idempotency)

On Sat, 8 Apr 2023 at 18:33, Louis Tian  wrote:
> > In general UPSERT (or any definition of it that I can think of) does 
> > not imply idempotency.
> "Idempotence is the property of certain operations in mathematics and 
> computer science whereby they can be applied multiple times without changing 
> the result beyond the initial application." from Wikipedia.
> the concept of Idempotence when applies to HTTP is consistent with the above. 
> https://developer.mozilla.org/en-US/docs/Glossary/Idempotent. Or are you 
> going by a different defintion that I am not aware of?
> If you execute the same upsert multiple times, the state of the database 
> remains the same as if only execute once.
> If a row already exists, the first statement will update the row so does any 
> subsequent statements. executing the same update multiple time is the same as 
> executing it only once.
> If the row doesn't exist, the first statement will insert that row and any 
> subsequent will try to update, but the update has no real effect since it the 
> value is exactly the same as the insert.
> So by defintion, upsert is idempotent.

Only on a narrow definition of upsert.

You are thinking on a narrow ( but very frequent ) use of "upsert"
statements, something like:

insert on users(id,name) values (1,'x') on conflict(id) update set name='x'

But upsert can be used for things like:

   insert into last_access(id,cuando) values (1,current_timestamp) on
conflict(id) set cuando=current_timestamp

   insert into access_count(id, access_count) values (1,1) on
conflict(id) set access_count=access_count+1

Which are not idempotent ( and also frequent, I use both variants )

Francisco Olarte.
--
Message  protected by MailGuard: e-mail anti-virus, anti-spam and content 
filtering.https://www.mailguard.com.au/mg
Click here to report this message as spam:
https://console.mailguard.com.au/ras/25ayvBFOMd/7rJjrYVDtTx03A1wSEIfeL/0.1



Re: [EXTERNAL]: Re: [EXTERNAL]: Re: UPSERT in Postgres

2023-04-09 Thread Louis Tian
Hi Adrian, 

> No it is not as Israel Brewster pointed out.
I think we need make a distinction between an operation and a statement here
The examples Israel presented and yours are both non-idempotent statements 
because the different value being upserted each time the statement is executed. 
Not because upsert as an operation is not idempotent. 
Likewise, In HTTP, PUT method is defined as idempotent. No one would expect 
different PUT requests with different payload to be idempotent. 

Also, I don't think I am not the only one who think upsert is logically 
idempotent. Just a few seconds of searching on google.
- "The upsert action is a combination of insert and update. It allows us to 
write idempotent statements". 
(https://jonmeyers.io/blog/use-on-conflict-to-upsert-in-postgresql. )
-"One common example of an idempotent operation is an upsert (update or insert) 
statement in a database." 
(https://www.arecadata.com/core-data-engineering-concepts-idempotency)
- "A POST request means that the request is not idempotent, but an "upsert" 
request is idempotent." 
(https://softwareengineering.stackexchange.com/questions/426225/rest-how-to-upsert-a-resource-without-an-specific-resource-url#:~:text=A%20POST%20request%20means%20that,%22upsert%22%20request%20is%20idempotent.)
While no as direct as the article above, but the issue reported in those places 
are essentially stem from the same understanding. 
- 
https://stackoverflow.com/questions/48816629/on-conflict-do-nothing-in-postgres-with-a-not-null-constraint
- 
https://www.postgresql.org/message-id/flat/1996438.1604952901%40sss.pgh.pa.us#ace8adc1354265aca5672028faea0c0f
So at least I am not only one imaging things?
- "a ready made statement for idempotent data like Mongo's upsert" 
(https://www.compose.com/articles/the-potency-of-idempotent-with-rabbitmq-and-mongodb-upsert/)

> And therein lies your problem, you are imagining something that does not
> exist and more to the point will most likely not exist as it would break all 
> code that depends on above behavior.
I never argue the way postgresql does for insert ... on conflict is wrong and 
need to change. 
I said "I think it is doing what it reads quite literally" just like in your 
words "It does what is advertised on the tin". So there is no disagreement 
here. So I am not saying "insert on conflict" need to change. 
What I am point out here is, the "insert...on conflict do update" is not a true 
"upsert". 
I am saying in my opinion it would be nice see UPSERT statement to be added to 
PostgreSQL so upsert can work "out of box" even when there're not null 
constraints on that table.

It might never happen judging from the replies I am getting but that's fine. 
The solution you have showed me is good enough as a workaround (to not 
depreciate it's value, just not as easy/bullet proof as it could be in my 
opinion). 
It was just a question just to confirm my understanding so I got what I need, 
so thank you all for that. 


Cheers,
Louis Tian


From: Adrian Klaver 
Sent: Sunday, April 9, 2023 7:51 AM
To: Louis Tian ; Peter Geoghegan 
Cc: pgsql-general@lists.postgresql.org 
Subject: [EXTERNAL]: Re: [EXTERNAL]: Re: UPSERT in Postgres 
 
CAUTION: This email originated from outside of Envirada. Do not click links or 
open attachments unless you recognize the sender and know the content is safe.


On 4/6/23 17:49, Louis Tian wrote:
> Hi Peter,
>
> Thanks for your reply. Appreciate the help and discussion.
>
>>   In general UPSERT (or any definition of it that I can think of) does
>> not imply idempotency.
>
> "Idempotence is the property of certain operations in mathematics and 
> computer science whereby they can be applied multiple times without changing 
> the result beyond the initial application." from Wikipedia.
> the concept of Idempotence when applies to HTTP is consistent with the above. 
> https://developer.mozilla.org/en-US/docs/Glossary/Idempotent. Or are you 
> going by a different defintion that I am not aware of?
> If you execute the same upsert multiple times, the state of the database 
> remains the same as if only execute once.
> If a row already exists, the first statement will update the row so does any 
> subsequent statements. executing the same update multiple time is the same as 
> executing it only once.
> If the row doesn't exist, the first statement will insert that row and any 
> subsequent will try to update, but the update has no real effect since it the 
> value is exactly the same as the insert.
> So by defintion, upsert is idempotent.

No it is not as Israel Brewster pointed out.

To his example I would add:

alter some_table add column ts_upsert_update timestamptz;

insert into some_table values('foo', 'bar') on conflict(tbl_id) do
update set foo_fld = excluded.foo_fld, bar_fld = some_table.bar_fld,
ts_upsert_update = now();

You are substituting whatever definition you have in your head for the
definition as it actually exists.

>
>> It could just be a unique index or a uni

Re: [EXTERNAL]: Re: [EXTERNAL]: Re: UPSERT in Postgres

2023-04-09 Thread Louis Tian



From: Israel Brewster 
Sent: Sunday, April 9, 2023 3:09 AM
To: Louis Tian 
Cc: Peter Geoghegan ; pgsql-general@lists.postgresql.org 

Subject: [EXTERNAL]: Re: [EXTERNAL]: Re: UPSERT in Postgres

Thanks Israel. Your example really helped me to understand where we differ.

> Not necessarily. Consider the following UPSERT statement:
> INSERT INTO customer_order_counts (customer_id,order_count) VALUES (123456, 
> 1) ON CONFLICT (customer_id) DO UPDATE SET 
> order_count=customer_order_counts.order_count+1;

Yes, you are 100% right that is not idempotent statement. But that's not 
because UPSERT is idempotent ​*operation*. I am making a distinction between an 
operation and statement here.
This statement is not idempotent is not because UPSERT operation is not 
idempotent, it is because the value being upsert is different in the statement 
each time you run that statement.

> Not really. Generally when I am doing an UPSERT, I am NOT using the primary 
> key, but rather some other UNIQUE-ly indexed column(s). My primary key is 
> typically an ID column that is defined as a serial, > and automatically 
> generated by the database. The unique column I use for the upset, however, is 
> generally something that would identify the row to a human - such as Station 
> ID and timestamp
> columns in a database I have of seismic readings. Each reading gets a unique 
> ID (the primary key) that identifies it to the database and is used in joins. 
> However, occasionally a reading gets updated, so I > update the row, based 
> not on the primary key (which I don’t know for incoming data), but on the 
> station and timestamp. UPSERT, but not based on the primary key.

I am using "primary key" to refer to the "row identifier" in a conceptual way.
Definition from Wiki, "In the relational 
model of 
databases, a primary key is a specific 
choice of a minimal set of attributes 
(columns) that uniquely 
specify a tuple (row) in a 
relation 
(table)."
So, it's not necessarily the primary key you declared on the table (for 
example, the id column in your example).
To put it in another way, the Station ID and timestamp columns work in the 
capability of a primary key.
I was merely setting a scene in my orginal email. What I was trying to say is 
"UPSERT only makes sense when each row is uniquely identifiable".
So please don't get me wrong there, I am not arguing that being able to do on 
conflict on any unique index is wrong or bad.

Cheers,
Louis Tian

> On Apr 6, 2023, at 4:49 PM, Louis Tian  wrote:
>
> Hi Peter,
>
> Thanks for your reply. Appreciate the help and discussion.
>
>>  In general UPSERT (or any definition of it that I can think of) does
>> not imply idempotency.
>
> "Idempotence is the property of certain operations in mathematics and 
> computer science whereby they can be applied multiple times without changing 
> the result beyond the initial application." from Wikipedia.
> the concept of Idempotence when applies to HTTP is consistent with the above. 
> https://developer.mozilla.org/en-US/docs/Glossary/Idempotent. Or are you 
> going by a different defintion that I am not aware of?
> If you execute the same upsert multiple times, the state of the database 
> remains the same as if only execute once.

Not necessarily. Consider the following UPSERT statement:

INSERT INTO customer_order_counts (customer_id,order_count) VALUES (123456, 1) 
ON CONFLICT (customer_id) DO UPDATE SET 
order_count=customer_order_counts.order_count+1;

That is completely valid (I tested it), and actually makes sense as something 
you might want to do - keep track of how many orders a customer has placed, for 
example if you only keep the order records for 6 months, but still want to know 
the total number of orders the customer has placed. If it is a new customer, 
you insert a record for the customer with an order count of 1. Otherwise, you 
update the record to increment the order count. Clearly this is NOT an 
idempotent operation - every time you run it, it changes the order count, so 
the state of the database does NOT remain the same as if you only execute it 
once.

> If a row already exists, the first statement will update the row so does any 
> subsequent statements. executing the same update multiple time is the same as 
> executing it only once.
> If the row doesn't exist, the first statement will insert that row and any 
> subsequent will try to update, but the update has no real effect since it the 
> value is exactly the same as the insert.
> So by defintion, upsert is idempotent.
>
>> It could just be a unique index or a unique constraint. So you can
>> upsert on any individual unique constraint/index, or the primary