SV: Possibilities for optimizing inserts across oracle_fdw foreign data wrapper

2021-09-21 Thread Niels Jespersen
-Oprindelig meddelelse-
>Fra: Laurenz Albe  
>Sendt: 20. september 2021 15:18
>Til: Niels Jespersen ; pgsql-gene...@postgresql.org
>Emne: Re: Possibilities for optimizing inserts across oracle_fdw foreign data 
>wrapper
>
>On Sun, 2021-09-19 at 10:28 +, Niels Jespersen wrote:
>> We are often using the oracle_fdw to transfer data between Postgres (version 
>> 11+) and Oracle (version 18+). It works great.
>>  
>> However I have a task at hand that requires inserting a few billion rows in 
>> an Oracle table from a Postgres query.
>> If I could make the Oracle insert direct load, that would usually also 
>> increase throughput.
> But, is that possible here. There are no constraints defined on the 
> destinaton tables.
>
>The cause of the bad performance for bulk data modifications is that the FDW 
>API is built that way: each row INSERTed means a round trip between PostgreSQL 
>and Oracle.
>
>That could be improved by collecting rows and inserting them in bulk on the 
>Oracle side, but I don't feel like implementing that and complicating the 
>code.>
>
>From my point of view, oracle_fdw is good for reading, but not for bulk writes.
>
>Yours,
>Laurenz Albe

Thank you for clairifying. I will use a python script for doing this, then. 
Read large chunks of data from Postgres and insert equally large chunks into 
Oracle, possibly using direct load. I know this can work. 

Regards Niels


Re: Timestamp with vs without time zone.

2021-09-21 Thread Laurenz Albe
On Tue, 2021-09-21 at 18:00 +1200, Tim Uckun wrote:
> I am hoping to get some clarification on timestamp with time zone.
> 
> My understanding is that timestamp with time zone stores data in UTC
> but displays it in your time zone.

That is correct.
When a timestamp is rendered as string, it it shown in the time zone
specified by the current setting of the "timezone" parameter in your
database session.

> Does this also work on queries? If
> I query between noon and 2:00 PM on some date in time zone XYZ does pg
> translate the query to UTC before sending it to the server?

Yes.

> To provide context I have the following situation.
> 
> I have a data file to import. All the dates in the time zone
> pacific/auckland. My app reads the data , does some processing and
> cleaning up and then saves it to the database.
> 
> The language I am using creates the time data type with the right time
> zone. The processing is being done on a server which is on UTC, the
> database server is also on UTC.  I am pretty sure the ORM isn't
> appending "at time zone pacific/Auckland" to the data when it appends
> it to the database.
> 
> So does the database know the timestamp is in auckland time when the
> client is connecting from a server on UTC?

It does, but only if you set "timezone" appropriately in the database
session.  You could use ALTER ROLE to change the default setting for a
database user, but it might be best to set that from the application.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





Synchronous logical replication

2021-09-21 Thread asaba.takan...@fujitsu.com
Hello,

I have a question about synchronous logical replication.

There is the following description in the document "49.8. Synchronous 
Replication Support for Logical Decoding". 
> A synchronous replica receiving changes via logical decoding will work in the 
> scope of a single database. 
> Since, in contrast to that, synchronous_standby_names currently is server 
> wide, 
> this means this technique will not work properly if more than one database is 
> actively used.

What kind of problem will it occur?
For example, 

publisher:
db1=# \dRp
   List of publications
Name |  Owner   | All tables | Inserts | Updates | Deletes | Truncates 
| Via root
 
-+--++-+-+-+---+--
 pub1| user | f  | t   | t   | t   | t 
| f
(1 row)

db2=# \dRp
   List of publications
Name |  Owner   | All tables | Inserts | Updates | Deletes | Truncates 
| Via root
-+--++-+-+-+---+--
 pub2| user | f  | t   | t   | t   | t 
| f
(1 row)

synchronous_standby_names = 'FIRST 2 (sub1, sub2)'

subscriber:
db1=# \dRs
 List of subscriptions
Name | Owner  | Enabled |  Publication
-++-+---
 sub1| user   | t   | {pub1}
 (1 row)

db2=# \dRs
 List of subscriptions
Name | Owner  | Enabled |  Publication
-++-+---
 sub2| user   | t   | {pub2}
 (1 row)

In the above example, will not synchronous replication for both sub1 and sub2 
work?
Also, if sub2 is changed asynchronously, will any problem occur?

Regards,

--
Takanori Asaba






Re: Timestamp with vs without time zone.

2021-09-21 Thread Tim Uckun
It seems like it would be so much more useful if the timestamp with
time zone type actually stored the time zone in the record.


On Tue, Sep 21, 2021 at 7:25 PM Laurenz Albe  wrote:
>
> On Tue, 2021-09-21 at 18:00 +1200, Tim Uckun wrote:
> > I am hoping to get some clarification on timestamp with time zone.
> >
> > My understanding is that timestamp with time zone stores data in UTC
> > but displays it in your time zone.
>
> That is correct.
> When a timestamp is rendered as string, it it shown in the time zone
> specified by the current setting of the "timezone" parameter in your
> database session.
>
> > Does this also work on queries? If
> > I query between noon and 2:00 PM on some date in time zone XYZ does pg
> > translate the query to UTC before sending it to the server?
>
> Yes.
>
> > To provide context I have the following situation.
> >
> > I have a data file to import. All the dates in the time zone
> > pacific/auckland. My app reads the data , does some processing and
> > cleaning up and then saves it to the database.
> >
> > The language I am using creates the time data type with the right time
> > zone. The processing is being done on a server which is on UTC, the
> > database server is also on UTC.  I am pretty sure the ORM isn't
> > appending "at time zone pacific/Auckland" to the data when it appends
> > it to the database.
> >
> > So does the database know the timestamp is in auckland time when the
> > client is connecting from a server on UTC?
>
> It does, but only if you set "timezone" appropriately in the database
> session.  You could use ALTER ROLE to change the default setting for a
> database user, but it might be best to set that from the application.
>
> Yours,
> Laurenz Albe
> --
> Cybertec | https://www.cybertec-postgresql.com
>




Re: Possibilities for optimizing inserts across oracle_fdw foreign data wrapper

2021-09-21 Thread Laurenz Albe
On Mon, 2021-09-20 at 15:41 +0200, Magnus Hagander wrote:
> Just as a note, with PostgreSQL 14 the FDW APIs allow batch insert. It
> should be possible to update oracle_fdw to take advantage of that as
> well, right?

Yes.  The exercise is more complicated in Oracle, because they don't
support multi-line INSERTs.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





Re: Timestamp with vs without time zone.

2021-09-21 Thread Laurenz Albe
On Tue, 2021-09-21 at 19:35 +1200, Tim Uckun wrote:
> It seems like it would be so much more useful if the timestamp with
> time zone type actually stored the time zone in the record.

This has been requested before, and it would be closer to the intention
of the SQL standard, but I guess it won't happen.

For one, it would change on-disk storage, which would make it
impossible to use pg_upgrade.  It also would require timestamps to
occupy more than 8 bytes.

The best solution is probably to explicitly store the time zone as
an additional column.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





Aw: Re: Timestamp with vs without time zone.

2021-09-21 Thread Karsten Hilbert
> It seems like it would be so much more useful if the timestamp with
> time zone type actually stored the time zone in the record.

Which one ?

Karsten





Aw: Re: Timestamp with vs without time zone.

2021-09-21 Thread Karsten Hilbert
> > It seems like it would be so much more useful if the timestamp with
> > time zone type actually stored the time zone in the record.
>
> This has been requested before, and it would be closer to the intention
> of the SQL standard, but I guess it won't happen.
>
> For one, it would change on-disk storage, which would make it
> impossible to use pg_upgrade.  It also would require timestamps to
> occupy more than 8 bytes.
>
> The best solution is probably to explicitly store the time zone as
> an additional column.

Would that not be a perfect candidate for a fully fleshed out,
exemplary composite type ?

(but, then, yes it would beg the question what the TZ field of
 the composite is to actually mean...)

Karsten




Re: Re: Timestamp with vs without time zone.

2021-09-21 Thread Tim Uckun
One specified by the user.  Many date formats carry either an offset
or the time zone information.

On Tue, Sep 21, 2021 at 7:39 PM Karsten Hilbert  wrote:
>
> > It seems like it would be so much more useful if the timestamp with
> > time zone type actually stored the time zone in the record.
>
> Which one ?
>
> Karsten
>




Aw: Re: Re: Timestamp with vs without time zone.

2021-09-21 Thread Karsten Hilbert
> > > It seems like it would be so much more useful if the timestamp with
> > > time zone type actually stored the time zone in the record.
> >
> > Which one ?
> >
> One specified by the user.  Many date formats carry either an offset
> or the time zone information.

What would that TZ mean, exactly, especially in relation to the timestamp 
itself ?

Would the timestamp be stored as that TZ ?

Karsten




Re: Re: Timestamp with vs without time zone.

2021-09-21 Thread Peter J. Holzer
On 2021-09-21 09:39:59 +0200, Karsten Hilbert wrote:
> > It seems like it would be so much more useful if the timestamp with
> > time zone type actually stored the time zone in the record.
> 
> Which one ?

To expand on that question a bit:

There are several formats to specify a time zone: By offset, by name
(several nomenclatures), etc. 

For example, Karsten's mail had a timestamp of "2021-09-21 09:39:59
+0200". Thst's enough information to convert it to UTC, but not enough
for date arithmetic. For example what is 
'2021-09-21 09:39:59 +0200' + '2 months'::interval?

Should the result be '2021-11-21 09:39:59 +0200' or '2021-11-21 09:39:59
+0100'? I'm guessing that Karsten is in Germany, so it's probably the
latter. But to compute that you need to know that the timezone is
Europe/Berlin (or at least CET). Even that is not enough for dates in
the more distant future. The EU has decided to abolish DST (that should
have happened in 2020, but of course there was that little problem that
got in the way), but we don't know when that will happen and which
timezone Germany will choose. So for a date in e.g. 2025 we simply don't
know what the timezone offset will be.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Re: Re: Timestamp with vs without time zone.

2021-09-21 Thread Tim Uckun
Yes it would record the timestamp and then also record the time zone.
That way all the conversion functions would still work.

That's the way it works in the programming languages I use anyway.

On Tue, Sep 21, 2021 at 8:09 PM Karsten Hilbert  wrote:
>
> > > > It seems like it would be so much more useful if the timestamp with
> > > > time zone type actually stored the time zone in the record.
> > >
> > > Which one ?
> > >
> > One specified by the user.  Many date formats carry either an offset
> > or the time zone information.
>
> What would that TZ mean, exactly, especially in relation to the timestamp 
> itself ?
>
> Would the timestamp be stored as that TZ ?
>
> Karsten




Re: Re: Timestamp with vs without time zone.

2021-09-21 Thread Tim Uckun
That's all true and I won't argue about the madness that is timezones
in the world. I am simply thinking it would be some sort of a struct
like thing which would store the numerical value of the time stamp and
also the time zone that time was recorded in.  Presumably everything
else is an insane calculation from there. What was the offset on that
day? I guess it depends on the daylight savings time. What would the
conversion to another time zone be? That would depend on the DST
settings on that day in both places.

Mankind can't agree on what side of the road to drive on, what the
electrical voltage should be at the wall, what those plugs should be,
how you should charge your phone or anything else for that matter so
there is no way of avoiding the insanity.  It's just that the phrase
"timestamp with time zone" would seem to indicate the time zone is
stored somewhere in there.


On Tue, Sep 21, 2021 at 8:44 PM Peter J. Holzer  wrote:
>
> On 2021-09-21 09:39:59 +0200, Karsten Hilbert wrote:
> > > It seems like it would be so much more useful if the timestamp with
> > > time zone type actually stored the time zone in the record.
> >
> > Which one ?
>
> To expand on that question a bit:
>
> There are several formats to specify a time zone: By offset, by name
> (several nomenclatures), etc.
>
> For example, Karsten's mail had a timestamp of "2021-09-21 09:39:59
> +0200". Thst's enough information to convert it to UTC, but not enough
> for date arithmetic. For example what is
> '2021-09-21 09:39:59 +0200' + '2 months'::interval?
>
> Should the result be '2021-11-21 09:39:59 +0200' or '2021-11-21 09:39:59
> +0100'? I'm guessing that Karsten is in Germany, so it's probably the
> latter. But to compute that you need to know that the timezone is
> Europe/Berlin (or at least CET). Even that is not enough for dates in
> the more distant future. The EU has decided to abolish DST (that should
> have happened in 2020, but of course there was that little problem that
> got in the way), but we don't know when that will happen and which
> timezone Germany will choose. So for a date in e.g. 2025 we simply don't
> know what the timezone offset will be.
>
> hp
>
> --
>_  | Peter J. Holzer| Story must make more sense than reality.
> |_|_) ||
> | |   | h...@hjp.at |-- Charles Stross, "Creative writing
> __/   | http://www.hjp.at/ |   challenge!"




Aw: Re: Re: Timestamp with vs without time zone.

2021-09-21 Thread Karsten Hilbert
> It's just that the phrase
> "timestamp with time zone" would seem to indicate the time zone is
> stored somewhere in there.

Now, I can fully agree with _that_ :-)

Karsten





Question about behavior of conditional indexes

2021-09-21 Thread Koen De Groote
Greetings all,

Working on postgres 11.

I'm researching an index growing in size and never shrinking, and not being
used anymore after a while.

The index looks like this:

"index002" btree (action_performed, should_still_perform_action,
action_performed_at DESC) WHERE should_still_perform_action = false
AND action_performed = true

So, there are 2 boolean fields, and a conditional clause for both. The
table has about 50M rows, the index barely ever goes over 100K matched rows.

The idea is to search for rows matching these conditions quickly, and then
update them. This happens daily.

This means the condition no longer match the index. At this point, does the
row get removed from the index? Or does it stay in there as a dead row?

I'm noticing index bloat on this index and am wondering if all these
updated rows just stay in the index?

The index currently sits at 330MB. If I drop and re-create it, it's 1.5MB.

A cronjob runs a vacuum once per day, I can see the amount of dead rows
dropping in monitoring software.

But should this also take care of indexes? In postgres 11, you can't
reindex concurrently, so I was wondering if indexes are skipped by vacuum?
Or only in case of conditional indexes?



So I'm wondering if the behavior is as I described.

Regards,
Koen De Groote


Re: Question about behavior of conditional indexes

2021-09-21 Thread Gavin Flower

On 21/09/21 22:28, Koen De Groote wrote:

Greetings all,

Working on postgres 11.

I'm researching an index growing in size and never shrinking, and not 
being used anymore after a while.


The index looks like this:

|"index002" btree (action_performed, should_still_perform_action, 
action_performed_at DESC) WHERE should_still_perform_action = false 
AND action_performed = true |
So, there are 2 boolean fields, and a conditional clause for both. The 
table has about 50M rows, the index barely ever goes over 100K matched 
rows.


The idea is to search for rows matching these conditions quickly, and 
then update them. This happens daily.


This means the condition no longer match the index. At this point, 
does the row get removed from the index? Or does it stay in there as a 
dead row?


I'm noticing index bloat on this index and am wondering if all these 
updated rows just stay in the index?


The index currently sits at 330MB. If I drop and re-create it, it's 1.5MB.

A cronjob runs a vacuum once per day, I can see the amount of dead 
rows dropping in monitoring software.


But should this also take care of indexes? In postgres 11, you can't 
reindex concurrently, so I was wondering if indexes are skipped by 
vacuum? Or only in case of conditional indexes?




So I'm wondering if the behavior is as I described.

Regards,
Koen De Groote


Can you upgrade to pg13?  A lot of work was done on indexes in pg12 & 
13.  So possibly your problem may have been resolved, at least in part.


Note that pg 14 is due out this month.


Cheers,
Gavin





Re: Question about behavior of conditional indexes

2021-09-21 Thread Ninad Shah
On Tue, 21 Sept 2021 at 15:59, Koen De Groote  wrote:

> Greetings all,
>
> Working on postgres 11.
>
> I'm researching an index growing in size and never shrinking, and not
> being used anymore after a while.
>
> The index looks like this:
>
> "index002" btree (action_performed, should_still_perform_action, 
> action_performed_at DESC) WHERE should_still_perform_action = false AND 
> action_performed = true
>
> So, there are 2 boolean fields, and a conditional clause for both. The
> table has about 50M rows, the index barely ever goes over 100K matched rows.
>
> The idea is to search for rows matching these conditions quickly, and then
> update them. This happens daily.
>
> This means the condition no longer match the index. At this point, does
> the row get removed from the index? Or does it stay in there as a dead row?
>
> I'm noticing index bloat on this index and am wondering if all these
> updated rows just stay in the index?
>
> The index currently sits at 330MB. If I drop and re-create it, it's 1.5MB.
>
> A cronjob runs a vacuum once per day, I can see the amount of dead rows
> dropping in monitoring software.
>
   - This doesn't reclaim the space. VACUUM operation cleans up space above
the upper edge(High-water mark). Interleaved fragmentation will be marked
for reuse.

>
> But should this also take care of indexes? In postgres 11, you can't
> reindex concurrently, so I was wondering if indexes are skipped by vacuum?
> Or only in case of conditional indexes?
>
  - They aren't left untouched by VACUUM, but as I mentioned reclaiming
space is not a job of VACUUM operation. You must execute VACUUM FULL.
AFAIK, reindexing the 1.5MB index may not need a lot of time(regardless of
concurrent reindexing feature).

>
>
>
> So I'm wondering if the behavior is as I described.
>
> Regards,
> Koen De Groote
>


Re: Timestamp with vs without time zone.

2021-09-21 Thread Adrian Klaver

On 9/20/21 11:00 PM, Tim Uckun wrote:

I am hoping to get some clarification on timestamp with time zone.

My understanding is that timestamp with time zone stores data in UTC
but displays it in your time zone.  Does this also work on queries? If
I query between noon and 2:00 PM on some date in time zone XYZ does pg
translate the query to UTC before sending it to the server?




So does the database know the timestamp is in auckland time when the
client is connecting from a server on UTC?


My question would be why does it matter? The whole purpose of 
timestamptz is that you know it is stored as UTC, from there you can 
transform to whatever time zone you want.




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Timestamp with vs without time zone.

2021-09-21 Thread Tom Lane
Adrian Klaver  writes:
> On 9/20/21 11:00 PM, Tim Uckun wrote:
>> I am hoping to get some clarification on timestamp with time zone.
>> 
>> My understanding is that timestamp with time zone stores data in UTC
>> but displays it in your time zone.  Does this also work on queries? If
>> I query between noon and 2:00 PM on some date in time zone XYZ does pg
>> translate the query to UTC before sending it to the server?
>> 
>> So does the database know the timestamp is in auckland time when the
>> client is connecting from a server on UTC?

It might be more clear if you realize that there is no client-side logic
involved here.  The rotation to/from UTC happens in timestamptz_in or
timestamptz_out, based on the server's TimeZone setting.

It's incumbent on clients to set TimeZone correctly if they want
unlabeled timestamps to be interpreted in a particular zone.

regards, tom lane




Re: Question about behavior of conditional indexes

2021-09-21 Thread Michael Lewis
You likely need to tune your autovacuum settings such that the index bloat
reaches a steady-state and does not continue to increase. When vacuum runs,
it will remove dead (and no longer visible to any running transactions)
tuples aka row versions from each page (8KB block by default) in the file
for that table's data. It will also update the index, except in newer
versions of Postgres where that behavior becomes optional (see manual for
which version and whether it still runs by default). If you are I/O bound
and cannot tune the system defaults to autovacuum more proactively (when a
smaller percent of rows are dead), then perhaps just change the settings
for that table as it seems to be functioning a bit like a queue. Or you
might consider setting up a daily job to vacuum analyze on all tables, if
there is a period of low system activity. If you didn't have the index on
the columns you are updating, then reducing fillfactor would be an option
to increase HOT updates and thereby prevent the index bloat. Alas, that's
not an option with the index needing to reference those values that are
being changed.

"index002" btree (action_performed, should_still_perform_action,
action_performed_at DESC) WHERE should_still_perform_action = false
AND action_performed = true

That index seems odd to me. Why index a single value for the first two
columns? I would recreate it with those skipped. Also, based on the
names, I'd expect the opposite for true and false. That is, the
"interesting" rows are where the action has NOT yet been performed yet
and it is needed. I'd expect the index as defined to cover most of the
table rather than a small fraction. Perhaps just a typo from manual
obfuscation.

For what it's worth, you can create new concurrently, drop old
concurrently, then rename new to old. That would be the same result as a
reindex concurrently.

>


Re: Re: Timestamp with vs without time zone.

2021-09-21 Thread Peter J. Holzer
On 2021-09-21 20:50:44 +1200, Tim Uckun wrote:
> That's all true and I won't argue about the madness that is timezones
> in the world. I am simply thinking it would be some sort of a struct
> like thing which would store the numerical value of the time stamp and
> also the time zone that time was recorded in.  Presumably everything
> else is an insane calculation from there. What was the offset on that
> day? I guess it depends on the daylight savings time. What would the
> conversion to another time zone be? That would depend on the DST
> settings on that day in both places.

Yes, but HOW IS THAT TIME ZONE STORED? 

As a user you can say "I don't care, just make it work somehow".

But as a developer you have to decide on a specific way. And as a
database developer in particular you would have to choose a way which
works for almost everybody.

And that's the problem because ...

> Mankind can't agree on what side of the road to drive on, what the
> electrical voltage should be at the wall, what those plugs should be,
> how you should charge your phone or anything else for that matter

... people have different needs and it would be difficult to satisfy
them all.

Simply storing an offset from UTC is simple, fast, doesn't take much
space - but it would be almost as misleading as the current state. A
simple offset is not a time zone.

Storing the IANA timezone names (e.g. 'Europe/Vienna') would store an
identifier for what most people think of as a time zone - but that takes
a lot of space, it needs a lookup for almost any operation and worst of
all, you couldn't index such a column (at least not with a btree index)
because the comparison functions aren't stable.

You could use a numeric indentifier instead of the name, that would take
less space but wouldn't solve the other problems (and add the problem
that now you have just added another mapping which you need to maintain).

There are other ways, but I'm sure they all have some pros and some
cons. None will be perfect.

So I don't think there is an obvious (or even non-obvious, but clearly
good) way for the PostgreSQL developers to add a real "timestamp with
timezone" type.

As an application developer however, you can define a compound type (or
just use two or three columns together) which satisfies the needs of
your specific application.

> It's just that the phrase "timestamp with time zone" would seem to
> indicate the time zone is stored somewhere in there.

I absolutely agree. Calling a type which doesn't include a timezone
"timestamp with timezone" is - how do I put this? - more than just
weird. "timestamp without timezone" should be called "local timestamp
with unspecified timezone" and "timestamp with timezone" should be
called "global timestamp without timezone". However, those aren't SQL
names.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Re: Timestamp with vs without time zone.

2021-09-21 Thread Dave Cramer
On Tue, 21 Sept 2021 at 13:20, Peter J. Holzer  wrote:

> On 2021-09-21 20:50:44 +1200, Tim Uckun wrote:
> > That's all true and I won't argue about the madness that is timezones
> > in the world. I am simply thinking it would be some sort of a struct
> > like thing which would store the numerical value of the time stamp and
> > also the time zone that time was recorded in.  Presumably everything
> > else is an insane calculation from there. What was the offset on that
> > day? I guess it depends on the daylight savings time. What would the
> > conversion to another time zone be? That would depend on the DST
> > settings on that day in both places.
>
> Yes, but HOW IS THAT TIME ZONE STORED?
>
> As a user you can say "I don't care, just make it work somehow".
>
> But as a developer you have to decide on a specific way. And as a
> database developer in particular you would have to choose a way which
> works for almost everybody.
>
> And that's the problem because ...
>
> > Mankind can't agree on what side of the road to drive on, what the
> > electrical voltage should be at the wall, what those plugs should be,
> > how you should charge your phone or anything else for that matter
>
> ... people have different needs and it would be difficult to satisfy
> them all.
>
> Simply storing an offset from UTC is simple, fast, doesn't take much
> space - but it would be almost as misleading as the current state. A
> simple offset is not a time zone.
>
> Storing the IANA timezone names (e.g. 'Europe/Vienna') would store an
> identifier for what most people think of as a time zone - but that takes
> a lot of space, it needs a lookup for almost any operation and worst of
> all, you couldn't index such a column (at least not with a btree index)
> because the comparison functions aren't stable.
>
> You could use a numeric indentifier instead of the name, that would take
> less space but wouldn't solve the other problems (and add the problem
> that now you have just added another mapping which you need to maintain).
>
> There are other ways, but I'm sure they all have some pros and some
> cons. None will be perfect.
>
> So I don't think there is an obvious (or even non-obvious, but clearly
> good) way for the PostgreSQL developers to add a real "timestamp with
> timezone" type.
>
> As an application developer however, you can define a compound type (or
> just use two or three columns together) which satisfies the needs of
> your specific application.
>
> > It's just that the phrase "timestamp with time zone" would seem to
> > indicate the time zone is stored somewhere in there.
>
> I absolutely agree. Calling a type which doesn't include a timezone
> "timestamp with timezone" is - how do I put this? - more than just
> weird. "timestamp without timezone" should be called "local timestamp
> with unspecified timezone" and "timestamp with timezone" should be
> called "global timestamp without timezone". However, those aren't SQL
> names.
>
>
I would say this is a perspective thing. It's a timestamp with a time zone
from the client's perspective.


> Dave Cramer
> www.postgres.rocks
>
>


Re: Re: Timestamp with vs without time zone.

2021-09-21 Thread Peter J. Holzer
On 2021-09-21 13:34:21 -0400, Dave Cramer wrote:
> On Tue, 21 Sept 2021 at 13:20, Peter J. Holzer  wrote:
> On 2021-09-21 20:50:44 +1200, Tim Uckun wrote:
> > It's just that the phrase "timestamp with time zone" would seem to
> > indicate the time zone is stored somewhere in there.
> 
> I absolutely agree. Calling a type which doesn't include a timezone
> "timestamp with timezone" is - how do I put this? - more than just
> weird. "timestamp without timezone" should be called "local timestamp
> with unspecified timezone" and "timestamp with timezone" should be
> called "global timestamp without timezone". However, those aren't SQL
> names.
> 
> 
> 
> I would say this is a perspective thing. It's a timestamp with a time
> zone from the client's perspective.

I disagree. When I read back the value the original timezone is lost. So
it clearly DOESN'T store the timestamp WITH the timezone.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Re: Timestamp with vs without time zone.

2021-09-21 Thread Dave Cramer
On Tue, 21 Sept 2021 at 13:40, Peter J. Holzer  wrote:

> On 2021-09-21 13:34:21 -0400, Dave Cramer wrote:
> > On Tue, 21 Sept 2021 at 13:20, Peter J. Holzer  wrote:
> > On 2021-09-21 20:50:44 +1200, Tim Uckun wrote:
> > > It's just that the phrase "timestamp with time zone" would seem to
> > > indicate the time zone is stored somewhere in there.
> >
> > I absolutely agree. Calling a type which doesn't include a timezone
> > "timestamp with timezone" is - how do I put this? - more than just
> > weird. "timestamp without timezone" should be called "local timestamp
> > with unspecified timezone" and "timestamp with timezone" should be
> > called "global timestamp without timezone". However, those aren't SQL
> > names.
> >
> >
> >
> > I would say this is a perspective thing. It's a timestamp with a time
> > zone from the client's perspective.
>
> I disagree. When I read back the value the original timezone is lost. So
> it clearly DOESN'T store the timestamp WITH the timezone.
>
> I never said it stored the timezone.  I said that it has a timezone.


Dave Cramer
www.postgres.rocks

>
>


Re: Timestamp with vs without time zone.

2021-09-21 Thread FWS Neil


> On Sep 21, 2021, at 12:34 PM, Dave Cramer  wrote:
> On Tue, 21 Sept 2021 at 13:20, Peter J. Holzer  > wrote:
> On 2021-09-21 20:50:44 +1200, Tim Uckun wrote:
> > That's all true and I won't argue about the madness that is timezones
> > in the world. I am simply thinking it would be some sort of a struct
> > like thing which would store the numerical value of the time stamp and
> > also the time zone that time was recorded in.  Presumably everything
> > else is an insane calculation from there. What was the offset on that
> > day? I guess it depends on the daylight savings time. What would the
> > conversion to another time zone be? That would depend on the DST
> > settings on that day in both places.
> 
> Yes, but HOW IS THAT TIME ZONE STORED? 
> 
> As a user you can say "I don't care, just make it work somehow".
> 
> But as a developer you have to decide on a specific way. And as a
> database developer in particular you would have to choose a way which
> works for almost everybody.
> 
> And that's the problem because ...
> 
> > Mankind can't agree on what side of the road to drive on, what the
> > electrical voltage should be at the wall, what those plugs should be,
> > how you should charge your phone or anything else for that matter
> 
> ... people have different needs and it would be difficult to satisfy
> them all.
> 
> Simply storing an offset from UTC is simple, fast, doesn't take much
> space - but it would be almost as misleading as the current state. A
> simple offset is not a time zone.
> 
> Storing the IANA timezone names (e.g. 'Europe/Vienna') would store an
> identifier for what most people think of as a time zone - but that takes
> a lot of space, it needs a lookup for almost any operation and worst of
> all, you couldn't index such a column (at least not with a btree index)
> because the comparison functions aren't stable.
> 
> You could use a numeric indentifier instead of the name, that would take
> less space but wouldn't solve the other problems (and add the problem
> that now you have just added another mapping which you need to maintain).
> 
> There are other ways, but I'm sure they all have some pros and some
> cons. None will be perfect.
> 
> So I don't think there is an obvious (or even non-obvious, but clearly
> good) way for the PostgreSQL developers to add a real "timestamp with
> timezone" type.
> 
> As an application developer however, you can define a compound type (or
> just use two or three columns together) which satisfies the needs of
> your specific application.
> 
> > It's just that the phrase "timestamp with time zone" would seem to
> > indicate the time zone is stored somewhere in there.
> 
> I absolutely agree. Calling a type which doesn't include a timezone
> "timestamp with timezone" is - how do I put this? - more than just
> weird. "timestamp without timezone" should be called "local timestamp
> with unspecified timezone" and "timestamp with timezone" should be
> called "global timestamp without timezone". However, those aren't SQL
> names.
> 
> 
> I would say this is a perspective thing. It's a timestamp with a time zone 
> from the client's perspective.

A timestamp cannot have a time zone and be a valid timestamp.

Let me explain.

A timestamp is a single time that exists in the world.  For example March 1, 
2021, 4:15 am is a timestamp.

If you add a time zone (other than UTC) then a time stamp is not always a 
single time that exists in the world.

For example in the spring using time zone American/Chicago, on April 14, 2021 
the time zone time changes at 2am to become 3am.  The time April 14, 2021, 2:30 
am simply does not exists.  And therefore cannot be a timestamp.  Apple’s APIs 
will by default automatically change 2:30am to 3:00am.  Is that correct?  Or 
should it change to 3:30am?  Apple has the option for the latter, but the APIs 
don’t work.

In the fall it is even worse.  Using time zone America/Chicago, on November 7, 
2021, 1:30 am occurs twice.  That does not work as a timestamp.  Which one do 
you use, the early one or the late one.  Apple’s APIs give you a choice.

The point being that people do expect to see times in local time, but the only 
real timestamp is UTC and I can’t ever imagine a need to store time zone 
information related to a timestamp.  If you need to store the location that 
data originated from, then store the location or the Time Zone, but it should 
not be connected to the timestamp.  Location data is completely different than 
time data.

Neil
www.fairwindsoft.com




Re: Re: Timestamp with vs without time zone.

2021-09-21 Thread Peter J. Holzer
On 2021-09-21 13:43:46 -0400, Dave Cramer wrote:
> 
> 
> On Tue, 21 Sept 2021 at 13:40, Peter J. Holzer  wrote:
> 
> On 2021-09-21 13:34:21 -0400, Dave Cramer wrote:
> > On Tue, 21 Sept 2021 at 13:20, Peter J. Holzer  wrote:
> >     On 2021-09-21 20:50:44 +1200, Tim Uckun wrote:
> >     Calling a type which doesn't include a timezone
> >     "timestamp with timezone" is - how do I put this? - more than just
> >     weird.
> >
> > I would say this is a perspective thing. It's a timestamp with a time
> > zone from the client's perspective.
> 
> I disagree. When I read back the value the original timezone is lost. So
> it clearly DOESN'T store the timestamp WITH the timezone.
> 
> 
> I never said it stored the timezone.  I said that it has a timezone. 

The raison d’être of a database is to store data. If some data isn't
stored, the database doesn't have it, in my opinion.

As a different example, I can store a number with 15 decimal digits in a float4
and I can get 15 decimal digits out again:

hjp=> create table t (f float4);
CREATE TABLE
hjp=> insert into t(f) values(1.23456789012345);
INSERT 0 1
hjp=> select f::float8::numeric from t;
╔══╗
║f ║
╟──╢
║ 1.23456788063049 ║
╚══╝
(1 row)

But those digits aren't the same I stored. So a float4 doesn't "have" 15
decimal digits of accuracy. Not even 8, although in this specific case
the first 8 digits happen to be correct.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Timestamp with vs without time zone.

2021-09-21 Thread Peter J. Holzer
[Can y'all please trim your quotes? Having to scroll down 80+ lines to
get to the content is cumbersome]

On 2021-09-21 13:17:39 -0500, FWS Neil wrote:
> A timestamp cannot have a time zone and be a valid timestamp.
> 
> Let me explain.
> 
> A timestamp is a single time that exists in the world.

Correct.

> For example March 1, 2021, 4:15 am is a timestamp.

This is NOT a timestamp, because it is not a single time that exists in
the world. It is about 30 different times that exist in the world.

You can turn it into a timestamp by adding timezone information:

March 1 2021, 4:15 am, America/Chicago

Now it's a single time, equivalent to 2021-03-01T10:15:00Z.

However, if you don't have to store the timezone if you have some other
way to unambiguously specify the time, e.g. by always using UTC or by
storing seconds since an epoch.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Timestamp with vs without time zone.

2021-09-21 Thread Adrian Klaver

On 9/21/21 11:45 AM, Peter J. Holzer wrote:

On 2021-09-21 13:43:46 -0400, Dave Cramer wrote:



On Tue, 21 Sept 2021 at 13:40, Peter J. Holzer  wrote:

 On 2021-09-21 13:34:21 -0400, Dave Cramer wrote:
 > On Tue, 21 Sept 2021 at 13:20, Peter J. Holzer  wrote:
 >     On 2021-09-21 20:50:44 +1200, Tim Uckun wrote:
 >     Calling a type which doesn't include a timezone
 >     "timestamp with timezone" is - how do I put this? - more than just
 >     weird.
 >
 > I would say this is a perspective thing. It's a timestamp with a time
 > zone from the client's perspective.

 I disagree. When I read back the value the original timezone is lost. So
 it clearly DOESN'T store the timestamp WITH the timezone.


I never said it stored the timezone.  I said that it has a timezone.


The raison d’être of a database is to store data. If some data isn't
stored, the database doesn't have it, in my opinion.



But if you use timestamptz it does have it. The data(timestamp) is 
stored with time zone UTC. From there you can reconstruct the timestamp 
at any time zone you want, given the clients needs. I'm not sure why 
where it started is important when people care how it is presented to 
them on retrieval.



--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Timestamp with vs without time zone.

2021-09-21 Thread Peter J. Holzer
On 2021-09-21 12:37:49 -0700, Adrian Klaver wrote:
> On 9/21/21 11:45 AM, Peter J. Holzer wrote:
> > On 2021-09-21 13:43:46 -0400, Dave Cramer wrote:
> > > On Tue, 21 Sept 2021 at 13:40, Peter J. Holzer  wrote:
> > >  On 2021-09-21 13:34:21 -0400, Dave Cramer wrote:
> > >  > I would say this is a perspective thing. It's a timestamp with a 
> > > time
> > >  > zone from the client's perspective.
> > > 
> > >  I disagree. When I read back the value the original timezone is 
> > > lost. So
> > >  it clearly DOESN'T store the timestamp WITH the timezone.
> > > 
> > > 
> > > I never said it stored the timezone.  I said that it has a timezone.
> > 
> > The raison d’être of a database is to store data. If some data isn't
> > stored, the database doesn't have it, in my opinion.
> > 
> 
> But if you use timestamptz it does have it.

No.

> The data(timestamp) is stored with time zone UTC. From there you can
> reconstruct the timestamp at any time zone you want, given the clients
> needs. I'm not sure why where it started is important when people care
> how it is presented to them on retrieval.

Whether it is important or not is besides the point. It isn't there and
it cannot be reconstructed. If you need it, you have to store it
separately.

hjp=> set timezone to 'America/Chicago'; 
SET
hjp=> insert into t values('2021-03-01 04:15'); 
INSERT 0 1
hjp=> set timezone to 'Europe/Vienna'; 
SET
hjp=> insert into t values('2021-03-01 11:15'); 
INSERT 0 1
hjp=> select * from t; 
╔╗
║   t║
╟╢
║ 2021-03-01 11:15:00+01 ║
║ 2021-03-01 11:15:00+01 ║
╚╝
(2 rows)

These two values are completely indistinguishable. That's good for a
timestamp (they are the same time after all). But they are not a
"timestamp with time zone". The time zone is not part of the value but
of the environment.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Timestamp with vs without time zone.

2021-09-21 Thread Adrian Klaver

On 9/21/21 11:17 AM, FWS Neil wrote:



On Sep 21, 2021, at 12:34 PM, Dave Cramer > wrote:
On Tue, 21 Sept 2021 at 13:20, Peter J. Holzer > wrote:


On 2021-09-21 20:50:44 +1200, Tim Uckun wrote:
> That's all true and I won't argue about the madness that is
timezones
> in the world. I am simply thinking it would be some sort of a struct
> like thing which would store the numerical value of the time
stamp and
> also the time zone that time was recorded in.  Presumably everything
> else is an insane calculation from there. What was the offset on
that
> day? I guess it depends on the daylight savings time. What would the
> conversion to another time zone be? That would depend on the DST
> settings on that day in both places.

Yes, but HOW IS THAT TIME ZONE STORED?

As a user you can say "I don't care, just make it work somehow".

But as a developer you have to decide on a specific way. And as a
database developer in particular you would have to choose a way which
works for almost everybody.

And that's the problem because ...

> Mankind can't agree on what side of the road to drive on, what the
> electrical voltage should be at the wall, what those plugs
should be,
> how you should charge your phone or anything else for that matter

... people have different needs and it would be difficult to satisfy
them all.

Simply storing an offset from UTC is simple, fast, doesn't take much
space - but it would be almost as misleading as the current state. A
simple offset is not a time zone.

Storing the IANA timezone names (e.g. 'Europe/Vienna') would store an
identifier for what most people think of as a time zone - but that
takes
a lot of space, it needs a lookup for almost any operation and
worst of
all, you couldn't index such a column (at least not with a btree
index)
because the comparison functions aren't stable.

You could use a numeric indentifier instead of the name, that
would take
less space but wouldn't solve the other problems (and add the problem
that now you have just added another mapping which you need to
maintain).

There are other ways, but I'm sure they all have some pros and some
cons. None will be perfect.

So I don't think there is an obvious (or even non-obvious, but clearly
good) way for the PostgreSQL developers to add a real "timestamp with
timezone" type.

As an application developer however, you can define a compound
type (or
just use two or three columns together) which satisfies the needs of
your specific application.

> It's just that the phrase "timestamp with time zone" would seem to
> indicate the time zone is stored somewhere in there.

I absolutely agree. Calling a type which doesn't include a timezone
"timestamp with timezone" is - how do I put this? - more than just
weird. "timestamp without timezone" should be called "local timestamp
with unspecified timezone" and "timestamp with timezone" should be
called "global timestamp without timezone". However, those aren't SQL
names.


I would say this is a perspective thing. It's a timestamp with a time 
zone from the client's perspective.


A timestamp cannot have a time zone and be a valid timestamp.

Let me explain.

A timestamp is a single time that exists in the world.  For example 
March 1, 2021, 4:15 am is a timestamp.


If you add a time zone (other than UTC) then a time stamp is not always 
a single time that exists in the world.


You have the above backwards. A time zone locks the timestamp to a 
single point in time.




For example in the spring using time zone American/Chicago, on April 14, 
2021 the time zone time changes at 2am to become 3am.  The time April 
14, 2021, 2:30 am simply does not exists.  And therefore cannot be a 
timestamp.  Apple’s APIs will by default automatically change 2:30am to 
3:00am.  Is that correct?  Or should it change to 3:30am?  Apple has the 
option for the latter, but the APIs don’t work.


In a sense it does, it becomes 3:30 am CDT.

The DST change was actually 03/14/2021.

An example for my timezone US/Pacific:

test(5432)=# select '03/13/2021 2:30 am'::timestamptz;
  timestamptz

 2021-03-13 02:30:00-08
(1 row)

test(5432)=# select '03/14/2021 2:30 am'::timestamptz;
  timestamptz

 2021-03-14 03:30:00-07




In the fall it is even worse.  Using time zone America/Chicago, on 
November 7, 2021, 1:30 am occurs twice.  That does not work as a 
timestamp.  Which one do you use, the early one or the late one. 
  Apple’s APIs give you a choice.


It occurs in two different time zones CDT then CST which makes it a 
different time in each case.




The point being that people do expect to see times in local time, but 
the only real timestamp is UTC and I can’t

Re: Timestamp with vs without time zone.

2021-09-21 Thread David G. Johnston
On Tuesday, September 21, 2021, Peter J. Holzer  wrote:

>
> These two values are completely indistinguishable. That's good for a
> timestamp (they are the same time after all). But they are not a
> "timestamp with time zone". The time zone is not part of the value but
> of the environment.
>

No, it’s inherent to the data type itself.  The data type has an implied
time zone of UTC.  That is a reasonable definition of “with” in my book.
If I see “20:05” I need to know the type of data to know whether its a time
or interval - values without types are incomplete.  i.e., the type can
supply relevant context, like a reference time zone.

David J.


Re: Timestamp with vs without time zone.

2021-09-21 Thread Peter J. Holzer
On 2021-09-21 12:58:13 -0700, David G. Johnston wrote:
> On Tuesday, September 21, 2021, Peter J. Holzer  wrote:
> These two values are completely indistinguishable. That's good for a
> timestamp (they are the same time after all). But they are not a
> "timestamp with time zone". The time zone is not part of the value but
> of the environment.
> 
> 
> No, it’s inherent to the data type itself.  The data type has an implied time
> zone of UTC.  That is a reasonable definition of “with” in my book.

If you can store only a single value that's 0 bits of information (which
is exactly the storage used).

But I'm going to stop arguing here. I'm obviously not able to convince
you and you aren't going to convince me either.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Timestamp with vs without time zone.

2021-09-21 Thread cen
From my experience, and some might disagree, I prefer to do db stores 
purely in UTC and handle timezones in ORM or client side.


The only time I actually needed to store timezone information in a 
dedicated column is when needing to convey that information to the end 
user, for example "your plane will arrive at this date and time in this 
destination timezone". The majority of other cases are just a 
localization issue and don't require you to store the timezone info.


Having to rely on database to muck around with timezones or doing it in 
session settings (which some advocate) is just asking for trouble in my 
opinion.



On 9/21/21 9:35 AM, Tim Uckun wrote:

It seems like it would be so much more useful if the timestamp with
time zone type actually stored the time zone in the record.


On Tue, Sep 21, 2021 at 7:25 PM Laurenz Albe  wrote:

On Tue, 2021-09-21 at 18:00 +1200, Tim Uckun wrote:

I am hoping to get some clarification on timestamp with time zone.

My understanding is that timestamp with time zone stores data in UTC
but displays it in your time zone.

That is correct.
When a timestamp is rendered as string, it it shown in the time zone
specified by the current setting of the "timezone" parameter in your
database session.


Does this also work on queries? If
I query between noon and 2:00 PM on some date in time zone XYZ does pg
translate the query to UTC before sending it to the server?

Yes.


To provide context I have the following situation.

I have a data file to import. All the dates in the time zone
pacific/auckland. My app reads the data , does some processing and
cleaning up and then saves it to the database.

The language I am using creates the time data type with the right time
zone. The processing is being done on a server which is on UTC, the
database server is also on UTC.  I am pretty sure the ORM isn't
appending "at time zone pacific/Auckland" to the data when it appends
it to the database.

So does the database know the timestamp is in auckland time when the
client is connecting from a server on UTC?

It does, but only if you set "timezone" appropriately in the database
session.  You could use ALTER ROLE to change the default setting for a
database user, but it might be best to set that from the application.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com








Re: Timestamp with vs without time zone.

2021-09-21 Thread Michael Lewis
Related to this current discussion and exchange of ideas... is there a best
practice for retrieving data in such a way as the rows are localized to a
timezone for where/group by purposes. That is, if I have a table which has
events, but those events belong to a tenant or some entity that has a
location which implies a timezone (or at least an offset), is there a best
way to write a query similar to the below? Please forgive and overlook if
there is some obvious syntax error, as this is just a quick and dirty
example. Might it make sense to store a "localized" version of the
timestamp *without* timezone on the event record such that an index can be
used for fast retrieval and even grouping?

select
date_trunc( 'month', e.event_datetime AT TIMEZONE t.time_zone_name ) AS
event_date,
count( e.id )
from events AS e
join tenants AS t ON t.id = e.tenant_id
where e.event_datetime AT TIMEZONE t.time_zone_name >= '01/01/2021'::DATE
AND e.event_datetime AT TIMEZONE t.time_zone_name < '09/01/2021'::DATE;


Re: Timestamp with vs without time zone.

2021-09-21 Thread Tom Lane
"David G. Johnston"  writes:
> On Tuesday, September 21, 2021, Peter J. Holzer  wrote:
>> These two values are completely indistinguishable. That's good for a
>> timestamp (they are the same time after all). But they are not a
>> "timestamp with time zone". The time zone is not part of the value but
>> of the environment.

> No, it’s inherent to the data type itself.  The data type has an implied
> time zone of UTC.  That is a reasonable definition of “with” in my book.

The SQL standard is pretty clear that they intend "timestamp with time
zone" to include a time zone represented by a specific UTC offset.
We didn't implement it that way, which is not great, but with
twenty-some years of history there it seems unlikely that we'll change.
Especially since the SQL spec hasn't exactly covered itself with glory
in terms of providing *useful* datetime types.  (Anybody ever found a
use for timetz?)

I think there is plenty of application for timestamps that actually
include (civil) time zones.  Calendaring, for example.  If I make an
appointment to see a friend at 2PM some months from now, it's
understood that that's in the local time zone; if some lawmakers take
it on themselves to fool with the DST rules before then, we're still
going to meet at 2PM local time.  And it'd be useful to know whether
that now conflicts with appointments defined by reference to some
other zone, so the easy way of "assume it's all local time" doesn't
cut it.

This same example demonstrates why the SQL spec's idea of "timestamp
with time zone" doesn't cut it.  You really need some DST-aware notion
of timezone identity.  But, as already covered upthread, the can of
worms that that opens is so large that nobody has wanted to try to
tackle it as a primitive Postgres datatype.

regards, tom lane




Re: Timestamp with vs without time zone.

2021-09-21 Thread Steve Crawford
My 2-cents.

"Time stamp with time zone" is a terrible name for the data type. "Point in
time" would be better but we are stuck with historical baggage in that
regard. The following are equivalent points in time (AKA timesamptz):
2021-09-21 12:34:56-07
Tue 21 Sep 2021 12:34:56 PM PDT
2021-09-21 12:34:56+00
2021-09-21 12:34:56+08:45 (Yes, Australia and some other areas have
non-full-hour offsets).
1632252896 (Epoch)

The server's internal representation of timestamptz is completely
irrelevant except to the PostgreSQL developers. Who cares if it's UTC- it
can be in micro-gerbils as far as users are concerned. All that matters is
that the server can store a point-in-time value and can return that value
represented in the time zone and format that the user requests.

I have never personally encountered a situation where I needed to track the
timezone of the person/process that inserted point-in-time data but I
almost always encounter the need to retrieve point-in-time data represented
in the end-user's desired zone and locale/format (join the conference call
at, the rocket launch is scheduled for, ...). If needed for a specific
case, the user can always include a column indicating the desired offset or
the time-zone to use to determine the offset, whichever is better suited
for their needs.

Cheers,
Steve


Surprising results from tests intended to show the difference in semantics between transaction_timestamp(), statement_timestamp(), and clock_timestamp()

2021-09-21 Thread Bryn Llewellyn
I'm surprised by the results that I describe below. Please help me interpret 
them.

Briefly, why does transaction_timestamp() report a later value than 
statement_timestamp() when they're both invoked in the very first statement 
after "start transaction". (They report identical values in an implicitly 
started txn.)
— — — — — — — — — — — — — — — — — — — — — — — — —

*Introduction*

I'm presently documenting (in the YugabyteDB YSQL section) the various PG 
date-time functions that report the "current" moment. There are two degrees of 
freedom: the return data type; and the kind of moment. This would suggest that 
there's eighteen differently-named functions (considering also timetz—which the 
PG doc recommends against using). There are, in fact, far fewer. Never mind, 
timestamptz is almost always the best choice for the data type of the moment; 
and the three functions that I name in my "subject" return that data type.

I thought that this demo would be simple and convincing:

-- SINGLE STATEMENT TEST (IMPLICIT TXN)

select
  transaction_timestamp() as txn_11,
  statement_timestamp()   as stm_11,
  clock_timestamp()   as clk_11,
  pg_sleep(:slp_1)as dummy,
  transaction_timestamp() as txn_12,
  statement_timestamp()   as stm_12,
  clock_timestamp()   as clk_12;

-- TWO STATEMENT TEST (EXPLICIT TXN)

start transaction;

select
  transaction_timestamp() as txn_11,
  statement_timestamp()   as stm_11,
  clock_timestamp()   as clk_11,
  pg_sleep(:slp_1)as dummy,
  transaction_timestamp() as txn_12,
  statement_timestamp()   as stm_12,
  clock_timestamp()   as clk_12;

select pg_sleep(:slp_2);

select
  transaction_timestamp() as txn_2,
  statement_timestamp()   as stm_2,
  clock_timestamp()   as clk_2;

commit;

The output from this takes a huge mental effort to comprehend. So I wrote a 
naive reporting function:

function expected_tstz_diff_error(
  captionin text,
  t_finish   in timestamptz,
  t_startin timestamptz,
  expected_diff  in double precision)
  returns   text

I test what I get like this:

-- SINGLE STATEMENT TEST (IMPLICIT TXN)
select expected_tstz_diff_error('txn_12 - txn_11 = 0',:s1_txn_12, 
:s1_txn_12, :zero);
select expected_tstz_diff_error('stm_12 - stm_11 = 0',:s1_stm_12, 
:s1_stm_12, :zero);
select expected_tstz_diff_error('txn_11 - stm_11 = 0',:s1_txn_11, 
:s1_stm_11, :zero);
select expected_tstz_diff_error('clk_12 - clk_11 = slp_1',:s1_clk_12, 
:s1_clk_11, :slp_1);

and this:

-- TWO STATEMENT TEST (EXPLICIT TXN)

-- Within the first "select".
select expected_tstz_diff_error('txn_12 - txn_11 = 0',:s1_txn_12, 
:s1_txn_12, :zero);
select expected_tstz_diff_error('stm_12 - stm_11 = 0',:s1_stm_12, 
:s1_stm_12, :zero);
select expected_tstz_diff_error('txn_11 - stm_11 = 0',:s1_txn_11, 
:s1_stm_11, :zero);
select expected_tstz_diff_error('clk_12 - clk_11 = slp_1',:s1_clk_12, 
:s1_clk_11, :slp_1);

-- Within the trasaction but across the first and the second "select".
select expected_tstz_diff_error('txn_2 - txn_11 = 0', :s2_txn_2,  
:s1_txn_11, :zero);
select expected_tstz_diff_error('stm_2 - stm_11 = slp_1 + slp_2', :s2_stm_2,  
:s1_stm_11, (:slp_1 + :slp_2));
select expected_tstz_diff_error('clk_2 - clk_11 = slp_1 + slp_2', :s2_clk_2,  
:s1_clk_11, (:slp_1 + :slp_2));

The function reports either "exact" or the error in milliseconds.

The runnable code is copied at the very end, after my account of the problem. 
You can guess that there's a malarkey of quoting with \gset to populate all 
those psql "substitution variables". But it all works. The design concept is 
informed by these ideas:

— I can't use a PL/pgSQL encapsulation because, with this, it's all regarded as 
a single top-level statement and so I can't easily demonstrate the difference 
in semantics between transaction_timestamp() and statement_timestamp().

— I want to get my results back with the minimum of time-consuming fuss to get 
the most reliable timings. So I do the reporting after I have all the results 
to hand in psql "substitution variables".

*Results*

Here's what I get with my PG Version 13.4 installation on my MacBook. (I 
removed blank lines to improve the readability.)

SINGLE STATEMENT TEST (IMPLICIT TXN)

txn_12 - txn_11 = 0: exact
stm_12 - stm_11 = 0: exact
txn_11 - stm_11 = 0: exact
clk_12 - clk_11 = slp_1: error:  0.1 ms

TWO STATEMENT TEST (EXPLICIT TXN)

txn_12 - txn_11 = 0: exact
stm_12 - stm_11 = 0: exact
txn_11 - stm_11 = 0: error: -0.1 ms
clk_12 - clk_11 = slp_1: error:  0.6 ms

txn_2 - txn_11 = 0 : exact
stm_2 - stm_11 = slp_1 + slp_2 : error:  1.6 ms
clk_2 - clk_11 = slp_1 + slp_2 : error:  1.6 ms

*Discussion*

The stand-alone “select" that implicitly starts the txn gives the results that 
I'd expect.

I expected that, using the explicitl

Re: Surprising results from tests intended to show the difference in semantics between transaction_timestamp(), statement_timestamp(), and clock_timestamp()

2021-09-21 Thread David G. Johnston
On Tuesday, September 21, 2021, Bryn Llewellyn  wrote:

> I'm surprised by the results that I describe below. Please help me
> interpret them.
>
> Briefly, why does transaction_timestamp() report a later value
> than statement_timestamp() when they're both invoked in the very first
> statement after "start transaction". (They report identical values in an
> implicitly started txn.)
>
>
>
> *  transaction_timestamp() as txn_11,  statement_timestamp()   as stm_11, *
>
>
>
> *  t_finish   in timestamptz,  t_startin
> timestamptz,  expected_diff  in double precision) *
>
> *select expected_tstz_diff_error('txn_11 - stm_11 = 0',
>   :s1_txn_11, :s1_stm_11, :zero);*
>
> *txn_11 - stm_11 = 0: error: -0.1 ms*
>


>
>
> *  err  constant double precision not null := (
> (extract(epoch from t_finish) - extract(epoch from t_start)) -
> expected_diff )*dp_1000; *
>

So, finish is the txn, start is the stm, and expected is 0 - thus if finish
(txn) is earlier than start (stm) your error will be negative.  Which it
is, and since transaction should be earlier than statement this would seem
to be correct.  In short, you seem to have mis-interpreted the error sign’s
meaning.

David J.


Re: Surprising results from tests intended to show the difference in semantics between transaction_timestamp(), statement_timestamp(), and clock_timestamp()

2021-09-21 Thread Adrian Klaver

On 9/21/21 4:23 PM, Bryn Llewellyn wrote:
I'm surprised by the results that I describe below. Please help me 
interpret them.


Briefly, why does transaction_timestamp() report a later value 
than statement_timestamp() when they're both invoked in the very first 
statement after "start transaction". (They report identical values in an 
implicitly started txn.)


I'm not seeing it:

test(5432)=# begin;

select
  transaction_timestamp() as txn_11,
  statement_timestamp()   as stm_11,
  clock_timestamp()   as clk_11,
  pg_sleep(0.5)as dummy,
  transaction_timestamp() as txn_12,
  statement_timestamp()   as stm_12,
  clock_timestamp()   as clk_12;

select pg_sleep(1);

select
  transaction_timestamp() as txn_2,
  statement_timestamp()   as stm_2,
  clock_timestamp()   as clk_2;

commit;

BEGIN
-[ RECORD 1 ]-
txn_11 | 2021-09-21 17:29:18.272042-07
stm_11 | 2021-09-21 17:29:18.2722-07
clk_11 | 2021-09-21 17:29:18.272297-07
dummy  |
txn_12 | 2021-09-21 17:29:18.272042-07
stm_12 | 2021-09-21 17:29:18.2722-07
clk_12 | 2021-09-21 17:29:18.772863-07

-[ RECORD 1 ]
pg_sleep |

-[ RECORD 1 ]
txn_2 | 2021-09-21 17:29:18.272042-07
stm_2 | 2021-09-21 17:29:19.774436-07
clk_2 | 2021-09-21 17:29:19.77456-07


No matter how many times I run this the transaction_timestamp() is less 
then statement_timestamp().  This is as it should be, as 
transaction_timestamp() fires with the begin; and statement_timestamp() 
fires at select ... and then updates with the subsequent select  and 
clock_timestamp() just keeps marching on through the whole process.


--
Adrian Klaver
adrian.kla...@aklaver.com




Thanks, David! Re: Surprising results from tests intended to show the difference in semantics between transaction_timestamp(), statement_timestamp(), and clock_timestamp()

2021-09-21 Thread Bryn Llewellyn
> David G. Johnston wrote:
> 
> Bryn wrote:
> 
> I'm surprised by the results that I describe below. Please help me interpret 
> them.
> 
> Briefly, why does transaction_timestamp() report a later value than 
> statement_timestamp() when they're both invoked in the very first statement 
> after "start transaction". (They report identical values in an implicitly 
> started txn.)
> 
>   transaction_timestamp() as txn_11,
>   statement_timestamp()   as stm_11,
>  
>   t_finish   in timestamptz,
>   t_startin timestamptz,
>   expected_diff  in double precision)
>  
> select expected_tstz_diff_error('txn_11 - stm_11 = 0',:s1_txn_11, 
> :s1_stm_11, :zero);
> 
> txn_11 - stm_11 = 0: error: -0.1 ms
>  
>   err  constant double precision not null :=
>  ( (extract(epoch from t_finish) - extract(epoch from t_start)) - 
> expected_diff )*dp_1000;
>  
> 
> So, finish is the txn, start is the stm, and expected is 0 - thus if finish 
> (txn) is earlier than start (stm) your error will be negative.  Which it is, 
> and since transaction should be earlier than statement this would seem to be 
> correct.  In short, you seem to have mis-interpreted the error sign’s meaning.

Thanks, David.

Damn... I'm so embarrassed. I'd been staring at my code for too long and was 
blind to my stupidity. Just goes to show how important it is to choose names 
sensibly. "Start" and "finish" were silly (a left-over from a previous use of 
that code). I changed the names to "t_expected_later" and "t_expected_sooner" — 
and twizzled my arguments where needed. All is clear now.

In summary, the first statement after the explicit "start transaction" is a bit 
later than the moment that the explicitly started txn starts—in line with 
intuition. And, of course, the round trip to invoke pg_sleep() takes time too.

I was talking nonsense when I wrote "commit is bound to take time and I'm not 
using an estimate for this". Doubtless "commit" does take some time. But it 
happens after my last recorded time measurement. That, too, was a left-over 
from a different test. Sigh...

Case closed.



Re: Surprising results from tests intended to show the difference in semantics between transaction_timestamp(), statement_timestamp(), and clock_timestamp()

2021-09-21 Thread Bryn Llewellyn
> Adrian Klaver wrote:
> 
> On 9/21/21 4:23 PM, Bryn Llewellyn wrote:
>> I'm surprised by the results that I describe below. Please help me interpret 
>> them.
>> Briefly, why does transaction_timestamp() report a later value than 
>> statement_timestamp() when they're both invoked in the very first statement 
>> after "start transaction". (They report identical values in an implicitly 
>> started txn.)
> 
> I'm not seeing it:
> 
> ...
> 
> No matter how many times I run this the transaction_timestamp() is less then 
> statement_timestamp().  This is as it should be, as transaction_timestamp() 
> fires with the begin; and statement_timestamp() fires at select ... and then 
> updates with the subsequent select  and clock_timestamp() just keeps 
> marching on through the whole process.

Oh dear...

David Johnson pointed out that I was misinterpreting my results. Sigh... It all 
makes sense now.

Sorry to have troubled you all with this.

Re: Timestamp with vs without time zone.

2021-09-21 Thread Tim Cross


FWS Neil  writes:

>  On Sep 21, 2021, at 12:34 PM, Dave Cramer  wrote:
>  On Tue, 21 Sept 2021 at 13:20, Peter J. Holzer  wrote:
>
>  On 2021-09-21 20:50:44 +1200, Tim Uckun wrote:
>  > That's all true and I won't argue about the madness that is timezones
>  > in the world. I am simply thinking it would be some sort of a struct
>  > like thing which would store the numerical value of the time stamp and
>  > also the time zone that time was recorded in.  Presumably everything
>  > else is an insane calculation from there. What was the offset on that
>  > day? I guess it depends on the daylight savings time. What would the
>  > conversion to another time zone be? That would depend on the DST
>  > settings on that day in both places.
>
>  Yes, but HOW IS THAT TIME ZONE STORED? 
>
>  As a user you can say "I don't care, just make it work somehow".
>
>  But as a developer you have to decide on a specific way. And as a
>  database developer in particular you would have to choose a way which
>  works for almost everybody.
>
>  And that's the problem because ...
>
>  > Mankind can't agree on what side of the road to drive on, what the
>  > electrical voltage should be at the wall, what those plugs should be,
>  > how you should charge your phone or anything else for that matter
>
>  ... people have different needs and it would be difficult to satisfy
>  them all.
>
>  Simply storing an offset from UTC is simple, fast, doesn't take much
>  space - but it would be almost as misleading as the current state. A
>  simple offset is not a time zone.
>
>  Storing the IANA timezone names (e.g. 'Europe/Vienna') would store an
>  identifier for what most people think of as a time zone - but that takes
>  a lot of space, it needs a lookup for almost any operation and worst of
>  all, you couldn't index such a column (at least not with a btree index)
>  because the comparison functions aren't stable.
>
>  You could use a numeric indentifier instead of the name, that would take
>  less space but wouldn't solve the other problems (and add the problem
>  that now you have just added another mapping which you need to maintain).
>
>  There are other ways, but I'm sure they all have some pros and some
>  cons. None will be perfect.
>
>  So I don't think there is an obvious (or even non-obvious, but clearly
>  good) way for the PostgreSQL developers to add a real "timestamp with
>  timezone" type.
>
>  As an application developer however, you can define a compound type (or
>  just use two or three columns together) which satisfies the needs of
>  your specific application.
>
>  > It's just that the phrase "timestamp with time zone" would seem to
>  > indicate the time zone is stored somewhere in there.
>
>  I absolutely agree. Calling a type which doesn't include a timezone
>  "timestamp with timezone" is - how do I put this? - more than just
>  weird. "timestamp without timezone" should be called "local timestamp
>  with unspecified timezone" and "timestamp with timezone" should be
>  called "global timestamp without timezone". However, those aren't SQL
>  names.
>
>  I would say this is a perspective thing. It's a timestamp with a time zone 
> from the client's perspective.
>
> A timestamp cannot have a time zone and be a valid timestamp.
>
> Let me explain.
>
> A timestamp is a single time that exists in the world.  For example March 1, 
> 2021, 4:15 am is a timestamp.
>
> If you add a time zone (other than UTC) then a time stamp is not always a 
> single time that exists in the world.
>
> For example in the spring using time zone American/Chicago, on April 14, 2021 
> the time zone time changes at 2am to become
> 3am.  The time April 14, 2021, 2:30 am simply does not exists.  And therefore 
> cannot be a timestamp.  Apple’s APIs will by
> default automatically change 2:30am to 3:00am.  Is that correct?  Or should 
> it change to 3:30am?  Apple has the option for the
> latter, but the APIs don’t work.
>
> In the fall it is even worse.  Using time zone America/Chicago, on November 
> 7, 2021, 1:30 am occurs twice.  That does not work
> as a timestamp.  Which one do you use, the early one or the late one.  
> Apple’s APIs give you a choice.
>
> The point being that people do expect to see times in local time, but the 
> only real timestamp is UTC and I can’t ever imagine a
> need to store time zone information related to a timestamp.  If you need to 
> store the location that data originated from, then
> store the location or the Time Zone, but it should not be connected to the 
> timestamp.  Location data is completely different than
> time data.
>

+1. This is the key point often overlooked. To make matters even more
complex, the daylight savings switch over dates can change, often at the
whim of politicians. For example, the daylight savings time has been
changed in Australia because of major events (such as the Olympics). As
soon as you bring time zones into the mix, any calculations based on
differences in dates must now als

Re: Timestamp with vs without time zone.

2021-09-21 Thread cen


On 21. 09. 21 23:34, Michael Lewis wrote:
Related to this current discussion and exchange of ideas... is there a 
best practice for retrieving data in such a way as the rows are 
localized to a timezone for where/group by purposes. That is, if I 
have a table which has events, but those events belong to a tenant or 
some entity that has a location which implies a timezone (or at least 
an offset), is there a best way to write a query similar to the below? 
Please forgive and overlook if there is some obvious syntax error, as 
this is just a quick and dirty example. Might it make sense to store a 
"localized" version of the timestamp *without* timezone on the event 
record such that an index can be used for fast retrieval and even 
grouping?


select
date_trunc( 'month', e.event_datetime AT TIMEZONE t.time_zone_name ) 
AS event_date,

count( e.id  )
from events AS e
join tenants AS t ON t.id  = e.tenant_id
where e.event_datetime AT TIMEZONE t.time_zone_name >= 
'01/01/2021'::DATE AND e.event_datetime AT TIMEZONE t.time_zone_name < 
'09/01/2021'::DATE;



This is an interesting case. A simplified query example would be to 
"give me all events for this year".


I am not sure what the cost of shifting UTC is, probably not much, but 
depending on use case it might make sense to deconstruct into date and 
time for query optimization.