Sv: Support for dates before 4713 BC

2024-03-04 Thread Andreas Joseph Krogh


På mandag 04. mars 2024 kl. 17:01:46, skrev Koschicek-Krombholz, Bernhard <
bernhard.koschicek-krombh...@oeaw.ac.at 
>:

It would be very useful if PostgreSQL would support dates before 4713 BC.

Out of curiosity, are there any plans to implement it in the near future? 

https://www.postgresql.org/message-id/flat/ca438ff8331c4e109aa1b75a130948ac%40oeaw.ac.at
 







--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com 
www.visena.com 
 


CTEs and concurrency

2024-03-04 Thread Brecht De Rooms
Dear,

I am currently running a chaos test on a system (essentially starting nodes
that process something and randomly knockign them out). It appeared to work
fine with regular tests but I am seeing occasional duplicate key value
violattions of a uniqueness constraint on one of the complexer CTE-based
queries. Something that  only happens with concurrency where nodes restart
and ample load.
I can not reproduce it by taking out the query and running it manually in
PG Admin, it behaves fine if I do so and does exactly what I expect.

The query looks like this (it uses Rust SQLX which is why there is some
unnesting happening on the parameters).

WITH unnested_inputs AS (
SELECT * FROM (
SELECT
unnest($1::uuid[]) AS event_id,
unnest($2::varchar[]) AS type,
unnest($3::int[]) AS version,
unnest($4::uuid[]) AS causation_id,
unnest($5::uuid[]) AS correlation_id,
unnest($6::text[]) AS idempotency_key,
unnest($7::jsonb[]) AS data,
unnest($8::jsonb[]) AS metadata,
unnest($9::text[]) AS subscription_id,
unnest($10::text[]) AS subscription_instance_identifier,
unnest($11::bigint[]) AS applied_order_id
) AS inputs
),
to_update_subscription_logs AS (
SELECT sl.id as subscription_log_id, sl.node_id, sl.status, ui.*
FROM subscription_log sl
JOIN unnested_inputs ui
ON sl.event_id = ui.causation_id
AND sl.node_id = $12
AND sl.status = 'assigned'
AND sl.subscription_id = ui.subscription_id
AND sl.subscription_instance_identifier =
ui.subscription_instance_identifier
FOR UPDATE NOWAIT -- if something is updating it, we probably shouldn't
touch it anymore.
),
updated_logs AS (
UPDATE subscription_log sl
SET status = 'processed',
updated_at = CURRENT_TIMESTAMP
FROM to_update_subscription_logs usl
WHERE sl.id = usl.subscription_log_id
AND usl.node_id = $12
),
inserted_event_log AS (
INSERT INTO event_log (
event_id, type, version, causation_id, correlation_id,
idempotency_key, data, metadata, created_at
)
SELECT
event_id, type, version, usl.causation_id, correlation_id,
idempotency_key, data, metadata, CURRENT_TIMESTAMP
FROM to_update_subscription_logs usl
),
inserted_output_routing_info AS (
INSERT INTO output_event_routing (event_id, subscription_id,
subscription_instance_identifier, applied_order_id)
SELECT event_id, subscription_id, subscription_instance_identifier,
applied_order_id
FROM to_update_subscription_logs usl
),

SELECT * FROM to_update_subscription_logs

The tables look as follows:

CREATE TABLE event_log (
event_id UUID PRIMARY KEY,
event_order_id BIGINT REFERENCES event(order_id),
type varchar NOT NULL,
version int NOT NULL,
causation_id UUID,
correlation_id UUID,
idempotency_key TEXT NOT NULL,
data JSONB NOT NULL,
metadata JSONB,
created_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT constraint_event_log_unique_idempotency_key
UNIQUE(idempotency_key) -- idempotent writes.
);

CREATE TABLE output_event_routing (
event_id UUID REFERENCES event_log(event_id),
subscription_id TEXT NOT NULL,
subscription_instance_identifier TEXT,
applied_order_id BIGINT,
CONSTRAINT constraint_output_event_routing_uniqueness
UNIQUE(subscription_id, subscription_instance_identifier, applied_order_id)
);

CREATE TABLE subscription_log (
id UUID NOT NULL PRIMARY KEY,
event_id UUID NOT NULL,
event_order_id BIGINT NOT NULL,
event_correlation_id UUID NOT NULL,
subscription_instance_identifier TEXT NOT NULL,
subscription_id TEXT NOT NULL REFERENCES subscription(name),
status processing_status NOT NULL DEFAULT 'enqueued',
node_id UUID references node(id), -- is null until assigned.
);

Since I'm trying to avoid using PL/pgSQL upon request I tried to achieve
the following behaviour in CTEs:
- For given events, update the subscription log to 'processed' only if we
still are the node that is processing these and the status is still
'assigned'.
- Only for the events where the previous succeeded, continue processing by
inserting in the event_log and inserting in
the inserted_output_routing_info.
The mechanism aims to make sure we don't insert results of event processing
twice.

When logging the input values, we can see that there are indeed two times
the same value sets (exactly the same) passed for different nodes, that's
to be expected and exactly what has to be caught by this logic. Same
values, but another node. What we see is that one node succeeds and the
other node fails due to the uniqueness violation. Which is actually fine
from a business perspective since rolling back has the same effect, albeit
with an error that I didn't expect. However, I would love to understand
this, how can one node succeed and set the status of the log to 'processed'
and continue to insert 

Support for dates before 4713 BC

2024-03-04 Thread Koschicek-Krombholz, Bernhard
It would be very useful if PostgreSQL would support dates before 4713 BC.
Out of curiosity, are there any plans to implement it in the near future?

All the best,
Bernhard Koschicek-Krombholz


Re: When manual analyze is needed

2024-03-04 Thread veem v
On Mon, 4 Mar 2024 at 21:46, Greg Sabino Mullane  wrote:

> On Mon, Mar 4, 2024 at 12:23 AM veem v  wrote:
>
>> Additionally if a query was working fine but suddenly takes a
>> suboptimal plan because of missing stats , do we have any hash value column
>> on any performance view associated with the queryid which we can refer to
>> see past vs current plans difference and identify such issues quickly and
>> fix it?
>>
>
> You can use auto_explain; nothing else tracks things at that fine a level.
> You can use pg_stat_statements to track the average and max time for each
> query. Save and reset periodically to make it more useful.
>
> https://www.postgresql.org/docs/current/auto-explain.html
>
> https://www.postgresql.org/docs/current/pgstatstatements.html
>
>

Thank you so much Greg. That helps.

We were planning to have the auto_explain extension added and set the
log_min_duration to ~5 seconds and log_analyze to true. So that all the
queries going above that time period will be logged and provide detailed
information on the exact point of bottleneck. Will it be a good idea to set
it on production DB which is a highly active database? or should we only
have the extension added but only set the parameters while we debug some
performance issue and then reset it back after we are done.


Is partition pruning impacted by data type

2024-03-04 Thread sud
Hi,
We are designing one application which is currently restricted to one time
zone users but has the possibility to go global in future. Some of the
transaction tables are going to be daily range partitioned on the
transaction_create_date column. But the "date" data type will have no time
component in it, so we are thinking to make it as timestamp data
type(timestamptz(6)), so that it will help us in us two ways,

firstly , though current use cases in which the majority of the queries are
going to happen on a day or multiple days of transactions. But if we have
any use case which needs further lower granularity like in hourly duration
, then having "timestamp" data type with an index created on it will help.
And in future , if we plan to partition it based on further lower
granularity like hourly , that can be accommodated easily with a
"timestamp" data type.

However the question we have is ,
1)If there is any downside of having the partition key with "timestamp with
timezone" type? Will it impact the partition pruning of the queries anyway
by appending any run time "time zone" conversion function during the query
planning/execution phase?
2) As it will take the default server times , so during daylight saving
the server time will change, so in that case, can it cause any unforeseen
issue?
3)Will this cause the data to be spread unevenly across partitions and make
the partitions unevenly sized? If will go for UTC/GMT as db time, the
user's one day transaction might span across two daily partitions.


Thanks and Regards
Sud


Re: When manual analyze is needed

2024-03-04 Thread Greg Sabino Mullane
On Mon, Mar 4, 2024 at 12:23 AM veem v  wrote:

> Additionally if a query was working fine but suddenly takes a
> suboptimal plan because of missing stats , do we have any hash value column
> on any performance view associated with the queryid which we can refer to
> see past vs current plans difference and identify such issues quickly and
> fix it?
>

You can use auto_explain; nothing else tracks things at that fine a level.
You can use pg_stat_statements to track the average and max time for each
query. Save and reset periodically to make it more useful.

https://www.postgresql.org/docs/current/auto-explain.html

https://www.postgresql.org/docs/current/pgstatstatements.html


> I am not seeing any such column in pg_stat_activity or pg_stat_statements
> to hold hash value of the plan and also the query column is showing
> "" for many of the entries, why so?
>

Ordinary users are not allowed to see what other people are running. You
can add a user to the pg_read_all_stats role to allow this:

GRANT pg_read_all_stats TO alice;

Oftentimes someone needing access to the stats also needs a little more
access, so consider the pg_monitor role as well. Both are documented here:

https://www.postgresql.org/docs/current/predefined-roles.html

Cheers,
Greg


Re: postgres_fdw aggregate pushdown for group by with expressions

2024-03-04 Thread Tom Lane
=?utf-8?Q?Micha=C5=82_K=C5=82eczek?=  writes:
> Does that mean that *any* expression containing text constants is not going 
> to be pushed down??
> That would be a really serious issues I’d say.

If it contains text constants and not text variables, possibly so.
Keep in mind though that this rule is applied after constant-folding,
so that what we're left with is normally going to contain variables
at every level.  extract() is problematic because it combines a
text constant with a datetime (collation-less) variable.

regards, tom lane




Re: Inconsistent results in timestamp/interval comparison

2024-03-04 Thread Francisco Olarte
On Mon, 4 Mar 2024 at 14:06,  wrote:
> Am 04.03.2024 13:45 schrieb Francisco Olarte:
> > Intervals are composed of months, days and seconds, as not every month
> > has 30 days and not every day has 86400 seconds, so to compare them
> > you have to normalize them somehow, which can lead to bizarre results.
> Ah, I see, thanks for the explanation.  I had the (apparently wrong)
> impression that Postgres _internally_ always uses numerical values (i.e.
> the equivalent of EXTRACT(EPOCH …)) for such calculations.  My bad…
> However, a clarification in the docs might be helpful!

Ah, the elusive timestamp/interval clarification. Intervals being
three numbers, and timestamp(tz) being just a fancy one with fancy
text conversions and arithmetic rules take a lot of this list
bandwidth.

I work with telephony and similar things with use a lot of durations,
which I just store in numeric/real/integer columns, but due to history
and some reporting convenience I store points in time as timestamps.
Adding is easy, just do ts_col + duration_col * '1 second'::interval,
but I'm really looking forward to convert ts[tz] to numbers too, so I
can substract them. Substracting is a bit more hairy, but extract
epoch on the result normally solves it, as they are justified
intervals. But I just made some memory refreshing queries and they
still manage to surprise me, and I've been using postgres from before
sql.

Francisco Olarte.




Re: Inconsistent results in timestamp/interval comparison

2024-03-04 Thread albrecht . dress

Am 04.03.2024 13:45 schrieb Francisco Olarte:

Intervals are composed of months, days and seconds, as not every month
has 30 days and not every day has 86400 seconds, so to compare them
you have to normalize them somehow, which can lead to bizarre results.


Ah, I see, thanks for the explanation.  I had the (apparently wrong) 
impression that Postgres _internally_ always uses numerical values (i.e. 
the equivalent of EXTRACT(EPOCH …)) for such calculations.  My bad…


However, a clarification in the docs might be helpful!


If you want to do point in time arithmetic, you will be better of by
extracting epoch from your timestamps and substracting that.


I can confirm that using the query

select now(), t1, extract(epoch from now() - t1) >= extract (epoch from 
'2 years'::interval), now() >= (t1 + '2 years'::interval) from testtab;


produces consistent results.

Thanks a lot for your help,
Albrecht.




Re: Inconsistent results in timestamp/interval comparison

2024-03-04 Thread Alban Hertroys
On Mon, 4 Mar 2024 at 13:46, Francisco Olarte 
wrote:

> On Mon, 4 Mar 2024 at 13:10,  wrote:
> > According to the documentation, Table 9.31, IMHO both comparisons should
> > produce the same results, as
>
> > timestamp - timestamp → interval
> > timestamp + interval → timestamp
> Your problem may be due to interval comparison.
>
> Intervals are composed of months, days and seconds, as not every month
> has 30 days and not every day has 86400 seconds, so to compare them
> you have to normalize them somehow, which can lead to bizarre results.
>
> => select '2 years'::interval > '1 year 362 days'::interval;
>  ?column?
> --
>  f
> (1 row)
>
> => select '2 years'::interval > '1 year 359 days'::interval;
>  ?column?
> --
>  t
> (1 row)
>
> => select '2 years'::interval > '1 year 360 days'::interval;
>  ?column?
> --
>  f
> (1 row)
>
> => select '2 years'::interval = '1 year 360 days'::interval;
>  ?column?
> --
>  t
> (1 row)
>
> If you want to do point in time arithmetic, you will be better of by
> extracting epoch from your timestamps and substracting that. Intervals
> are more for calendar arithmetic on the type "set me a date two
> months, three days and four hours from the last".
>
> Francisco Olarte.
>

To elaborate, justify_interval(t) shows how the length of the interval ends
up when there is no timestamp to base the end of the interval on:

=> with testtab(t1) as (
select cast(v as timestamp with time zone)
from (values ('2022-02-27 11:46:33'), ('2022-03-11 23:39:17'),
('2022-03-21 17:49:02')) x(v)
)
select now(), t1, now() - t1 "now()-t1", justify_interval(now() -t1)
from testtab;
 now  |   t1   |now()-t1
  |   justify_interval
--++-+---
 2024-03-04 13:00:31.00386+00 | 2022-02-27 11:46:33+00 | 736 days
01:13:58.00386 | 2 years 16 days 01:13:58.00386
 2024-03-04 13:00:31.00386+00 | 2022-03-11 23:39:17+00 | 723 days
13:21:14.00386 | 2 years 3 days 13:21:14.00386
 2024-03-04 13:00:31.00386+00 | 2022-03-21 17:49:02+00 | 713 days
19:11:29.00386 | 1 year 11 mons 23 days 19:11:29.00386
(3 rows)

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


Re: Inconsistent results in timestamp/interval comparison

2024-03-04 Thread Francisco Olarte
On Mon, 4 Mar 2024 at 13:10,  wrote:
> According to the documentation, Table 9.31, IMHO both comparisons should
> produce the same results, as

> timestamp - timestamp → interval
> timestamp + interval → timestamp
Your problem may be due to interval comparison.

Intervals are composed of months, days and seconds, as not every month
has 30 days and not every day has 86400 seconds, so to compare them
you have to normalize them somehow, which can lead to bizarre results.

=> select '2 years'::interval > '1 year 362 days'::interval;
 ?column?
--
 f
(1 row)

=> select '2 years'::interval > '1 year 359 days'::interval;
 ?column?
--
 t
(1 row)

=> select '2 years'::interval > '1 year 360 days'::interval;
 ?column?
--
 f
(1 row)

=> select '2 years'::interval = '1 year 360 days'::interval;
 ?column?
--
 t
(1 row)

If you want to do point in time arithmetic, you will be better of by
extracting epoch from your timestamps and substracting that. Intervals
are more for calendar arithmetic on the type "set me a date two
months, three days and four hours from the last".

Francisco Olarte.




Re: postgres_fdw aggregate pushdown for group by with expressions

2024-03-04 Thread Michał Kłeczek
Hi All,

> On 3 Mar 2024, at 10:34, Michał Kłeczek  wrote:
> 
> Hi,
> 
> I have the following foreign table:
> 
> CREATE FOREIGN TABLE t1 (
>  grouping_column text,
>  date_column date,
>  whatever_data int
> );
> 
> The query is:
> 
> SELECT
>  sum(whatever_data)
> FROM
>  t1
> GROUP BY
>  grouping_colulmn, extract(YEAR FROM date_column), extract(MONTH FROM 
> date_column);
> 
> From my (preliminary) testing postgres_fdw will not push down this aggregate 
> query
> - it will happily push down query with only “grouping_column” or 
> “grouping_column, date_column" in GROUP BY

I was able to make it work by creating an extension and installing it on both 
sides with functions:

year(date) IMMUTABLE
month(date) IMMUTABLE

But it looks there are more issues - the following queries work fine (ie. are 
pushed down to remote server):

SELECT * FROM t1 WHERE grouping_column LIKE ‘abcd%’;
and
SELECT sum(whatever_data) FROM t1 GROUP BY grouping_column, year(date_column), 
month(date_column)

But together - unfortunately not:

SELECT grouping_column, sum(whatever_data) FROM t1 WHERE grouping_column LIKE 
‘abcd%' GROUP BY grouping_column, year(date_column)

In this case aggregate is performed locally.
Not sure if this is because of statistics (but I set fdw_tuple_cost to a very 
high value 9 to try to force planner to push down aggregates)

—
Michal



Inconsistent results in timestamp/interval comparison

2024-03-04 Thread albrecht . dress

Hi all,

I run the “official” deb package postgresql-16 v. 16.2-1.pgdg120+2 on a 
Debian Bookworm system, and observed a confusing behavior in a 
calculation with time stamps and intervals.


To reproduce, consider the following trivial example:


create table testtab (t1 timestamp without time zone);
insert into testtab values ('2022-02-27 11:46:33'), ('2022-03-11 
23:39:17'), ('2022-03-21 17:49:02');
test=# select now(), t1, (now() - t1) >= '2 years'::interval, now() >= 
(t1 + '2 years'::interval) from testtab;
  now  | t1  | ?column? | 
?column?

---+-+--+--
 2024-03-04 12:59:39.796969+01 | 2022-02-27 11:46:33 | t| t
 2024-03-04 12:59:39.796969+01 | 2022-03-11 23:39:17 | t| f
 2024-03-04 12:59:39.796969+01 | 2022-03-21 17:49:02 | f| f
(3 Zeilen)


According to the documentation, Table 9.31, IMHO both comparisons should 
produce the same results, as


timestamp - timestamp → interval
timestamp + interval → timestamp

i.e.

(now() - t1) >= '2 years'::interval# add t1 on both sides of the 
comparison

now() >= (t1 + '2 years'::interval)

As only the second example is wrong for the 1st comparison method, this 
might indicate some rounding and/or insufficient precision issue.


Or did I miss something here?

Thanks in advance,
Albrecht.





Re: When manual analyze is needed

2024-03-04 Thread Laurenz Albe
On Mon, 2024-03-04 at 10:16 +0530, veem v wrote:
> So the partitioned table stats is nothing but the rolledover stats of all the 
> partitions.
> As you mentioned, autoanalyze only works for child partitions but not the 
> partition tables,
> so does it mean we have to schedule some jobs manually(through some 
> scheduletr like pg_cron)
> to analyze these partitioned tables at certain time intervals to keep those 
> up to date for
> partitioned tables?

Something like that, yes.

> And won't that scan all the child partitions again to have the stats 
> aggregated/rolled
> over from all the child partitions level to the table level?

Yes.

> Additionally if a query was working fine but suddenly takes a suboptimal plan 
> because
> of missing stats , do we have any hash value column on any performance view 
> associated
> with the queryid which we can refer to see past vs current plans difference 
> and identify
> such issues quickly and fix it?

Not that I know of.

Yours,
Laurenz Albe