Re: POC: Cleaning up orphaned files using undo logs

2021-09-19 Thread Antonin Houska
Amit Kapila  wrote:

> On Thu, Sep 9, 2021 at 8:33 PM Antonin Houska  wrote:

> > The problem of the temporary undo log is that it's loaded into local buffers
> > and that backend can exit w/o flushing local buffers to disk, and thus we 
> > are
> > not guaranteed to find enough information when trying to discard the undo 
> > log
> > the backend wrote. I'm thinking about the following solutions:
> >
> > 1. Let the backend manage temporary undo log on its own (even the slot
> >metadata would stay outside the shared memory, and in particular the
> >insertion pointer could start from 1 for each session) and remove the
> >segment files at the same moment the temporary relations are removed.
> >
> >However, by moving the temporary undo slots away from the shared memory,
> >computation of oldestFullXidHavingUndo (see the PROC_HDR structure) would
> >be affected. It might seem that a transaction which only writes undo log
> >for temporary relations does not need to affect oldestFullXidHavingUndo,
> >but it needs to be analyzed thoroughly. Since oldestFullXidHavingUndo
> >prevents transactions to be truncated from the CLOG too early, I wonder 
> > if
> >the following is possible (This scenario is only applicable to the zheap
> >storage engine [1], which is not included in this patch, but should 
> > already
> >be considered.):
> >
> >A transaction creates a temporary table, does some (many) changes and 
> > then
> >gets rolled back. The undo records are being applied and it takes some
> >time. Since XID of the transaction did not affect 
> > oldestFullXidHavingUndo,
> >the XID can disappear from the CLOG due to truncation.
> >
> 
> By above do you mean to say that in zheap code, we don't consider XIDs
> that operate on temp table/undo for oldestFullXidHavingUndo?

I was referring to the code

/* We can't process temporary undo logs. */
if (log->meta.persistence == UNDO_TEMP)
continue;

in undodiscard.c:UndoDiscard().

> 
> > However zundo.c in
> >[1] indicates that the transaction status *is* checked during undo
> >execution, so we might have a problem.
> >
> 
> It would be easier to follow if you can tell which exact code are you
> referring here?

In meant the call of TransactionIdDidCommit() in
zundo.c:zheap_exec_pending_rollback().

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Re: WIP: System Versioned Temporal Table

2021-09-19 Thread Corey Huinker
On Sun, Sep 19, 2021 at 3:12 PM Hannu Krosing  wrote:

> A side table has the nice additional benefit that we can very easily
> version the *table structure* so when we ALTER TABLE and the table
> structure changes we just make a new side table with now-currents
> structure.
>

It's true that would allow for perfect capture of changes to the table
structure, but how would you query the thing?

If a system versioned table was created with a column foo that is type
float, and then we dropped that column, how would we ever query what the
value of foo was in the past?

Would the columns returned from SELECT * change based on the timeframe
requested?

If we then later added another column that happened to also be named foo
but now was type JSONB, would we change the datatype returned based on the
time period being queried?

Is the change in structure a system versioning which itself must be
captured?


> Also we may want different set of indexes on historic table(s) for
> whatever reason
>

+1


>
> And we may even want to partition history tables for speed, storage
> cost  or just to drop very ancient history
>

+1


Re: WIP: System Versioned Temporal Table

2021-09-19 Thread Corey Huinker
>
> Thanks for giving this a lot of thought. When you asked the question
> the first time you hadn't discussed how that might work, but now we
> have something to discuss.
>

My ultimate goal is to unify this effort with the application period
effort. Step 1 in that was to understand what each was doing and why they
were doing it. If you check out the other thread, you'll see a highly
similar message that I sent over there.


> There are 3 implementation routes that I see, so let me explain so
> that others can join the discussion.
>
> 1. Putting all data in one table. This makes DROP SYSTEM VERSIONING
> effectively impossible. It requires access to the table to be
> rewritten to add in historical quals for non-historical access and it
> requires some push-ups around indexes. (The current patch adds the
> historic quals by kludging the parser, which is wrong place, since it
> doesn't work for joins etc.. However, given that issue, the rest seems
> to follow on naturally).
>
> 2. Putting data in a side table. This makes DROP SYSTEM VERSIONING
> fairly trivial, but it complicates many DDL commands (please make a
> list?) and requires the optimizer to know about this and cater to it,
> possibly complicating plans. Neither issue is insurmountable, but it
> becomes more intrusive.
>
> The current patch could go in either of the first 2 directions with
> further work.
>
> 3. Let the Table Access Method handle it. I call this out separately
> since it avoids making changes to the rest of Postgres, which might be
> a good thing, with the right TAM implementation.
>

I'd like to hear more about this idea number 3.

I could see value in allowing the history table to be a foreign table,
perhaps writing to csv/parquet/whatever files, and that sort of setup could
be persuasive to a regulator who wants extra-double-secret-proof that
auditing cannot be tampered with. But with that we'd have to give up the
relkind idea, which itself was going to be a cheap way to prevent updates
outside of the system triggers.


Re: POC: Cleaning up orphaned files using undo logs

2021-09-19 Thread Antonin Houska
Amit Kapila  wrote:

> On Fri, Sep 17, 2021 at 9:50 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:
> >
> > > On Tue, Sep 14, 2021 at 10:51:42AM +0200, Antonin Houska wrote:
> >
> > * What happened with the idea of abandoning discard worker for the sake
> >   of simplicity? From what I see limiting everything to foreground undo
> >   could reduce the core of the patch series to the first four patches
> >   (forgetting about test and docs, but I guess it would be enough at
> >   least for the design review), which is already less overwhelming.
> >
> 
> I think the discard worker would be required even if we decide to
> apply all the undo in the foreground. We need to forget/remove the
> undo of committed transactions as well which we can't remove
> immediately after the commit.

I think I proposed foreground discarding at some point, but you reminded me
that the undo may still be needed for some time even after transaction
commit. Thus the discard worker is indispensable.

What we can miss, at least for the cleanup of the orphaned files, is the *undo
worker*. In this patch series the cleanup is handled by the startup process.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Re: Deduplicate code updating ControleFile's DBState.

2021-09-19 Thread Amul Sul
On Thu, Sep 16, 2021 at 5:17 AM Michael Paquier  wrote:
>
> On Wed, Sep 15, 2021 at 10:49:39PM +, Bossart, Nathan wrote:
> > Ah, I was missing this context.  Perhaps this should be included in
> > the patch set for the other thread, especially if it will need to be
> > exported.
>
> This part of the patch is mentioned at the top of the thread:
> -   LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
> -   ControlFile->state = DB_IN_PRODUCTION;
> -   ControlFile->time = (pg_time_t) time(NULL);
> -
> +   SetControlFileDBState(DB_IN_PRODUCTION);
> SpinLockAcquire(>info_lck);
> XLogCtl->SharedRecoveryState = RECOVERY_STATE_DONE;
> SpinLockRelease(>info_lck);
>
> There is an assumption in this code to update SharedRecoveryState
> *while* holding ControlFileLock.  For example, see the following
> comment in xlog.c, ReadRecord():
> /*
>  * We update SharedRecoveryState while holding the lock on
>  * ControlFileLock so both states are consistent in shared
>  * memory.
>  */

Ok, understood, let's do that update with ControlFileLock, thanks.

Regards,
Amul




Re: Undocumented AT TIME ZONE INTERVAL syntax

2021-09-19 Thread Corey Huinker
>
>
>> Yeah, I really didn't expect to change the behavior, but wanted to make
> sure that the existing behavior was understood. I'll whip up a patch.
>

Attached is an attempt at an explanation of the edge cases I was
encountering, as well as some examples. If nothing else, the examples will
draw eyes and searches to the explanations that were already there.
From 618a7fbd5606510b993697a0a1968fde5f02fbb2 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Sun, 19 Sep 2021 22:34:42 -0400
Subject: [PATCH 1/2] Explain some of the nuances with INTERVAL data when the
 string literal offers fewer positions of information than than the fields
 spefication requires.

---
 doc/src/sgml/datatype.sgml | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 50a2c8e5f1..83ebb68333 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -2816,7 +2816,30 @@ P  years-months-
  defined with a fields specification, the interpretation of
  unmarked quantities depends on the fields.  For
  example INTERVAL '1' YEAR is read as 1 year, whereas
- INTERVAL '1' means 1 second.  Also, field values
+ INTERVAL '1' means 1 second. If the string provided contains
+ a : then the number to the left of the first :
+ will be used to fill in the most significant sub-day part of the
+ fields speficification. 
+
+
+SELECT INTERVAL '2' HOUR TO SECOND;
+ interval
+--
+ 00:00:02
+
+SELECT INTERVAL '4:2' HOUR TO SECOND;
+ interval
+--
+ 04:02:00
+
+
+SELECT INTERVAL '2:' DAY TO SECOND;
+ interval
+--
+ 02:00:00
+
+
+ Also, field values
  to the right of the least significant field allowed by the
  fields specification are silently discarded.  For
  example, writing INTERVAL '1 day 2:03:04' HOUR TO MINUTE
-- 
2.30.2

From 042467c84a4f9d3cbbdb7660b0b174d99dadde9d Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Sun, 19 Sep 2021 23:26:28 -0400
Subject: [PATCH 2/2] Add example to show the effect of AT TIME ZONE INTERVAL.

---
 doc/src/sgml/func.sgml | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 78812b2dbe..eee10f2db9 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -10435,6 +10435,9 @@ SELECT TIMESTAMP WITH TIME ZONE '2001-02-16 20:38:40-05' AT TIME ZONE 'America/D
 
 SELECT TIMESTAMP '2001-02-16 20:38:40' AT TIME ZONE 'Asia/Tokyo' AT TIME ZONE 'America/Chicago';
 Result: 2001-02-16 05:38:40
+
+SELECT TIMESTAMP '2001-02-16 20:38:40' AT TIME ZONE 'UTC' AT TIME ZONE INTERVAL '3:21:20' HOUR TO SECOND;
+Result: 2001-02-17 00:00:00+00
 
 The first example adds a time zone to a value that lacks it, and
 displays the value using the current TimeZone
@@ -10442,7 +10445,8 @@ SELECT TIMESTAMP '2001-02-16 20:38:40' AT TIME ZONE 'Asia/Tokyo' AT TIME ZONE 'A
 to the specified time zone, and returns the value without a time zone.
 This allows storage and display of values different from the current
 TimeZone setting.  The third example converts
-Tokyo time to Chicago time.
+Tokyo time to Chicago time. The fourth example adds the time zone UTC
+to a value that lacks it, and then applies a highly contrived fixed offset.

 

-- 
2.30.2



Unintended interaction between bottom-up deletion and deduplication's single value strategy

2021-09-19 Thread Peter Geoghegan
The return value of _bt_bottomupdel_pass() is advisory; it reports
"failure" for a deletion pass that was just suboptimal (it rarely
reports failure because it couldn't delete anything at all). Bottom-up
deletion preemptively falls back on a version-orientated deduplication
pass when it didn't quite delete as many items as it hoped to delete.
This policy avoids thrashing, particularly with low cardinality
indexes, where the number of distinct TIDs per tableam/heapam block
tends to drive when and how TIDs get deleted.

I have noticed an unintended and undesirable interaction between this
new behavior and an older deduplication behavior: it's possible for
_bt_bottomupdel_pass() to return false to trigger a preemptive
version-orientated deduplication pass that ends up using
deduplication's "single value" strategy. This is just contradictory on
its face: a version-orientated deduplication pass tries to prevent a
version-driven page split altogether, whereas a single value strategy
deduplication pass is specifically supposed to set things up for an
imminent page split (a split that uses nbtsplitloc.c's single value
strategy). Clearly we shouldn't prepare for a page split and try to
avoid a page split at the same time!

The practical consequence of this oversight is that leaf pages full of
duplicates (all duplicates of the same single value) are currently
much more likely to have a version-driven page split (from non-HOT
updates) than similar pages that have two or three distinct key
values. Another undesirable consequence is that we'll waste cycles in
affected cases; any future bottom-up index deletion passes will waste
time on the tuples that the intervening deduplication pass
deliberately declined to merge together (as any single value dedup
pass will). In other words, the heuristics described in comments above
_bt_bottomupdel_finish_pending() can become confused by this
misbehavior (after an initial round of deletion + deduplication, in a
later round of deletion). This interaction is clearly a bug. It's easy
to avoid.

Attached patch fixes the bug by slightly narrowing the conditions
under which we'll consider if we should apply deduplication's single
value strategy. We were already not even considering it with a unique
index, where it was always clear that this is only a
version-orientated deduplication pass. It seems natural to also check
whether or not we just had a "failed" call to _bt_bottomupdel_pass()
-- this is a logical extension of what we do already. Barring
objections, I will apply this patch (and backpatch to Postgres 14) in
a few days.

-- 
Peter Geoghegan


v1-0001-Fix-single-value-strategy-index-deletion-bug.patch
Description: Binary data


Re: Release 14 Schedule

2021-09-19 Thread Amit Kapila
On Mon, Sep 20, 2021 at 3:15 AM Jonathan S. Katz  wrote:
>
> On 9/19/21 12:32 PM, Justin Pryzby wrote:
> > On Sat, Sep 18, 2021 at 01:37:19PM -0400, Tom Lane wrote:
> >> We don't yet have a list-of-major-features for the v14 release notes.
> >> Anybody care to propose one?
> >
> >   . Allow extended statistics on column expressions;
> >   . Memoize node which can improve speed of nested loop joins;
> >   . Allow use of LZ4 compression for faster access to TOASTed fields;
> >   . JSONB and H-store types may be subscripted, as may be participating 
> > data types provided by extensions.
> >   . Many improvements to performance of VACUUM;
> >
> > Maybe these??
>
> I would propose a few different ones. I'm looking at the overall breadth
> of user impact as I propose these and the reactions I've seen in the field.
>
> - General performance improvements for databases with multiple
> connections (the MVCC snapshot work).
>
> - The reduction in bloat on frequently updated B-trees; that was a
> longstanding complaint against PostgreSQL that was resolved.
>
> - I agree with the JSON improvements; I'd bucket this in data types and
> include the support of multiranges.
>
> - Logical decoding / replication received some significant performance
> improvements
>
> - Many improvements in query parallelism. One that stands out is how
> parallel queries can be leveraged using FDWs now, in particular the
> postgres_fdw.
>
> - I agree with VACUUM suggestion as well.
>

+1 to this list. One enhancement which we might want to consider is:
Improve the performance of updates/deletes on partitioned tables when
only a few partitions are affected (Amit Langote, Tom Lane)

I think this will be quite useful for customers using partitions.

-- 
With Regards,
Amit Kapila.




Re: Improved PostgreSQL Mathematics Support.

2021-09-19 Thread Private Information Retrieval(PIR)
Your request is essentially to wrap the GMP library into native types in 
Postgres. This can be done as custom types and adding postgres extensions as 
you suggested originally. The work to be done is straightforward, but there is 
a lot of work so it would take a awhile to implement. The big integer part is 
rather simple, while there is some work to be done there the fractional part 
will take significantly longer because reasons ( think testing edge cases), but 
it is doable if there is interest in implementing this.

-The MuchPIR Team

Sent with [ProtonMail](https://protonmail.com/) Secure Email.

‐‐‐ Original Message ‐‐‐
On Sunday, September 19th, 2021 at 9:40 PM, A Z  wrote:

> Dear PostgreSQL Hackers,
>
> I have been trying to get a reply or interest in either updating
> PostgreSQL to support High Precision mathematical types,
> with arithmetic and elementary functions support, or release
> of an Extension which has accomplished the same thing.
>
> Is there someone on this email list which could please have a look
> at the specifications that I have posted, and reply and get back to
> me? I would be more than thrilled if something could be done
> to improve PostgreSQL in this area.
>
> Yours Sincerely,
>
> Z.M.

Re: Improved PostgreSQL Mathematics Support.

2021-09-19 Thread David G. Johnston
On Sunday, September 19, 2021, A Z  wrote:

>
> Is there someone on this email list which could please have a look
> at the specifications that I have posted, and reply and get back to
> me?
>

Given the number of posts you’ve made I would have to conclude that the
answer to that question is no.  There is presently no interest in this from
the people who read these mailing lists.

David J.


Re: Logical replication keepalive flood

2021-09-19 Thread Greg Nancarrow
On Sun, Sep 19, 2021 at 3:47 PM Amit Kapila  wrote:
>
> >
> > Yes, pg_recvlogical seems to be relying on receiving a keepalive for
> > its "--endpos" logic to work (and the 006 test is relying on '' record
> > output from pg_recvlogical in this case).
> > But is it correct to be relying on a keepalive for this?
> >
>
> I don't think this experiment/test indicates that pg_recvlogical's
> "--endpos" relies on keepalive. It would just print the records till
> --endpos and then exit. In the test under discussion, as per
> confirmation by Hou-San, the BEGIN record received has the same LSN as
> --endpos, so it would just output that and exit which is what is
> mentioned in pg_recvlogical docs as well (If there's a record with LSN
> exactly equal to lsn, the record will be output).
>

It seems to be relying on keepalive to get ONE specific record per
--endpos value, because once we apply the
"v1-0002-WIP-skip-the-keepalive-on-FIRST-loop-iteration.patch" patch,
then when pg_recvlogical is invoked for a second time with the same
--endos, it outputs the next record (BEGIN) too. So now for the same
--endpos value, we've had two different records output by
pg_recvlogical.
I have not seen this described in the documentation, so I think it
will need to be updated, should keepalives be reduced as per the
patch. The current documentation seems to be implying that one
particular record will be returned for a given --endpos  (at least,
there is no mention of the possibility of different records being
output for the one --endpos, or that the first_lsn of a transaction is
always equal to end_lsn of the previous transaction).

--endpos=lsn

   In --start mode, automatically stop replication and exit with
normal exit status 0 when receiving reaches the specified LSN. If
specified when not in --start mode, an error is raised.

   If there's a record with LSN exactly equal to lsn, the record will be output.

   The --endpos option is not aware of transaction boundaries and may
truncate output partway through a transaction. Any partially output
transaction will not be consumed and will be replayed again when the
slot is next read from. Individual messages are never truncated.


Regards,
Greg Nancarrow
Fujitsu Australia




Improved PostgreSQL Mathematics Support.

2021-09-19 Thread A Z
Dear PostgreSQL Hackers,

I have been trying to get a reply or interest in either updating
PostgreSQL to support High Precision mathematical types,
with arithmetic and elementary functions support, or release
of an Extension which has accomplished the same thing.

Is there someone on this email list which could please have a look
at the specifications that I have posted, and reply and get back to
me? I would be more than thrilled if something could be done
to improve PostgreSQL in this area.

Yours Sincerely,

Z.M.




Re: PoC/WIP: Extended statistics on expressions

2021-09-19 Thread Tomas Vondra

On 9/3/21 5:56 AM, Justin Pryzby wrote:

On Wed, Sep 01, 2021 at 06:45:29PM +0200, Tomas Vondra wrote:

However while polishing the second patch, I realized we're allowing
statistics on expressions referencing system attributes. So this fails;

CREATE STATISTICS s ON ctid, x FROM t;

but this passes:

CREATE STATISTICS s ON (ctid::text), x FROM t;

IMO we should reject such expressions, just like we reject direct references
to system attributes - patch attached.


Right, same as indexes.  +1



I've pushed this check, disallowing extended stats on expressions 
referencing system attributes. This means we'll reject both ctid and 
(ctid::text), just like for indexes.



Furthermore, I wonder if we should reject expressions without any Vars? This
works now:

CREATE STATISTICS s ON (11:text) FROM t;

but it seems rather silly / useless, so maybe we should reject it.


To my surprise, this is also allowed for indexes...

But (maybe this is what I was remembering) it's prohibited to have a constant
expression as a partition key.

Expressions without a var seem like a case where the user did something
deliberately silly, and dis-similar from the case of making a stats expression
on a simple column - that seemed like it could be a legitimate
mistake/confusion (it's not unreasonable to write an extra parenthesis, but
it's strange if that causes it to behave differently).

I think it's not worth too much effort to prohibit this: if they're determined,
they can still write an expresion with a var which is constant.  I'm not going
to say it's worth zero effort, though.



I've decided not to push this. The statistics objects on expressions not 
referencing any variables seem useless, but maybe not entirely - we 
allow volatile expressions, like


  CREATE STATISTICS s ON (random()) FROM t;

which I suppose might be useful. And we reject similar cases (except for 
the volatility, of course) for indexes too.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Release 14 Schedule

2021-09-19 Thread Jonathan S. Katz
On 9/19/21 12:32 PM, Justin Pryzby wrote:
> On Sat, Sep 18, 2021 at 01:37:19PM -0400, Tom Lane wrote:
>> We don't yet have a list-of-major-features for the v14 release notes.
>> Anybody care to propose one?
> 
>   . Allow extended statistics on column expressions;
>   . Memoize node which can improve speed of nested loop joins;
>   . Allow use of LZ4 compression for faster access to TOASTed fields;
>   . JSONB and H-store types may be subscripted, as may be participating data 
> types provided by extensions.
>   . Many improvements to performance of VACUUM;
> 
> Maybe these??

I would propose a few different ones. I'm looking at the overall breadth
of user impact as I propose these and the reactions I've seen in the field.

- General performance improvements for databases with multiple
connections (the MVCC snapshot work).

- The reduction in bloat on frequently updated B-trees; that was a
longstanding complaint against PostgreSQL that was resolved.

- I agree with the JSON improvements; I'd bucket this in data types and
include the support of multiranges.

- Logical decoding / replication received some significant performance
improvements

- Many improvements in query parallelism. One that stands out is how
parallel queries can be leveraged using FDWs now, in particular the
postgres_fdw.

- I agree with VACUUM suggestion as well.

I can try proposing some wording on this in a bit; I'm working on the
overdue draft of the press release, and thought I'd chime in here first.

Thanks,

Jonathan



OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

2021-09-19 Thread Andrew Dunstan



On 9/17/21 5:35 PM, Greg Stark wrote:
> Hm. Let's Encrypt's FAQ tells me I'm on the right track with that
> question but the distinctinos are far more coarse than I was worried
> about:
>
>
> Does Let’s Encrypt issue certificates for anything other than SSL/TLS
> for websites?
>
> Let’s Encrypt certificates are standard Domain Validation
> certificates, so you can use them for any server that uses a domain
> name, like web servers, mail servers, FTP servers, and many more.
>
> Email encryption and code signing require a different type of
> certificate that Let’s Encrypt does not issue.



Presumably this should be a certificate something like our client certs,
where the subject designates a user id or similar (e.g. an email
address) rather than a domain name.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: Undocumented AT TIME ZONE INTERVAL syntax

2021-09-19 Thread Corey Huinker
On Sun, Sep 19, 2021 at 10:56 AM Tom Lane  wrote:

> Corey Huinker  writes:
> >> SELECT '2018-03-04' AT TIME ZONE INTERVAL '2' HOUR TO MINUTE;
>
> > ... But none of this is in our own documentation.
>
> That's not entirely true.  [1] says
>
> When writing an interval constant with a fields specification, or when
> assigning a string to an interval column that was defined with a
> fields specification, the interpretation of unmarked quantities
> depends on the fields. For example INTERVAL '1' YEAR is read as 1
> year, whereas INTERVAL '1' means 1 second. Also, field values “to the
> right” of the least significant field allowed by the fields
> specification are silently discarded. For example, writing INTERVAL '1
> day 2:03:04' HOUR TO MINUTE results in dropping the seconds field, but
> not the day field.
>

That text addresses the case of the unadorned string (seconds) and the
overflow
case (more string values than places to put them), but doesn't really
address
the underflow.


>
> But I'd certainly agree that a couple of examples are not a specification.
> Looking at DecodeInterval, it looks like the rule is that unmarked or
> ambiguous fields are matched to the lowest field mentioned by the typmod
> restriction.  Thus
>
> regression=# SELECT INTERVAL '4:2' HOUR TO MINUTE;
>  interval
> --
>  04:02:00
> (1 row)
>
> regression=# SELECT INTERVAL '4:2' MINUTE TO SECOND;
>  interval
> --
>  00:04:02
> (1 row)


# SELECT INTERVAL '04:02' HOUR TO SECOND;

 interval

--

 04:02:00


This result was a bit unexpected, and the existing documentation doesn't
address underflow cases like this.

So, restating all this to get ready to document it, the rule seems to be:


1. Integer strings with no spaces or colons will always apply to the
rightmost end of the restriction given, lack of a restriction means seconds.


Example:


# SELECT INTERVAL '2' HOUR TO SECOND, INTERVAL '2' HOUR TO MINUTE, INTERVAL
'2';
 interval | interval | interval
--+--+--
 00:00:02 | 00:02:00 | 00:00:02
(1 row)



2. Strings with time context (space separator for days, : for everything
else) will apply starting with the leftmost part of the spec that fits,
continuing to the right until string values are exhausted.


Examples:


# SELECT INTERVAL '4:2' HOUR TO SECOND, INTERVAL '4:2' DAY TO SECOND;
 interval | interval
--+--
 04:02:00 | 04:02:00

(1 row)



> If you wanted to improve this para it'd be cool with me.
>

I think people's eyes are naturally drawn to the example tables, and
because the rules for handling string underflow are subtle, I think a few
concrete examples are the way to go.



>
> > Before I write a patch to add this to the documentation, I'm curious what
> > level of sloppiness we should tolerate in the interval calculation.
> Should
> > we enforce the time string to actually conform to the format laid out in
> > the X TO Y spec?
>
> We have never thrown away high-order fields:
>

And with the above I'm now clear that we're fine with the existing behavior
for underflow.


>
> I'm not sure what the SQL spec says here, but I'd be real hesitant to
> change the behavior of cases that we've accepted for twenty-plus
> years, unless they're just obviously insane.  Which these aren't IMO.
>

Yeah, I really didn't expect to change the behavior, but wanted to make
sure that the existing behavior was understood. I'll whip up a patch.


Re: WIP: System Versioned Temporal Table

2021-09-19 Thread Hannu Krosing
A side table has the nice additional benefit that we can very easily
version the *table structure* so when we ALTER TABLE and the table
structure changes we just make a new side table with now-currents
structure.

Also we may want different set of indexes on historic table(s) for
whatever reason

And we may even want to partition history tables for speed, storage
cost  or just to drop very ancient history

-
Hannu Krosing
Google Cloud - We have a long list of planned contributions and we are hiring.
Contact me if interested.

On Sun, Sep 19, 2021 at 8:32 PM Simon Riggs
 wrote:
>
> On Sun, 19 Sept 2021 at 01:16, Corey Huinker  wrote:
> >>
> >> 1. Much of what I have read about temporal tables seemed to imply or 
> >> almost assume that system temporal tables would be implemented as two 
> >> actual separate tables. Indeed, SQLServer appears to do it that way [1] 
> >> with syntax like
> >>
> >> WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.WebsiteUserInfoHistory));
> >>
> >>
> >> Q 1.1. Was that implementation considered and if so, what made this 
> >> implementation more appealing?
> >>
> >
> > I've been digging some more on this point, and I've reached the conclusion 
> > that a separate history table is the better implementation. It would make 
> > the act of removing system versioning into little more than a DROP TABLE, 
> > plus adjusting the base table to reflect that it is no longer system 
> > versioned.
>
> Thanks for giving this a lot of thought. When you asked the question
> the first time you hadn't discussed how that might work, but now we
> have something to discuss.
>
> > 10. Queries that omit the FOR SYSTEM_TIME clause, as well as ones that use 
> > FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP, would simply use the base table 
> > directly with no quals to add.
> > 11. Queries that use FOR SYSTEM_TIME and not FOR SYSTEM_TIME AS OF 
> > CURRENT_TIMESTAMP, then the query would do a union of the base table and 
> > the history table with quals applied to both.
>
>
> > 14. DROP SYSTEM VERSIONING from a table would be quite straightforward - 
> > the history table would be dropped along with the triggers that reference 
> > it, setting relissystemversioned = 'f' on the base table.
> >
> > I think this would have some key advantages:
> >
> > 1. MVCC bloat is no worse than it was before.
>
> The number of row versions stored in the database is the same for
> both, just it would be split across two tables in this form.
>
> > 2. No changes whatsoever to referential integrity.
>
> The changes were fairly minor, but I see your thinking about indexes
> as a simplification.
>
> > 3. DROP SYSTEM VERSIONING becomes an O(1) operation.
>
> It isn't top of mind to make this work well. The whole purpose of the
> history is to keep it, not to be able to drop it quickly.
>
>
> > Thoughts?
>
> There are 3 implementation routes that I see, so let me explain so
> that others can join the discussion.
>
> 1. Putting all data in one table. This makes DROP SYSTEM VERSIONING
> effectively impossible. It requires access to the table to be
> rewritten to add in historical quals for non-historical access and it
> requires some push-ups around indexes. (The current patch adds the
> historic quals by kludging the parser, which is wrong place, since it
> doesn't work for joins etc.. However, given that issue, the rest seems
> to follow on naturally).
>
> 2. Putting data in a side table. This makes DROP SYSTEM VERSIONING
> fairly trivial, but it complicates many DDL commands (please make a
> list?) and requires the optimizer to know about this and cater to it,
> possibly complicating plans. Neither issue is insurmountable, but it
> becomes more intrusive.
>
> The current patch could go in either of the first 2 directions with
> further work.
>
> 3. Let the Table Access Method handle it. I call this out separately
> since it avoids making changes to the rest of Postgres, which might be
> a good thing, with the right TAM implementation.
>
> My preferred approach would be to do this "for free" in the table
> access method, but we're a long way from this in terms of actual
> implementation. When Corey  suggested earlier that we just put the
> syntax in there, this was the direction I was thinking.
>
> After waiting a day since I wrote the above, I think we should go with
> (2) as Corey suggests, at least for now, and we can always add (3)
> later.
>
> --
> Simon Riggshttp://www.EnterpriseDB.com/
>
>




Re: WIP: System Versioned Temporal Table

2021-09-19 Thread Simon Riggs
On Sun, 19 Sept 2021 at 01:16, Corey Huinker  wrote:
>>
>> 1. Much of what I have read about temporal tables seemed to imply or almost 
>> assume that system temporal tables would be implemented as two actual 
>> separate tables. Indeed, SQLServer appears to do it that way [1] with syntax 
>> like
>>
>> WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.WebsiteUserInfoHistory));
>>
>>
>> Q 1.1. Was that implementation considered and if so, what made this 
>> implementation more appealing?
>>
>
> I've been digging some more on this point, and I've reached the conclusion 
> that a separate history table is the better implementation. It would make the 
> act of removing system versioning into little more than a DROP TABLE, plus 
> adjusting the base table to reflect that it is no longer system versioned.

Thanks for giving this a lot of thought. When you asked the question
the first time you hadn't discussed how that might work, but now we
have something to discuss.

> 10. Queries that omit the FOR SYSTEM_TIME clause, as well as ones that use 
> FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP, would simply use the base table 
> directly with no quals to add.
> 11. Queries that use FOR SYSTEM_TIME and not FOR SYSTEM_TIME AS OF 
> CURRENT_TIMESTAMP, then the query would do a union of the base table and the 
> history table with quals applied to both.


> 14. DROP SYSTEM VERSIONING from a table would be quite straightforward - the 
> history table would be dropped along with the triggers that reference it, 
> setting relissystemversioned = 'f' on the base table.
>
> I think this would have some key advantages:
>
> 1. MVCC bloat is no worse than it was before.

The number of row versions stored in the database is the same for
both, just it would be split across two tables in this form.

> 2. No changes whatsoever to referential integrity.

The changes were fairly minor, but I see your thinking about indexes
as a simplification.

> 3. DROP SYSTEM VERSIONING becomes an O(1) operation.

It isn't top of mind to make this work well. The whole purpose of the
history is to keep it, not to be able to drop it quickly.


> Thoughts?

There are 3 implementation routes that I see, so let me explain so
that others can join the discussion.

1. Putting all data in one table. This makes DROP SYSTEM VERSIONING
effectively impossible. It requires access to the table to be
rewritten to add in historical quals for non-historical access and it
requires some push-ups around indexes. (The current patch adds the
historic quals by kludging the parser, which is wrong place, since it
doesn't work for joins etc.. However, given that issue, the rest seems
to follow on naturally).

2. Putting data in a side table. This makes DROP SYSTEM VERSIONING
fairly trivial, but it complicates many DDL commands (please make a
list?) and requires the optimizer to know about this and cater to it,
possibly complicating plans. Neither issue is insurmountable, but it
becomes more intrusive.

The current patch could go in either of the first 2 directions with
further work.

3. Let the Table Access Method handle it. I call this out separately
since it avoids making changes to the rest of Postgres, which might be
a good thing, with the right TAM implementation.

My preferred approach would be to do this "for free" in the table
access method, but we're a long way from this in terms of actual
implementation. When Corey  suggested earlier that we just put the
syntax in there, this was the direction I was thinking.

After waiting a day since I wrote the above, I think we should go with
(2) as Corey suggests, at least for now, and we can always add (3)
later.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Release 14 Schedule

2021-09-19 Thread Justin Pryzby
On Sat, Sep 18, 2021 at 01:37:19PM -0400, Tom Lane wrote:
> We don't yet have a list-of-major-features for the v14 release notes.
> Anybody care to propose one?

  . Allow extended statistics on column expressions;
  . Memoize node which can improve speed of nested loop joins;
  . Allow use of LZ4 compression for faster access to TOASTed fields;
  . JSONB and H-store types may be subscripted, as may be participating data 
types provided by extensions.
  . Many improvements to performance of VACUUM;

Maybe these??
Improve the performance of updates/deletes on partitioned tables when only a 
few partitions are affected (Amit Langote, Tom Lane)
Add SQL-standard SEARCH and CYCLE clauses for common table expressions (Peter 
Eisentraut)
Allow REINDEX to process all child tables or indexes of a partitioned relation 
(Justin Pryzby, Michael Paquier)

BTW I wondered if this should be mentioned as an incompatibile change:

commit 3d351d916b20534f973eda760cde17d96545d4c4
Redefine pg_class.reltuples to be -1 before the first VACUUM or ANALYZE.

-- 
Justin




Re: postgres.h included from relcache.h - but removing it breaks pg_upgrade

2021-09-19 Thread Alexander Korotkov
On Sat, Sep 18, 2021 at 11:35 PM Alvaro Herrera  wrote:
> On 2021-Sep-18, Alexander Korotkov wrote:
>
> > I see now.  I think I'm rather favoring splitting visibilitymap.h.
>
> Agreed, this looks sane to me.  However, I think the
> VM_ALL_{VISIBLE,FROZEN} macros should remain in visibilitymap.h, since
> they depend on the visibilitymap_get_status function (and pg_upgrade
> doesn't use them).
>
> There's a typo "maros" for "macros" in the new header file.  (Also, why
> does the copyright line say "portions" if no portion under another
> copyright?  I think we don't say "portions" when there is only one
> copyright statement line.)

Thank you for the feedback.  All changes are accepted.

Any objections to pushing this?

--
Regards,
Alexander Korotkov


0001-Split-macros-from-visibilitymap.h-into-a-separate--3.patch
Description: Binary data


Re: Undocumented AT TIME ZONE INTERVAL syntax

2021-09-19 Thread Tom Lane
Corey Huinker  writes:
>> SELECT '2018-03-04' AT TIME ZONE INTERVAL '2' HOUR TO MINUTE;

> ... But none of this is in our own documentation.

That's not entirely true.  [1] says

When writing an interval constant with a fields specification, or when
assigning a string to an interval column that was defined with a
fields specification, the interpretation of unmarked quantities
depends on the fields. For example INTERVAL '1' YEAR is read as 1
year, whereas INTERVAL '1' means 1 second. Also, field values “to the
right” of the least significant field allowed by the fields
specification are silently discarded. For example, writing INTERVAL '1
day 2:03:04' HOUR TO MINUTE results in dropping the seconds field, but
not the day field.

But I'd certainly agree that a couple of examples are not a specification.
Looking at DecodeInterval, it looks like the rule is that unmarked or
ambiguous fields are matched to the lowest field mentioned by the typmod
restriction.  Thus

regression=# SELECT INTERVAL '4:2' HOUR TO MINUTE;
 interval 
--
 04:02:00
(1 row)

regression=# SELECT INTERVAL '4:2' MINUTE TO SECOND;
 interval 
--
 00:04:02
(1 row)

If you wanted to improve this para it'd be cool with me.

> Before I write a patch to add this to the documentation, I'm curious what
> level of sloppiness we should tolerate in the interval calculation. Should
> we enforce the time string to actually conform to the format laid out in
> the X TO Y spec?

We have never thrown away high-order fields:

regression=# SELECT INTERVAL '1 day 4:2' MINUTE TO SECOND;
interval

 1 day 00:04:02
(1 row)

AFAICS we consider that the typmod provides a rounding rule, not a
license to transform the value to something entirely different.

I'm not sure what the SQL spec says here, but I'd be real hesitant to
change the behavior of cases that we've accepted for twenty-plus
years, unless they're just obviously insane.  Which these aren't IMO.

regards, tom lane

[1] 
https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-INTERVAL-INPUT