Re: Simplify standby state machine a bit in WaitForWALToBecomeAvailable()

2023-03-05 Thread Michael Paquier
On Sat, Mar 04, 2023 at 09:47:05AM +0530, Bharath Rupireddy wrote:
> Okay, here's a patch attached.

Thanks.

+ * When source == XLOG_FROM_ANY, this function first searches for the segment
+ * with a TLI in archive first, if not found, it searches in pg_wal. This way,
+ * if there is a WAL segment with same passed-in segno but different TLI
+ * present in both the archive and pg_wal, it prefers the one with higher TLI.
+ * The reason for this is that if for example we try to do archive recovery to
+ * timeline 2, which branched off timeline 1, but the WAL for timeline 2 is not
+ * archived yet, we would replay past the timeline switch point on timeline 1
+ * using the archived WAL segment, before even looking timeline 2's WAL
+ * segments in pg_wal.

This is pretty much what the commit has mentioned.  The first half
provides enough details, IMO.
--
Michael


signature.asc
Description: PGP signature


Re: add PROCESS_MAIN to VACUUM

2023-03-05 Thread Michael Paquier
On Wed, Mar 01, 2023 at 10:53:59PM -0800, Nathan Bossart wrote:
> I don't feel a strong need for that, especially now that we aren't
> modifying params anymore.

That was mostly OK for me, so applied after tweaking a couple of
places in the tests (extra explanations, for one), the comments and
the code.

The part that improved the tests of PROCESS_TOAST was useful on its
own, so I have done that separately as 46d490a.  Another thing that I
have found incorrect is the need for two pg_stat_reset() calls that
would reflect on the whole database, in the middle of a test running
in parallel of other things.  As far as I understand, you have added
that to provide fresh data after a single command while relying on
vactst, but it is possible to get the same amount of coverage by
relying on cumulated counts, and that gets even more solid with all
these commands run on an independent table.
--
Michael


signature.asc
Description: PGP signature


Re: shoud be get_extension_schema visible?

2023-03-05 Thread Michael Paquier
On Mon, Mar 06, 2023 at 08:34:49AM +0100, Pavel Stehule wrote:
>> Note for other reviewers / committers: this is a something actually already
>> wanted for 3rd party code.  As an example, here's Pavel's code in
>> plpgsql_check
>> extension that internally has to duplicate this function (and deal with
>> compatibility):
>> https://github.com/okbob/plpgsql_check/blob/master/src/catalog.c#L205

I can see why you'd want that, so OK from here to provide this routine
for external consumption.  Let's first wait a bit and see if others
have any kind of objections or comments.
--
Michael


signature.asc
Description: PGP signature


Re: psql: Add role's membership options to the \du+ command

2023-03-05 Thread Pavel Luzanov

On 03.03.2023 19:21, David G. Johnston wrote:
I'd be fine with "pg_can_admin_role" being a newly created function 
that provides this true/false answer but it seems indisputable that 
today there is no core-provided means to answer the question "can one 
role get ADMIN rights on another role".  Modifying \du to show this 
seems out-of-scope but the pg_has_role function already provides that 
question for INHERIT and SET so it is at least plausible to extend it 
to include ADMIN, even if the phrase "has role" seems a bit of a 
misnomer.  I do cover this aspect with the Role Graph pseudo-extension 
but given the presence and ease-of-use of a boolean-returning function 
this seems like a natural addition.  We've also survived quite long 
without it - this isn't a new concept in v16, just a bit refined.


I must admit that I am slowly coming to the same conclusions that you 
have already outlined in previous messages.


Indeed, adding ADMIN to pg_has_role looks logical. The function will 
show whether one role can manage another directly or indirectly (via SET 
ROLE).
Adding ADMIN will lead to the question of naming other values. It is 
more reasonable to have INHERIT instead of USAGE.
And it is not very clear whether (except for backward compatibility) a 
separate MEMBER value is needed at all.


I wouldn't bother starting yet another thread in this area right now, 
this one can absorb some related changes as well as the subject line item.


I agree.

--
Pavel Luzanov
Postgres Professional:https://postgrespro.com


Re: shoud be get_extension_schema visible?

2023-03-05 Thread Pavel Stehule
po 6. 3. 2023 v 8:33 odesílatel Julien Rouhaud  napsal:

> Hi,
>
> On Sun, Feb 19, 2023 at 06:40:39AM +0100, Pavel Stehule wrote:
> >
> > pá 17. 2. 2023 v 6:45 odesílatel Pavel Stehule 
> > napsal:
> >
> > > more times I needed to get the extension's assigned namespace. There is
> > > already a cooked function get_extension_schema, but it is static.
> > >
> > > I need to find a function with a known name, but possibly an unknown
> > > schema from a known extension.
> > >
> >
> > Here is an patch
>
> The patch is trivial so I don't have much to say about it, and it also
> seems
> quite reasonable generally.
>
> Note for other reviewers / committers: this is a something actually already
> wanted for 3rd party code.  As an example, here's Pavel's code in
> plpgsql_check
> extension that internally has to duplicate this function (and deal with
> compatibility):
> https://github.com/okbob/plpgsql_check/blob/master/src/catalog.c#L205
>
> I'm marking this entry as Ready For Committer.
>

Thank you very much

Pavel


Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

2023-03-05 Thread Drouvot, Bertrand

Hi,

On 2/16/23 10:21 PM, Andres Freund wrote:

Hi,

On 2023-02-15 09:21:48 +0100, Drouvot, Bertrand wrote:

diff --git a/src/backend/utils/activity/pgstat_relation.c 
b/src/backend/utils/activity/pgstat_relation.c
index f793ac1516..b26e2a5a7a 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -471,20 +471,46 @@ pgstat_fetch_stat_tabentry_ext(bool shared, Oid reloid)
   * Find any existing PgStat_TableStatus entry for rel_id in the current
   * database. If not found, try finding from shared tables.
   *
+ * If an entry is found, copy it and increment the copy's counters with their
+ * subtransactions counterparts. Then return the copy. There is no need for the
+ * caller to pfree the copy as the MemoryContext will be reset soon after.
+ *


The "There is no need" bit seems a bit off. Yes, that's true for the current
callers, but who says that it has to stay that way?



Good point. Wording has been changed in V6 attached.


Otherwise this looks ready, on a casual scan.



Thanks for having looked at it!

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/src/backend/utils/activity/pgstat_relation.c 
b/src/backend/utils/activity/pgstat_relation.c
index f793ac1516..c030e1782e 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -471,20 +471,46 @@ pgstat_fetch_stat_tabentry_ext(bool shared, Oid reloid)
  * Find any existing PgStat_TableStatus entry for rel_id in the current
  * database. If not found, try finding from shared tables.
  *
+ * If an entry is found, copy it and increment the copy's counters with their
+ * subtransactions counterparts. Then return the copy. The caller may need to
+ * pfree the copy (in case the MemoryContext is not reset soon after).
+ *
  * If no entry found, return NULL, don't create a new one
  */
 PgStat_TableStatus *
 find_tabstat_entry(Oid rel_id)
 {
PgStat_EntryRef *entry_ref;
+   PgStat_TableXactStatus *trans;
+   PgStat_TableStatus *tabentry = NULL;
+   PgStat_TableStatus *tablestatus = NULL;
 
entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, 
MyDatabaseId, rel_id);
if (!entry_ref)
+   {
entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, 
InvalidOid, rel_id);
+   if(!entry_ref)
+   return tablestatus;
+   }
+
+   tabentry = (PgStat_TableStatus *) entry_ref->pending;
+   tablestatus = palloc(sizeof(PgStat_TableStatus));
+   *tablestatus = *tabentry;
+
+   /*
+* Live subtransactions' counts aren't in t_counts yet. This is not a 
hot
+* code path so it sounds ok to reconcile for tuples_inserted,
+* tuples_updated and tuples_deleted even if this is not what the caller
+* is interested in.
+*/
+   for (trans = tabentry->trans; trans != NULL; trans = trans->upper)
+   {
+   tablestatus->t_counts.t_tuples_inserted += 
trans->tuples_inserted;
+   tablestatus->t_counts.t_tuples_updated += trans->tuples_updated;
+   tablestatus->t_counts.t_tuples_deleted += trans->tuples_deleted;
+   }
 
-   if (entry_ref)
-   return entry_ref->pending;
-   return NULL;
+   return tablestatus;
 }
 
 /*
diff --git a/src/backend/utils/adt/pgstatfuncs.c 
b/src/backend/utils/adt/pgstatfuncs.c
index b61a12382b..b4de57c535 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1546,17 +1546,11 @@ pg_stat_get_xact_tuples_inserted(PG_FUNCTION_ARGS)
Oid relid = PG_GETARG_OID(0);
int64   result;
PgStat_TableStatus *tabentry;
-   PgStat_TableXactStatus *trans;
 
if ((tabentry = find_tabstat_entry(relid)) == NULL)
result = 0;
else
-   {
-   result = tabentry->t_counts.t_tuples_inserted;
-   /* live subtransactions' counts aren't in t_tuples_inserted yet 
*/
-   for (trans = tabentry->trans; trans != NULL; trans = 
trans->upper)
-   result += trans->tuples_inserted;
-   }
+   result = (int64) (tabentry->t_counts.t_tuples_inserted);
 
PG_RETURN_INT64(result);
 }
@@ -1567,17 +1561,11 @@ pg_stat_get_xact_tuples_updated(PG_FUNCTION_ARGS)
Oid relid = PG_GETARG_OID(0);
int64   result;
PgStat_TableStatus *tabentry;
-   PgStat_TableXactStatus *trans;
 
if ((tabentry = find_tabstat_entry(relid)) == NULL)
result = 0;
else
-   {
-   result = tabentry->t_counts.t_tuples_updated;
-   /* live subtransactions' counts aren't in t_tuples_updated yet 
*/
-   for (trans = tabentry->trans; trans != NULL; trans = 
trans->upper)
-   

using memoize in in paralel query decreases performance

2023-03-05 Thread Pavel Stehule
Hi

In one query I can see very big overhead of memoize node - unfortunately
with hits = 0

The Estimate is almost very good. See details in attachment

Regards

Pavel


slow
Description: Binary data


Re: shoud be get_extension_schema visible?

2023-03-05 Thread Julien Rouhaud
Hi,

On Sun, Feb 19, 2023 at 06:40:39AM +0100, Pavel Stehule wrote:
>
> pá 17. 2. 2023 v 6:45 odesílatel Pavel Stehule 
> napsal:
>
> > more times I needed to get the extension's assigned namespace. There is
> > already a cooked function get_extension_schema, but it is static.
> >
> > I need to find a function with a known name, but possibly an unknown
> > schema from a known extension.
> >
>
> Here is an patch

The patch is trivial so I don't have much to say about it, and it also seems
quite reasonable generally.

Note for other reviewers / committers: this is a something actually already
wanted for 3rd party code.  As an example, here's Pavel's code in plpgsql_check
extension that internally has to duplicate this function (and deal with
compatibility):
https://github.com/okbob/plpgsql_check/blob/master/src/catalog.c#L205

I'm marking this entry as Ready For Committer.




Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2023-03-05 Thread Kyotaro Horiguchi
At Mon, 06 Mar 2023 15:24:25 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> In any case, I think we need to avoid such concurrent autovacuum/analyze.

If it is correct, I believe the attached fix works.

regads.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 937b2101b3..023ec5ecc4 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1137,7 +1137,7 @@ SELECT pg_stat_get_subscription_stats(NULL);
 -- extends.
 SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'relation' \gset
-CREATE TABLE test_io_shared(a int);
+CREATE TABLE test_io_shared(a int) WITH (autovacuum_enabled = 'false');
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
  pg_stat_force_next_flush 
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 74e592aa8a..aa6552befd 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -549,7 +549,7 @@ SELECT pg_stat_get_subscription_stats(NULL);
 -- extends.
 SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'relation' \gset
-CREATE TABLE test_io_shared(a int);
+CREATE TABLE test_io_shared(a int) WITH (autovacuum_enabled = 'false');
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
 SELECT sum(extends) AS io_sum_shared_after_extends


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-05 Thread Amit Kapila
On Mon, Mar 6, 2023 at 10:12 AM Peter Smith  wrote:
>
> 4. IdxIsRelationIdentityOrPK
>
> +/*
> + * Given a relation and OID of an index, returns true if the
> + * index is relation's replica identity index or relation's
> + * primary key's index.
> + *
> + * Returns false otherwise.
> + */
> +bool
> +IdxIsRelationIdentityOrPK(Relation rel, Oid idxoid)
> +{
> + Assert(OidIsValid(idxoid));
> +
> + return GetRelationIdentityOrPK(rel) == idxoid;
> +}
>
> I think you've "simplified" this function in v28 but AFAICT now it has
> a different logic to v27.
>
> PREVIOUSLY it was coded like
> + return RelationGetReplicaIndex(rel) == idxoid ||
> + RelationGetPrimaryKeyIndex(rel) == idxoid;
>
> You can see if 'idxoid' did NOT match RI but if it DID match PK
> previously it would return true. But now in that scenario, it won't
> even check the PK if there was a valid RI. So it might return false
> when previously it returned true. Is it deliberate?
>

I don't see any problem with this because by default PK will be a
replica identity. So only if the user specifies the replica identity
full or changes the replica identity to some other index, we will try
to get PK which seems valid for this case. Am, I missing something
which makes this code do something bad?

Few other comments on latest code:

1.
   
-   A published table must have a replica identity configured in
+   A published table must have a replica
identity configured in

How the above change is related to this patch?

2.
certain additional requirements) can also be set to be the replica
-   identity.  If the table does not have any suitable key, then it can be set
+   identity. If the table does not have any suitable key, then it can be set

I think we should change the spacing of existing docs (two spaces
after fullstop to one space) and that too inconsistently. I suggest to
add new changes with same spacing as existing doc. If you are adding
entirely new section then we can consider differently.

3.
to replica identity full, which means the entire row becomes
-   the key.  This, however, is very inefficient and should only be used as a
-   fallback if no other solution is possible.  If a replica identity other
-   than full is set on the publisher side, a replica identity
-   comprising the same or fewer columns must also be set on the subscriber
-   side.  See  for details on
+   the key. When replica identity FULL is specified,
+   indexes can be used on the subscriber side for searching the rows. These

Shouldn't specifying FULL be consistent wih existing docs?

-- 
With Regards,
Amit Kapila.




Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-03-05 Thread Masahiko Sawada
On Fri, Mar 3, 2023 at 8:04 PM John Naylor  wrote:
>
> On Wed, Mar 1, 2023 at 6:59 PM Masahiko Sawada  wrote:
> >
> > On Wed, Mar 1, 2023 at 3:37 PM John Naylor  
> > wrote:
> > >
> > > I think we're trying to solve the wrong problem here. I need to study 
> > > this more, but it seems that code that needs to stay within a memory 
> > > limit only needs to track what's been allocated in chunks within a block, 
> > > since writing there is what invokes a page fault.
> >
> > Right. I guess we've discussed what we use for calculating the *used*
> > memory amount but I don't remember.
> >
> > I think I was confused by the fact that we use some different
> > approaches to calculate the amount of used memory. Parallel hash and
> > tidbitmap use the allocated chunk size whereas hash_agg_check_limits()
> > in nodeAgg.c uses MemoryContextMemAllocated(), which uses the
> > allocated block size.
>
> That's good to know. The latter says:
>
>  * After adding a new group to the hash table, check whether we need to enter
>  * spill mode. Allocations may happen without adding new groups (for instance,
>  * if the transition state size grows), so this check is imperfect.
>
> I'm willing to claim that vacuum can be imperfect also, given the tid store's 
> properties: 1) on average much more efficient in used space, and 2) no longer 
> bound by the 1GB limit.
>
> > > I'm not sure how this affects progress reporting, because it would be 
> > > nice if it didn't report dead_tuple_bytes bigger than 
> > > max_dead_tuple_bytes.
> >
> > Yes, the progress reporting could be confusable. Particularly, in
> > shared tidstore cases, the dead_tuple_bytes could be much bigger than
> > max_dead_tuple_bytes. Probably what we need might be functions for
> > MemoryContext and dsa_area to get the amount of memory that has been
> > allocated, by not tracking every chunk space. For example, the
> > functions would be like what SlabStats() does; iterate over every
> > block and calculates the total/free memory usage.
>
> I'm not sure we need to invent new infrastructure for this. Looking at v29 in 
> vacuumlazy.c, the order of operations for memory accounting is:
>
> First, get the block-level space -- stop and vacuum indexes if we exceed the 
> limit:
>
> /*
>  * Consider if we definitely have enough space to process TIDs on page
>  * already.  If we are close to overrunning the available space for
>  * dead_items TIDs, pause and do a cycle of vacuuming before we tackle
>  * this page.
>  */
> if (TidStoreIsFull(vacrel->dead_items)) --> which is basically "if 
> (TidStoreMemoryUsage(ts) > ts->control->max_bytes)"
>
> Then, after pruning the current page, store the tids and then get the 
> block-level space again:
>
> else if (prunestate.num_offsets > 0)
> {
>   /* Save details of the LP_DEAD items from the page in dead_items */
>   TidStoreSetBlockOffsets(...);
>
>   pgstat_progress_update_param(PROGRESS_VACUUM_DEAD_TUPLE_BYTES,
>TidStoreMemoryUsage(dead_items));
> }
>
> Since the block-level measurement is likely overestimating quite a bit, I 
> propose to simply reverse the order of the actions here, effectively 
> reporting progress for the *last page* and not the current one: First update 
> progress with the current memory usage, then add tids for this page. If this 
> allocated a new block, only a small bit of that will be written to. If this 
> block pushes it over the limit, we will detect that up at the top of the 
> loop. It's kind of like our earlier attempts at a "fudge factor", but simpler 
> and less brittle. And, as far as OS pages we have actually written to, I 
> think it'll effectively respect the memory limit, at least in the local mem 
> case. And the numbers will make sense.
>
> Thoughts?

It looks to work but it still doesn't work in a case where a shared
tidstore is created with a 64kB memory limit, right?
TidStoreMemoryUsage() returns 1MB and TidStoreIsFull() returns true
from the beginning.

BTW I realized that since the caller can pass dsa_area to tidstore
(and radix tree), if other data are allocated in the same DSA are,
TidStoreMemoryUsage() (and RT_MEMORY_USAGE()) returns the memory usage
that includes not only itself but also other data. Probably it's
better to comment that the passed dsa_area should be dedicated to a
tidstore (or a radix tree).

>
> But now that I'm looking more closely at the details of memory accounting, I 
> don't like that TidStoreMemoryUsage() is called twice per page pruned (see 
> above). Maybe it wouldn't noticeably slow things down, but it's a bit sloppy. 
> It seems like we should call it once per loop and save the result somewhere. 
> If that's the right way to go, that possibly indicates that TidStoreIsFull() 
> is not a useful interface, at least in this form.

Agreed.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2023-03-05 Thread Kyotaro Horiguchi
At Sat, 04 Mar 2023 18:21:09 -0500, Tom Lane  wrote in 
> Andres Freund  writes:
> > Just pushed the actual pg_stat_io view, the splitting of the tablespace 
> > test,
> > and the pg_stat_io tests.
> 
> One of the test cases is flapping a bit:
> 
> diff -U3 
> /home/pg/build-farm-15/buildroot/HEAD/pgsql.build/src/test/regress/expected/stats.out
>  
> /home/pg/build-farm-15/buildroot/HEAD/pgsql.build/src/test/regress/results/stats.out
> --- 
> /home/pg/build-farm-15/buildroot/HEAD/pgsql.build/src/test/regress/expected/stats.out
>  2023-03-04 21:30:05.891579466 +0100
> +++ 
> /home/pg/build-farm-15/buildroot/HEAD/pgsql.build/src/test/regress/results/stats.out
>   2023-03-04 21:34:26.745552661 +0100
> @@ -1201,7 +1201,7 @@
>  SELECT :io_sum_shared_after_reads > :io_sum_shared_before_reads;
>   ?column? 
>  --
> - t
> + f
>  (1 row)
>  
>  DROP TABLE test_io_shared;
> 
> There are two instances of this today [1][2], and I've seen it before
> but failed to note down where.

The concurrent autoanalyze below is logged as performing at least one
page read from the table. It is unclear, however, how that analyze
operation resulted in 19 hits and 2 reads on the (I think) single-page
relation.

In any case, I think we need to avoid such concurrent autovacuum/analyze.


https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grison=2023-03-04%2021%3A19%3A39

2023-03-04 22:36:27.781 CET [4073:106] pg_regress/stats LOG:  statement: ALTER 
TABLE test_io_shared SET TABLESPACE regress_tblspace;
2023-03-04 22:36:27.838 CET [4073:107] pg_regress/stats LOG:  statement: SELECT 
COUNT(*) FROM test_io_shared;
2023-03-04 22:36:27.864 CET [4255:5] LOG:  automatic analyze of table 
"regression.public.test_io_shared"
avg read rate: 5.208 MB/s, avg write rate: 5.208 MB/s
buffer usage: 17 hits, 2 misses, 2 dirtied
2023-03-04 22:36:28.024 CET [4073:108] pg_regress/stats LOG:  statement: SELECT 
pg_stat_force_next_flush();
2023-03-04 22:36:28.024 CET [4073:108] pg_regress/stats LOG:  statement: SELECT 
pg_stat_force_next_flush();
2023-03-04 22:36:28.027 CET [4073:109] pg_regress/stats LOG:  statement: SELECT 
sum(reads) AS io_sum_shared_after_reads
  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 
'relation'  



> [1] 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grison=2023-03-04%2021%3A19%3A39
> [2] 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mule=2023-03-04%2020%3A30%3A05

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Inaccurate comment for pg_get_partkeydef

2023-03-05 Thread Japin Li

PSA patch to fix a comment inaccurate.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 6dc117dea8..bcb493b56c 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -1855,7 +1855,7 @@ pg_get_statisticsobjdef_expressions(PG_FUNCTION_ARGS)
  *
  * Returns the partition key specification, ie, the following:
  *
- * PARTITION BY { RANGE | LIST | HASH } (column opt_collation opt_opclass [, ...])
+ * { RANGE | LIST | HASH } (column opt_collation opt_opclass [, ...])
  */
 Datum
 pg_get_partkeydef(PG_FUNCTION_ARGS)


Re: Removing unneeded self joins

2023-03-05 Thread Michał Kłeczek
Hi All,

I just wanted to ask about the status and plans for this patch.
I can see it being stuck at “Waiting for Author” status in several commit tests.

I think this patch would be really beneficial for us as we heavily use views to 
structure out code.
Each view is responsible for providing some calculated values and they are 
joined in a query to retrieve the full set of information.

Not sure how the process works and how I could help (I am absolutely not 
capable of helping with coding I am afraid - but could sponsor a (small :) ) 
bounty to speed things up).

Thanks,
Michal

> On 16 Dec 2022, at 07:45, Andrey Lepikhov  wrote:
> 
> On 12/6/22 23:46, Andres Freund wrote:
>> This doesn't pass the main regression tests due to a plan difference.
>> https://cirrus-ci.com/task/5537518245380096
>> https://api.cirrus-ci.com/v1/artifact/task/5537518245380096/testrun/build/testrun/regress/regress/regression.diffs
>> diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/join.out 
>> /tmp/cirrus-ci-build/build/testrun/regress/regress/results/join.out
>> --- /tmp/cirrus-ci-build/src/test/regress/expected/join.out  2022-12-05 
>> 19:11:52.453920838 +
>> +++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/join.out  
>> 2022-12-05 19:15:21.864183651 +
>> @@ -5806,7 +5806,7 @@
>>   Nested Loop
>> Join Filter: (sj_t3.id = sj_t1.id)
>> ->  Nested Loop
>> - Join Filter: (sj_t3.id = sj_t2.id)
>> + Join Filter: (sj_t2.id = sj_t3.id)
>>   ->  Nested Loop Semi Join
>> ->  Nested Loop
>>   ->  HashAggregate
> This change in the test behaviour is induced by the a5fc4641
> "Avoid making commutatively-duplicate clauses in EquivalenceClasses."
> Nothing special, as I see. Attached patch fixes this.
> 
> -- 
> Regards
> Andrey Lepikhov
> Postgres Professional
> 





Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-05 Thread Peter Smith
Here are some review comments for v28-0001.

==
doc/src/sgml/logical-replication.sgml

1.
A published table must have a replica identity configured in order to
be able to replicate UPDATE and DELETE operations, so that appropriate
rows to update or delete can be identified on the subscriber side. By
default, this is the primary key, if there is one. Another unique
index (with certain additional requirements) can also be set to be the
replica identity. When replica identity FULL is specified, indexes can
be used on the subscriber side for searching the rows. These indexes
should be btree, non-partial and have at least one column reference
(e.g., should not consist of only expressions). These restrictions on
the non-unique index properties are in essence the same restrictions
that are enforced for primary keys. Internally, we follow the same
approach for supporting index scans within logical replication scope.
If there are no such suitable indexes, the search on the subscriber
side can be very inefficient, therefore replica identity FULL should
only be used as a fallback if no other solution is possible. If a
replica identity other than full is set on the publisher side, a
replica identity comprising the same or fewer columns must also be set
on the subscriber side. See REPLICA IDENTITY for details on how to set
the replica identity. If a table without a replica identity is added
to a publication that replicates UPDATE or DELETE operations then
subsequent UPDATE or DELETE operations will cause an error on the
publisher. INSERT operations can proceed regardless of any replica
identity.

~

1a.
Changes include:
"should" --> "must"
"e.g." --> "i.e."

BEFORE
These indexes should be btree, non-partial and have at least one
column reference (e.g., should not consist of only expressions).

SUGGESTION
Candidate indexes must be btree, non-partial, and have at least one
column reference (i.e., cannot consist of only expressions).

~

1b.
The fix for my v27 review comment #2b (changing "full" to FULL) was
not made correctly. It should be uppercase FULL, not full:
"other than full" --> "other than FULL"

==
src/backend/executor/execReplication.c

2.
 /*
  * Setup a ScanKey for a search in the relation 'rel' for a tuple 'key' that
  * is setup to match 'rel' (*NOT* idxrel!).
  *
- * Returns whether any column contains NULLs.
+ * Returns how many columns should be used for the index scan.
+ *
+ * This is not generic routine, it expects the idxrel to be
+ * a btree, non-partial and have at least one column
+ * reference (e.g., should not consist of only expressions).
  *
- * This is not generic routine, it expects the idxrel to be replication
- * identity of a rel and meet all limitations associated with that.
+ * By definition, replication identity of a rel meets all
+ * limitations associated with that. Note that any other
+ * index could also meet these limitations.
  */
-static bool
+static int
 build_replindex_scan_key(ScanKey skey, Relation rel, Relation idxrel,
  TupleTableSlot *searchslot)

~

"(e.g., should not consist of only expressions)" --> "(i.e., cannot
consist of only expressions)"

==
src/backend/replication/logical/relation.c

3. FindUsableIndexForReplicaIdentityFull

+/*
+ * Returns the oid of an index that can be used via the apply
+ * worker. The index should be btree, non-partial and have at
+ * least one column reference (e.g., should not consist of
+ * only expressions). The limitations arise from
+ * RelationFindReplTupleByIndex(), which is designed to handle
+ * PK/RI and these limitations are inherent to PK/RI.

The 2nd sentence of this comment should match the same changes in the
Commit message --- "must not" instead of "should not", "i.e." instead
of "e.g.", etc. See the review comment #1a above.

~~~

4. IdxIsRelationIdentityOrPK

+/*
+ * Given a relation and OID of an index, returns true if the
+ * index is relation's replica identity index or relation's
+ * primary key's index.
+ *
+ * Returns false otherwise.
+ */
+bool
+IdxIsRelationIdentityOrPK(Relation rel, Oid idxoid)
+{
+ Assert(OidIsValid(idxoid));
+
+ return GetRelationIdentityOrPK(rel) == idxoid;
+}

I think you've "simplified" this function in v28 but AFAICT now it has
a different logic to v27.

PREVIOUSLY it was coded like
+ return RelationGetReplicaIndex(rel) == idxoid ||
+ RelationGetPrimaryKeyIndex(rel) == idxoid;

You can see if 'idxoid' did NOT match RI but if it DID match PK
previously it would return true. But now in that scenario, it won't
even check the PK if there was a valid RI. So it might return false
when previously it returned true. Is it deliberate?

==
.../subscription/t/032_subscribe_use_index.pl

5.
+# Testcase start: SUBSCRIPTION USES INDEX WITH PUB/SUB different data
+#
+# The subscriber has duplicate tuples that publisher does not have.
+# When publsher updates/deletes 1 row, subscriber uses indexes and
+# exactly updates/deletes 1 row.

"and exactly updates/deletes 1 row." --> "and 

Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?

2023-03-05 Thread Michael Paquier
On Mon, Feb 20, 2023 at 01:54:00PM +0530, Bharath Rupireddy wrote:
> I ran some tests on my dev system [1] and I don't see much difference
> between v3 and v4. So, +1 for v3 patch (+ argument order swap) from
> Andres to keep the code simple and elegant.

This thread has stalled for a couple of weeks, so I have gone back to
it.  Testing on a tmpfs I am not seeing a difference if performance
for any of the approaches discussed.  At the end, as I am the one
behind the introduction of pg_pwrite_zeros(), I have applied v3 after 
switches the size and offset parameters to be the same way as in v4.
--
Michael


signature.asc
Description: PGP signature


pg_rewind: Skip log directory for file type check like pg_wal

2023-03-05 Thread Soumyadeep Chakraborty
Hello hackers,

I think we should extend the "log" directory the same courtesy as was
done for pg_wal (pg_xlog) in 0e42397f42b.

Today, even if BOTH source and target servers have symlinked "log"
directories, pg_rewind fails with:

file "log" is of different type in source and target.

Attached is a repro patch using the 004_pg_xlog_symlink.pl test to
demonstrate the failure.
Running make check PROVE_TESTS='t/004_pg_xlog_symlink.pl'
in src/bin/pg_rewind should suffice after applying.

This is because when we use the libpq query to fetch the filemap from
the source server, we consider the log directory as a directory, even if
it is a symlink. This is because pg_stat_file() is used in that query in
libpq_traverse_files() and pg_stat_file() returns isdir=t for symlinks
to directories.

This shortcoming is somewhat called out:

 * XXX: There is no backend function to get a symbolic link's target in
 * general, so if the admin has put any custom symbolic links in the data
 * directory, they won't be copied correctly.

We could fix the query and/or pg_stat_file(). However, we would also
like to support deployments where only one of the primaries and/or
standbys have the symlink. That is not hard to conceive, given primaries
and standbys can have drastically disparate log volume and/or log
collection requirements.

Attached is a patch that treats "log" like we treat "pg_wal".

Regards,
Soumyadeep (VMware)
From 697414d2b630efdad0a9137ea9cc93f8576a9792 Mon Sep 17 00:00:00 2001
From: Soumyadeep Chakraborty 
Date: Sun, 5 Mar 2023 17:57:55 -0800
Subject: [PATCH v1 1/1] Fix pg_rewind when log is a symlink

The log directory can often be symlinked in the same way the pg_wal
directory is. Today, even if BOTH the source and target servers have
their log directories symlinked, pg_rewind will run into the error:

"log" is of different type in source and target

This is because when we use the libpq query to fetch the filemap from
the source server, we consider the log directory as a directory, even if
it is a symlink. This is because pg_stat_file() is used in that query in
libpq_traverse_files() and pg_stat_file() returns isdir=t for symlinks
to directories.

This shortcoming is somewhat called out:

 * XXX: There is no backend function to get a symbolic link's target in
 * general, so if the admin has put any custom symbolic links in the data
 * directory, they won't be copied correctly.

We could fix the query and/or pg_stat_file(). However, we would also
like to support deployments where only one of the primaries and/or
standbys have the symlink. That is not hard to conceive, given primaries
and standbys can have drastically disparate log volume and/or log
collection requirements.

So, we decide to extend the log directory the same courtesy as was done
for pg_wal (pg_xlog) in 0e42397f42b.
---
 src/bin/pg_rewind/filemap.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_rewind/filemap.c b/src/bin/pg_rewind/filemap.c
index bd5c598e200..a076bb33996 100644
--- a/src/bin/pg_rewind/filemap.c
+++ b/src/bin/pg_rewind/filemap.c
@@ -221,11 +221,11 @@ process_source_file(const char *path, file_type_t type, size_t size,
 	file_entry_t *entry;
 
 	/*
-	 * Pretend that pg_wal is a directory, even if it's really a symlink. We
+	 * Pretend that pg_wal/log is a directory, even if it's really a symlink. We
 	 * don't want to mess with the symlink itself, nor complain if it's a
 	 * symlink in source but not in target or vice versa.
 	 */
-	if (strcmp(path, "pg_wal") == 0 && type == FILE_TYPE_SYMLINK)
+	if (((strcmp(path, "pg_wal") == 0 || strcmp(path, "log") == 0)) && type == FILE_TYPE_SYMLINK)
 		type = FILE_TYPE_DIRECTORY;
 
 	/*
@@ -263,9 +263,9 @@ process_target_file(const char *path, file_type_t type, size_t size,
 	 */
 
 	/*
-	 * Like in process_source_file, pretend that pg_wal is always a directory.
+	 * Like in process_source_file, pretend that pg_wal/log is always a directory.
 	 */
-	if (strcmp(path, "pg_wal") == 0 && type == FILE_TYPE_SYMLINK)
+	if (((strcmp(path, "pg_wal") == 0 || strcmp(path, "log") == 0)) && type == FILE_TYPE_SYMLINK)
 		type = FILE_TYPE_DIRECTORY;
 
 	/* Remember this target file */
-- 
2.34.1

diff --git a/src/bin/pg_rewind/t/004_pg_xlog_symlink.pl b/src/bin/pg_rewind/t/004_pg_xlog_symlink.pl
index 5fb7fa9077c..f95ba1a1486 100644
--- a/src/bin/pg_rewind/t/004_pg_xlog_symlink.pl
+++ b/src/bin/pg_rewind/t/004_pg_xlog_symlink.pl
@@ -2,7 +2,7 @@
 # Copyright (c) 2021-2023, PostgreSQL Global Development Group
 
 #
-# Test pg_rewind when the target's pg_wal directory is a symlink.
+# Test pg_rewind when the target's log directory is a symlink.
 #
 use strict;
 use warnings;
@@ -27,11 +27,12 @@ sub run_test
 	RewindTest::setup_cluster($test_mode);
 
 	my $test_primary_datadir = $node_primary->data_dir;
+	mkdir("$test_primary_datadir/log") or die;
 
 	# turn pg_wal into a symlink
-	print("moving $test_primary_datadir/pg_wal to $primary_xlogdir\n");
-	

Re: [PATCH] Add CANONICAL option to xmlserialize

2023-03-05 Thread Thomas Munro
On Mon, Mar 6, 2023 at 11:20 AM Jim Jones  wrote:
> On 05.03.23 22:00, Thomas Munro wrote:
> > could be something to do with
> > our environment, since .cirrus.yml sets LANG=C in the 32 bit test run
> > -- maybe try that locally?

> Also using LANGUAGE=C the result is the same for me - all tests pass
> just fine.

I couldn't reproduce that locally either, but I just tested on CI with
your patch applied  saw the failure, and then removed
"PYTHONCOERCECLOCALE=0 LANG=C" and it's all green:

https://github.com/macdice/postgres/commit/91999f5d13ac2df6f7237a301ed6cf73f2bb5b6d

Without looking too closely, my first guess would have been that this
just isn't going to work without UTF-8 database encoding, so you might
need to skip the test (see for example
src/test/regress/expected/unicode_1.out).  It's annoying that "xml"
already has 3 expected variants... hmm.  BTW shouldn't it be failing
in a more explicit way somewhere sooner if the database encoding is
not UTF-8, rather than getting confused?




Re: Refactor to introduce pg_strcoll().

2023-03-05 Thread Thomas Munro
+/* Win32 does not have UTF-8, so we need to map to UTF-16 */

I wonder if this is still true.  I think in Windows 10+ you can enable
UTF-8 support.  Then could you use strcoll_l() directly?  I struggled
to understand that, but I am a simple Unix hobbit from the shire so I
dunno.  (Perhaps the *whole OS* has to be in that mode, so you might
have to do a runtime test?  This was discussed in another thread that
mostly left me confused[1].).

And that leads to another thought.  We have an old comment
"Unfortunately, there is no strncoll(), so ...".  Curiously, Windows
does actually have strncoll_l() (as do some other libcs out there).
So after skipping the expansion to wchar_t, one might think you could
avoid the extra copy required to nul-terminate the string (and hope
that it doesn't make an extra copy internally, far from given).
Unfortunately it seems to be defined in a strange way that doesn't
look like your pg_strncoll_XXX() convention: it has just one length
parameter, not one for each string.  That is, it's designed for
comparing prefixes of strings, not for working with
non-null-terminated strings.  I'm not entirely sure if the interface
makes sense at all!  Is it measuring in 'chars' or 'encoded
characters'?  I would guess the former, like strncpy() et al, but then
what does it mean if it chops a UTF-8 sequence in half?  And at a
higher level, if you wanted to use it for our purpose, you'd
presumably need Min(s1_len, s2_len), but I wonder if there are string
pairs that would sort in a different order if the collation algorithm
could see more characters after that?  For example, in Dutch "ij" is
sometimes treated like a letter that sorts differently than "i" + "j"
normally would, so if you arbitrarily chop that "j" off while
comparing common-length prefix you might get into trouble; likewise
for "aa" in Danish.  Perhaps these sorts of problems explain why it's
not in the standard (though I see it was at some point in some kind of
draft; I don't grok the C standards process enough to track down what
happened but WG20/WG14 draft N1027[2] clearly contains strncoll_l()
alongside the stuff that we know and use today).  Or maybe I'm
underthinking it.

[1] 
https://www.postgresql.org/message-id/flat/CA%2BhUKGJ%3DXThErgAQRoqfCy1bKPxXVuF0%3D2zDbB%2BSxDs59pv7Fw%40mail.gmail.com
[2] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1027.pdf




Re: [PATCH] Add CANONICAL option to xmlserialize

2023-03-05 Thread Jim Jones

On 05.03.23 22:00, Thomas Munro wrote:

The CI run for that failed in an interesting way, only on Debian +
Meson, 32 bit.  The diffs appear to show that psql has a different
opinion of the column width, while building its header (the "--"
you get at the top of psql's output), even though the actual column
contents was the same.  regression.diff[2] shows that there is a "£1"
in the output, which is how UTF-8 "£1" looks if you view it with
Latin1 glasses on.  Clearly this patch involves transcoding, Latin1
and UTF-8
One of the use cases of this patch is exactly the transcoding of a non 
utf-8 document to utf-8 - as described in the XML canonical spec.

and I haven't studied it, but it's pretty weird for the 32
bit build to give a different result...
Yeah, it's pretty weird indeed. I'll try to reproduce this environment 
in a container to see if I get the same diff. Although I'm not sure that 
by "fixing" the result set for this environment it won't break all the 
others.

could be something to do with
our environment, since .cirrus.yml sets LANG=C in the 32 bit test run
-- maybe try that locally?
Also using LANGUAGE=C the result is the same for me - all tests pass 
just fine.

That run also generated a core dump, but I think that's just our open
SIGQUIT problem[3] and not relevant here.

[1] https://cirrus-ci.com/build/6319462375227392
[2] 
https://api.cirrus-ci.com/v1/artifact/task/5800598633709568/testrun/build-32/testrun/regress/regress/regression.diffs
[3] 
https://www.postgresql.org/message-id/flat/20230214202927.xgb2w6b7gnhq6tvv%40awork3.anarazel.de


Thanks for the quick reply. Much appreciated!







Re: Date-Time dangling unit fix

2023-03-05 Thread Joseph Koshakow
Also I removed some dead code from the previous patch.

- Joe Koshakow
From 2ff08d729bca87992514d0651fdb62455e43cd8a Mon Sep 17 00:00:00 2001
From: Joseph Koshakow 
Date: Sat, 10 Dec 2022 18:59:26 -0500
Subject: [PATCH] Remove unknown ISO format, handle dandling units

This commit removes the date format of "y2001m02d04" and the time
format of "h04mm05s06". These were never documented and don't seem to
be valid ISO formats.

Additionally this commit handles repeated and dangling julian units
in DecodeDateTime.
---
 src/backend/utils/adt/datetime.c   | 219 ++---
 src/test/regress/expected/horology.out |  41 ++---
 src/test/regress/sql/horology.sql  |   4 +
 3 files changed, 36 insertions(+), 228 deletions(-)

diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index d166613895..bf7cb94b52 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -983,7 +983,7 @@ DecodeDateTime(char **field, int *ftype, int nf,
 	int			fmask = 0,
 tmask,
 type;
-	int			ptype = 0;		/* "prefix type" for ISO y2001m02d04 format */
+	int			ptype = 0;		/* "prefix type" for ISO and Julian formats */
 	int			i;
 	int			val;
 	int			dterr;
@@ -1174,10 +1174,6 @@ DecodeDateTime(char **field, int *ftype, int nf,
 
 			case DTK_NUMBER:
 
-/*
- * Was this an "ISO date" with embedded field labels? An
- * example is "y2001m02d04" - thomas 2001-02-04
- */
 if (ptype != 0)
 {
 	char	   *cp;
@@ -1188,84 +1184,11 @@ DecodeDateTime(char **field, int *ftype, int nf,
 	if (errno == ERANGE)
 		return DTERR_FIELD_OVERFLOW;
 
-	/*
-	 * only a few kinds are allowed to have an embedded
-	 * decimal
-	 */
-	if (*cp == '.')
-		switch (ptype)
-		{
-			case DTK_JULIAN:
-			case DTK_TIME:
-			case DTK_SECOND:
-break;
-			default:
-return DTERR_BAD_FORMAT;
-break;
-		}
-	else if (*cp != '\0')
+	if (*cp != '.' && *cp != '\0')
 		return DTERR_BAD_FORMAT;
 
 	switch (ptype)
 	{
-		case DTK_YEAR:
-			tm->tm_year = value;
-			tmask = DTK_M(YEAR);
-			break;
-
-		case DTK_MONTH:
-
-			/*
-			 * already have a month and hour? then assume
-			 * minutes
-			 */
-			if ((fmask & DTK_M(MONTH)) != 0 &&
-(fmask & DTK_M(HOUR)) != 0)
-			{
-tm->tm_min = value;
-tmask = DTK_M(MINUTE);
-			}
-			else
-			{
-tm->tm_mon = value;
-tmask = DTK_M(MONTH);
-			}
-			break;
-
-		case DTK_DAY:
-			tm->tm_mday = value;
-			tmask = DTK_M(DAY);
-			break;
-
-		case DTK_HOUR:
-			tm->tm_hour = value;
-			tmask = DTK_M(HOUR);
-			break;
-
-		case DTK_MINUTE:
-			tm->tm_min = value;
-			tmask = DTK_M(MINUTE);
-			break;
-
-		case DTK_SECOND:
-			tm->tm_sec = value;
-			tmask = DTK_M(SECOND);
-			if (*cp == '.')
-			{
-dterr = ParseFractionalSecond(cp, fsec);
-if (dterr)
-	return dterr;
-tmask = DTK_ALL_SECS_M;
-			}
-			break;
-
-		case DTK_TZ:
-			tmask = DTK_M(TZ);
-			dterr = DecodeTimezone(field[i], tzp);
-			if (dterr)
-return dterr;
-			break;
-
 		case DTK_JULIAN:
 			/* previous field was a label for "julian date" */
 			if (value < 0)
@@ -1510,6 +1433,9 @@ DecodeDateTime(char **field, int *ftype, int nf,
 
 	case UNITS:
 		tmask = 0;
+		/* prevent consecutive unhandled units */
+		if (ptype != 0)
+			return DTERR_BAD_FORMAT;
 		ptype = val;
 		break;
 
@@ -1536,7 +1462,6 @@ DecodeDateTime(char **field, int *ftype, int nf,
 			 ftype[i + 1] != DTK_TIME &&
 			 ftype[i + 1] != DTK_DATE))
 			return DTERR_BAD_FORMAT;
-
 		ptype = val;
 		break;
 
@@ -1567,6 +1492,10 @@ DecodeDateTime(char **field, int *ftype, int nf,
 		fmask |= tmask;
 	}			/* end loop over fields */
 
+	/* prefix type was dangling and never handled */
+	if (ptype != 0)
+		return DTERR_BAD_FORMAT;
+
 	/* do final checking/adjustment of Y/M/D fields */
 	dterr = ValidateDate(fmask, isjulian, is2digits, bc, tm);
 	if (dterr)
@@ -1933,7 +1862,7 @@ DecodeTimeOnly(char **field, int *ftype, int nf,
 	int			fmask = 0,
 tmask,
 type;
-	int			ptype = 0;		/* "prefix type" for ISO h04mm05s06 format */
+	int			ptype = 0;		/* "prefix type" for ISO format */
 	int			i;
 	int			val;
 	int			dterr;
@@ -2060,133 +1989,12 @@ DecodeTimeOnly(char **field, int *ftype, int nf,
 			case DTK_NUMBER:
 
 /*
- * Was this an "ISO time" with embedded field labels? An
- * example is "h04mm05s06" - thomas 2001-02-04
+ * Was this an "ISO time" An example is "T040506.789"
  */
 if (ptype != 0)
 {
-	char	   *cp;
-	int			value;
-
-	/* Only accept a date under limited circumstances */
 	switch (ptype)
 	{
-		case DTK_JULIAN:
-		case DTK_YEAR:
-		case DTK_MONTH:
-			

Re: Date-Time dangling unit fix

2023-03-05 Thread Joseph Koshakow
On Sun, Mar 5, 2023 at 12:54 PM Tom Lane  wrote:
>
> We do accept this:
>
> => select '12:34'::time;
>time
> --
>  12:34:00
> (1 row)
>
> so that must be going through a different code path, which I didn't
> try to identify yet.

That query will contain a single field of "12:34" with ftype DTK_TIME.
That will call into DecodeTime(), which calls into DecodeTimeCommon(),
where we have:

*tmask = DTK_TIME_M;

- Joe Koshakow


Re: [PATCH] Add CANONICAL option to xmlserialize

2023-03-05 Thread Thomas Munro
On Mon, Mar 6, 2023 at 7:44 AM Jim Jones  wrote:
> The attached version includes documentation and tests to the patch.

The CI run for that failed in an interesting way, only on Debian +
Meson, 32 bit.  The diffs appear to show that psql has a different
opinion of the column width, while building its header (the "--"
you get at the top of psql's output), even though the actual column
contents was the same.  regression.diff[2] shows that there is a "£1"
in the output, which is how UTF-8 "£1" looks if you view it with
Latin1 glasses on.  Clearly this patch involves transcoding, Latin1
and UTF-8 and I haven't studied it, but it's pretty weird for the 32
bit build to give a different result...  could be something to do with
our environment, since .cirrus.yml sets LANG=C in the 32 bit test run
-- maybe try that locally?

That run also generated a core dump, but I think that's just our open
SIGQUIT problem[3] and not relevant here.

[1] https://cirrus-ci.com/build/6319462375227392
[2] 
https://api.cirrus-ci.com/v1/artifact/task/5800598633709568/testrun/build-32/testrun/regress/regress/regression.diffs
[3] 
https://www.postgresql.org/message-id/flat/20230214202927.xgb2w6b7gnhq6tvv%40awork3.anarazel.de




Re: Should vacuum process config file reload more often

2023-03-05 Thread Melanie Plageman
On Thu, Mar 2, 2023 at 6:37 PM Melanie Plageman
 wrote:
>
> On Thu, Mar 2, 2023 at 2:36 AM Masahiko Sawada  wrote:
> >
> > On Thu, Mar 2, 2023 at 10:41 AM Melanie Plageman
> >  wrote:
> > > On another topic, I've just realized that when autovacuuming we only
> > > update tab->at_vacuum_cost_delay/limit from
> > > autovacuum_vacuum_cost_delay/limit for each table (in
> > > table_recheck_autovac()) and then use that to update
> > > MyWorkerInfo->wi_cost_delay/limit. MyWorkerInfo->wi_cost_delay/limit is
> > > what is used to update VacuumCostDelay/Limit in AutoVacuumUpdateDelay().
> > > So, even if we reload the config file in vacuum_delay_point(), if we
> > > don't use the new value of autovacuum_vacuum_cost_delay/limit it will
> > > have no effect for autovacuum.
> >
> > Right, but IIUC wi_cost_limit (and VacuumCostDelayLimit) might be
> > updated. After the autovacuum launcher reloads the config file, it
> > calls autovac_balance_cost() that updates that value of active
> > workers. I'm not sure why we don't update workers' wi_cost_delay,
> > though.
>
> Ah yes, I didn't realize this. Thanks. I went back and did more code
> reading/analysis, and I see no reason why we shouldn't update
> worker->wi_cost_delay to the new value of autovacuum_vac_cost_delay in
> autovac_balance_cost(). Then, as you said, the autovac launcher will
> call autovac_balance_cost() when it reloads the configuration file.
> Then, the next time the autovac worker calls AutoVacuumUpdateDelay(), it
> will update VacuumCostDelay.
>
> > > I started writing a little helper that could be used to update these
> > > workerinfo->wi_cost_delay/limit in vacuum_delay_point(),
> >
> > Since we set vacuum delay parameters for autovacuum workers so that we
> > ration out I/O equally, I think we should keep the current mechanism
> > that the autovacuum launcher sets workers' delay parameters and they
> > update accordingly.
>
> Yes, agreed, it should go in the same place as where we update
> wi_cost_limit (autovac_balance_cost()). I think we should potentially
> rename autovac_balance_cost() because its name and all its comments
> point to its only purpose being to balance the total of the workers
> wi_cost_limits to no more than autovacuum_vacuum_cost_limit. And the
> autovacuum_vacuum_cost_delay doesn't need to be balanced in this way.
>
> Though, since this change on its own would make autovacuum pick up new
> values of autovacuum_vacuum_cost_limit (without having the worker reload
> the config file), I wonder if it makes sense to try and have
> vacuum_delay_point() only reload the config file if it is an explicit
> vacuum or an analyze not being run in an outer transaction (to avoid
> overhead of reloading config file)?
>
> The lifecycle of this different vacuum delay-related gucs and how it
> differs between autovacuum workers and explicit vacuum is quite tangled
> already, though.

So, I've attached a new version of the patch which is quite different
from the previous versions.

In this version I've removed wi_cost_delay from WorkerInfoData. There is
no synchronization of cost_delay amongst workers, so there is no reason
to keep it in shared memory.

One consequence of not updating VacuumCostDelay from wi_cost_delay is
that we have to have a way to keep track of whether or not autovacuum
table options are in use.

This patch does this in a cringeworthy way. I added two global
variables, one to track whether or not cost delay table options are in
use and the other to store the value of the table option cost delay. I
didn't want to use a single variable with a special value to indicate
that table option cost delay is in use because
autovacuum_vacuum_cost_delay already has special values that mean
certain things. My code needs a better solution.

It is worth mentioning that I think that in master,
AutoVacuumUpdateDelay() was incorrectly reading wi_cost_limit and
wi_cost_delay from shared memory without holding a lock.

I've added in a shared lock for reading from wi_cost_limit in this
patch. However, AutoVacuumUpdateLimit() is called unconditionally in
vacuum_delay_point(), which is called quite often (per block-ish), so I
was trying to think if there is a way we could avoid having to check
this shared memory variable on every call to vacuum_delay_point().
Rebalances shouldn't happen very often (done by the launcher when a new
worker is launched and by workers between vacuuming tables). Maybe we
can read from it less frequently?

Also not sure how the patch interacts with failsafe autovac and parallel
vacuum.

- Melanie
From 9b5cbbc0c8f892dde3e220f0945b2c1e0d175b84 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Sun, 5 Mar 2023 14:39:16 -0500
Subject: [PATCH v2] Reload config file more often while vacuuming

---
 src/backend/commands/vacuum.c   | 38 ---
 src/backend/postmaster/autovacuum.c | 97 ++---
 src/include/postmaster/autovacuum.h |  2 +
 3 files changed, 104 insertions(+), 33 deletions(-)

diff 

[PATCH] Add CANONICAL option to xmlserialize

2023-03-05 Thread Jim Jones

On 27.02.23 14:16, I wrote:

Hi,

In order to compare pairs of XML documents for equivalence it is 
necessary to convert them first to their canonical form, as described 
at W3C Canonical XML 1.1.[1] This spec basically defines a standard 
physical representation of xml documents that have more then one 
possible representation, so that it is possible to compare them, e.g. 
forcing UTF-8 encoding, entity reference replacement, attributes 
normalization, etc.


Although it is not part of the XML/SQL standard, it would be nice to 
have the option CANONICAL in xmlserialize. Additionally, we could also 
add the attribute WITH [NO] COMMENTS to keep or remove xml comments 
from the documents.


Something like this:

WITH t(col) AS (
 VALUES
  ('
  
  
  ]>

  
  http://postgresql.org;>

    
    
 http://postgresql.org; >

    
    

    
    

    
    
     321 
  
  '::xml)
)
SELECT xmlserialize(DOCUMENT col AS text CANONICAL) FROM t;
xmlserialize
 

 http://postgresql.org; ns:a="1" ns:b="2" 
ns:c="3">42©attr="default"> 321 

(1 row)

-- using WITH COMMENTS

WITH t(col) AS (
 VALUES
  (' http://postgresql.org;>
    
     321 
  '::xml)
)
SELECT xmlserialize(DOCUMENT col AS text CANONICAL WITH COMMENTS) FROM t;
xmlserialize
 

 http://postgresql.org; ns:a="1" ns:b="2" ns:c="3"> 321 

(1 row)


Another option would be to simply create a new function, e.g. 
xmlcanonical(doc xml, keep_comments boolean), but I'm not sure if this 
would be the right approach.


Attached a very short draft. What do you think?

Best, Jim

1- https://www.w3.org/TR/xml-c14n11/


The attached version includes documentation and tests to the patch.

I hope things are clearer now :)

Best, Jim
From 1a3b8bc66c451863ace0488d32f1c0a876ab8f04 Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Tue, 28 Feb 2023 23:06:30 +0100
Subject: [PATCH v1] Add CANONICAL format to xmlserialize

This patch introduces the CANONICAL option to xmlserialize, which
serializes xml documents in their canonical form - as described in
the W3C Canonical XML Version 1.1 specification. This option can
be used with the additional parameter WITH [NO] COMMENTS to keep
or remove xml comments from the canonical xml output. This feature
is based on the function xmlC14NDocDumpMemory from the C14N module
of libxml2.

This patch also includes regression tests and documentation.
---
 doc/src/sgml/datatype.sgml|  41 +-
 src/backend/executor/execExprInterp.c |  12 ++-
 src/backend/parser/gram.y |  14 +++-
 src/backend/parser/parse_expr.c   |   1 +
 src/backend/utils/adt/xml.c   |  60 ++
 src/include/nodes/parsenodes.h|   1 +
 src/include/nodes/primnodes.h |   9 +++
 src/include/parser/kwlist.h   |   1 +
 src/include/utils/xml.h   |   1 +
 src/test/regress/expected/xml.out | 108 ++
 src/test/regress/expected/xml_1.out   | 104 +
 src/test/regress/expected/xml_2.out   | 108 ++
 src/test/regress/sql/xml.sql  |  61 +++
 13 files changed, 516 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 467b49b199..46ec95dbb8 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4460,7 +4460,7 @@ xml 'bar'
 xml, uses the function
 xmlserialize:xmlserialize
 
-XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type )
+XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type [ CANONICAL [ WITH [NO] COMMENTS ] ])
 
 type can be
 character, character varying, or
@@ -4470,6 +4470,45 @@ XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS 
 
+   
+The option CANONICAL converts a given
+ XML document to its https://www.w3.org/TR/xml-c14n11/#Terminology;>canonical form
+ based on the https://www.w3.org/TR/xml-c14n11/;>W3C Canonical XML 1.1 Specification.
+ It is basically designed to provide applications the ability to compare xml documents or test if they
+ have been changed. The optional parameter WITH [NO] COMMENTS removes or keeps XML comments
+ from the given document.
+
+
+
+ Example:
+
+
+   

 When a character string value is cast to or from type
 xml without going through XMLPARSE or
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 19351fe34b..f8f10f0ed9 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -3829,6 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
 			{
 Datum	   *argvalue = op->d.xmlexpr.argvalue;
 bool	   *argnull = op->d.xmlexpr.argnull;
+XmlSerializeFormat	format = 

Re: [Question] Similar Cost but variable execution time in sort

2023-03-05 Thread Ankit Kumar Pandey




On 05/03/23 22:21, Tom Lane wrote:



Ankit Kumar Pandey  writes:
> From my observation, we only account for data in cost computation but 
> not number of columns sorted.

> Should we not account for number of columns in sort as well?

I'm not sure whether simply charging more for 2 sort columns than 1
would help much.  The traditional reasoning for not caring was that
data and I/O costs would swamp comparison costs anyway, but maybe with
ever-increasing memory sizes we're getting to the point where it is
worth refining the model for in-memory sorts.  But see the header
comment for cost_sort().

Also ... not too long ago we tried and failed to install more-complex
sort cost estimates for GROUP BY.  The commit log message for f4c7c410e
gives some of the reasons why that failed, but what it boils down to
is that useful estimates would require information we don't have, such
as a pretty concrete idea of the relative costs of different datatypes'
comparison functions.

In short, maybe there's something to be done here, but I'm afraid
there is a lot of infrastructure slogging needed first, if you want
estimates that are better than garbage-in-garbage-out.

regards, tom lane


Thanks, I can see the challenges in this.

Regards,
Ankit






Re: Date-Time dangling unit fix

2023-03-05 Thread Tom Lane
[ I removed Lockhart, because he's taken no part in Postgres work for
  more than twenty years; if that address even still works, you're
  just bugging him ]

Alexander Lakhin  writes:
> In fact,
> SELECT time 'h04mm05s06';
> doesn't work for many years, but
> SELECT time 'h04mm05s06.0';
> still does.

I traced that down to this in DecodeTimeOnly:

if ((fmask & DTK_TIME_M) != DTK_TIME_M)
return DTERR_BAD_FORMAT;

where we have

#define DTK_ALL_SECS_M  (DTK_M(SECOND) | DTK_M(MILLISECOND) | 
DTK_M(MICROSECOND))
#define DTK_TIME_M  (DTK_M(HOUR) | DTK_M(MINUTE) | DTK_ALL_SECS_M)

So in other words, this test insists on seeing hour, minute, second,
*and* fractional-second fields.  That seems obviously too picky.
It might not matter if we rip out this syntax, but I see other similar
tests so I suspect some of them will still be reachable.

Personally I'd say that hh:mm is a plenty complete enough time, and
whether you write seconds is optional, let alone fractional seconds.
We do accept this:

=> select '12:34'::time;
   time   
--
 12:34:00
(1 row)

so that must be going through a different code path, which I didn't
try to identify yet.

regards, tom lane




Re: zstd compression for pg_dump

2023-03-05 Thread Justin Pryzby
On Sat, Feb 25, 2023 at 07:22:27PM -0600, Justin Pryzby wrote:
> On Fri, Feb 24, 2023 at 01:18:40PM -0600, Justin Pryzby wrote:
> > This is a draft patch - review is welcome and would help to get this
> > ready to be considererd for v16, if desired.
> > 
> > I'm going to add this thread to the old CF entry.
> > https://commitfest.postgresql.org/31/2888/
> 
> This resolves cfbot warnings: windows and cppcheck.
> And refactors zstd routines.
> And updates docs.
> And includes some fixes for earlier patches that these patches conflicts
> with/depends on.

This rebases over the TAP and doc fixes to LZ4.
And adds necessary ENV to makefile and meson.
And adds an annoying boilerplate header.
And removes supports_compression(), which is what I think Tomas meant
when referring to "annoying unsupported cases".
And updates zstd.c: fix an off-by-one, allocate in init depending on
readF/writeF, do not reset the input buffer on each iteration, and show
parameter name in errors.

I'd appreciate help checking that this is doing the right things and
works correctly with zstd threaded workers.  The zstd API says: "use one
different context per thread for parallel execution" and "For parallel
execution, use one separate ZSTD_CStream per thread".
https://github.com/facebook/zstd/blob/dev/lib/zstd.h

I understand that to mean that, if pg_dump *itself* were using threads,
then each thread would need to call ZSTD_createCStream().  pg_dump isn't
threaded, so there's nothing special needed, right?

Except that, under windows, pg_dump -Fd -j actually uses threads instead
of forking.  I *think* that's still safe, since the pgdump threads are
created *before* calling zstd functions (see _PrintTocData and
_StartData of the custom and directory formats), so it happens naturally
that there's a separate zstd stream for each thread of pgdump.

-- 
Justin
>From 6a8b88a3dd37d24ebfdaa6a96505476b8a1efe92 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 7 Jan 2023 15:45:06 -0600
Subject: [PATCH 1/3] pg_dump: zstd compression

Previously proposed at: 20201221194924.gi30...@telsasoft.com
---
 doc/src/sgml/ref/pg_dump.sgml |   8 +-
 src/bin/pg_dump/Makefile  |   2 +
 src/bin/pg_dump/compress_io.c |  79 ++--
 src/bin/pg_dump/compress_zstd.c   | 515 ++
 src/bin/pg_dump/compress_zstd.h   |   9 +
 src/bin/pg_dump/meson.build   |   2 +
 src/bin/pg_dump/pg_backup_archiver.c  |  28 +-
 src/bin/pg_dump/pg_backup_directory.c |   2 +
 src/bin/pg_dump/pg_dump.c |  13 -
 src/bin/pg_dump/t/002_pg_dump.pl  |  79 +++-
 src/tools/pginclude/cpluspluscheck|   1 +
 11 files changed, 654 insertions(+), 84 deletions(-)
 create mode 100644 src/bin/pg_dump/compress_zstd.c
 create mode 100644 src/bin/pg_dump/compress_zstd.h

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 334e4b7fd14..1fb66be1818 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -330,8 +330,9 @@ PostgreSQL documentation
machine-readable format that pg_restore
can read. A directory format archive can be manipulated with
standard Unix tools; for example, files in an uncompressed archive
-   can be compressed with the gzip or
-   lz4 tools.
+   can be compressed with the gzip,
+   lz4, or
+   zstd tools.
This format is compressed by default using gzip
and also supports parallel dumps.
   
@@ -655,7 +656,8 @@ PostgreSQL documentation

 Specify the compression method and/or the compression level to use.
 The compression method can be set to gzip,
-lz4, or none for no compression.
+lz4, zstd,
+or none for no compression.
 A compression detail string can optionally be specified.  If the
 detail string is an integer, it specifies the compression level.
 Otherwise, it should be a comma-separated list of items, each of the
diff --git a/src/bin/pg_dump/Makefile b/src/bin/pg_dump/Makefile
index eb8f59459a1..24de7593a6a 100644
--- a/src/bin/pg_dump/Makefile
+++ b/src/bin/pg_dump/Makefile
@@ -18,6 +18,7 @@ include $(top_builddir)/src/Makefile.global
 
 export GZIP_PROGRAM=$(GZIP)
 export LZ4
+export ZSTD
 export with_icu
 
 override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
@@ -29,6 +30,7 @@ OBJS = \
 	compress_io.o \
 	compress_lz4.o \
 	compress_none.o \
+	compress_zstd.o \
 	dumputils.o \
 	parallel.o \
 	pg_backup_archiver.o \
diff --git a/src/bin/pg_dump/compress_io.c b/src/bin/pg_dump/compress_io.c
index ce06f1eac9c..a3c2f36bb67 100644
--- a/src/bin/pg_dump/compress_io.c
+++ b/src/bin/pg_dump/compress_io.c
@@ -52,8 +52,8 @@
  *
  *	InitDiscoverCompressFileHandle tries to infer the compression by the
  *	filename suffix. If the suffix is not yet known then it tries to simply
- *	open the file and if it fails, it tries to open the same file with the .gz
- *	suffix, and then 

Re: [Question] Similar Cost but variable execution time in sort

2023-03-05 Thread Tom Lane
Ankit Kumar Pandey  writes:
> From my observation, we only account for data in cost computation but 
> not number of columns sorted.
> Should we not account for number of columns in sort as well?

I'm not sure whether simply charging more for 2 sort columns than 1
would help much.  The traditional reasoning for not caring was that
data and I/O costs would swamp comparison costs anyway, but maybe with
ever-increasing memory sizes we're getting to the point where it is
worth refining the model for in-memory sorts.  But see the header
comment for cost_sort().

Also ... not too long ago we tried and failed to install more-complex
sort cost estimates for GROUP BY.  The commit log message for f4c7c410e
gives some of the reasons why that failed, but what it boils down to
is that useful estimates would require information we don't have, such
as a pretty concrete idea of the relative costs of different datatypes'
comparison functions.

In short, maybe there's something to be done here, but I'm afraid
there is a lot of infrastructure slogging needed first, if you want
estimates that are better than garbage-in-garbage-out.

regards, tom lane




Re: Date-Time dangling unit fix

2023-03-05 Thread Joseph Koshakow
Attached is a patch for removing the discussed format of date-times.
From f35284762c02ed466496e4e562b5f95a884b5ef1 Mon Sep 17 00:00:00 2001
From: Joseph Koshakow 
Date: Sat, 10 Dec 2022 18:59:26 -0500
Subject: [PATCH] Remove unknown ISO format, handle dandling units

This commit removes the date format of "y2001m02d04" and the time
format of "h04mm05s06". These were never documented and don't seem to
be valid ISO formats.

Additionally this commit handles repeated and dangling julian units
in DecodeDateTime.
---
 src/backend/utils/adt/datetime.c   | 210 ++---
 src/test/regress/expected/horology.out |  41 ++---
 src/test/regress/sql/horology.sql  |   4 +
 3 files changed, 37 insertions(+), 218 deletions(-)

diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index d166613895..51b72ad6c2 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -983,7 +983,7 @@ DecodeDateTime(char **field, int *ftype, int nf,
 	int			fmask = 0,
 tmask,
 type;
-	int			ptype = 0;		/* "prefix type" for ISO y2001m02d04 format */
+	int			ptype = 0;		/* "prefix type" for ISO and Julian formats */
 	int			i;
 	int			val;
 	int			dterr;
@@ -1174,10 +1174,6 @@ DecodeDateTime(char **field, int *ftype, int nf,
 
 			case DTK_NUMBER:
 
-/*
- * Was this an "ISO date" with embedded field labels? An
- * example is "y2001m02d04" - thomas 2001-02-04
- */
 if (ptype != 0)
 {
 	char	   *cp;
@@ -1188,84 +1184,11 @@ DecodeDateTime(char **field, int *ftype, int nf,
 	if (errno == ERANGE)
 		return DTERR_FIELD_OVERFLOW;
 
-	/*
-	 * only a few kinds are allowed to have an embedded
-	 * decimal
-	 */
-	if (*cp == '.')
-		switch (ptype)
-		{
-			case DTK_JULIAN:
-			case DTK_TIME:
-			case DTK_SECOND:
-break;
-			default:
-return DTERR_BAD_FORMAT;
-break;
-		}
-	else if (*cp != '\0')
+	if (*cp != '.' && *cp != '\0')
 		return DTERR_BAD_FORMAT;
 
 	switch (ptype)
 	{
-		case DTK_YEAR:
-			tm->tm_year = value;
-			tmask = DTK_M(YEAR);
-			break;
-
-		case DTK_MONTH:
-
-			/*
-			 * already have a month and hour? then assume
-			 * minutes
-			 */
-			if ((fmask & DTK_M(MONTH)) != 0 &&
-(fmask & DTK_M(HOUR)) != 0)
-			{
-tm->tm_min = value;
-tmask = DTK_M(MINUTE);
-			}
-			else
-			{
-tm->tm_mon = value;
-tmask = DTK_M(MONTH);
-			}
-			break;
-
-		case DTK_DAY:
-			tm->tm_mday = value;
-			tmask = DTK_M(DAY);
-			break;
-
-		case DTK_HOUR:
-			tm->tm_hour = value;
-			tmask = DTK_M(HOUR);
-			break;
-
-		case DTK_MINUTE:
-			tm->tm_min = value;
-			tmask = DTK_M(MINUTE);
-			break;
-
-		case DTK_SECOND:
-			tm->tm_sec = value;
-			tmask = DTK_M(SECOND);
-			if (*cp == '.')
-			{
-dterr = ParseFractionalSecond(cp, fsec);
-if (dterr)
-	return dterr;
-tmask = DTK_ALL_SECS_M;
-			}
-			break;
-
-		case DTK_TZ:
-			tmask = DTK_M(TZ);
-			dterr = DecodeTimezone(field[i], tzp);
-			if (dterr)
-return dterr;
-			break;
-
 		case DTK_JULIAN:
 			/* previous field was a label for "julian date" */
 			if (value < 0)
@@ -1510,6 +1433,9 @@ DecodeDateTime(char **field, int *ftype, int nf,
 
 	case UNITS:
 		tmask = 0;
+		/* prevent consecutive unhandled units */
+		if (ptype != 0)
+			return DTERR_BAD_FORMAT;
 		ptype = val;
 		break;
 
@@ -1536,7 +1462,6 @@ DecodeDateTime(char **field, int *ftype, int nf,
 			 ftype[i + 1] != DTK_TIME &&
 			 ftype[i + 1] != DTK_DATE))
 			return DTERR_BAD_FORMAT;
-
 		ptype = val;
 		break;
 
@@ -1567,6 +1492,10 @@ DecodeDateTime(char **field, int *ftype, int nf,
 		fmask |= tmask;
 	}			/* end loop over fields */
 
+	/* prefix type was dangling and never handled */
+	if (ptype != 0)
+		return DTERR_BAD_FORMAT;
+
 	/* do final checking/adjustment of Y/M/D fields */
 	dterr = ValidateDate(fmask, isjulian, is2digits, bc, tm);
 	if (dterr)
@@ -1933,7 +1862,7 @@ DecodeTimeOnly(char **field, int *ftype, int nf,
 	int			fmask = 0,
 tmask,
 type;
-	int			ptype = 0;		/* "prefix type" for ISO h04mm05s06 format */
+	int			ptype = 0;		/* "prefix type" for ISO format */
 	int			i;
 	int			val;
 	int			dterr;
@@ -2060,133 +1989,23 @@ DecodeTimeOnly(char **field, int *ftype, int nf,
 			case DTK_NUMBER:
 
 /*
- * Was this an "ISO time" with embedded field labels? An
- * example is "h04mm05s06" - thomas 2001-02-04
+ * Was this an "ISO time" An example is "T040506.789"
  */
 if (ptype != 0)
 {
 	char	   *cp;
 	int			value;
 
-	/* Only accept a date under limited circumstances */
-	switch (ptype)
-	{
-		case DTK_JULIAN:
-		case DTK_YEAR:
-		case DTK_MONTH:
-	

Re: SQL JSON path enhanced numeric literals

2023-03-05 Thread Peter Eisentraut

On 03.03.23 21:16, Dean Rasheed wrote:

I think this new feature ought to be mentioned in the docs somewhere.
Perhaps a sentence or two in the note below table 9.49 would suffice,
since it looks like that's where jsonpath numbers are mentioned for
the first time.


Done.  I actually put it into the data types chapter, where some other 
differences between SQL and SQL/JSON syntax were already discussed.



In jsonpath_scan.l, I think the hex/oct/bininteger cases could do with
a comment, such as

/* Non-decimal integers in ECMAScript; must not have underscore after radix */
hexinteger0[xX]{hexdigit}(_?{hexdigit})*
octinteger0[oO]{octdigit}(_?{octdigit})*
bininteger0[bB]{bindigit}(_?{bindigit})*

since that's different from the main lexer's syntax.


done


Perhaps it's worth mentioning that difference in the docs.


done


Otherwise, this looks good to me.


committed





Re: How does pg implement the visiblity of one tuple for specified transaction?

2023-03-05 Thread Ankit Kumar Pandey

Hi Jacktby,

Did you try looking at HeapTupleSatisfiesVisibility function (in 
src/backend/access/heap/heapam_visibility.c) ? I think it might give you 
some idea.


Thanks,

Ankit




Re: Date-Time dangling unit fix

2023-03-05 Thread Alexander Lakhin

Hello,

05.03.2023 02:31, Joseph Koshakow wrote:

I also don't have a copy of ISO 8601 and wasn't able to find anything
about this variant on Google. I did find this comment in datetime.c

/*
* Was this an "ISO date" with embedded field labels? An
* example is "y2001m02d04" - thomas 2001-02-04
*/

which comes from this commit [1], which was authored by Thomas Lockhart
(presumably the same thomas from the comment).


I've also seen another interesting comment in datetime.c:
    /*
 * Was this an "ISO time" with embedded field labels? An
 * example is "h04mm05s06" - thomas 2001-02-04
 */
In fact,
SELECT time 'h04mm05s06';
doesn't work for many years, but
SELECT time 'h04mm05s06.0';
still does.

I've just found that I mentioned it some time ago:
https://www.postgresql.org/message-id/dff75442-2468-f74f-568c-6006e141062f%40gmail.com

Best regards,
Alexander

How does pg implement the visiblity of one tuple for specified transaction?

2023-03-05 Thread jack...@gmail.com

Suppose there is a transaction running, how it knows the tuples that are
visible for it?


jack...@gmail.com


[Question] Similar Cost but variable execution time in sort

2023-03-05 Thread Ankit Kumar Pandey

Hi,

This was noticed in 
https://www.postgresql.org/message-id/caaphdvo2y9s2ao-bpyo7gmpyd0xe2lo-kflnqx80fcftqbc...@mail.gmail.com


I am bringing it up again.


Consider the following example:

Setup (tuple should be in memory to avoid overshadowing of disk I/O in 
the experimentation):


work_mem = 2048MB

create table abcd(a int, b int, c int, d int);
insert into abcd select x*random(), x*random(), x*random(), x*random() 
from generate_series(1, 10)x;


select pg_prewarm(abcd);


1. explain analyze select * from abcd order by a;

    QUERY PLAN
---
 Sort  (cost=9845.82..10095.82 rows=10 width=16) (actual 
time=134.113..155.990 rows=10 loops=1)

   Sort Key: a
   Sort Method: quicksort  Memory: 8541kB
   ->  Seq Scan on abcd  (cost=0.00..1541.00 rows=10 width=16) 
(actual time=0.013..28.418 rows=10 loops=1)

 Planning Time: 0.392 ms
 Execution Time: 173.702 ms
(6 rows)

2. explain analyze select * from abcde order by a,b;

explain analyze select * from abcd order by a,b;
    QUERY PLAN
---
 Sort  (cost=9845.82..10095.82 rows=10 width=16) (actual 
time=174.676..204.065 rows=10 loops=1)

   Sort Key: a, b
   Sort Method: quicksort  Memory: 8541kB
   ->  Seq Scan on abcd  (cost=0.00..1541.00 rows=10 width=16) 
(actual time=0.018..29.213 rows=10 loops=1)

 Planning Time: 0.055 ms
 Execution Time: 229.119 ms
(6 rows)


3. explain analyze select * from abcd order by a,b,c;

    QUERY PLAN
---
 Sort  (cost=9845.82..10095.82 rows=10 width=16) (actual 
time=159.829..179.675 rows=10 loops=1)

   Sort Key: a, b, c
   Sort Method: quicksort  Memory: 8541kB
   ->  Seq Scan on abcd  (cost=0.00..1541.00 rows=10 width=16) 
(actual time=0.018..31.207 rows=10 loops=1)

 Planning Time: 0.055 ms
 Execution Time: 195.393 ms
(6 rows)

In above queries, startup and total costs are same, yet execution time 
varies wildly.


Question: If cost is same for similar query, shouldn't execution time be 
similar as well?


From my observation, we only account for data in cost computation but 
not number of


columns sorted.

Should we not account for number of columns in sort as well?


Relevant discussion: 
https://www.postgresql.org/message-id/CAApHDvoc1m_vo1+XVpMUj+Mfy6rMiPQObM9Y-jZ=xrwc1gk...@mail.gmail.com



Regards,

Ankit






Re: Missing free_var() at end of accum_sum_final()?

2023-03-05 Thread Joel Jacobson
On Fri, Mar 3, 2023, at 16:11, Dean Rasheed wrote:
> Attachments:
> * make-result-using-vars-buf-v2.patch

One suggestion: maybe add a comment explaining why the allocated buffer
which size is based on strlen(cp) for the decimal digit values,
is guaranteed to be large enough also for the result's digit buffer?

I.e. some kind of proof why

   (NUMERIC_HDRSZ + strlen(cp) + DEC_DIGITS * 2) >= ((ndigits + 1) * 
sizeof(NumericDigit)))

holds true in general.

/Joel




Re: [Proposal] Allow pg_dump to include all child tables with the root table

2023-03-05 Thread Gilles Darold

Le 04/03/2023 à 19:18, Tom Lane a écrit :

Gilles Darold  writes:

But I disagree the use of --table-with-childs and
--exclude-table-with-childs because we already have the --table and
--exclude-table, and it will add lot of code where we just need a switch
to include children tables.

I quite dislike the idea of a separate --with-whatever switch, because
it will (presumably) apply to all of your --table and --exclude-table
switches, where you may need it to apply to just some of them.
Spelling considerations aside, attaching the property to the
individual switches seems far superior.  And I neither believe that
this would add a lot of code, nor accept that as an excuse even if
it's true.y..



Right, this is not a lot of code but just more code where I think we 
just need a switch. I much prefer that it applies to all --table / 
--exclude-table because this is generally the behavior we want for all 
root/parent tables. But I agree that in some cases users could want that 
this behavior applies to some selected tables only so the proposed new 
options can answer to this need. Even if generally in similar cases 
several pg_dump commands can be used. This is just my opinion, I will 
adapt the patch to use the proposed new options.



But, what do you think about having pg_dump default to dump children 
tables with --table / --exclude-table? I was very surprised that this 
was not the case the first time I see that. In this case we could add 
--[exclude-]table-no-child-tables. I think this form will be less used 
than the form where we need the child tables to be dump with the parent 
table, meaning that most of the time pg_dump commands using --table and 
--exclude-table will be kept untouched and those using more regexp to 
dump child tables could be simplified. I'm not sure that the backward 
compatibility is an argument here to not change the default behavior of 
pg_dump.


--

Gilles



--
Gilles Darold