EXEC_BACKEND vs bgworkers without BGWORKER_SHMEM_ACCESS

2021-08-01 Thread Andres Freund
Hi,

While looking at a patch I noticed that SubPostmasterMain() for bgworkers
unconditionally does

/* Need a PGPROC to run CreateSharedMemoryAndSemaphores */
InitProcess();

which presents a problem, because StartBackgroundWorker() then does

/*
 * If we're not supposed to have shared memory access, then detach from
 * shared memory.  If we didn't request shared memory access, the
 * postmaster won't force a cluster-wide restart if we exit 
unexpectedly,
 * so we'd better make sure that we don't mess anything up that would
 * require that sort of cleanup.
 */
if ((worker->bgw_flags & BGWORKER_SHMEM_ACCESS) == 0)
{
ShutdownLatchSupport();
dsm_detach_all();
PGSharedMemoryDetach();
}

which presents a problem: We've initialized all kind of references to shared
memory, own a PGPROC, but have detached from shared memory.

In practice this will lead pretty quickly to a segfault, because process exit
will run proc_exit callbacks, which in turn will try to do a ProcKill(). Or
logging dereferences MyProc, or ...

It seems the above code block would need to at least do shmem_exit() before
the PGSharedMemoryDetach()?

This code has been introduced in

commit 4d155d8b08fe08c1a1649fdbad61c6dcf4a8671f
Author: Robert Haas 
Date:   2014-05-07 14:54:43 -0400

Detach shared memory from bgworkers without shmem access.

Since the postmaster won't perform a crash-and-restart sequence
for background workers which don't request shared memory access,
we'd better make sure that they can't corrupt shared memory.

Patch by me, review by Tom Lane.

but before that things were just slightly differently broken...


I tested both the crash and that a shmem_exit() fixes it with an ugly hack in
regress.c. I don't really know how to write a good testcase for this, given
that the communication facilities of a unconnected bgworker are quite
limited... I guess the bgworker could create files or something :(.

Greetings,

Andres Freund




RE: Parallel Inserts (WAS: [bug?] Missed parallel safety checks..)

2021-08-01 Thread houzj.f...@fujitsu.com
On August 2, 2021 2:04 PM Greg Nancarrow  wrote:
> On Mon, Aug 2, 2021 at 2:52 PM Amit Kapila  wrote:
> >
> > On Fri, Jul 30, 2021 at 6:53 PM houzj.f...@fujitsu.com
> >  wrote:
> > >
> > > On Friday, July 30, 2021 2:52 PM Greg Nancarrow 
> wrote:
> > > > On Fri, Jul 30, 2021 at 4:02 PM Amit Kapila 
> wrote:
> > > > >
> > > > > > Besides, I think we need a new default value about parallel
> > > > > > dml safety. Maybe 'auto' or 'null'(different from
> > > > > > safe/restricted/unsafe). Because, user is likely to alter the
> > > > > > safety to the default value to get the automatic safety check,
> > > > > > a independent default value can make it more clear.
> > > > > >
> > > > >
> > > > > Hmm, but auto won't work for partitioned tables, right? If so,
> > > > > that might appear like an inconsistency to the user and we need
> > > > > to document the same. Let me summarize the discussion so far in
> > > > > this thread so that it is helpful to others.
> > > > >
> > > >
> > > > To avoid that inconsistency, UNSAFE could be the default for
> > > > partitioned tables (and we would disallow setting AUTO for these).
> > > > So then AUTO is the default for non-partitioned tables only.
> > >
> > > I think this approach is reasonable, +1.
> > >
> >
> > I see the need to change to default via Alter Table but I am not sure
> > if Auto is the most appropriate way to handle that. How about using
> > DEFAULT itself as we do in the case of REPLICA IDENTITY? So, if users
> > have to alter parallel safety value to default, they need to just say
> > Parallel DML DEFAULT. The default would mean automatic behavior for
> > non-partitioned relations and ignore parallelism for partitioned
> > tables.
> >
> 
> Hmm, I'm not so sure I'm sold on that.
> I personally think "DEFAULT" here is vague, and users then need to know what
> DEFAULT equates to, based on the type of table (partitioned or non-partitioned
> table).
> Also, then there's two ways to set the actual "default" DML parallel-safety 
> for
> partitioned tables: DEFAULT or UNSAFE.
> At least "AUTO" is a meaningful default option name for non-partitioned tables
> - "automatic" parallel-safety checking, and the fact that it isn't the 
> default (and
> can't be set) for partitioned tables highlights the difference in the way 
> being
> proposed to treat them (i.e. use automatic checking only for non-partitioned
> tables).
> I'd be interested to hear what others think.
> I think a viable alternative would be to record whether an explicit DML
> parallel-safety has been specified, and if not, apply default behavior (i.e. 
> by
> default use automatic checking for non-partitioned tables and treat 
> partitioned
> tables as UNSAFE). I'm just not sure whether this kind of distinction 
> (explicit vs
> implicit default) has been used before in Postgres options.

I think both approaches are fine, but using "DEFAULT" might has a disadvantage
that if we somehow support automatic safety check for partitioned table in the
future, then the meaning of "DEFAULT" for partitioned table will change from
UNSAFE to automatic check. It could also bring some burden on the user to
modify their sql script.

Best regards,
houzj


Re: Skip partition tuple routing with constant partition key

2021-08-01 Thread Amit Langote
I noticed that there is no CF entry for this, so created one in the next CF:

https://commitfest.postgresql.org/34/3270/

Rebased patch attached.


v8-0001-Teach-get_partition_for_tuple-to-cache-bound-offs.patch
Description: Binary data


Re: Parallel Inserts (WAS: [bug?] Missed parallel safety checks..)

2021-08-01 Thread Greg Nancarrow
On Mon, Aug 2, 2021 at 2:52 PM Amit Kapila  wrote:
>
> On Fri, Jul 30, 2021 at 6:53 PM houzj.f...@fujitsu.com
>  wrote:
> >
> > On Friday, July 30, 2021 2:52 PM Greg Nancarrow  wrote:
> > > On Fri, Jul 30, 2021 at 4:02 PM Amit Kapila  
> > > wrote:
> > > >
> > > > > Besides, I think we need a new default value about parallel dml
> > > > > safety. Maybe 'auto' or 'null'(different from
> > > > > safe/restricted/unsafe). Because, user is likely to alter the safety
> > > > > to the default value to get the automatic safety check, a independent 
> > > > > default
> > > > > value can make it more clear.
> > > > >
> > > >
> > > > Hmm, but auto won't work for partitioned tables, right? If so, that
> > > > might appear like an inconsistency to the user and we need to document
> > > > the same. Let me summarize the discussion so far in this thread so
> > > > that it is helpful to others.
> > > >
> > >
> > > To avoid that inconsistency, UNSAFE could be the default for partitioned 
> > > tables
> > > (and we would disallow setting AUTO for these).
> > > So then AUTO is the default for non-partitioned tables only.
> >
> > I think this approach is reasonable, +1.
> >
>
> I see the need to change to default via Alter Table but I am not sure
> if Auto is the most appropriate way to handle that. How about using
> DEFAULT itself as we do in the case of REPLICA IDENTITY? So, if users
> have to alter parallel safety value to default, they need to just say
> Parallel DML DEFAULT. The default would mean automatic behavior for
> non-partitioned relations and ignore parallelism for partitioned
> tables.
>

Hmm, I'm not so sure I'm sold on that.
I personally think "DEFAULT" here is vague, and users then need to
know what DEFAULT equates to, based on the type of table (partitioned
or non-partitioned table).
Also, then there's two ways to set the actual "default" DML
parallel-safety for partitioned tables: DEFAULT or UNSAFE.
At least "AUTO" is a meaningful default option name for
non-partitioned tables - "automatic" parallel-safety checking, and the
fact that it isn't the default (and can't be set) for partitioned
tables highlights the difference in the way being proposed to treat
them (i.e. use automatic checking only for non-partitioned tables).
I'd be interested to hear what others think.
I think a viable alternative would be to record whether an explicit
DML parallel-safety has been specified, and if not, apply default
behavior (i.e. by default use automatic checking for non-partitioned
tables and treat partitioned tables as UNSAFE). I'm just not sure
whether this kind of distinction (explicit vs implicit default) has
been used before in Postgres options.

Regards,
Greg Nancarrow
Fujitsu Australia




RE: Failed transaction statistics to measure the logical replication progress

2021-08-01 Thread osumi.takami...@fujitsu.com
On Thursday, July 29, 2021 10:50 AM Masahiko Sawada  
wrote:
> On Thu, Jul 8, 2021 at 3:55 PM osumi.takami...@fujitsu.com 
>  wrote:
> > When the current HEAD fails during logical decoding, the failure 
> > increments txns count in pg_stat_replication_slots - [1] and adds 
> > the transaction size to the sum of bytes in the same repeatedly on 
> > the publisher, until the problem is solved.
> > One of the good examples is duplication error on the subscriber side 
> > and this applies to both streaming and spill cases as well.
> >
> > This update prevents users from grasping the exact number and size 
> > of successful and unsuccessful transactions. Accordingly, we need to 
> > have new columns of failed transactions that will work to 
> > differentiate both of them for all types, which means spill, 
> > streaming and normal transactions. This will help users to measure 
> > the exact status of logical replication.
> 
> Could you please elaborate on use cases of the proposed statistics?
> For example, the current statistics on pg_replication_slots can be 
> used for tuning logical_decoding_work_mem as well as inferring the 
> total amount of bytes passed to the output plugin. How will the user use 
> those statistics?
> 
> Also, if we want the stats of successful transactions why don't we 
> show the stats of successful transactions in the view instead of ones 
> of failed transactions?
It works to show the ratio of successful and unsuccessful transactions,
which should be helpful in terms of administrator perspective.
FYI, the POC patch added the columns where I prefixed 'failed' to those names.
But, substantially, it meant the ratio when user compared normal columns and
newly introduced columns by this POC in the pg_stat_replication_slots.


> > Attached file is the POC patch for this.
> > Current design is to save failed stats data in the ReplicationSlot struct.
> > This is because after the error, I'm not able to access the 
> > ReorderBuffer
> object.
> > Thus, I chose the object where I can interact with at the
> ReplicationSlotRelease timing.
> 
> When discussing the pg_stat_replication_slots view, there was an idea 
> to store the slot statistics on ReplicationSlot struct. But the idea 
> was rejected mainly because the struct is on the shared buffer[1]. If 
> we store those counts on ReplicationSlot struct it increases the usage 
> of shared memory. And those statistics are used only by logical slots 
> and don’t necessarily need to be shared among the server processes.
Yes, I was aware of this.
I was not sure if this design will be expected or not for the enhancement,
I thought of changing the design accordingly once the idea gets accepted by the 
community.

> Moreover, if we want to add more statistics on the view in the future, 
> it further increases the usage of shared memory. If we want to track 
> the stats of successful transactions, I think it's easier to track 
> them on the subscriber side rather than the publisher side. We can 
> increase counters when applying [stream]commit/abort logical changes on the 
> subscriber.
It's true that to track the stats of successful and unsuccessful transactions 
on the *sub*
is easier than on the pub. After some survey, it turned out that I cannot 
distinguish
the protocol messages between the cases of any failure (e.g. duplication error 
on the sub)
from user intentional and successful operations(e.g. DROP SUBSCRIPTION and 
ALTER SUBSCRIPTION DISABLE) on the pub.

If we truly want to achieve this change on the publisher side,
protocol change requires in order to make above cases distinguishable,
now I feel that it is better to do this in the subscriber side. 

Accordingly, I'm thinking to have unsuccessful and successful stats on the sub 
side.
Sawada-san is now implementing a new view in [1].
Do you think that I should write a patch to introduce a new separate view
or write a patch to add more columns to the new view 
"pg_stat_subscription_errors" that is added at [1] ?


[1] - 
https://www.postgresql.org/message-id/CAD21AoDeScrsHhLyEPYqN3sydg6PxAPVBboK%3D30xJfUVihNZDA%40mail.gmail.com


Best Regards,
Takamichi Osumi



Re: Background writer and checkpointer in crash recovery

2021-08-01 Thread Thomas Munro
On Sat, Jul 31, 2021 at 2:16 AM Robert Haas  wrote:
> On Fri, Jul 30, 2021 at 4:42 AM Aleksander Alekseev
>  wrote:
> > v2-0001 and v2-0002 look fine, but I don't like much the idea of 
> > introducing a new GUC in v2-0003. It's for very specific needs, which most 
> > of the users, I believe, don't care about. I suggest dealing with v2-0001 
> > and v2-0002 first and then maybe submit and discuss v2-0003 as a separate 
> > CF entry.

Thanks.

> Thanks for bumping this thread; I had forgotten all about this effort,
> but having just spent a bunch of time struggling with the thicket of
> cases in StartupXLOG(), I'm now feeling highly motivated to make some
> more progress in simplifying things over there. I am still of the
> opinion that 0001 is a good idea, and I don't have any suggestions for
> how it could be improved,

That's good news, and thanks.  Yes, clearly there is much more that
can be simplified here.

> except perhaps that the call to
> PublishStartupProcessInformation() could maybe have a one-line
> comment.

Done.  BTW that is temporary, as I'm planning to remove that machinery soon[1].

> Thomas, are you planning to press forward with committing
> this soon? If not, do you mind if I do?

I pushed 0001.  Let me think about 0002, and flesh out 0003 a bit more.

[1] 
https://www.postgresql.org/message-id/flat/CA+hUKGLYRyDaneEwz5Uya_OgFLMx5BgJfkQSD=q9hmwsfrr...@mail.gmail.com




Re: Parallel Inserts (WAS: [bug?] Missed parallel safety checks..)

2021-08-01 Thread Amit Kapila
On Fri, Jul 30, 2021 at 6:53 PM houzj.f...@fujitsu.com
 wrote:
>
> On Friday, July 30, 2021 2:52 PM Greg Nancarrow  wrote:
> > On Fri, Jul 30, 2021 at 4:02 PM Amit Kapila  wrote:
> > >
> > > > Besides, I think we need a new default value about parallel dml
> > > > safety. Maybe 'auto' or 'null'(different from
> > > > safe/restricted/unsafe). Because, user is likely to alter the safety
> > > > to the default value to get the automatic safety check, a independent 
> > > > default
> > > > value can make it more clear.
> > > >
> > >
> > > Hmm, but auto won't work for partitioned tables, right? If so, that
> > > might appear like an inconsistency to the user and we need to document
> > > the same. Let me summarize the discussion so far in this thread so
> > > that it is helpful to others.
> > >
> >
> > To avoid that inconsistency, UNSAFE could be the default for partitioned 
> > tables
> > (and we would disallow setting AUTO for these).
> > So then AUTO is the default for non-partitioned tables only.
>
> I think this approach is reasonable, +1.
>

I see the need to change to default via Alter Table but I am not sure
if Auto is the most appropriate way to handle that. How about using
DEFAULT itself as we do in the case of REPLICA IDENTITY? So, if users
have to alter parallel safety value to default, they need to just say
Parallel DML DEFAULT. The default would mean automatic behavior for
non-partitioned relations and ignore parallelism for partitioned
tables.

-- 
With Regards,
Amit Kapila.




Re: Speeding up GIST index creation for tsvectors

2021-08-01 Thread Amit Khandekar
On Sat, 20 Mar 2021 at 02:19, John Naylor  wrote:
> On Fri, Mar 19, 2021 at 8:57 AM Amit Khandekar  wrote:
> > Regarding the alignment changes... I have removed the code that
> > handled the leading identically unaligned bytes, for lack of evidence
> > that percentage of such cases is significant. Like I noted earlier,
> > for the tsearch data I used, identically unaligned cases were only 6%.
> > If I find scenarios where these cases can be significant after all and
> > if we cannot do anything in the gist index code, then we might have to
> > bring back the unaligned byte handling. I didn't get a chance to dig
> > deeper into the gist index implementation to see why they are not
> > always 8-byte aligned.
>
> I find it stranger that something equivalent to char* is not randomly 
> misaligned, but rather only seems to land on 4-byte boundaries.
>
> [thinks] I'm guessing it's because of VARHDRSZ, but I'm not positive.
>
> FWIW, I anticipate some push back from the community because of the fact that 
> the optimization relies on statistical phenomena.

I dug into this issue for tsvector type. Found out that it's the way
in which the sign array elements are arranged that is causing the pointers to
be misaligned:

Datum
gtsvector_picksplit(PG_FUNCTION_ARGS)
{
..
cache = (CACHESIGN *) palloc(sizeof(CACHESIGN) * (maxoff + 2));
cache_sign = palloc(siglen * (maxoff + 2));

for (j = 0; j < maxoff + 2; j++)
cache[j].sign = &cache_sign[siglen * j];

}

If siglen is not a multiple of 8 (say 700), cache[j].sign will in some
cases point to non-8-byte-aligned addresses, as you can see in the
above code snippet.

Replacing siglen by MAXALIGN64(siglen) in the above snippet gets rid
of the misalignment. This change applied over the 0001-v3 patch gives
additional ~15% benefit. MAXALIGN64(siglen) will cause a bit more
space, but for not-so-small siglens, this looks worth doing. Haven't
yet checked into types other than tsvector.

Will get back with your other review comments. I thought, meanwhile, I
can post the above update first.




Re: Skipping logical replication transactions on subscriber side

2021-08-01 Thread Amit Kapila
On Mon, Aug 2, 2021 at 7:45 AM Masahiko Sawada  wrote:
>
> On Fri, Jul 30, 2021 at 12:52 PM Amit Kapila  wrote:
> >
> > On Thu, Jul 29, 2021 at 11:18 AM Masahiko Sawada  
> > wrote:
> >
> > Setting up logical rep error context in a generic function looks a bit
> > odd to me. Do we really need to set up error context here? I
> > understand we can't do this in caller but anyway I think we are not
> > sending this to logical replication view as well, so not sure we need
> > to do it here.
>
> Yeah, I'm not convinced of this part yet. I wanted to show relid also
> in truncate cases but I came up with only this idea.
>
> If an error happens during truncating the table (in
> ExecuteTruncateGuts()), relid set by
> set_logicalrep_error_context_rel() is actually sent to the view. If we
> don’t have it, the view always shows relid as NULL in truncate cases.
> On the other hand, it doesn’t cover all cases. For example, it doesn’t
> cover an error that the target table doesn’t exist on the subscriber,
> which happens when opening the target table. Anyway, in most cases,
> even if relid is NULL, the error message in the view helps users to
> know which relation the error happened on. What do you think?
>

Yeah, I also think at this stage error message is sufficient in such cases.

-- 
With Regards,
Amit Kapila.




Re: Record a Bitmapset of non-pruned partitions

2021-08-01 Thread Amit Langote
On Sun, Aug 1, 2021 at 9:31 PM David Rowley  wrote:
> On Fri, 30 Jul 2021 at 19:10, Amit Langote  wrote:
> > interleaved_parts idea looks clever.  I wonder if you decided that
> > it's maybe not worth setting that field in the joinrel's
> > PartitionBoundInfos?  For example, adding the code that you added in
> > create_list_bounds() also in merge_list_bounds().
>
> Currently generate_orderedappend_paths() only checks
> partitions_are_ordered() for base and other member rels, so setting
> the field for join rels would be a waste of effort given that it's not
> used for anything.
>
> I've not really looked into the possibility of enabling this
> optimization for partition-wise joined rels. I know that there's a bit
> more complexity now due to c8434d64c. I'm not really all that clear on
> which cases could be allowed here and which couldn't. It would require
> more analysis and I'd say that's a different patch.

Yeah, that makes sense.

> > ...  The definition of interleaved
> > + * is any partition that can contain multiple different values where 
> > exists at
> > + * least one other partition which could contain a value which is between 
> > the
> > + * multiple possible values in the other partition.
> >
> > The sentence sounds a bit off starting at "...where exists".  How about:
>
> I must have spent too long writing SQL queries.

Hah.

> I had another self review of these and I'm pretty happy with them. I'm
> quite glad to see the performance of querying a single partition of a
> table with large numbers of partitions no longer tails off as much as
> it used to.

Nice, glad to see the apply_scanjoin_target_to_paths() loop taken care of.

Thank you.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: Skipping logical replication transactions on subscriber side

2021-08-01 Thread Masahiko Sawada
On Fri, Jul 30, 2021 at 3:47 PM houzj.f...@fujitsu.com
 wrote:
>
> On July 29, 2021 1:48 PM Masahiko Sawada  wrote:
> >
> > Sorry I've attached wrong ones. Reattached the correct version patches.
>
> Hi,
>
> I had some comments on the new version patches.

Thank you for the comments!

>
> 1)
>
> -   relstate = (SubscriptionRelState *) 
> palloc(sizeof(SubscriptionRelState));
> -   relstate->relid = subrel->srrelid;
> +   relstate = (SubscriptionRelState *) hash_search(htab, (void *) 
> &subrel->srrelid,
> +   HASH_ENTER, NULL);
>
> I found the new version patch changes the List type 'relstate' to hash table 
> type
> 'relstate'. Will this bring significant performance improvements ?

For pgstat_vacuum_stat() purposes, I think it's better to use a hash
table to avoid O(N) lookup. But it might not be good to change the
type of the return value of GetSubscriptionNotReadyRelations() since
this returned value is used by other functions to iterate over
elements. The list iteration is faster than the hash table’s one. It
would be better to change it so that pgstat_vacuum_stat() constructs a
hash table for its own purpose.

>
> 2)
> + * PgStat_StatSubRelErrEntry represents a error happened during logical
>
> a error => an error

Will fix.

>
> 3)
> +CREATE VIEW pg_stat_subscription_errors AS
> +SELECT
> +   d.datname,
> +   sr.subid,
> +   s.subname,
>
> It seems the 'subid' column is not mentioned in the document of the
> pg_stat_subscription_errors view.

Will fix.

>
>
> 4)
> +
> +   if (fread(&nrels, 1, sizeof(long), fpin) != sizeof(long))
> +   {
> ...
> +   for (int i = 0; i < nrels; i++)
>
> the type of i(int) seems different of the type or 'nrels'(long), it might be
> better to use the same type.

Will fix.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Skipping logical replication transactions on subscriber side

2021-08-01 Thread Masahiko Sawada
On Fri, Jul 30, 2021 at 12:52 PM Amit Kapila  wrote:
>
> On Thu, Jul 29, 2021 at 11:18 AM Masahiko Sawada  
> wrote:
> >
> > On Thu, Jul 29, 2021 at 2:04 PM Masahiko Sawada  
> > wrote:
> > >
> > > > Yeah, it seems to be introduced by commit 0926e96c493. I've attached
> > > > the patch for that.
> > > >
> > > > Also, I've attached the updated version patches. This version patch
> > > > has pg_stat_reset_subscription_error() SQL function and sends a clear
> > > > message after skipping the transaction. 0004 patch includes the
> > > > skipping transaction feature and introducing RESET to ALTER
> > > > SUBSCRIPTION. It would be better to separate them.
> > > >
>
> +1, to separate out the reset part.

Okay, I'll do that.

>
> > >
> > > I've attached the new version patches that fix cfbot failure.
> >
> > Sorry I've attached wrong ones. Reattached the correct version patches.
> >
>
> Pushed the 0001* patch that removes the unused parameter.

Thanks!

>
> Few comments on v4-0001-Add-errcontext-to-errors-of-the-applying-logical-
> ===

Thank you for the comments!

> 1.
> --- a/src/backend/commands/tablecmds.c
> +++ b/src/backend/commands/tablecmds.c
> @@ -78,6 +78,7 @@
>  #include "partitioning/partbounds.h"
>  #include "partitioning/partdesc.h"
>  #include "pgstat.h"
> +#include "replication/logicalworker.h"
>  #include "rewrite/rewriteDefine.h"
>  #include "rewrite/rewriteHandler.h"
>  #include "rewrite/rewriteManip.h"
> @@ -1899,6 +1900,9 @@ ExecuteTruncateGuts(List *explicit_rels,
>   continue;
>   }
>
> + /* Set logical replication error callback info if necessary */
> + set_logicalrep_error_context_rel(rel);
> +
>   /*
>   * Build the lists of foreign tables belonging to each foreign server
>   * and pass each list to the foreign data wrapper's callback function,
> @@ -2006,6 +2010,9 @@ ExecuteTruncateGuts(List *explicit_rels,
>   pgstat_count_truncate(rel);
>   }
>
> + /* Reset logical replication error callback info */
> + reset_logicalrep_error_context_rel();
> +
>
> Setting up logical rep error context in a generic function looks a bit
> odd to me. Do we really need to set up error context here? I
> understand we can't do this in caller but anyway I think we are not
> sending this to logical replication view as well, so not sure we need
> to do it here.

Yeah, I'm not convinced of this part yet. I wanted to show relid also
in truncate cases but I came up with only this idea.

If an error happens during truncating the table (in
ExecuteTruncateGuts()), relid set by
set_logicalrep_error_context_rel() is actually sent to the view. If we
don’t have it, the view always shows relid as NULL in truncate cases.
On the other hand, it doesn’t cover all cases. For example, it doesn’t
cover an error that the target table doesn’t exist on the subscriber,
which happens when opening the target table. Anyway, in most cases,
even if relid is NULL, the error message in the view helps users to
know which relation the error happened on. What do you think?

>
> 2.
> +/* Struct for saving and restoring apply information */
> +typedef struct ApplyErrCallbackArg
> +{
> + LogicalRepMsgType command; /* 0 if invalid */
> +
> + /* Local relation information */
> + char*nspname; /* used for error context */
> + char*relname; /* used for error context */
> +
> + TransactionId remote_xid;
> + TimestampTz committs;
> +} ApplyErrCallbackArg;
> +static ApplyErrCallbackArg apply_error_callback_arg =
> +{
> + .command = 0,
> + .relname = NULL,
> + .nspname = NULL,
> + .remote_xid = InvalidTransactionId,
> + .committs = 0,
> +};
> +
>
> Better to have a space between the above two declarations.

Will fix.

>
> 3. commit message:
> This commit adds the error context to errors happening during applying
> logical replication changes, showing the command, the relation
> relation, transaction ID, and commit timestamp in the server log.
>
> 'relation' is mentioned twice.

Will fix.

>
> The patch is not getting applied probably due to yesterday's commit in
> this area.

Okay. I'll rebase the patches to the current HEAD.

I'm incorporating all comments from you and Houzj, and will submit the
new patch soon.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-08-01 Thread Tom Lane
I wrote:
> ... aside from the question of whether
> a too-large subexpression number should be an error or not.

Oh ... poking around some more, I noticed a very nearby precedent.
regexp_replace's replacement string can include \1 to \9 to insert
the substring matching the N'th parenthesized subexpression.  But
if there is no such subexpression, you don't get an error, just
an empty insertion.  So that seems like an argument for not
throwing an error for an out-of-range subexpr parameter.

regards, tom lane




Re: amcheck verification for GiST and GIN

2021-08-01 Thread Nikolay Samokhvalov
Thank you, v5 didn't find any issues at all. One thing: for my 29 indexes,
the tool generated output 3.5 GiB. I guess many INFO messages should be
downgraded to something like DEBUG1?

On Fri, Jul 30, 2021 at 2:35 AM Heikki Linnakangas  wrote:

> On 29/07/2021 21:34, Nikolay Samokhvalov wrote:
> > I was trying to check a bunch of GINs on some production after switching
> > from Ubuntu 16.04 to 18.04 and got many errors. So decided to check for
> > 16.04 first (that is still used on prod for that DB), without any
> > OS/glibc changes.
> >
> > On 16.04, I still saw errors and it was not really expected because this
> > should mean that production is corrupted too. So, REINDEX should fix it.
> > But it didn't -- see output below. I cannot give data and thinking how
> > to create a synthetic demo of this. Any suggestions?
> >
> > And is this a sign that the tool is wrong rather that we have a real
> > corruption cases? (I assume if we did, we would see no errors after
> > REINDEXing -- of course, if GIN itself doesn't have bugs).
> >
> > Env: Ubuntu 16.04 (so, glibc 2.27), Postgres 12.7, patch from Heikki
> > slightly adjusted to work with PG12 (
> > https://gitlab.com/postgres/postgres/-/merge_requests/5
> > ) snippet used
> > to run amcheck:
> > https://gitlab.com/-/snippets/2001962
> >  (see file #3)
>
> Almost certainly the tool is wrong. We went back and forth a few times
> with Pawel, fixing various bugs in the amcheck patch at this thread:
>
> https://www.postgresql.org/message-id/9fdbb584-1e10-6a55-ecc2-9ba8b5dca1cf%40iki.fi.
>
> Can you try again with the latest patch version from that thread,
> please? That's v5-0001-Amcheck-for-GIN-13stable.patch.
>
> - Heikki
>


Re: A qsort template

2021-08-01 Thread Thomas Munro
On Mon, Aug 2, 2021 at 12:40 PM Thomas Munro  wrote:
> Great!  I saw similar sorts of numbers.  It's really just a few
> crumbs, nothing compared to the gains David just found over in the
> thread "Use generation context to speed up tuplesorts", but on the
> bright side, these crumbs will be magnified by that work.

(Hmm, that also makes me wonder about using a smaller SortTuple when
possible...)




Re: A qsort template

2021-08-01 Thread Thomas Munro
On Fri, Jul 30, 2021 at 12:34 PM John Naylor
 wrote:
> I got around to getting a benchmark together to serve as a starting point. I 
> based it off something I got from the archives, but don't remember where (I 
> seem to remember Tomas Vondra wrote the original, but not sure). To start I 
> just used types that were there already -- int, text, numeric. The latter two 
> won't be helped by this patch, but I wanted to keep something like that so we 
> can see what kind of noise variation there is. I'll probably cut text out in 
> the future and just keep numeric for that purpose.

Thanks, that's very useful.

> I've attached both the script and a crude spreadsheet. I'll try to figure out 
> something nicer for future tests, and maybe some graphs. The "comparison" 
> sheet has the results side by side (min of five). There are 6 distributions 
> of values:
> - random
> - sorted
> - "almost sorted"
> - reversed
> - organ pipe (first half ascending, second half descending)
> - rotated (sorted but then put the smallest at the end)
> - random 0s/1s
>
> I included both "select a" and "select *" to make sure we have the recent 
> datum sort optimization represented. The results look pretty good for ints -- 
> about the same speed up master gets going from tuple sorts to datum sorts, 
> and those got faster in turn also.

Great!  I saw similar sorts of numbers.  It's really just a few
crumbs, nothing compared to the gains David just found over in the
thread "Use generation context to speed up tuplesorts", but on the
bright side, these crumbs will be magnified by that work.

> Next I think I'll run microbenchmarks on int64s with the test harness you 
> attached earlier, and experiment with the qsort parameters a bit.

Cool.  I haven't had much luck experimenting with that yet, though I
consider the promotion from magic numbers to names as an improvement
in any case.

> I'm also attaching your tuplesort patch so others can see what exactly I'm 
> comparing.

We've been bouncing around quite a few different ideas and patches in
this thread; soon I'll try to bring it back to one patch set with the
ideas that are looking good so far in a more tidied up form.  For the
tupesort.c part, I added some TODO notes in
v3-0001-WIP-Accelerate-tuple-sorting-for-common-types.patch's commit
message (see reply to Peter).




Re: A qsort template

2021-08-01 Thread Thomas Munro
On Fri, Jul 30, 2021 at 7:11 PM Peter Geoghegan  wrote:
> If you're going to specialize the sort routine for unsigned integer
> style abbreviated keys then you might as well cover all relevant
> opclasses/types. Almost all abbreviated key schemes produce
> conditioned datums that are designed to use simple 3-way unsigned int
> comparator. It's not just text. (Actually, the only abbreviated key
> scheme that doesn't do it that way is numeric.)

Right, that was the plan, but this was just experimenting with an
idea.  Looks like John's also seeing evidence that it may be worth
pursuing.

(Re numeric, I guess it must be possible to rearrange things so it can
use ssup_datum_signed_cmp; maybe something like NaN -> INT64_MAX, +inf
-> INT64_MAX - 1, -inf -> INT64_MIN, and then -1 - (whatever we're
doing now for normal values).)

> Offhand I know that UUID, macaddr, and inet all have abbreviated keys
> that can use your new ssup_datum_binary_cmp() comparator instead of
> their own duplicated comparator (which will make them use the
> corresponding specialized sort routine inside tuplesort.c).

Thanks, I've added these ones, and also gist_bbox_zorder_cmp_abbrev.

I also renamed that function to ssup_datum_unsigned_cmp(), because
"binary" was misleading.
From a1ca8e9ca989de5f61f1b8f067cb9785034ce9cc Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sat, 3 Jul 2021 19:02:10 +1200
Subject: [PATCH v3] WIP: Accelerate tuple sorting for common types.

Several data types have their own fast comparator functions that can be
replaced with common binary or signed comparator functions.  These
functions can then be recognized by tuplesort.c and used to dispatch to
corresponding specialized sort functions, to accelerate sorting.

XXX WIP, experiment grade only, many open questions... such as:

XXX Can we avoid repeating the null-handling logic?
XXX Is it worth specializing for reverse sort?
XXX Is it worth specializing for nulls first, nulls last, not null?
XXX Should we have separate cases for "result is authoritative", "need
XXX tiebreaker for atts 1..n (= abbrev case)", "need tie breaker for
XXX atts 2..n"?
XXX If we did all of the above, that'd give:
XXX  {nulls_first, nulls_last, not_null} x
XXX  {asc, desc} x
XXX  {tiebreak_none, tiebreak_first, tiebreak_rest} x
XXX  {unsigned, signed, int32}
XXX  = 54 specializations!  Seems a little over the top.
XXX You could use a smaller SortTuple for some cases.
XXX How should we handle numeric?
---
 src/backend/access/gist/gistproc.c |  18 +--
 src/backend/access/nbtree/nbtcompare.c |  22 ++--
 src/backend/utils/adt/date.c   |  15 +--
 src/backend/utils/adt/mac.c|  23 +---
 src/backend/utils/adt/network.c|  17 +--
 src/backend/utils/adt/timestamp.c  |  11 ++
 src/backend/utils/adt/uuid.c   |  25 +---
 src/backend/utils/adt/varlena.c|  34 +-
 src/backend/utils/sort/tuplesort.c | 160 -
 src/include/utils/sortsupport.h| 115 ++
 10 files changed, 308 insertions(+), 132 deletions(-)

diff --git a/src/backend/access/gist/gistproc.c b/src/backend/access/gist/gistproc.c
index d474612b77..20135ea0ec 100644
--- a/src/backend/access/gist/gistproc.c
+++ b/src/backend/access/gist/gistproc.c
@@ -37,7 +37,6 @@ static uint64 part_bits32_by2(uint32 x);
 static uint32 ieee_float32_to_uint32(float f);
 static int	gist_bbox_zorder_cmp(Datum a, Datum b, SortSupport ssup);
 static Datum gist_bbox_zorder_abbrev_convert(Datum original, SortSupport ssup);
-static int	gist_bbox_zorder_cmp_abbrev(Datum z1, Datum z2, SortSupport ssup);
 static bool gist_bbox_zorder_abbrev_abort(int memtupcount, SortSupport ssup);
 
 
@@ -1726,21 +1725,6 @@ gist_bbox_zorder_abbrev_convert(Datum original, SortSupport ssup)
 #endif
 }
 
-static int
-gist_bbox_zorder_cmp_abbrev(Datum z1, Datum z2, SortSupport ssup)
-{
-	/*
-	 * Compare the pre-computed Z-orders as unsigned integers. Datum is a
-	 * typedef for 'uintptr_t', so no casting is required.
-	 */
-	if (z1 > z2)
-		return 1;
-	else if (z1 < z2)
-		return -1;
-	else
-		return 0;
-}
-
 /*
  * We never consider aborting the abbreviation.
  *
@@ -1764,7 +1748,7 @@ gist_point_sortsupport(PG_FUNCTION_ARGS)
 
 	if (ssup->abbreviate)
 	{
-		ssup->comparator = gist_bbox_zorder_cmp_abbrev;
+		ssup->comparator = ssup_datum_unsigned_cmp;
 		ssup->abbrev_converter = gist_bbox_zorder_abbrev_convert;
 		ssup->abbrev_abort = gist_bbox_zorder_abbrev_abort;
 		ssup->abbrev_full_comparator = gist_bbox_zorder_cmp;
diff --git a/src/backend/access/nbtree/nbtcompare.c b/src/backend/access/nbtree/nbtcompare.c
index 7ac73cb8c2..204cf778fb 100644
--- a/src/backend/access/nbtree/nbtcompare.c
+++ b/src/backend/access/nbtree/nbtcompare.c
@@ -119,26 +119,12 @@ btint4cmp(PG_FUNCTION_ARGS)
 		PG_RETURN_INT32(A_LESS_THAN_B);
 }
 
-static int
-btint4fastcmp(Datum x, Datum y, SortSupport ssup)
-{
-	int32		a = DatumGetInt32(x);
-	int32		b = DatumGetInt32(y);
-
-	if (a > b)
-		return A_GREATER_THAN_B;

Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-08-01 Thread Tom Lane
Gilles Darold  writes:
> [ v5-0001-regexp-foo-functions.patch ]

I've gone through this whole patch now, and found quite a lot that I did
not like.  In no particular order:

* Wrapping parentheses around the user's regexp doesn't work.  It can
turn an invalid regexp into a valid one: for example 'a)(b' should draw
a syntax error.  With this patch, no error would be thrown, but the
"outer" parens wouldn't do what you expected.  Worse, it can turn a
valid regexp into an invalid one: the metasyntax options described in
9.7.3.4 only work at the start of the regexp.  So we have to handle
whole-regexp cases honestly rather than trying to turn them into an
instance of the parenthesized-subexpression case.

* You did a lot of things quite inefficiently, apparently to avoid
touching any existing code.  I think it's better to extend
setup_regexp_matches() and replace_text_regexp() a little bit so that
they can support the behaviors these new functions need.  In both of
them, it's absolutely trivial to allow a search start position to be
passed in; and it doesn't take much to teach replace_text_regexp()
to replace only the N'th match.

* Speaking of N'th, there is not much of anything that I like
about Oracle's terminology for the function arguments, and I don't
think we ought to adopt it.  If we're documenting the functions as
processing the "N'th match", it seems to me to be natural to call
the parameter "N" not "occurrence".  Speaking of the "occurrence'th
occurrence" is just silly, not to mention long and easy to misspell.
Likewise, "position" is a horribly vague term for the search start
position; it could be interpreted to mean several other things.
"start" seems much better.  "return_opt" is likewise awfully unclear.
I went with "endoption" below, though I could be talked into something
else.  The only one of Oracle's choices that I like is "subexpr" for
subexpression number ... but you went with DB2's rather vague "group"
instead.  I don't want to use their "capture group" terminology,
because that appears nowhere else in our documentation.  Our existing
terminology is "parenthesized subexpression", which seems fine to me
(and also agrees with Oracle's docs).

* I spent a lot of time on the docs too.  A lot of the syntax specs
were wrong (where you put the brackets matters), many of the examples
seemed confusingly overcomplicated, and the text explanations needed
copy-editing.

* Also, the regression tests seemed misguided.  This patch is not
responsible for testing the regexp engine as such; we have tests
elsewhere that do that.  So I don't think we need complex regexps
here.  We just need to verify that the parameters of these functions
act properly, and check their error cases.  That can be done much
more quickly and straightforwardly than what you had.


So here's a revised version that I like better.  I think this
is pretty nearly committable, aside from the question of whether
a too-large subexpression number should be an error or not.

regards, tom lane

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index a5b6adc4bb..80aac4965e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3108,6 +3108,78 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ regexp_count
+
+regexp_count ( string text, pattern text
+ [, start integer
+ [, flags text ] ] )
+integer
+   
+   
+Returns the number of times the POSIX regular
+expression pattern matches in
+the string; see
+.
+   
+   
+regexp_count('123456789012', '\d\d\d', 2)
+3
+   
+  
+
+  
+   
+
+ regexp_instr
+
+regexp_instr ( string text, pattern text
+ [, start integer
+ [, N integer
+ [, endoption integer
+ [, flags text
+ [, subexpr integer ] ] ] ] ] )
+integer
+   
+   
+Returns the position within string where
+the N'th match of the POSIX regular
+expression pattern occurs, or zero if there is
+no such match; see .
+   
+   
+regexp_instr('ABCDEF', 'c(.)(..)', 1, 1, 0, 'i')
+3
+   
+   
+regexp_instr('ABCDEF', 'c(.)(..)', 1, 1, 0, 'i', 2)
+5
+   
+  
+
+  
+   
+
+ regexp_like
+
+regexp_like ( string text, pattern text
+ [, flags text ] )
+boolean
+   
+   
+Checks whether a match of the POSIX regular
+expression pattern occurs
+within string; see
+.
+   
+   
+regexp_like('Hello World', 'world$', 'i')
+t
+   
+  
+
   

 
@@ -3117,8 +3189,9 @@ repeat('Pg', 4) PgPgPgPg
 text[]


-Returns captured substrings resulting from the first match of a POSIX
-regular expression to the string; see
+R

Re: Corrected documentation of data type for the logical replication message formats.

2021-08-01 Thread Peter Smith
On Mon, Aug 2, 2021 at 1:32 AM vignesh C  wrote:
>
> On Sun, Aug 1, 2021 at 4:11 PM Peter Smith  wrote:
> >
> > On Sat, Jul 31, 2021 at 7:00 AM Tom Lane  wrote:
> > >
> > > vignesh C  writes:
> > > [ v6-0001-Included-the-actual-datatype-used-in-logical-repl.patch ]
> > >
> > > I see what you want to do here, but the way you did it seems quite
> > > detrimental to the readability of the field descriptions.
> > > Parenthesized interjections should be used sparingly.
> > >
> > > I'm inclined to think that the equivalent data type is part of the
> > > field data type specification, and thus that we ought to put it in
> > > the data type part of each entry.  So we'd have something like
> > >
> > > 
> > > 
> > > Int64 (XLogRecPtr)
> > > 
> > > 
> > > 
> > > The final LSN of the transaction.
> > > 
> > > 
> > > 
> > >
> > > instead of what you did here.  Parentheses might not be the best
> > > punctuation to use, given the existing convention about parenthesized
> > > specific values, but we could probably settle on some other markup.
> > > Or just ignore the ambiguity.
> >
> > +1 to change it like suggested above.
> >
> > The specific value for the flags might then look like below, but that
> > does not look too bad to me.
> >
> > 
> > Int8 (uint8) (0)
> > 
>
> I felt we can change it like:
> 
> Int8(0) (uint8)
> 
>
> I felt the flag value can be kept first followed by the data type since it is 
> used similarly for the other message types like below:
> 
> Byte1('C')
> 
>
> I have made changes in similar lines and posted the patch at [1].
> Thoughts?

I agree. The specified value looks better when it comes first, as you did it.

~~

Option #1:
Int8(0) (uint8)
Int64 (XLogRecPtr)

Option #2:
Int8(0) [uint8]
Int64 [XLogRecPtr]

Option #3:
Int8(0) -- uint
Int64 -- XLogRecPtr

etc...

Probably my slight favourite is Option #2 above, but YMMV. Any format
you choose which is similar to those above is fine by me.

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Record a Bitmapset of non-pruned partitions

2021-08-01 Thread David Rowley
Thanks for having a look at this.

On Mon, 2 Aug 2021 at 02:33, Zhihong Yu  wrote:
> Here we look for partitions which
> +* might be interleaved with other partitions and set the
> +* interleaved_parts field with the partition indexes of any partitions
> +* which may be interleaved with another partition.
>
> The above seems a little bit repetitive. It can be shortened to remove 
> repetition.

I agree that the word "partition" is mentioned quite a few times. The
only one I can see that could be removed is the "partition indexes"
one. Likely the details about which bit we set can be left up to the
struct field comment in partbounds.h

I've adjusted this to become:

/*
* Calculate interleaved partitions.  Here we look for partitions which
* might be interleaved with other partitions and set a bit in
* interleaved_parts for any partitions which may be interleaved with
* another partition.
*/

David




Re: slab allocator performance issues

2021-08-01 Thread Tomas Vondra



On 8/1/21 11:07 PM, Andres Freund wrote:

Hi,

On 2021-08-01 19:59:18 +0200, Tomas Vondra wrote:

In the attached .ods file with results, the "comparison" sheets are the
interesting ones - the last couple columns compare the main metrics for the
two patches (labeled patch-1 and patch-2) to master.


I assume with patch-1/2 you mean the ones after the benchmark patch
itself?



Yes, those are the two WIP patches you shared on 19/7.




Overall, the results look quite good - patch-1 is mostly on par with master,
with maybe 5% variability in both directions. That's expected, considering
the patch does not aim to improve performance.


Not for slab anyway...



Maybe the hot/cold separation could have some effect, but probably not 
for the workloads I've tested.





The second patch brings some nice improvements - 30%-50% in most cases (for
both allocation and free) seems pretty nice. But for the "increase" FIFO
pattern (incrementally allocating/freeing more memory) there's a significant
regression - particularly for the allocation time. In some cases (larger
chunks, block size does not matter too much) it jumps from 25ms to almost
200ms.


I'm not surprised to see some memory usage increase some, but that
degree of time overhead does surprise me. ISTM there's something wrong.



Yeah, the higher amount of allocated memory is due to the couple fields 
added to the SlabBlock struct, but even that only affects a single case 
with 480B chunks and 1kB blocks. Seems fine to me, especially if we end 
up growing the slab blocks.


Not sure about the allocation time, though.


It'd probably worth benchmarking the different improvements inside the
WIP: slab performance. patch. There's some that I'd expect to be all
around improvements, whereas others likely aren't quite that clear
cut. I assume you'd prefer that I split the patch up?



Yeah, if you split that patch into sensible parts, I'll benchmark those. 
Also, we can add more interesting workloads if you have some ideas.





This seems unfortunate - the allocation pattern (FIFO, allocating more
memory over time) seems pretty common, and the slowdown is significant.


Did you analyze what causes the regressions?



No, not yet. I'll run the same set of benchmarks for the Generation, 
discussed in the other thread, and then I'll investigate this. But if 
you split the patch, that'd probably help.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Replace l337sp34k in comments.

2021-08-01 Thread Andrew Dunstan


On 8/1/21 5:10 PM, Andres Freund wrote:
> Hi,
>
> On 2021-07-31 12:15:34 +0300, Peter Geoghegan wrote:
>> On Sat, Jul 31, 2021 at 11:22 AM Andrey Borodin  wrote:
>>> FWIW, my 2 cents.
>>> I do not see much difference between up2date, up-to-date, up to date, 
>>> current, recent, actual, last, newest, correct, fresh etc.
>> +1.
>> To me it seems normal to debate wording/terminology with new code
>> comments, but that's about it. I find this zeal to change old code
>> comments misguided. It's okay if they're clearly wrong or have typos.
>> Anything else is just hypercorrection. And in any case there is a very
>> real chance of making the overall situation worse rather than better.
>> Probably in some subtle but important way.
> Same here. I find them quite distracting, even.
>
> It's one thing for such patches to target blindly obvious typos etc, but
> they often also end up including less clear cut changes, which cost a
> fair bit of time to review/judge.
>

I agree. Errors, ambiguities and typos should be fixed, but purely
stylistic changes should not be made. In any case, I don't think we need
to hold the code comments to the same standard as the docs. I think a
little more informality is acceptable in code comments.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Replace l337sp34k in comments.

2021-08-01 Thread Andres Freund
Hi,

On 2021-07-31 12:15:34 +0300, Peter Geoghegan wrote:
> On Sat, Jul 31, 2021 at 11:22 AM Andrey Borodin  wrote:
> > FWIW, my 2 cents.
> > I do not see much difference between up2date, up-to-date, up to date, 
> > current, recent, actual, last, newest, correct, fresh etc.
> 
> +1.

> To me it seems normal to debate wording/terminology with new code
> comments, but that's about it. I find this zeal to change old code
> comments misguided. It's okay if they're clearly wrong or have typos.
> Anything else is just hypercorrection. And in any case there is a very
> real chance of making the overall situation worse rather than better.
> Probably in some subtle but important way.

Same here. I find them quite distracting, even.

It's one thing for such patches to target blindly obvious typos etc, but
they often also end up including less clear cut changes, which cost a
fair bit of time to review/judge.

Greetings,

Andres Freund




Re: slab allocator performance issues

2021-08-01 Thread Andres Freund
Hi,

On 2021-08-01 19:59:18 +0200, Tomas Vondra wrote:
> In the attached .ods file with results, the "comparison" sheets are the
> interesting ones - the last couple columns compare the main metrics for the
> two patches (labeled patch-1 and patch-2) to master.

I assume with patch-1/2 you mean the ones after the benchmark patch
itself?


> Overall, the results look quite good - patch-1 is mostly on par with master,
> with maybe 5% variability in both directions. That's expected, considering
> the patch does not aim to improve performance.

Not for slab anyway...


> The second patch brings some nice improvements - 30%-50% in most cases (for
> both allocation and free) seems pretty nice. But for the "increase" FIFO
> pattern (incrementally allocating/freeing more memory) there's a significant
> regression - particularly for the allocation time. In some cases (larger
> chunks, block size does not matter too much) it jumps from 25ms to almost
> 200ms.

I'm not surprised to see some memory usage increase some, but that
degree of time overhead does surprise me. ISTM there's something wrong.

It'd probably worth benchmarking the different improvements inside the
WIP: slab performance. patch. There's some that I'd expect to be all
around improvements, whereas others likely aren't quite that clear
cut. I assume you'd prefer that I split the patch up?


> This seems unfortunate - the allocation pattern (FIFO, allocating more
> memory over time) seems pretty common, and the slowdown is significant.

Did you analyze what causes the regressions?

Greetings,

Andres Freund




pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead

2021-08-01 Thread Andres Freund
Hi,

Since

commit 960869da0803427d14335bba24393f414b476e2c
Author: Magnus Hagander 
Date:   2021-01-17 13:34:09 +0100

Add pg_stat_database counters for sessions and session time

pgstat_report_stat() does another timestamp computation via
pgstat_send_connstats(), despite typically having computed one just a few
lines before.

Given that timestamp computation isn't all that cheap, that's not great. Even
more, that additional timestamp computation makes things *less* accurate:

void
pgstat_report_stat(bool disconnect)
...
/*
 * Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL
 * msec since we last sent one, or the backend is about to exit.
 */
now = GetCurrentTransactionStopTimestamp();
if (!disconnect &&
!TimestampDifferenceExceeds(last_report, now, 
PGSTAT_STAT_INTERVAL))
return;

/* for backends, send connection statistics */
if (MyBackendType == B_BACKEND)
pgstat_send_connstats(disconnect, last_report);

last_report = now;

and then pgstat_send_connstats() does:
/* session time since the last report */
TimestampDifference(((last_report == 0) ? MyStartTimestamp : 
last_report),
GetCurrentTimestamp(),
&secs, &usecs);
msg.m_session_time = secs * 100 + usecs;

We maintain last_report as GetCurrentTransactionStopTimestamp(), but then use
a separate timestamp in pgstat_send_connstats() to compute the difference from
last_report, which is then set to GetCurrentTransactionStopTimestamp()'s
return value.


I'm also not all that happy with sending yet another UDP packet for this. This
doubles the UDP packets to the stats collector in the common cases (unless
more than TABSTAT_QUANTUM tables have stats to report, or shared tables have
been accessed). We already send plenty of "summary" information via
PgStat_MsgTabstat, one of which is sent unconditionally, why do we need a
separate message for connection stats?


Alternatively we could just not send an update to connection stats every 500ms
for every active connection, but only do so at connection end. The database
stats would only contain stats for disconnected sessions, while the stats for
"live" connections are maintained via backend_status.c.  That'd give us *more*
information for less costs, because we then could see idle/active times for
individual connections.

That'd likely be more change than what we would want to do at this point in
the release cycle though. But I think we ought to do something about the
increased overhead...

Greetings,

Andres Freund




Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-08-01 Thread Gilles Darold
Le 01/08/2021 à 19:23, Tom Lane a écrit :
> I've been working through this patch, and trying to verify
> compatibility against Oracle and DB2, and I see some points that need
> discussion or at least recording for the archives.
>
> * In Oracle, while the documentation for regexp_instr says that
> return_option should only be 0 or 1, experimentation with sqlfiddle
> shows that any nonzero value is silently treated as 1.  The patch
> raises an error for other values, which I think is a good idea.
> (IBM's docs say that DB2 raises an error too, though I can't test
> that.)  We don't need to be bug-compatible to that extent.
>
> * What should happen when the subexpression/capture group number of
> regexp_instr or regexp_substr exceeds the number of parenthesized
> subexpressions of the regexp?  Oracle silently returns a no-match
> result (0 or NULL), as does this patch.  However, IBM's docs say
> that DB2 raises an error.  I'm inclined to think that this is
> likewise taking bug-compatibility too far, and that we should
> raise an error like DB2.  There are clearly cases where throwing
> an error would help debug a faulty call, while I'm less clear on
> a use-case where not throwing an error would be useful.
>
> * IBM's docs say that both regexp_count and regexp_like have
> arguments "string, pattern [, start] [, flags]" --- that is,
> each of start and flags can be independently specified or omitted.
> The patch follows Oracle, which has no start option for 
> regexp_like, and where you can't write flags for regexp_count
> without writing start.  This is fine by me, because doing these
> like DB2 would introduce the same which-argument-is-this issues
> as we're being forced to cope with for regexp_replace.  I don't
> think we need to accept ambiguity in these cases too.  But it's
> worth memorializing this decision in the thread.
>
> * The patch has most of these functions silently ignoring the 'g'
> flag, but I think they should raise errors instead.  Oracle doesn't
> accept a 'g' flag for these, so why should we?  The only case where
> that logic doesn't hold is regexp_replace, because depending on which
> syntax you use the 'g' flag might or might not be meaningful.  So
> for regexp_replace, I'd vote for silently ignoring 'g' if the
> occurrence-number parameter is given, while honoring it if not.
>
> I've already made changes in my local copy per the last item,
> but I've not done anything about throwing errors for out-of-range
> subexpression numbers.  Anybody have an opinion about that one?


I thought about this while I was implementing the functions and chose to
not throw an error because of the Oracle behavior and also with others
regular expression implementation. For example in Perl there is no error:


$ perl -e '$str="hello world"; $str =~ s/(l)/$20/; print "$str\n";'
helo world


Usually a regular expression is always tested by its creator to be sure
that this the right one and that it does what is expected. But I agree
that it could help the writer to debug its RE.


Also if I recall well Oracle and DB2 limit the number of capture groups
back references from \1 to \9 for Oracle and \0 to \9 for DB2. I have
chosen to not apply this limit, I don't see the interest of such a
limitation.



-- 
Gilles Darold
http://www.darold.net/



Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-08-01 Thread Gilles Darold
Le 30/07/2021 à 23:38, Tom Lane a écrit :
> Gilles Darold  writes:
>> Le 26/07/2021 à 21:56, Tom Lane a écrit :
>>> I'm inclined to just drop the regexp_replace additions.  I don't think
>>> that the extra parameters Oracle provides here are especially useful.
>>> They're definitely not useful enough to justify creating compatibility
>>> hazards for.
>> I would not say that being able to replace the Nth occurrence of a
>> pattern matching is not useful but i agree that this is not a common
>> case with replacement. Both Oracle [1] and IBM DB2 [2] propose this form
>> and I have though that we can not have compatibility issues because of
>> the different data type at the 4th parameter.
> Well, here's an example of the potential issues:
>
> [...]


Thanks for pointing me this case, I did not think that the prepared
statement could lead to this confusion.


>> Anyway, maybe we can just
>> rename the function even if I would prefer that regexp_replace() be
>> extended. For example:
>>     regexp_replace(source, pattern, replacement [, flags ]);
>>     regexp_substitute(source, pattern, replacement [, position ] [,
>> occurrence ] [, flags ]);
> Hmm.  Of course the entire selling point of this patch seems to be
> bug-compatibility with Oracle, so using different names is largely
> defeating the point :-(
>
> Maybe we should just hold our noses and do it.  The point that
> you'd get a recognizable failure if the wrong function were chosen
> reassures me a little bit.  We've seen a lot of cases where this
> sort of ambiguity results in the system just silently doing something
> different from what you expected, and I was afraid that that could
> happen here.


I join a new version of the patch that include a check of the option
parameter in the basic form of regexp_replace() and return an error in
ambiguous cases.


PREPARE rr AS SELECT regexp_replace('healthy, wealthy, and
wise','(\w+)thy', '\1ish', $1);
EXECUTE rr(1);
ERROR:  ambiguous use of the option parameter in regex_replace(),
value: 1
HINT:  you might set the occurrence parameter to force the use of
the extended form of regex_replace()


This is done by checking if the option parameter value is an integer and
throw the error in this case. I don't think of anything better.


Best regards,

-- 
Gilles Darold

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index a5b6adc4bb..02d1f72e1e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3108,6 +3108,66 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ regexp_count
+
+regexp_count ( string text, pattern text [, position integer ] [, flags text ] )
+integer
+   
+   
+Returns the number of times a pattern occurs for a match of a POSIX
+regular expression to the string; see
+.
+   
+   
+regexp_count('123456789012', '\d{3}', 3)
+3
+   
+  
+
+  
+   
+
+ regexp_instr
+
+regexp_instr ( string text, pattern text [, position integer ] [, occurence integer ] [, returnopt integer ] [, flags text ] [, group integer ] )
+integer
+   
+   
+   Returns the position within string where the
+   match of a POSIX regular expression occurs. It returns an integer
+   indicating the beginning or ending position of the matched substring,
+   depending on the value of the returnopt argument
+   (default beginning). If no match is found the function returns 0;
+   see .
+   
+   
+regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4)
+7
+   
+  
+
+  
+   
+
+ regexp_like
+
+regexp_like ( string text, pattern text [, flags text ] )
+boolean
+   
+   
+Evaluates the existence of a match to a POSIX regular expression
+in string; see .
+   
+   
+regexp_like('Hello'||chr(10)||'world', '^world$', 'm')
+t
+   
+  
+
+
   

 
@@ -3156,7 +3216,7 @@ repeat('Pg', 4) PgPgPgPg
 
  regexp_replace
 
-regexp_replace ( string text, pattern text, replacement text [, flags text ] )
+regexp_replace ( string text, pattern text, replacement text [, position integer ] [, occurence integer ]  [, flags text ] )
 text


@@ -3171,6 +3231,24 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ regexp_substr
+
+regexp_substr ( string text, pattern text [, position integer ] [, occurence integer ] [, flags text ] [, group integer ] )
+text
+   
+   
+   Return the substring within string corresponding to the
+   match of a POSIX regular expression; see .
+   
+   
+regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3)
+55
+   
+  
+
   

   

Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-08-01 Thread Tom Lane
I've been working through this patch, and trying to verify
compatibility against Oracle and DB2, and I see some points that need
discussion or at least recording for the archives.

* In Oracle, while the documentation for regexp_instr says that
return_option should only be 0 or 1, experimentation with sqlfiddle
shows that any nonzero value is silently treated as 1.  The patch
raises an error for other values, which I think is a good idea.
(IBM's docs say that DB2 raises an error too, though I can't test
that.)  We don't need to be bug-compatible to that extent.

* What should happen when the subexpression/capture group number of
regexp_instr or regexp_substr exceeds the number of parenthesized
subexpressions of the regexp?  Oracle silently returns a no-match
result (0 or NULL), as does this patch.  However, IBM's docs say
that DB2 raises an error.  I'm inclined to think that this is
likewise taking bug-compatibility too far, and that we should
raise an error like DB2.  There are clearly cases where throwing
an error would help debug a faulty call, while I'm less clear on
a use-case where not throwing an error would be useful.

* IBM's docs say that both regexp_count and regexp_like have
arguments "string, pattern [, start] [, flags]" --- that is,
each of start and flags can be independently specified or omitted.
The patch follows Oracle, which has no start option for 
regexp_like, and where you can't write flags for regexp_count
without writing start.  This is fine by me, because doing these
like DB2 would introduce the same which-argument-is-this issues
as we're being forced to cope with for regexp_replace.  I don't
think we need to accept ambiguity in these cases too.  But it's
worth memorializing this decision in the thread.

* The patch has most of these functions silently ignoring the 'g'
flag, but I think they should raise errors instead.  Oracle doesn't
accept a 'g' flag for these, so why should we?  The only case where
that logic doesn't hold is regexp_replace, because depending on which
syntax you use the 'g' flag might or might not be meaningful.  So
for regexp_replace, I'd vote for silently ignoring 'g' if the
occurrence-number parameter is given, while honoring it if not.

I've already made changes in my local copy per the last item,
but I've not done anything about throwing errors for out-of-range
subexpression numbers.  Anybody have an opinion about that one?

regards, tom lane




Re: Corrected documentation of data type for the logical replication message formats.

2021-08-01 Thread vignesh C
On Sun, Aug 1, 2021 at 4:11 PM Peter Smith  wrote:
>
> On Sat, Jul 31, 2021 at 7:00 AM Tom Lane  wrote:
> >
> > vignesh C  writes:
> > [ v6-0001-Included-the-actual-datatype-used-in-logical-repl.patch ]
> >
> > I see what you want to do here, but the way you did it seems quite
> > detrimental to the readability of the field descriptions.
> > Parenthesized interjections should be used sparingly.
> >
> > I'm inclined to think that the equivalent data type is part of the
> > field data type specification, and thus that we ought to put it in
> > the data type part of each entry.  So we'd have something like
> >
> > 
> > 
> > Int64 (XLogRecPtr)
> > 
> > 
> > 
> > The final LSN of the transaction.
> > 
> > 
> > 
> >
> > instead of what you did here.  Parentheses might not be the best
> > punctuation to use, given the existing convention about parenthesized
> > specific values, but we could probably settle on some other markup.
> > Or just ignore the ambiguity.
>
> +1 to change it like suggested above.
>
> The specific value for the flags might then look like below, but that
> does not look too bad to me.
>
> 
> Int8 (uint8) (0)
> 

I felt we can change it like:

Int8(0) (uint8)


I felt the flag value can be kept first followed by the data type since it
is used similarly for the other message types like below:

Byte1('C')


I have made changes in similar lines and posted the patch at [1].
Thoughts?

[1] -
https://www.postgresql.org/message-id/CALDaNm3sK75Mo%2BVzLmNGe29gYtJoeKHshAK0GDiAzfAj6LQPdw%40mail.gmail.com

Regards,
Vignesh


Re: Corrected documentation of data type for the logical replication message formats.

2021-08-01 Thread vignesh C
On Sat, Jul 31, 2021 at 2:30 AM Tom Lane  wrote:
>
> vignesh C  writes:
> [ v6-0001-Included-the-actual-datatype-used-in-logical-repl.patch ]
>
> I see what you want to do here, but the way you did it seems quite
> detrimental to the readability of the field descriptions.
> Parenthesized interjections should be used sparingly.
>
> I'm inclined to think that the equivalent data type is part of the
> field data type specification, and thus that we ought to put it in
> the data type part of each entry.  So we'd have something like
>
> 
> 
> Int64 (XLogRecPtr)
> 
> 
> 
> The final LSN of the transaction.
> 
> 
> 
>

I made changes based on the feedback, since Peter also was in favour
of using this approach, I modified based on the first approach.
Attached v7 patch has the changes for the same.

Regards,
Vignesh
From 285f4fe337222690f27f164d7210afbda84c37d0 Mon Sep 17 00:00:00 2001
From: Vigneshwaran c 
Date: Sun, 1 Aug 2021 20:21:31 +0530
Subject: [PATCH v7] Included the actual datatype used in logical replication
 message format documentation.

Included the actual datatype used in logical replication message format
documentation.
---
 doc/src/sgml/protocol.sgml | 159 +++--
 1 file changed, 100 insertions(+), 59 deletions(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 9843953b05..ac53c82d19 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -6411,7 +6411,8 @@ This section describes the detailed format of each logical replication message.
 These messages are returned either by the replication slot SQL interface or are
 sent by a walsender. In case of a walsender they are encapsulated inside the replication
 protocol WAL messages as described in 
-and generally obey same message flow as physical replication.
+and generally obey same message flow as physical replication. The base data types
+used are described in .
 
 
 
@@ -6436,7 +6437,7 @@ Begin
 
 
 
-Int64
+Int64 (XLogRecPtr)
 
 
 
@@ -6446,7 +6447,7 @@ Begin
 
 
 
-Int64
+Int64 (TimestampTz)
 
 
 
@@ -6457,7 +6458,7 @@ Begin
 
 
 
-Int32
+Int32 (TransactionId)
 
 
 
@@ -6491,7 +6492,7 @@ Message
 
 
 
-Int32
+Int32 (TransactionId)
 
 
 
@@ -6502,7 +6503,7 @@ Message
 
 
 
-Int8
+Int8 (uint8)
 
 
 
@@ -6513,7 +6514,7 @@ Message
 
 
 
-Int64
+Int64 (XLogRecPtr)
 
 
 
@@ -6579,17 +6580,17 @@ Commit
 
 
 
-Int8
+Int8(0) (uint8)
 
 
 
-Flags; currently unused (must be 0).
+Flags; currently unused.
 
 
 
 
 
-Int64
+Int64 (XLogRecPtr)
 
 
 
@@ -6599,7 +6600,7 @@ Commit
 
 
 
-Int64
+Int64 (XLogRecPtr)
 
 
 
@@ -6609,7 +6610,7 @@ Commit
 
 
 
-Int64
+Int64 (TimestampTz)
 
 
 
@@ -6644,7 +6645,7 @@ Origin
 
 
 
-Int64
+Int64 (XLogRecPtr)
 
 
 
@@ -6693,7 +6694,7 @@ Relation
 
 
 
-Int32
+Int32 (TransactionId)
 
 
 
@@ -6704,7 +6705,7 @@ Relation
 
 
 
-Int32
+Int32 (Oid)
 
 
 
@@ -6760,7 +6761,7 @@ Relation
 
 
 
-Int8
+Int8 (uint8)
 
 
 
@@ -6781,7 +6782,7 @@ Relation
 
 
 
-Int32
+Int32 (Oid)
 
 
 
@@ -6825,7 +6826,7 @@ Type
 
 
 
-Int32
+Int32 (TransactionId)
 
 
 
@@ -6836,7 +6837,7 @@ Type
 
 
 
-Int32
+Int32 (Oid)
 
 
 
@@ -6890,7 +6891,7 @@ Insert
 
 
 
-Int32
+Int32 (TransactionId)
 
 
 
@@ -6901,7 +6902,7 @@ Insert
 
 
 
-Int32
+Int32 (Oid)
 
 
 
@@ -6957,7 +6958,7 @@ Update
 
 
 
-Int32
+Int32 (TransactionId)
 
 
 
@@ -6968,7 +6969,7 @@ Update
 
 
 
-Int32
+Int32 (Oid)
 
 
 
@@ -7071,7 +7072,7 @@ Delete
 
 
 
-Int32
+Int32 (TransactionId)
 
 
 
@@ -7082,7 +7083,7 @@ Delete
 
 
 
-Int32
+Int32 (Oid)
 
 
 
@@ -7160,7 +7161,7 @@ Truncate
 
 
 
-Int32
+Int32 (TransactionId)
 
 
 
@@ -7192,7 +7193,7 @@ Truncate
 
 
 
-Int32
+Int32 (Oid)
 
 
 
@@ -7238,7 +7239,7 @@ Stream Start
 
 
 
-Int32
+Int32 (TransactionId)
 
 
 
@@ -7307,7 +7308,7 @@ Stream Commit
 
 
 
-Int32
+Int32 (TransactionId)
 
 
 
@@ -7317,17 +7318,17 @@ Stream Commit
 
 
 
-Int8
+Int8(0) (uint8)
 
 
 
-Flags; currently unused (must be 0).
+Flags; currently unused.
 
 
 
 
 
-Int64
+Int64 (XLogRecPtr)
 
 
 
@@ -7337,7 +7338,7 @@ Stream Commit
 
 
 
-Int64
+Int64 (XLogRecPtr)
 
 
 
@@ -7347,7 +7348,7 @@ Stream Commit
 
 
 
-Int64
+Int64 (TimestampTz)
 
 
 
@@ -7382,7 +7383,7 @@ Stream Abort
 
 
 
-Int32
+Int32 (TransactionId)
 
 
 
@@ -7392,7 +7393,7 @@ Stream Abort
 
 
 
-Int32
+Int32 (TransactionId)
 
 
 
@@ -7432,21 +7433,27 @@ are available since protocol version 3.
 
 
 
-Int64
+
+Int64 (XLogR

Re: Record a Bitmapset of non-pruned partitions

2021-08-01 Thread Zhihong Yu
On Sun, Aug 1, 2021 at 5:31 AM David Rowley  wrote:

> On Fri, 30 Jul 2021 at 19:10, Amit Langote 
> wrote:
> >
> > 0001 looks mostly fine, except I thought the following could be worded
> > to say that the bitmap members are offsets into the part_rels array.
> > To avoid someone confusing them with RT indexes, for example.
> >
> > +   Bitmapset  *live_parts; /* Bitmap with members to indicate which
> > +* partitions survived partition
> pruning. */
>
> Yeah, agreed. I've adjusted that.
>
> > On 0002:
> >
> > interleaved_parts idea looks clever.  I wonder if you decided that
> > it's maybe not worth setting that field in the joinrel's
> > PartitionBoundInfos?  For example, adding the code that you added in
> > create_list_bounds() also in merge_list_bounds().
>
> Currently generate_orderedappend_paths() only checks
> partitions_are_ordered() for base and other member rels, so setting
> the field for join rels would be a waste of effort given that it's not
> used for anything.
>
> I've not really looked into the possibility of enabling this
> optimization for partition-wise joined rels. I know that there's a bit
> more complexity now due to c8434d64c. I'm not really all that clear on
> which cases could be allowed here and which couldn't. It would require
> more analysis and I'd say that's a different patch.
>
> > ...  The definition of interleaved
> > + * is any partition that can contain multiple different values where
> exists at
> > + * least one other partition which could contain a value which is
> between the
> > + * multiple possible values in the other partition.
> >
> > The sentence sounds a bit off starting at "...where exists".  How about:
>
> I must have spent too long writing SQL queries.
>
> > "A partition is considered interleaved if it contains multiple values
> > such that there exists at least one other partition containing a value
> > that lies between those values [ in terms of partitioning-defined
> > ordering ]."
>
> That looks better. I took that with some small adjustments.
>
> > Looks fine otherwise.
>
> Thanks for the review.
>
> I had another self review of these and I'm pretty happy with them. I'm
> quite glad to see the performance of querying a single partition of a
> table with large numbers of partitions no longer tails off as much as
> it used to.
>
> David
>
Hi,
Some minor comment.

bq. Here we pass which partitioned

 partitioned -> partitions

Here we look for partitions which
+* might be interleaved with other partitions and set the
+* interleaved_parts field with the partition indexes of any partitions
+* which may be interleaved with another partition.

The above seems a little bit repetitive. It can be shortened to remove
repetition.

Cheers


Re: Record a Bitmapset of non-pruned partitions

2021-08-01 Thread David Rowley
On Fri, 30 Jul 2021 at 19:10, Amit Langote  wrote:
>
> 0001 looks mostly fine, except I thought the following could be worded
> to say that the bitmap members are offsets into the part_rels array.
> To avoid someone confusing them with RT indexes, for example.
>
> +   Bitmapset  *live_parts; /* Bitmap with members to indicate which
> +* partitions survived partition pruning. */

Yeah, agreed. I've adjusted that.

> On 0002:
>
> interleaved_parts idea looks clever.  I wonder if you decided that
> it's maybe not worth setting that field in the joinrel's
> PartitionBoundInfos?  For example, adding the code that you added in
> create_list_bounds() also in merge_list_bounds().

Currently generate_orderedappend_paths() only checks
partitions_are_ordered() for base and other member rels, so setting
the field for join rels would be a waste of effort given that it's not
used for anything.

I've not really looked into the possibility of enabling this
optimization for partition-wise joined rels. I know that there's a bit
more complexity now due to c8434d64c. I'm not really all that clear on
which cases could be allowed here and which couldn't. It would require
more analysis and I'd say that's a different patch.

> ...  The definition of interleaved
> + * is any partition that can contain multiple different values where exists 
> at
> + * least one other partition which could contain a value which is between the
> + * multiple possible values in the other partition.
>
> The sentence sounds a bit off starting at "...where exists".  How about:

I must have spent too long writing SQL queries.

> "A partition is considered interleaved if it contains multiple values
> such that there exists at least one other partition containing a value
> that lies between those values [ in terms of partitioning-defined
> ordering ]."

That looks better. I took that with some small adjustments.

> Looks fine otherwise.

Thanks for the review.

I had another self review of these and I'm pretty happy with them. I'm
quite glad to see the performance of querying a single partition of a
table with large numbers of partitions no longer tails off as much as
it used to.

David


v4-0001-Track-non-pruned-partitions-in-RelOptInfo.patch
Description: Binary data


v4-0002-Allow-ordered-partition-scans-in-more-cases.patch
Description: Binary data


Re: Corrected documentation of data type for the logical replication message formats.

2021-08-01 Thread Peter Smith
On Sat, Jul 31, 2021 at 7:00 AM Tom Lane  wrote:
>
> vignesh C  writes:
> [ v6-0001-Included-the-actual-datatype-used-in-logical-repl.patch ]
>
> I see what you want to do here, but the way you did it seems quite
> detrimental to the readability of the field descriptions.
> Parenthesized interjections should be used sparingly.
>
> I'm inclined to think that the equivalent data type is part of the
> field data type specification, and thus that we ought to put it in
> the data type part of each entry.  So we'd have something like
>
> 
> 
> Int64 (XLogRecPtr)
> 
> 
> 
> The final LSN of the transaction.
> 
> 
> 
>
> instead of what you did here.  Parentheses might not be the best
> punctuation to use, given the existing convention about parenthesized
> specific values, but we could probably settle on some other markup.
> Or just ignore the ambiguity.

+1 to change it like suggested above.

The specific value for the flags might then look like below, but that
does not look too bad to me.


Int8 (uint8) (0)


--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: [HACKERS] logical decoding of two-phase transactions

2021-08-01 Thread Peter Smith
On Sun, Aug 1, 2021 at 3:05 AM vignesh C  wrote:
>
> On Sat, Jul 31, 2021 at 11:12 AM Ajin Cherian  wrote:
> >
> > On Sat, Jul 31, 2021 at 2:39 PM Amit Kapila  wrote:
> >
> > > Here, the test is expecting 2 prepared transactions corresponding to
> > > two subscriptions but it waits for just one subscription via
> > > appname_copy. It should wait for the second subscription using
> > > $appname as well.
> > >
> > > What do you think?
> >
> > I agree with this analysis. The test needs to wait for both
> > subscriptions to catch up.
> > Attached is a patch that addresses this issue.
>
> The changes look good to me.
>

The patch to the test code posted by Ajin LGTM also.

I applied the patch and re-ran the TAP subscription tests. All OK.

--
Kind Regards,
Peter Smith.
Fujitsu Australia