Re: Force streaming every change in logical decoding

2022-12-22 Thread Masahiko Sawada
On Thu, Dec 22, 2022 at 4:18 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Amit,
>
> Thank you for updating the patch. I have also checked the patch
> and basically it has worked well. Almost all things I found were modified
> by v4.
>
> One comment: while setting logical_decoding_mode to wrong value,
> I got unfriendly ERROR message.
>
> ```
> postgres=# SET logical_decoding_mode = 1;
> ERROR:  invalid value for parameter "logical_decoding_mode": "1"
> HINT:  Available values: , immediate
> ```
>
> Here all acceptable enum should be output as HINT, but we could not see the 
> empty string.
> Should we modify config_enum_get_options() for treating empty string, maybe
> like (empty)?

Good point. I think the hint message can say "The only allowed value
is \"immediate\" as recovery_target does. Or considering the name of
logical_decoding_mode, we might want to have a non-empty string, say
'normal' as Kuroda-san proposed, as the default value.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Avoid lost result of recursion (src/backend/optimizer/util/inherit.c)

2022-12-22 Thread Richard Guo
On Wed, Dec 21, 2022 at 9:45 AM David Rowley  wrote:

> Also, I think it might be better to take the opportunity to rewrite
> the function to not use recursion. I don't quite see the need for it
> here and it looks like that might have helped contribute to the
> reported issue.  Can't we just write this as a while loop instead of
> having the function call itself? It's not as if we need stack space
> for keeping track of multiple parents. A child relation can only have
> 1 parent. It seems to me that we can just walk there by looping.


My best guess is that this function is intended to share the same code
pattern as in adjust_appendrel_attrs_multilevel.  The recursion is
needed as 'rel' can be more than one inheritance level below the top
parent.  I think we can keep the recursion, as in other similar
functions, as long as we make it right, as in attached patch.

Thanks
Richard


v1-0001-Fix-translate_col_privs_multilevel.patch
Description: Binary data


Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

2022-12-22 Thread Kyotaro Horiguchi
At Thu, 22 Dec 2022 10:09:23 +0530, Bharath Rupireddy 
 wrote in 
> On Thu, Dec 22, 2022 at 7:57 AM Michael Paquier  wrote:
> >
> > On Wed, Dec 21, 2022 at 10:22:02PM +0100, Magnus Hagander wrote:
> > > Basically, we take one thing and turn it into 3. That very naturally rings
> > > with "split" to me.
> > >
> > > Parse might work as well, certainly better than dissect. I'd still prefer
> > > split though.
> >
> > Honestly, I don't have any counter-arguments, so I am fine to switch
> > the name as you are suggesting.  And pg_split_walfile_name() it is?
> 
> +1. FWIW, a simple patch is here
> https://www.postgresql.org/message-id/CALj2ACXdZ7WGRD-_jPPeZugvWLN%2Bgxo3QtV-eZPRicUwjesM%3Dg%40mail.gmail.com.

By the way the function is documented as the follows.

>  Extracts the file sequence number and timeline ID from a WAL file name.

I didn't find the definition for the workd "file sequence number" in
the doc. Instead I find "segment number" (a bit doubtful, though..).

In the first place "file sequence number" and "segno" can hardly be
associated by appearance by readers, I think.  (Yeah, we can identify
that since the another parameter is identifiable alone.) Why don't we
spell out the parameter simply as "segment number"?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Force streaming every change in logical decoding

2022-12-22 Thread Kyotaro Horiguchi
At Thu, 22 Dec 2022 16:59:30 +0900, Masahiko Sawada  
wrote in 
> On Thu, Dec 22, 2022 at 4:18 PM Hayato Kuroda (Fujitsu)
>  wrote:
> >
> > Dear Amit,
> >
> > Thank you for updating the patch. I have also checked the patch
> > and basically it has worked well. Almost all things I found were modified
> > by v4.
> >
> > One comment: while setting logical_decoding_mode to wrong value,
> > I got unfriendly ERROR message.
> >
> > ```
> > postgres=# SET logical_decoding_mode = 1;
> > ERROR:  invalid value for parameter "logical_decoding_mode": "1"
> > HINT:  Available values: , immediate
> > ```
> >
> > Here all acceptable enum should be output as HINT, but we could not see the 
> > empty string.
> > Should we modify config_enum_get_options() for treating empty string, maybe
> > like (empty)?
> 
> Good point. I think the hint message can say "The only allowed value
> is \"immediate\" as recovery_target does. Or considering the name of
> logical_decoding_mode, we might want to have a non-empty string, say
> 'normal' as Kuroda-san proposed, as the default value.

Oh. I missed this, and agree to have the explicit default option.
I slightly prefer "buffered" but "normal" also works fine for me.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Call lazy_check_wraparound_failsafe earlier for parallel vacuum

2022-12-22 Thread Masahiko Sawada
On Sat, Nov 12, 2022 at 12:28 AM Imseih (AWS), Sami  wrote:
>
> >Yeah, it's a little inconsistent.
>
> Yes, this should be corrected by calling the failsafe
> inside the parallel vacuum loops and handling the case by exiting
> the loop and parallel vacuum if failsafe kicks in.

I agree it's better to be consistent but I think we cannot simply call
lazy_check_wraparound_failsafe() inside the parallel vacuum loops.
IIUC the failsafe is heap (or lazyvacuum ) specific, whereas parallel
vacuum is a common infrastructure to do index vacuum in parallel. We
should not break this design. For example, we would need to have a
callback for index scan loop so that the caller (i.e. lazy vacuum) can
do its work.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Call lazy_check_wraparound_failsafe earlier for parallel vacuum

2022-12-22 Thread Masahiko Sawada
On Wed, Dec 21, 2022 at 2:44 AM Imseih (AWS), Sami  wrote:
>
> Attached is a patch to check scanned pages rather
> than blockno.

Thank you for the patch. It looks good to me.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Force streaming every change in logical decoding

2022-12-22 Thread Amit Kapila
On Thu, Dec 22, 2022 at 1:55 PM Kyotaro Horiguchi
 wrote:
>
> At Thu, 22 Dec 2022 16:59:30 +0900, Masahiko Sawada  
> wrote in
> > On Thu, Dec 22, 2022 at 4:18 PM Hayato Kuroda (Fujitsu)
> >  wrote:
> > >
> > > Dear Amit,
> > >
> > > Thank you for updating the patch. I have also checked the patch
> > > and basically it has worked well. Almost all things I found were modified
> > > by v4.
> > >
> > > One comment: while setting logical_decoding_mode to wrong value,
> > > I got unfriendly ERROR message.
> > >
> > > ```
> > > postgres=# SET logical_decoding_mode = 1;
> > > ERROR:  invalid value for parameter "logical_decoding_mode": "1"
> > > HINT:  Available values: , immediate
> > > ```
> > >
> > > Here all acceptable enum should be output as HINT, but we could not see 
> > > the empty string.
> > > Should we modify config_enum_get_options() for treating empty string, 
> > > maybe
> > > like (empty)?
> >
> > Good point. I think the hint message can say "The only allowed value
> > is \"immediate\" as recovery_target does. Or considering the name of
> > logical_decoding_mode, we might want to have a non-empty string, say
> > 'normal' as Kuroda-san proposed, as the default value.
>
> Oh. I missed this, and agree to have the explicit default option.
> I slightly prefer "buffered" but "normal" also works fine for me.
>

+1 for "buffered" as that seems to convey the meaning better.

-- 
With Regards,
Amit Kapila.




Re: appendBinaryStringInfo stuff

2022-12-22 Thread David Rowley
On Thu, 22 Dec 2022 at 20:56, Tom Lane  wrote:
>
> David Rowley  writes:
> >   22.57%  postgres  [.] escape_json
>
> Hmm ... shouldn't we do something like
>
> -appendStringInfoString(buf, "\\b");
> +appendStringInfoCharMacro(buf, '\\');
> +appendStringInfoCharMacro(buf, 'b');
>
> and so on in that function?  I'm not convinced that this one
> hotspot justifies inlining appendStringInfoString everywhere.

It improves things slightly:

Test 1 (from earlier)

master + escape_json using appendStringInfoCharMacro
$ pgbench -n -T 60 -f bench.sql -M prepared postgres | grep latency
latency average = 1.807 ms
latency average = 1.800 ms
latency average = 1.812 ms (~4.8% faster than master)

  23.05%  postgres  [.] pg_utf_mblen
  22.55%  postgres  [.] escape_json
   8.58%  postgres  [.] JsonbIteratorNext.part.0
   6.80%  postgres  [.] AllocSetAlloc
   4.23%  postgres  [.] pg_mbstrlen_with_len
   3.88%  postgres  [.] JsonbToCStringWorker
   3.79%  postgres  [.] fillJsonbValue
   3.18%  postgres  [.] appendBinaryStringInfo
   2.43%  postgres  [.] enlargeStringInfo
   2.02%  postgres  [.] palloc
   1.61%  postgres  [.] jsonb_put_escaped_value

David




Re: Force streaming every change in logical decoding

2022-12-22 Thread Amit Kapila
On Thu, Dec 22, 2022 at 12:47 PM Masahiko Sawada  wrote:
>
> On Thu, Dec 22, 2022 at 4:05 PM Amit Kapila  wrote:
> >
>
> >  I think
> > instead of adding new tests with this patch, it may be better to
> > change some of the existing tests related to streaming to use this
> > parameter as that will clearly show one of the purposes of this patch.
>
> +1. I think test_decoding/sql/stream.sql and spill.sql are good
> candidates and we change logical replication TAP tests in a separate
> patch.
>

I prefer the other way, let's first do TAP tests because that will
also help immediately with the parallel apply feature. We need to
execute most of those tests in parallel mode.

-- 
With Regards,
Amit Kapila.




Re: Avoid lost result of recursion (src/backend/optimizer/util/inherit.c)

2022-12-22 Thread David Rowley
On Thu, 22 Dec 2022 at 21:18, Richard Guo  wrote:
> My best guess is that this function is intended to share the same code
> pattern as in adjust_appendrel_attrs_multilevel.  The recursion is
> needed as 'rel' can be more than one inheritance level below the top
> parent.  I think we can keep the recursion, as in other similar
> functions, as long as we make it right, as in attached patch.

I still think we should have a test to ensure this is actually
working. Do you want to write one?

David




Re: Force streaming every change in logical decoding

2022-12-22 Thread Amit Kapila
On Thu, Dec 22, 2022 at 1:15 PM Kyotaro Horiguchi
 wrote:
>
> At Thu, 22 Dec 2022 12:35:46 +0530, Amit Kapila  
> wrote in
> > I have addressed these comments in the attached. Additionally, I have
> > modified the docs and commit messages to make those clear. I think
> > instead of adding new tests with this patch, it may be better to
> > change some of the existing tests related to streaming to use this
> > parameter as that will clearly show one of the purposes of this patch.
>
> Being late but I'm warried that we might sometime regret that the lack
> of the explicit default. Don't we really need it?
>

For this, I like your proposal for "buffered" as an explicit default value.

> +Allows streaming or serializing changes immediately in logical 
> decoding.
> +The allowed values of logical_decoding_mode are 
> the
> +empty string and immediate. When set to
> +immediate, stream each change if
> +streaming option is enabled, otherwise, serialize
> +each change.  When set to an empty string, which is the default,
> +decoding will stream or serialize changes when
> +logical_decoding_work_mem is reached.
>
> With (really) fresh eyes, I took a bit long time to understand what
> the "streaming" option is. Couldn't we augment the description by a
> bit?
>

Okay, how about modifying it to: "When set to
immediate, stream each change if
streaming option (see optional parameters set by
CREATE SUBSCRIPTION) is enabled, otherwise, serialize each change.

-- 
With Regards,
Amit Kapila.




Re: Perform streaming logical transactions by background workers and parallel apply

2022-12-22 Thread Amit Kapila
On Thu, Dec 22, 2022 at 11:39 AM Masahiko Sawada  wrote:
>
> Thank you for updating the patch. Here are some comments on v64 patches:
>
> While testing the patch, I realized that if all streamed transactions
> are handled by parallel workers, there is no chance for the leader to
> call maybe_reread_subscription() except for when waiting for the next
> message. Due to this, the leader didn't stop for a while even if the
> subscription gets disabled. It's an extreme case since my test was
> that pgbench runs 30 concurrent transactions and logical_decoding_mode
> = 'immediate', but we might want to make sure to call
> maybe_reread_subscription() at least after committing/preparing a
> transaction.
>

Won't it be better to call it only if we handle the transaction by the
parallel worker?

> ---
> +if (pg_atomic_read_u32(&MyParallelShared->pending_stream_count) == 0)
> +{
> +if (pa_has_spooled_message_pending())
> +return;
> +
> +elog(ERROR, "invalid pending streaming block number");
> +}
>
> I think it's helpful if the error message shows the invalid block number.
>

+1. Additionally, I suggest changing the message to "invalid pending
streaming chunk"?

> ---
> On Wed, Dec 7, 2022 at 10:13 PM houzj.f...@fujitsu.com
>  wrote:
> >
> > On Wednesday, December 7, 2022 7:51 PM Masahiko Sawada 
> >  wrote:
> > > ---
> > > If a value of max_parallel_apply_workers_per_subscription is not
> > > sufficient, we get the LOG "out of parallel apply workers" every time
> > > when the apply worker doesn't launch a worker. But do we really need
> > > this log? It seems not consistent with
> > > max_sync_workers_per_subscription behavior. I think we can check if
> > > the number of running parallel workers is less than
> > > max_parallel_apply_workers_per_subscription before calling
> > > logicalrep_worker_launch(). What do you think?
> >
> > (Sorry, I missed this comment in last email)
> >
> > I personally feel giving a hint might help user to realize that the
> > max_parallel_applyxxx is not enough for the current workload and then they 
> > can
> > adjust the parameter. Otherwise, user might have an easy way to check if 
> > more
> > workers are needed.
> >
>
> Sorry, I missed this comment.
>
> I think the number of concurrent transactions on the publisher could
> be several hundreds, and the number of streamed transactions among
> them could be several tens. I agree setting
> max_parallel_apply_workers_per_subscription to a value high enough is
> ideal but I'm not sure we want to inform users immediately that the
> setting value is not enough. I think that with the default value
> (i.e., 2), it will not be enough for many systems and the server logs
> could be flood with the LOG "out of parallel apply workers".
>

It seems currently we give a similar message when the logical
replication worker slots are finished "out of logical replication
worker slots" or when we are not able to register background workers
"out of background worker slots". Now, OTOH, when we exceed the limit
of sync workers "max_sync_workers_per_subscription", we don't display
any message. Personally, I think if any user has used the streaming
option as "parallel" she wants all large transactions to be performed
in parallel and if the system is not able to deal with it, displaying
a LOG message will be useful for users. This is because the
performance difference for large transactions between parallel and
non-parallel is big (30-40%) and it is better for users to know as
soon as possible instead of expecting them to run some monitoring
query to notice the same.

-- 
With Regards,
Amit Kapila.




Re: [PATCH] Add function to_oct

2022-12-22 Thread Dag Lem
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

This is a mini-review of to_oct :-)

The function seems useful to me, and is obviously correct.

I don't know whether optimization at such a low level is considered in 
PostgreSQL, but here goes.

The calculation of quotient and remainder can be replaced by less costly 
masking and shifting.

Defining

#define OCT_DIGIT_BITS 3
#define OCT_DIGIT_BITMASK 0x7

the content of the loop can be replaced by

*--ptr = digits[value & OCT_DIGIT_BITMASK];
value >>= OCT_DIGIT_BITS;

Also, the check for ptr > buf in the while loop can be removed. The check is 
superfluous, since buf cannot possibly be exhausted by a 32 bit integer (the 
maximum octal number being 377).


Best regards

Dag Lem

The new status of this patch is: Waiting on Author


Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-12-22 Thread John Naylor
On Wed, Dec 21, 2022 at 3:09 PM Masahiko Sawada 
wrote:
>
> On Tue, Dec 20, 2022 at 3:09 PM John Naylor
>  wrote:

> >
https://www.postgresql.org/message-id/20220704211822.kfxtzpcdmslzm2dy%40awork3.anarazel.de
> >
> > I'm guessing the hash join case can afford to be precise about memory
because it must spill to disk when exceeding workmem. We don't have that
design constraint.
>
> You mean that the memory used by the radix tree should be limited not
> by the amount of memory actually used, but by the amount of memory
> allocated? In other words, it checks by MomoryContextMemAllocated() in
> the local cases and by dsa_get_total_size() in the shared case.

I mean, if this patch set uses 10x less memory than v15 (not always, but
easy to find cases where it does), and if it's also expensive to track
memory use precisely, then we don't have an incentive to track memory
precisely. Even if we did, we don't want to assume that every future caller
of radix tree is willing to incur that cost.

> The idea of using up to half of maintenance_work_mem might be a good
> idea compared to the current flat-array solution. But since it only
> uses half, I'm concerned that there will be users who double their
> maintenace_work_mem. When it is improved, the user needs to restore
> maintenance_work_mem again.

I find it useful to step back and look at the usage patterns:

Autovacuum: Limiting the memory allocated by vacuum is important, since
there are multiple workers and they can run at any time (possibly most of
the time). This case will not use parallel index vacuum, so will use slab,
where the quick estimation of memory taken by the context is not terribly
far off, so we can afford to be more optimistic here.

Manual vacuum: The default configuration assumes we want to finish as soon
as possible (vacuum_cost_delay is zero). Parallel index vacuum can be used.
My experience leads me to believe users are willing to use a lot of memory
to make manual vacuum finish as quickly as possible, and are disappointed
to learn that even if maintenance work mem is 10GB, vacuum can only use 1GB.

So I don't believe anyone will have to double maintenance work mem after
upgrading (even with pessimistic accounting) because we'll be both
- much more efficient with memory on average
- free from the 1GB cap

That said, it's possible 50% is too pessimistic -- a 75% threshold will
bring us very close to powers of two for example:

2*(1+2+4+8+16+32+64+128) + 256 = 766MB (74.8% of 1GB) -> keep going
766 + 256 = 1022MB -> stop

I'm not sure if that calculation could cause going over the limit, or how
common that would be.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

2022-12-22 Thread Michael Paquier
On Thu, Dec 22, 2022 at 05:19:24PM +0900, Kyotaro Horiguchi wrote:
> In the first place "file sequence number" and "segno" can hardly be
> associated by appearance by readers, I think.  (Yeah, we can identify
> that since the another parameter is identifiable alone.) Why don't we
> spell out the parameter simply as "segment number"?

As in using "sequence number" removing "file" from the docs and
changing the OUT parameter name to segment_number rather than segno?
Fine by me.
--
Michael


signature.asc
Description: PGP signature


Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

2022-12-22 Thread Bharath Rupireddy
On Thu, Dec 22, 2022 at 4:57 PM Michael Paquier  wrote:
>
> On Thu, Dec 22, 2022 at 05:19:24PM +0900, Kyotaro Horiguchi wrote:
> > In the first place "file sequence number" and "segno" can hardly be
> > associated by appearance by readers, I think.  (Yeah, we can identify
> > that since the another parameter is identifiable alone.) Why don't we
> > spell out the parameter simply as "segment number"?
>
> As in using "sequence number" removing "file" from the docs and
> changing the OUT parameter name to segment_number rather than segno?
> Fine by me.

+1.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Small miscellaneus fixes (Part II)

2022-12-22 Thread Ranier Vilela
Em qua., 21 de dez. de 2022 às 04:10, Michael Paquier 
escreveu:

> On Tue, Dec 20, 2022 at 06:51:34PM -0600, Justin Pryzby wrote:
> > On Fri, Nov 25, 2022 at 06:27:04PM -0300, Ranier Vilela wrote:
> > > 5. Use boolean operator with boolean operands
> > > (b/src/backend/commands/tablecmds.c)
> >
> > tablecmds.c: right.  Since 074c5cfbf
>
> Most of this does not seem to be really worth poking at.
>
> newcons = AddRelationNewConstraints(rel, NIL,
> list_make1(copyObject(constr)),
> -   recursing | is_readd,   /*
> allow_merge */
> +   recursing || is_readd,  /*
> allow_merge */
> There is no damage here, but that looks like a typo so no objections
> on this one.
>

Thanks Michael, for the commit.

regards,
Ranier Vilela


Re: Optimization issue of branching UNION ALL

2022-12-22 Thread Andrey Lepikhov

On 22/12/2022 06:50, Tom Lane wrote:

2. Iterative passes along the append_rel_list for replacing vars in the
translated_vars field. I can't grasp real necessity of passing all the
append_rel_list during flattening of an union all leaf subquery. No one
can reference this leaf, isn't it?


After thinking about that for awhile, I believe we can go further:
the containing_appendrel is actually the *only* part of the upper
query that needs to be adjusted.  So we could do something like
the attached.

This passes check-world, but I don't have quite enough confidence
in it to just commit it.
Thanks, I have written the letter because of some doubts too. But only 
one weak point I could imagine - if someday sql standard will be changed.

Your code looks better, than previous attempt.

--
regards,
Andrey Lepikhov
Postgres Professional





Re: Exit walsender before confirming remote flush in logical replication

2022-12-22 Thread Ashutosh Bapat
On Thu, Dec 22, 2022 at 11:16 AM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear hackers,
> (I added Amit as CC because we discussed in another thread)
>
> This is a fork thread from time-delayed logical replication [1].
> While discussing, we thought that we could extend the condition of walsender 
> shutdown[2][3].
>
> Currently, walsenders delay the shutdown request until confirming all sent 
> data
> are flushed on remote side. This condition was added in 985bd7[4], which is 
> for
> supporting clean switchover. Supposing that there is a primary-secondary
> physical replication system, and do following steps. If any changes are come
> while step 2 but the walsender does not confirm the remote flush, the reboot 
> in
> step 3 may be failed.
>
> 1. Stops primary server.
> 2. Promotes secondary to new primary.
> 3. Reboot (old)primary as new secondary.
>
> In case of logical replication, however, we cannot support the use-case that
> switches the role publisher <-> subscriber. Suppose same case as above, 
> additional
> transactions are committed while doing step2. To catch up such changes 
> subscriber
> must receive WALs related with trans, but it cannot be done because subscriber
> cannot request WALs from the specific position. In the case, we must truncate 
> all
> data in new subscriber once, and then create new subscription with copy_data
> = true.
>
> Therefore, I think that we can ignore the condition for shutting down the
> walsender in logical replication.
>
> This change may be useful for time-delayed logical replication. The walsender
> waits the shutdown until all changes are applied on subscriber, even if it is
> delayed. This causes that publisher cannot be stopped if large delay-time is
> specified.

I think the current behaviour is an artifact of using the same WAL
sender code for both logical and physical replication.

I agree with you that the logical WAL sender need not wait for all the
WAL to be replayed downstream.

I have not reviewed the patch though.

-- 
Best Wishes,
Ashutosh Bapat




Re: Perform streaming logical transactions by background workers and parallel apply

2022-12-22 Thread Amit Kapila
On Wed, Dec 21, 2022 at 11:02 AM houzj.f...@fujitsu.com
 wrote:
>
> Attach the new patch set which also includes some
> cosmetic comment changes.
>

Few minor comments:
=
1.
+   t = spill the changes of in-progress
transactions to+   disk and apply at once after the transaction is
committed on the+   publisher,

Can we change this description to: "spill the changes of in-progress
transactions to disk and apply at once after the transaction is
committed on the publisher and received by the subscriber,"

2.
table is in progress, there will be additional workers for the tables
-   being synchronized.
+   being synchronized. Moreover, if the streaming transaction is applied in
+   parallel, there will be additional workers.

Do we need this change in the first patch? We skip parallel apply
workers from view for the first patch. Am, I missing something?

3.
I think we would need a catversion bump for parallel apply feature
because of below change:
@@ -7913,11 +7913,16 @@ SCRAM-SHA-256$:&l

  
   
-   substream bool
+   substream char
   

Am, I missing something? If not, then I think we can note that in the
commit message to avoid forgetting it before commit.

4. Kindly change the below comments:
diff --git a/src/backend/replication/logical/applyparallelworker.c
b/src/backend/replication/logical/applyparallelworker.c
index 97f4a3037c..02bb608188 100644
--- a/src/backend/replication/logical/applyparallelworker.c
+++ b/src/backend/replication/logical/applyparallelworker.c
@@ -9,11 +9,10 @@
  *
  * This file contains the code to launch, set up, and teardown a parallel apply
  * worker which receives the changes from the leader worker and
invokes routines
- * to apply those on the subscriber database.
- *
- * This file contains routines that are intended to support setting up, using
- * and tearing down a ParallelApplyWorkerInfo which is required so the leader
- * worker and parallel apply workers can communicate with each other.
+ * to apply those on the subscriber database. Additionally, this file contains
+ * routines that are intended to support setting up, using, and tearing down a
+ * ParallelApplyWorkerInfo which is required so the leader worker and parallel
+ * apply workers can communicate with each other.
  *
  * The parallel apply workers are assigned (if available) as soon as xact's
  * first stream is received for subscriptions that have set their 'streaming'

-- 
With Regards,
Amit Kapila.




Re: appendBinaryStringInfo stuff

2022-12-22 Thread John Naylor
On Thu, Dec 22, 2022 at 4:19 PM David Rowley  wrote:
>
> Test 1 (from earlier)
>
> master + escape_json using appendStringInfoCharMacro
> $ pgbench -n -T 60 -f bench.sql -M prepared postgres | grep latency
> latency average = 1.807 ms
> latency average = 1.800 ms
> latency average = 1.812 ms (~4.8% faster than master)

> 23.05%  postgres  [.] pg_utf_mblen

I get about 20% improvement by adding an ascii fast path in
pg_mbstrlen_with_len, which I think would work with all encodings we
support:

@@ -1064,7 +1064,12 @@ pg_mbstrlen_with_len(const char *mbstr, int limit)

while (limit > 0 && *mbstr)
{
-   int l = pg_mblen(mbstr);
+   int l;
+
+   if (!IS_HIGHBIT_SET(*mbstr))
+   l = 1;
+   else
+   l = pg_mblen(mbstr);

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Perform streaming logical transactions by background workers and parallel apply

2022-12-22 Thread Masahiko Sawada
On Thu, Dec 22, 2022 at 7:04 PM Amit Kapila  wrote:
>
> On Thu, Dec 22, 2022 at 11:39 AM Masahiko Sawada  
> wrote:
> >
> > Thank you for updating the patch. Here are some comments on v64 patches:
> >
> > While testing the patch, I realized that if all streamed transactions
> > are handled by parallel workers, there is no chance for the leader to
> > call maybe_reread_subscription() except for when waiting for the next
> > message. Due to this, the leader didn't stop for a while even if the
> > subscription gets disabled. It's an extreme case since my test was
> > that pgbench runs 30 concurrent transactions and logical_decoding_mode
> > = 'immediate', but we might want to make sure to call
> > maybe_reread_subscription() at least after committing/preparing a
> > transaction.
> >
>
> Won't it be better to call it only if we handle the transaction by the
> parallel worker?

Agreed. And we won't need to do that after handling stream_prepare as
we don't do that now.

>
> > ---
> > +if (pg_atomic_read_u32(&MyParallelShared->pending_stream_count) == 
> > 0)
> > +{
> > +if (pa_has_spooled_message_pending())
> > +return;
> > +
> > +elog(ERROR, "invalid pending streaming block number");
> > +}
> >
> > I think it's helpful if the error message shows the invalid block number.
> >
>
> +1. Additionally, I suggest changing the message to "invalid pending
> streaming chunk"?
>
> > ---
> > On Wed, Dec 7, 2022 at 10:13 PM houzj.f...@fujitsu.com
> >  wrote:
> > >
> > > On Wednesday, December 7, 2022 7:51 PM Masahiko Sawada 
> > >  wrote:
> > > > ---
> > > > If a value of max_parallel_apply_workers_per_subscription is not
> > > > sufficient, we get the LOG "out of parallel apply workers" every time
> > > > when the apply worker doesn't launch a worker. But do we really need
> > > > this log? It seems not consistent with
> > > > max_sync_workers_per_subscription behavior. I think we can check if
> > > > the number of running parallel workers is less than
> > > > max_parallel_apply_workers_per_subscription before calling
> > > > logicalrep_worker_launch(). What do you think?
> > >
> > > (Sorry, I missed this comment in last email)
> > >
> > > I personally feel giving a hint might help user to realize that the
> > > max_parallel_applyxxx is not enough for the current workload and then 
> > > they can
> > > adjust the parameter. Otherwise, user might have an easy way to check if 
> > > more
> > > workers are needed.
> > >
> >
> > Sorry, I missed this comment.
> >
> > I think the number of concurrent transactions on the publisher could
> > be several hundreds, and the number of streamed transactions among
> > them could be several tens. I agree setting
> > max_parallel_apply_workers_per_subscription to a value high enough is
> > ideal but I'm not sure we want to inform users immediately that the
> > setting value is not enough. I think that with the default value
> > (i.e., 2), it will not be enough for many systems and the server logs
> > could be flood with the LOG "out of parallel apply workers".
> >
>
> It seems currently we give a similar message when the logical
> replication worker slots are finished "out of logical replication
> worker slots" or when we are not able to register background workers
> "out of background worker slots". Now, OTOH, when we exceed the limit
> of sync workers "max_sync_workers_per_subscription", we don't display
> any message. Personally, I think if any user has used the streaming
> option as "parallel" she wants all large transactions to be performed
> in parallel and if the system is not able to deal with it, displaying
> a LOG message will be useful for users. This is because the
> performance difference for large transactions between parallel and
> non-parallel is big (30-40%) and it is better for users to know as
> soon as possible instead of expecting them to run some monitoring
> query to notice the same.

I see your point. But looking at other parallel features such as
parallel queries, parallel vacuum and parallel index creation, we
don't give such messages even if the number of parallel workers
actually launched is lower than the ideal. They also bring a big
performance benefit.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




RE: Force streaming every change in logical decoding

2022-12-22 Thread shiy.f...@fujitsu.com
On Thu, Dec 22, 2022 5:24 PM Amit Kapila  wrote:
> 
> On Thu, Dec 22, 2022 at 1:15 PM Kyotaro Horiguchi
>  wrote:
> >
> > At Thu, 22 Dec 2022 12:35:46 +0530, Amit Kapila
>  wrote in
> > > I have addressed these comments in the attached. Additionally, I have
> > > modified the docs and commit messages to make those clear. I think
> > > instead of adding new tests with this patch, it may be better to
> > > change some of the existing tests related to streaming to use this
> > > parameter as that will clearly show one of the purposes of this patch.
> >
> > Being late but I'm warried that we might sometime regret that the lack
> > of the explicit default. Don't we really need it?
> >
> 
> For this, I like your proposal for "buffered" as an explicit default value.
> 
> > +Allows streaming or serializing changes immediately in logical
> decoding.
> > +The allowed values of logical_decoding_mode
> are the
> > +empty string and immediate. When set to
> > +immediate, stream each change if
> > +streaming option is enabled, otherwise, 
> > serialize
> > +each change.  When set to an empty string, which is the default,
> > +decoding will stream or serialize changes when
> > +logical_decoding_work_mem is reached.
> >
> > With (really) fresh eyes, I took a bit long time to understand what
> > the "streaming" option is. Couldn't we augment the description by a
> > bit?
> >
> 
> Okay, how about modifying it to: "When set to
> immediate, stream each change if
> streaming option (see optional parameters set by
> CREATE SUBSCRIPTION) is enabled, otherwise, serialize each change.
> 

I updated the patch to use "buffered" as the explicit default value, and include
Amit's changes about document.

Besides, I tried to reduce data size in streaming subscription tap tests by this
new GUC (see 0002 patch). But I didn't covert all streaming tap tests because I
think we also need to cover the case that there are lots of changes. So, 015* is
not modified. And 017* is not modified because streaming transactions and
non-streaming transactions are tested alternately in this test.

I collected the time to run these tests before and after applying the patch set
on my machine. In debug version, it saves about 5.3 s; and in release version,
it saves about 1.8 s. The time of each test is attached.

Regards,
Shi yu


v5-0001-Add-logical_decoding_mode-GUC.patch
Description: v5-0001-Add-logical_decoding_mode-GUC.patch


v5-0002-Converting-streaming-tap-tests-by-using-logical_d.patch
Description:  v5-0002-Converting-streaming-tap-tests-by-using-logical_d.patch
The unit is milliseconds.

- debug
+-+---+-+
| test_no |  HEAD | patched |
+-+---+-+
|   016   |  3425 |   3118  |
|   018   |  4873 |   3159  |
|   019   |  3241 |   3116  |
|   022   |  8273 |   7373  |
|   023   |  7156 |   4823  |
| |   | |
|   sum   | 26968 |  21589  |
+-+---+-+


- release
+-+---+-+
| test_no |  HEAD | patched |
+-+---+-+
|   016   |  1776 |   1649  |
|   018   |  2248 |   1689  |
|   019   |  1653 |   1648  |
|   022   |  4858 |   4462  |
|   023   |  3992 |   3292  |
| |   | |
|   sum   | 14527 |  12740  |
+-+---+-+


Re: daitch_mokotoff module

2022-12-22 Thread Dag Lem
I noticed that the Meson builds failed in Cfbot, the updated patch adds
a missing "include_directories" line to meson.build.

Best regards

Dag Lem

diff --git a/contrib/fuzzystrmatch/Makefile b/contrib/fuzzystrmatch/Makefile
index 0704894f88..12baf2d884 100644
--- a/contrib/fuzzystrmatch/Makefile
+++ b/contrib/fuzzystrmatch/Makefile
@@ -3,14 +3,15 @@
 MODULE_big = fuzzystrmatch
 OBJS = \
 	$(WIN32RES) \
+	daitch_mokotoff.o \
 	dmetaphone.o \
 	fuzzystrmatch.o
 
 EXTENSION = fuzzystrmatch
-DATA = fuzzystrmatch--1.1.sql fuzzystrmatch--1.0--1.1.sql
+DATA = fuzzystrmatch--1.1.sql fuzzystrmatch--1.1--1.2.sql fuzzystrmatch--1.0--1.1.sql
 PGFILEDESC = "fuzzystrmatch - similarities and distance between strings"
 
-REGRESS = fuzzystrmatch
+REGRESS = fuzzystrmatch fuzzystrmatch_utf8
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
@@ -22,3 +23,14 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif
+
+# Force this dependency to be known even without dependency info built:
+daitch_mokotoff.o: daitch_mokotoff.h
+
+daitch_mokotoff.h: daitch_mokotoff_header.pl
+	perl $< $@
+
+distprep: daitch_mokotoff.h
+
+maintainer-clean:
+	rm -f daitch_mokotoff.h
diff --git a/contrib/fuzzystrmatch/daitch_mokotoff.c b/contrib/fuzzystrmatch/daitch_mokotoff.c
new file mode 100644
index 00..d4ad95c283
--- /dev/null
+++ b/contrib/fuzzystrmatch/daitch_mokotoff.c
@@ -0,0 +1,596 @@
+/*
+ * Daitch-Mokotoff Soundex
+ *
+ * Copyright (c) 2021 Finance Norway
+ * Author: Dag Lem 
+ *
+ * This implementation of the Daitch-Mokotoff Soundex System aims at high
+ * performance.
+ *
+ * - The processing of each phoneme is initiated by an O(1) table lookup.
+ * - For phonemes containing more than one character, a coding tree is traversed
+ *   to process the complete phoneme.
+ * - The (alternate) soundex codes are produced digit by digit in-place in
+ *   another tree structure.
+ *
+ *  References:
+ *
+ * https://www.avotaynu.com/soundex.htm
+ * https://www.jewishgen.org/InfoFiles/Soundex.html
+ * https://familypedia.fandom.com/wiki/Daitch-Mokotoff_Soundex
+ * https://stevemorse.org/census/soundex.html (dmlat.php, dmsoundex.php)
+ * https://github.com/apache/commons-codec/ (dmrules.txt, DaitchMokotoffSoundex.java)
+ * https://metacpan.org/pod/Text::Phonetic (DaitchMokotoff.pm)
+ *
+ * A few notes on other implementations:
+ *
+ * - All other known implementations have the same unofficial rules for "UE",
+ *   these are also adapted by this implementation (0, 1, NC).
+ * - The only other known implementation which is capable of generating all
+ *   correct soundex codes in all cases is the JOS Soundex Calculator at
+ *   https://www.jewishgen.org/jos/jossound.htm
+ * - "J" is considered (only) a vowel in dmlat.php
+ * - The official rules for "RS" are commented out in dmlat.php
+ * - Identical code digits for adjacent letters are not collapsed correctly in
+ *   dmsoundex.php when double digit codes are involved. E.g. "BESST" yields
+ *   744300 instead of 743000 as for "BEST".
+ * - "J" is considered (only) a consonant in DaitchMokotoffSoundex.java
+ * - "Y" is not considered a vowel in DaitchMokotoffSoundex.java
+ *
+ * Permission to use, copy, modify, and distribute this software and its
+ * documentation for any purpose, without fee, and without a written agreement
+ * is hereby granted, provided that the above copyright notice and this
+ * paragraph and the following two paragraphs appear in all copies.
+ *
+ * IN NO EVENT SHALL THE AUTHOR OR DISTRIBUTORS BE LIABLE TO ANY PARTY FOR
+ * DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
+ * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
+ * DOCUMENTATION, EVEN IF THE AUTHOR OR DISTRIBUTORS HAVE BEEN ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ *
+ * THE AUTHOR AND DISTRIBUTORS SPECIFICALLY DISCLAIM ANY WARRANTIES,
+ * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
+ * AND FITNESS FOR A PARTICULAR PURPOSE.  THE SOFTWARE PROVIDED HEREUNDER IS
+ * ON AN "AS IS" BASIS, AND THE AUTHOR AND DISTRIBUTORS HAS NO OBLIGATIONS TO
+ * PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
+*/
+
+#include "postgres.h"
+
+#include "lib/stringinfo.h"
+#include "mb/pg_wchar.h"
+#include "utils/builtins.h"
+#include "utils/memutils.h"
+
+
+/*
+ * The soundex coding chart table is adapted from from
+ * https://www.jewishgen.org/InfoFiles/Soundex.html
+ * See daitch_mokotoff_header.pl for details.
+*/
+
+#define DM_CODE_DIGITS 6
+
+/* Coding chart table: Soundex codes */
+typedef char dm_code[2 + 1];	/* One or two sequential code digits + NUL */
+typedef dm_code dm_codes[3];	/* Start of name, before a vowel, any other */
+
+/* Coding chart table: Letter in input sequence */
+struct dm_letter
+{
+	char		letter;			/* Present letter in sequence */
+	const struct dm_letter *letters;	/* List of possible successive letters */
+	const		dm_codes *codes;	/* Code sequenc

Timeout when changes are filtered out by the core during logical replication

2022-12-22 Thread Ashutosh Bapat
Hi All,
A customer ran a script dropping a few dozens of users in a transaction.
Before dropping a user they change the ownership of the tables owned by
that user to another user and revoking all the accesses from that user in
the same transaction. There were a few thousand tables whose privileges and
ownership was changed by this transaction. Since all of these changes were
in catalog table, those changes were filtered out
in ReorderBufferProcessTXN()
by the following code
   if (!RelationIsLogicallyLogged(relation))
goto change_done;

I tried to reproduce a similar situation through the attached TAP test. For
500 users and 1000 tables, we see that the transaction takes significant
time but logical decoding does not take much time. So with the default 1
min WAL sender and receiver timeout I could not reproduce the timeout.
Beyond that our TAp test itself times out.

But I think there's a possibility that the logical receiver will time out
this way when decoding a sufficiently large transaction which takes more
than the timeout amount of time to decode. So I think we need to
call OutputPluginUpdateProgress() after a regular interval (in terms of
time or number of changes) to consume any feedback from the subscriber or
send a keep-alive message.

Following commit
```
commit 87c1dd246af8ace926645900f02886905b889718
Author: Amit Kapila 
Date:   Wed May 11 10:12:23 2022 +0530

Fix the logical replication timeout during large transactions.

 ```
fixed a similar problem when the changes were filtered by an output plugin,
but in this case the changes are not being handed over to the output plugin
as well. If we fix it in the core we may not need to handle it in the
output plugin as that commit does. The commit does not have a test case
which I could run to reproduce the timeout.

-- 
Best Wishes,
Ashutosh


032_long_decoding.pl
Description: Perl program


Apply worker fails if a relation is missing on subscriber even if refresh publication has not been refreshed yet

2022-12-22 Thread Melih Mutlu
Hi hackers,

I realized a behaviour of logical replication that seems unexpected to me,
but not totally sure.

Let's say a new table is created and added into a publication and not
created on subscriber yet. Also "ALTER SUBSCRIPTION ... REFRESH
PUBLICATION" has not been called yet.
What I expect in that case would be that logical replication continues to
work as it was working before the new table was created. The new table does
not get replicated until "REFRESH PUBLICATION" as stated here [1].
This is indeed how it actually seems to work. Until we insert a row into
the new table.

After a new row into the new table, the apply worker gets this change and
tries to apply it. As expected, it fails since the table does not exist on
the subscriber yet. And the worker keeps crashing without and can't apply
any changes for any table.
The obvious way to resolve this is creating the table on subscriber as
well. After that apply worker will be back to work and skip changes for the
new table and move to other changes.
Since REFRESH PUBLICATION is not called yet, any change for the new table
will not be replicated.

If replication of the new table will not start anyway (until REFRESH
PUBLICATION), do we really need to have that table on the subscriber for
apply worker to work?
AFAIU any change on publication would not affect logical replication setup
until the publication gets refreshed on subscriber. If this understanding
is correct, then apply worker should be able to run without needing new
tables.
What do you think?

Also; if you agree, then the attached patch attempts to fix this issue.
It relies on the info from pg_subscription_rel so that apply worker only
applies changes for the relations exist in pg_subscription_rel.
Since new tables wouldn't be in there until the next REFRESH PUBLICATION,
missing those tables won't be a problem for existing subscriptions.

[1] https://www.postgresql.org/docs/current/sql-altersubscription.html

Thanks,
-- 
Melih Mutlu
Microsoft


0001-Do-not-apply-for-tables-which-not-exist-on-catalog.patch
Description: Binary data


Re: fixing CREATEROLE

2022-12-22 Thread Alvaro Herrera
Reading 0001:

+However, CREATEROLE does not convey the ability to
+create SUPERUSER roles, nor does it convey any
+power over SUPERUSER roles that already exist.
+Furthermore, CREATEROLE does not convey the power
+to create REPLICATION users, nor the ability to
+grant or revoke the REPLICATION privilege, nor the
+ability to the role properties of such users.

"... nor the ability to the role properties ..."
I think a verb is missing here.

The contents looks good to me other than that problem, and I agree to
backpatch it.


Why did you choose to use two dots for ellipses in some command
s rather than three?  I know I've made that choice too on
occassion, but there aren't many such cases and maybe we should put a
stop to it (or a period) before it spreads too much.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/




Re: daitch_mokotoff module

2022-12-22 Thread Dag Lem
Dag Lem  writes:

> I noticed that the Meson builds failed in Cfbot, the updated patch adds
> a missing "include_directories" line to meson.build.
>

This should hopefully fix the last Cfbot failures, by exclusion of
daitch_mokotoff.h from headerscheck and cpluspluscheck.

Best regards

Dag Lem

diff --git a/contrib/fuzzystrmatch/Makefile b/contrib/fuzzystrmatch/Makefile
index 0704894f88..12baf2d884 100644
--- a/contrib/fuzzystrmatch/Makefile
+++ b/contrib/fuzzystrmatch/Makefile
@@ -3,14 +3,15 @@
 MODULE_big = fuzzystrmatch
 OBJS = \
 	$(WIN32RES) \
+	daitch_mokotoff.o \
 	dmetaphone.o \
 	fuzzystrmatch.o
 
 EXTENSION = fuzzystrmatch
-DATA = fuzzystrmatch--1.1.sql fuzzystrmatch--1.0--1.1.sql
+DATA = fuzzystrmatch--1.1.sql fuzzystrmatch--1.1--1.2.sql fuzzystrmatch--1.0--1.1.sql
 PGFILEDESC = "fuzzystrmatch - similarities and distance between strings"
 
-REGRESS = fuzzystrmatch
+REGRESS = fuzzystrmatch fuzzystrmatch_utf8
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
@@ -22,3 +23,14 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif
+
+# Force this dependency to be known even without dependency info built:
+daitch_mokotoff.o: daitch_mokotoff.h
+
+daitch_mokotoff.h: daitch_mokotoff_header.pl
+	perl $< $@
+
+distprep: daitch_mokotoff.h
+
+maintainer-clean:
+	rm -f daitch_mokotoff.h
diff --git a/contrib/fuzzystrmatch/daitch_mokotoff.c b/contrib/fuzzystrmatch/daitch_mokotoff.c
new file mode 100644
index 00..d4ad95c283
--- /dev/null
+++ b/contrib/fuzzystrmatch/daitch_mokotoff.c
@@ -0,0 +1,596 @@
+/*
+ * Daitch-Mokotoff Soundex
+ *
+ * Copyright (c) 2021 Finance Norway
+ * Author: Dag Lem 
+ *
+ * This implementation of the Daitch-Mokotoff Soundex System aims at high
+ * performance.
+ *
+ * - The processing of each phoneme is initiated by an O(1) table lookup.
+ * - For phonemes containing more than one character, a coding tree is traversed
+ *   to process the complete phoneme.
+ * - The (alternate) soundex codes are produced digit by digit in-place in
+ *   another tree structure.
+ *
+ *  References:
+ *
+ * https://www.avotaynu.com/soundex.htm
+ * https://www.jewishgen.org/InfoFiles/Soundex.html
+ * https://familypedia.fandom.com/wiki/Daitch-Mokotoff_Soundex
+ * https://stevemorse.org/census/soundex.html (dmlat.php, dmsoundex.php)
+ * https://github.com/apache/commons-codec/ (dmrules.txt, DaitchMokotoffSoundex.java)
+ * https://metacpan.org/pod/Text::Phonetic (DaitchMokotoff.pm)
+ *
+ * A few notes on other implementations:
+ *
+ * - All other known implementations have the same unofficial rules for "UE",
+ *   these are also adapted by this implementation (0, 1, NC).
+ * - The only other known implementation which is capable of generating all
+ *   correct soundex codes in all cases is the JOS Soundex Calculator at
+ *   https://www.jewishgen.org/jos/jossound.htm
+ * - "J" is considered (only) a vowel in dmlat.php
+ * - The official rules for "RS" are commented out in dmlat.php
+ * - Identical code digits for adjacent letters are not collapsed correctly in
+ *   dmsoundex.php when double digit codes are involved. E.g. "BESST" yields
+ *   744300 instead of 743000 as for "BEST".
+ * - "J" is considered (only) a consonant in DaitchMokotoffSoundex.java
+ * - "Y" is not considered a vowel in DaitchMokotoffSoundex.java
+ *
+ * Permission to use, copy, modify, and distribute this software and its
+ * documentation for any purpose, without fee, and without a written agreement
+ * is hereby granted, provided that the above copyright notice and this
+ * paragraph and the following two paragraphs appear in all copies.
+ *
+ * IN NO EVENT SHALL THE AUTHOR OR DISTRIBUTORS BE LIABLE TO ANY PARTY FOR
+ * DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
+ * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
+ * DOCUMENTATION, EVEN IF THE AUTHOR OR DISTRIBUTORS HAVE BEEN ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ *
+ * THE AUTHOR AND DISTRIBUTORS SPECIFICALLY DISCLAIM ANY WARRANTIES,
+ * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
+ * AND FITNESS FOR A PARTICULAR PURPOSE.  THE SOFTWARE PROVIDED HEREUNDER IS
+ * ON AN "AS IS" BASIS, AND THE AUTHOR AND DISTRIBUTORS HAS NO OBLIGATIONS TO
+ * PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
+*/
+
+#include "postgres.h"
+
+#include "lib/stringinfo.h"
+#include "mb/pg_wchar.h"
+#include "utils/builtins.h"
+#include "utils/memutils.h"
+
+
+/*
+ * The soundex coding chart table is adapted from from
+ * https://www.jewishgen.org/InfoFiles/Soundex.html
+ * See daitch_mokotoff_header.pl for details.
+*/
+
+#define DM_CODE_DIGITS 6
+
+/* Coding chart table: Soundex codes */
+typedef char dm_code[2 + 1];	/* One or two sequential code digits + NUL */
+typedef dm_code dm_codes[3];	/* Start of name, before a vowel, any other */
+
+/* Coding chart table: Letter in input sequence */
+struct dm_letter
+{
+	char		letter;			/* 

Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-12-22 Thread Masahiko Sawada
On Thu, Dec 22, 2022 at 7:24 PM John Naylor
 wrote:
>
>
> On Wed, Dec 21, 2022 at 3:09 PM Masahiko Sawada  wrote:
> >
> > On Tue, Dec 20, 2022 at 3:09 PM John Naylor
> >  wrote:
>
> > > https://www.postgresql.org/message-id/20220704211822.kfxtzpcdmslzm2dy%40awork3.anarazel.de
> > >
> > > I'm guessing the hash join case can afford to be precise about memory 
> > > because it must spill to disk when exceeding workmem. We don't have that 
> > > design constraint.
> >
> > You mean that the memory used by the radix tree should be limited not
> > by the amount of memory actually used, but by the amount of memory
> > allocated? In other words, it checks by MomoryContextMemAllocated() in
> > the local cases and by dsa_get_total_size() in the shared case.
>
> I mean, if this patch set uses 10x less memory than v15 (not always, but easy 
> to find cases where it does), and if it's also expensive to track memory use 
> precisely, then we don't have an incentive to track memory precisely. Even if 
> we did, we don't want to assume that every future caller of radix tree is 
> willing to incur that cost.

Understood.

>
> > The idea of using up to half of maintenance_work_mem might be a good
> > idea compared to the current flat-array solution. But since it only
> > uses half, I'm concerned that there will be users who double their
> > maintenace_work_mem. When it is improved, the user needs to restore
> > maintenance_work_mem again.
>
> I find it useful to step back and look at the usage patterns:
>
> Autovacuum: Limiting the memory allocated by vacuum is important, since there 
> are multiple workers and they can run at any time (possibly most of the 
> time). This case will not use parallel index vacuum, so will use slab, where 
> the quick estimation of memory taken by the context is not terribly far off, 
> so we can afford to be more optimistic here.
>
> Manual vacuum: The default configuration assumes we want to finish as soon as 
> possible (vacuum_cost_delay is zero). Parallel index vacuum can be used. My 
> experience leads me to believe users are willing to use a lot of memory to 
> make manual vacuum finish as quickly as possible, and are disappointed to 
> learn that even if maintenance work mem is 10GB, vacuum can only use 1GB.

Agreed.

> So I don't believe anyone will have to double maintenance work mem after 
> upgrading (even with pessimistic accounting) because we'll be both
> - much more efficient with memory on average
> - free from the 1GB cap

Make sense.

>
> That said, it's possible 50% is too pessimistic -- a 75% threshold will bring 
> us very close to powers of two for example:
>
> 2*(1+2+4+8+16+32+64+128) + 256 = 766MB (74.8% of 1GB) -> keep going
> 766 + 256 = 1022MB -> stop
>
> I'm not sure if that calculation could cause going over the limit, or how 
> common that would be.
>

If the value is a power of 2, it seems to work perfectly fine. But for
example if it's 700MB, the total memory exceeds the limit:

2*(1+2+4+8+16+32+64+128) = 510MB (72.8% of 700MB) -> keep going
510 + 256 = 766MB -> stop but it exceeds the limit.

In a more bigger case, if it's 11000MB,

2*(1+2+...+2048) = 8190MB (74.4%)
8190 + 4096 = 12286MB

That being said, I don't think they are not common cases. So the 75%
threshold seems to work fine in most cases.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: daitch_mokotoff module

2022-12-22 Thread Dag Lem
Dag Lem  writes:

> Hi Ian,
>
> Ian Lawrence Barwick  writes:
>

[...]

>> I see you provided some feedback on
>> https://commitfest.postgresql.org/36/3468/,
>> though the patch seems to have not been accepted (but not
>> conclusively rejected
>> either). If you still have the chance to review another patch (or
>> more) it would
>> be much appreciated, as there's quite a few piling up. Things like
>> documentation
>> or small improvements to client applications are always a good place to 
>> start.
>> Reviews can be provided at any time, there's no need to wait for the next
>> CommitFest.
>>
>
> OK, I'll try to find another patch to review.
>

I have scanned through all the patches in Commitfest 2023-01 with status
"Needs review", and it is difficult to find something which I can
meaningfully review.

The only thing I felt qualified to comment (or nit-pick?) on was
https://commitfest.postgresql.org/41/4071/

If something else should turn up which could be reviewed by someone
without intimate knowledge of PostgreSQL internals, then don't hesitate
to ask.

As for the Daitch-Mokotoff patch, the review by Andres Freund was very
helpful in order to improve the extension and to make it more idiomatic
- hopefully it is now a bit closer to being included.


Best regards

Dag Lem




ARRNELEMS Out-of-bounds possible errors

2022-12-22 Thread Ranier Vilela
Hi.

Per Coverity.

The commit ccff2d2
,
changed the behavior function ArrayGetNItems,
with the introduction of the function ArrayGetNItemsSafe.

Now ArrayGetNItems may return -1, according to the comment.
" instead of throwing an exception. -1 is returned after an error."

So the macro ARRNELEMS can fail entirely with -1 return,
resulting in codes failing to use without checking the function return.

Like (contrib/intarray/_int_gist.c):
{
int nel;

nel = ARRNELEMS(ent);
memcpy(ptr, ARRPTR(ent), nel * sizeof(int32));
}

Sources possibly affecteds:
contrib\cube\cube.c
contrib\intarray\_intbig_gist.c
contrib\intarray\_int_bool.c
contrib\intarray\_int_gin.c
contrib\intarray\_int_gist.c
contrib\intarray\_int_op.c
contrib\intarray\_int_tool.c:

Thoughts?

regards,
Ranier Vilela


Re: Optimization issue of branching UNION ALL

2022-12-22 Thread Tom Lane
Andrey Lepikhov  writes:
> Thanks, I have written the letter because of some doubts too. But only 
> one weak point I could imagine - if someday sql standard will be changed.

Yeah, if they ever decide that LATERAL should be allowed to reference a
previous sub-query of UNION ALL, that'd probably break this.  But it'd
break a lot of other code too, so I'm not going to worry about it.

I pushed the main fix to HEAD only, and the recursion checks to
all branches.

regards, tom lane




Re: Schema variables - new implementation for Postgres 15 (typo)

2022-12-22 Thread Dmitry Dolgov
Hi,

I'm continuing review the patch slowly, and have one more issue plus one
philosophical question.

The issue have something to do with variables invalidation. Enabling
debug_discard_caches = 1 (about which I've learned from this thread) and
running this subset of the test suite:

CREATE SCHEMA svartest;
SET search_path = svartest;
CREATE VARIABLE var3 AS int;

CREATE OR REPLACE FUNCTION inc(int)
RETURNS int AS $$
BEGIN
  LET svartest.var3 = COALESCE(svartest.var3 + $1, $1);
  RETURN var3;
END;
$$ LANGUAGE plpgsql;

SELECT inc(1);
SELECT inc(1);
SELECT inc(1);

crashes in my setup like this:

#2  0x00b432d4 in ExceptionalCondition (conditionName=0xce9b99 
"n >= 0 && n < list->length", fileName=0xce9c18 "list.c", lineNumber=770) at 
assert.c:66
#3  0x007d3acd in list_delete_nth_cell (list=0x18ab248, 
n=-3388) at list.c:770
#4  0x007d3b88 in list_delete_cell (list=0x18ab248, 
cell=0x18dc028) at list.c:842
#5  0x006bcb52 in sync_sessionvars_all (filter_lxid=true) at 
session_variable.c:524
#6  0x006bd4cb in SetSessionVariable (varid=16386, value=2, 
isNull=false) at session_variable.c:844
#7  0x006bd617 in SetSessionVariableWithSecurityCheck 
(varid=16386, value=2, isNull=false) at session_variable.c:885
#8  0x7f763b890698 in exec_stmt_let (estate=0x7ffcc6fd5190, 
stmt=0x18aa920) at pl_exec.c:5030
#9  0x7f763b88a746 in exec_stmts (estate=0x7ffcc6fd5190, 
stmts=0x180) at pl_exec.c:2116
#10 0x7f763b88a247 in exec_stmt_block (estate=0x7ffcc6fd5190, 
block=0x18aabf8) at pl_exec.c:1935
#11 0x7f763b889a49 in exec_toplevel_block (estate=0x7ffcc6fd5190, 
block=0x18aabf8) at pl_exec.c:1626
#12 0x7f763b8879df in plpgsql_exec_function (func=0x18781b0, 
fcinfo=0x18be110, simple_eval_estate=0x0, simple_eval_resowner=0x0, 
procedure_resowner=0x0, atomic=true) at pl_exec.c:615
#13 0x7f763b8a2320 in plpgsql_call_handler (fcinfo=0x18be110) at 
pl_handler.c:277
#14 0x00721716 in ExecInterpExpr (state=0x18bde28, 
econtext=0x18bd3d0, isnull=0x7ffcc6fd56d7) at execExprInterp.c:730
#15 0x00723642 in ExecInterpExprStillValid (state=0x18bde28, 
econtext=0x18bd3d0, isNull=0x7ffcc6fd56d7) at execExprInterp.c:1855
#16 0x0077a78b in ExecEvalExprSwitchContext (state=0x18bde28, 
econtext=0x18bd3d0, isNull=0x7ffcc6fd56d7) at 
../../../src/include/executor/executor.h:344
#17 0x0077a7f4 in ExecProject (projInfo=0x18bde20) at 
../../../src/include/executor/executor.h:378
#18 0x0077a9dc in ExecResult (pstate=0x18bd2c0) at 
nodeResult.c:136
#19 0x00738ea0 in ExecProcNodeFirst (node=0x18bd2c0) at 
execProcnode.c:464
#20 0x0072c6e3 in ExecProcNode (node=0x18bd2c0) at 
../../../src/include/executor/executor.h:262
#21 0x0072f426 in ExecutePlan (estate=0x18bd098, 
planstate=0x18bd2c0, use_parallel_mode=false, operation=CMD_SELECT, 
sendTuples=true, numberTuples=0, direction=ForwardScanDirection, 
dest=0x18b3eb8, execute_once=true) at execMain.c:1691
#22 0x0072cf76 in standard_ExecutorRun (queryDesc=0x189c688, 
direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:423
#23 0x0072cdb3 in ExecutorRun (queryDesc=0x189c688, 
direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:367
#24 0x0099afdc in PortalRunSelect (portal=0x1866648, 
forward=true, count=0, dest=0x18b3eb8) at pquery.c:927
#25 0x0099ac99 in PortalRun (portal=0x1866648, 
count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x18b3eb8, 
altdest=0x18b3eb8, qc=0x7ffcc6fd5a70) at pquery.c:771
#26 0x0099487d in exec_simple_query (query_string=0x17edcc8 
"SELECT inc(1);") at postgres.c:1238

It seems that sync_sessionvars_all tries to remove a cell that either doesn't
belong to the xact_recheck_varids or weird in some other way:

+>>> p l - xact_recheck_varids->elements
$81 = -3388

The second thing I wanted to ask about is a more strategical question. Does
anyone have clear understanding where this patch is going? The thread is quite
large, and it's hard to catch up with all the details -- it would be great if
someone could summarize the current state of things, are there any outstanding
design issues or not addressed concerns?

>From the first look it seems some major topics the discussion is evolving are 
>about:

* Validity of the use case. Seems to be quite convincingly addressed in [1] and
[2].

* Complicated logic around invalidation, concurrent create/drop etc. (I guess
the issue above is falling into the same category).

* Concerns that session variables could repeat some problems of temporary 
tables.

Is there anything else?

[1]: 
https://www

Re: Error-safe user functions

2022-12-22 Thread Tom Lane
Andrew Dunstan  writes:
> Yeah, I started there, but it's substantially more complex - unlike cube
> the jsonpath scanner calls the error routines as well as the parser.
> Anyway, here's a patch.

I looked through this and it seems generally OK.  A minor nitpick is
that we usually write "(Datum) 0" not "(Datum) NULL" for dont-care Datum
values.  A slightly bigger issue is that makeItemLikeRegex still allows
an error to be thrown from RE_compile_and_cache if a bogus regex is
presented.  But that could be dealt with later.

(I wonder why this is using RE_compile_and_cache at all, really,
rather than some other API.  There doesn't seem to be value in
forcing the regex into the cache at this point.)

regards, tom lane




Re: Add LZ4 compression in pg_dump

2022-12-22 Thread Justin Pryzby
There's a couple of lz4 bits which shouldn't be present in 002: file
extension and comments.




Re: Small miscellaneus fixes (Part II)

2022-12-22 Thread Ranier Vilela
Em ter., 20 de dez. de 2022 às 21:51, Justin Pryzby 
escreveu:

> On Fri, Nov 25, 2022 at 06:27:04PM -0300, Ranier Vilela wrote:
> > 5. Use boolean operator with boolean operands
> > (b/src/backend/commands/tablecmds.c)
>
> tablecmds.c: right.  Since 074c5cfbf
>
> pg_dump.c: right.  Since b08dee24a
>
> > 4. Fix dead code (src/backend/utils/adt/formatting.c)
> > Np->sign == '+', is different than "!= '-'" and is different than "!=
> '+'"
> > So the else is never hit.
>
> formatting.c: I don't see the problem.
>
> if (Np->sign != '-')
> ...
> else if (Np->sign != '+' && IS_PLUS(Np->Num))
> ...
>
> You said that the "else" is never hit, but why ?
>
This is a Coverity report.

dead_error_condition: The condition Np->Num->flag & 0x200 cannot be true.
5671else if (Np->sign != '+' && IS_PLUS(Np->Num))

CID 1501076 (#1 of 1): Logically dead code (DEADCODE)dead_error_line: Execution
cannot reach this statement: Np->Num->flag &= 0x

So, the dead code is because IS_PUS(Np->Num) is already tested and cannot
be true on else.

v1 patch attached.

regards,
Ranier Vilela


v1-fix_dead_code_formatting.patch
Description: Binary data


Re: [PATCH] Add function to_oct

2022-12-22 Thread Eric Radman
On Thu, Dec 22, 2022 at 10:08:17AM +, Dag Lem wrote:
> 
> The calculation of quotient and remainder can be replaced by less costly 
> masking and shifting.
> 
> Defining
> 
> #define OCT_DIGIT_BITS 3
> #define OCT_DIGIT_BITMASK 0x7
> 
> the content of the loop can be replaced by
> 
>   *--ptr = digits[value & OCT_DIGIT_BITMASK];
>   value >>= OCT_DIGIT_BITS;
> 
> Also, the check for ptr > buf in the while loop can be removed. The
> check is superfluous, since buf cannot possibly be exhausted by a 32
> bit integer (the maximum octal number being 377).

I integrated these suggestions in the attached -v2 patch and tested
range of values manually.

Also picked an OID > 8000 as suggested by unused_oids.

..Eric
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 1f63dc6dba..b539e0dc0b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3699,6 +3699,23 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ to_oct
+
+to_oct ( integer )
+text
+   
+   
+Converts the number to its equivalent octal representation.
+   
+   
+to_oct(32)
+40
+   
+  
+
   

 
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 1c52deec55..586693ad8f 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -5229,6 +5229,32 @@ to_hex64(PG_FUNCTION_ARGS)
PG_RETURN_TEXT_P(cstring_to_text(ptr));
 }
 
+#define OCT_DIGIT_BITS 3
+#define OCT_DIGIT_BITMASK 0x7
+/*
+ * Convert an int32 to a string containing a base 8 (oct) representation of
+ * the number.
+ */
+Datum
+to_oct32(PG_FUNCTION_ARGS)
+{
+   uint32  value = (uint32) PG_GETARG_INT32(0);
+   char*ptr;
+   const char *digits = "01234567";
+   charbuf[32];/* bigger than needed, but 
reasonable */
+
+   ptr = buf + sizeof(buf) - 1;
+   *ptr = '\0';
+
+   do
+   {
+   *--ptr = digits[value & OCT_DIGIT_BITMASK];
+   value >>= OCT_DIGIT_BITS;
+   } while (value);
+
+   PG_RETURN_TEXT_P(cstring_to_text(ptr));
+}
+
 /*
  * Return the size of a datum, possibly compressed
  *
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 98d90d9338..fde0b24563 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -3687,6 +3687,9 @@
 { oid => '2090', descr => 'convert int8 number to hex',
   proname => 'to_hex', prorettype => 'text', proargtypes => 'int8',
   prosrc => 'to_hex64' },
+{ oid => '8335', descr => 'convert int4 number to oct',
+  proname => 'to_oct', prorettype => 'text', proargtypes => 'int4',
+  prosrc => 'to_oct32' },
 
 # for character set encoding support
 
diff --git a/src/test/regress/expected/strings.out 
b/src/test/regress/expected/strings.out
index f028c1f10f..5ebaeb4a80 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2143,6 +2143,15 @@ select 
to_hex(256::bigint*256::bigint*256::bigint*256::bigint - 1) AS ""
  
 (1 row)
 
+--
+-- test to_oct
+--
+select to_oct(256*256*256 - 1) AS "";
+  
+--
+ 
+(1 row)
+
 --
 -- SHA-2
 --
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index 932f71cbca..c4c40949e7 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -691,6 +691,11 @@ select to_hex(256*256*256 - 1) AS "ff";
 
 select to_hex(256::bigint*256::bigint*256::bigint*256::bigint - 1) AS 
"";
 
+--
+-- test to_oct
+--
+select to_oct(256*256*256 - 1) AS "";
+
 --
 -- SHA-2
 --


Re: [PATCH] Infinite loop while acquiring new TOAST Oid

2022-12-22 Thread Nikita Malakhov
Hi hackers!

Any suggestions on the previous message (64-bit toast value ID with
individual counters)?

On Tue, Dec 13, 2022 at 1:41 PM Nikita Malakhov  wrote:

> Hi hackers!
>
> I've prepared a patch with a 64-bit TOAST Value ID. It is a kind of
> prototype
> and needs some further work, but it is already working and ready to play
> with.
>
> I've introduced 64-bit value ID field in varatt_external, but to keep it
> compatible
> with older bases 64-bit value is stored as a structure of two 32-bit
> fields and value
> ID moved to be the last in the varatt_external structure. Also, I've
> introduced
> different ID assignment function - instead of using global OID assignment
> function
> I use individual counters for each TOAST table, automatically cached and
> after
> server restart when new value is inserted into TOAST table maximum used
> value
> is searched and used to assign the next one.
>
> >Andres Freund:
> >I think we'll need to do something about the width of varatt_external to
> make
> >the conversion to 64bit toast oids viable - and if we do, I don't think
> >there's a decent argument for staying with 4 byte toast OIDs. I think the
> >varatt_external equivalent would end up being smaller in just about all
> cases.
> >And as you said earlier, the increased overhead inside the toast table /
> index
> >is not relevant compared to the size of toasted datums.
>
> There is some small overhead due to indexing 64-bit values. Also, for
> smaller
> tables we can use 32-bit ID instead of 64-bit, but then we would have a
> problem
> when we reach the limit of 2^32.
>
> Pg_upgrade seems to be not a very difficult case if we go keeping old TOAST
> tables using 32-bit values,
>
> Please have a look. I'd be grateful for some further directions.
>
> GIT branch with this feature, rebased onto current master:
> https://github.com/postgrespro/postgres/tree/toast_64bit
>
>
> --
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/


Re: Call lazy_check_wraparound_failsafe earlier for parallel vacuum

2022-12-22 Thread Peter Geoghegan
On Tue, Dec 20, 2022 at 9:44 AM Imseih (AWS), Sami  wrote:
> Attached is a patch to check scanned pages rather
> than blockno.

Pushed, thanks!

I adjusted the FAILSAFE_EVERY_PAGES comments, which now point out that
FAILSAFE_EVERY_PAGES is a power-of-two. The implication is that the
compiler is all but guaranteed to be able to reduce the modulo
division into a shift in the lazy_scan_heap loop, at the point of the
failsafe check. I doubt that it would really matter if the compiler
had to generate a DIV instruction, but it seems like a good idea to
avoid it on general principle, at least in performance sensitive code.

--
Peter Geoghegan




Re: ARRNELEMS Out-of-bounds possible errors

2022-12-22 Thread Nikita Malakhov
Hi,

Actually, there would be much more sources affected, like
 nbytes += subbytes[outer_nelems];
 subnitems[outer_nelems] = ArrayGetNItems(this_ndims,
ARR_DIMS(array));
 nitems += subnitems[outer_nelems];
 havenulls |= ARR_HASNULL(array);
 outer_nelems++;
  }

Maybe it is better for most calls like this to keep old behavior, by
passing a flag
that says which behavior is expected by caller?

On Thu, Dec 22, 2022 at 6:36 PM Ranier Vilela  wrote:

> Hi.
>
> Per Coverity.
>
> The commit ccff2d2
> ,
> changed the behavior function ArrayGetNItems,
> with the introduction of the function ArrayGetNItemsSafe.
>
> Now ArrayGetNItems may return -1, according to the comment.
> " instead of throwing an exception. -1 is returned after an error."
>
> So the macro ARRNELEMS can fail entirely with -1 return,
> resulting in codes failing to use without checking the function return.
>
> Like (contrib/intarray/_int_gist.c):
> {
> int nel;
>
> nel = ARRNELEMS(ent);
> memcpy(ptr, ARRPTR(ent), nel * sizeof(int32));
> }
>
> Sources possibly affecteds:
> contrib\cube\cube.c
> contrib\intarray\_intbig_gist.c
> contrib\intarray\_int_bool.c
> contrib\intarray\_int_gin.c
> contrib\intarray\_int_gist.c
> contrib\intarray\_int_op.c
> contrib\intarray\_int_tool.c:
>
> Thoughts?
>
> regards,
> Ranier Vilela
>


-- 
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/


Re: dynamic result sets support in extended query protocol

2022-12-22 Thread Alvaro Herrera
On 2022-Dec-21, Alvaro Herrera wrote:

> I suppose that in order for this to work, we would have to find another
> way to "advance" the queue that doesn't rely on the status being
> PGASYNC_READY.

I think the way to make this work is to increase the coupling between
fe-exec.c and fe-protocol.c by making the queue advance occur when
CommandComplete is received.  This is likely more correct protocol-wise
than what we're doing now: we would consider the command as done when
the server tells us it is done, rather than relying on internal libpq
state.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/




Re: Schema variables - new implementation for Postgres 15 (typo)

2022-12-22 Thread Pavel Stehule
čt 22. 12. 2022 v 17:15 odesílatel Dmitry Dolgov <9erthali...@gmail.com>
napsal:

> Hi,
>
> I'm continuing review the patch slowly, and have one more issue plus one
> philosophical question.
>
> The issue have something to do with variables invalidation. Enabling
> debug_discard_caches = 1 (about which I've learned from this thread) and
> running this subset of the test suite:
>
> CREATE SCHEMA svartest;
> SET search_path = svartest;
> CREATE VARIABLE var3 AS int;
>
> CREATE OR REPLACE FUNCTION inc(int)
> RETURNS int AS $$
> BEGIN
>   LET svartest.var3 = COALESCE(svartest.var3 + $1, $1);
>   RETURN var3;
> END;
> $$ LANGUAGE plpgsql;
>
> SELECT inc(1);
> SELECT inc(1);
> SELECT inc(1);
>
> crashes in my setup like this:
>
> #2  0x00b432d4 in ExceptionalCondition
> (conditionName=0xce9b99 "n >= 0 && n < list->length", fileName=0xce9c18
> "list.c", lineNumber=770) at assert.c:66
> #3  0x007d3acd in list_delete_nth_cell (list=0x18ab248,
> n=-3388) at list.c:770
> #4  0x007d3b88 in list_delete_cell (list=0x18ab248,
> cell=0x18dc028) at list.c:842
> #5  0x006bcb52 in sync_sessionvars_all (filter_lxid=true)
> at session_variable.c:524
> #6  0x006bd4cb in SetSessionVariable (varid=16386,
> value=2, isNull=false) at session_variable.c:844
> #7  0x006bd617 in SetSessionVariableWithSecurityCheck
> (varid=16386, value=2, isNull=false) at session_variable.c:885
> #8  0x7f763b890698 in exec_stmt_let (estate=0x7ffcc6fd5190,
> stmt=0x18aa920) at pl_exec.c:5030
> #9  0x7f763b88a746 in exec_stmts (estate=0x7ffcc6fd5190,
> stmts=0x180) at pl_exec.c:2116
> #10 0x7f763b88a247 in exec_stmt_block (estate=0x7ffcc6fd5190,
> block=0x18aabf8) at pl_exec.c:1935
> #11 0x7f763b889a49 in exec_toplevel_block
> (estate=0x7ffcc6fd5190, block=0x18aabf8) at pl_exec.c:1626
> #12 0x7f763b8879df in plpgsql_exec_function (func=0x18781b0,
> fcinfo=0x18be110, simple_eval_estate=0x0, simple_eval_resowner=0x0,
> procedure_resowner=0x0, atomic=true) at pl_exec.c:615
> #13 0x7f763b8a2320 in plpgsql_call_handler (fcinfo=0x18be110)
> at pl_handler.c:277
> #14 0x00721716 in ExecInterpExpr (state=0x18bde28,
> econtext=0x18bd3d0, isnull=0x7ffcc6fd56d7) at execExprInterp.c:730
> #15 0x00723642 in ExecInterpExprStillValid
> (state=0x18bde28, econtext=0x18bd3d0, isNull=0x7ffcc6fd56d7) at
> execExprInterp.c:1855
> #16 0x0077a78b in ExecEvalExprSwitchContext
> (state=0x18bde28, econtext=0x18bd3d0, isNull=0x7ffcc6fd56d7) at
> ../../../src/include/executor/executor.h:344
> #17 0x0077a7f4 in ExecProject (projInfo=0x18bde20) at
> ../../../src/include/executor/executor.h:378
> #18 0x0077a9dc in ExecResult (pstate=0x18bd2c0) at
> nodeResult.c:136
> #19 0x00738ea0 in ExecProcNodeFirst (node=0x18bd2c0) at
> execProcnode.c:464
> #20 0x0072c6e3 in ExecProcNode (node=0x18bd2c0) at
> ../../../src/include/executor/executor.h:262
> #21 0x0072f426 in ExecutePlan (estate=0x18bd098,
> planstate=0x18bd2c0, use_parallel_mode=false, operation=CMD_SELECT,
> sendTuples=true, numberTuples=0, direction=ForwardScanDirection,
> dest=0x18b3eb8, execute_once=true) at execMain.c:1691
> #22 0x0072cf76 in standard_ExecutorRun
> (queryDesc=0x189c688, direction=ForwardScanDirection, count=0,
> execute_once=true) at execMain.c:423
> #23 0x0072cdb3 in ExecutorRun (queryDesc=0x189c688,
> direction=ForwardScanDirection, count=0, execute_once=true) at
> execMain.c:367
> #24 0x0099afdc in PortalRunSelect (portal=0x1866648,
> forward=true, count=0, dest=0x18b3eb8) at pquery.c:927
> #25 0x0099ac99 in PortalRun (portal=0x1866648,
> count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x18b3eb8,
> altdest=0x18b3eb8, qc=0x7ffcc6fd5a70) at pquery.c:771
> #26 0x0099487d in exec_simple_query
> (query_string=0x17edcc8 "SELECT inc(1);") at postgres.c:1238
>
> It seems that sync_sessionvars_all tries to remove a cell that either
> doesn't
> belong to the xact_recheck_varids or weird in some other way:
>
> +>>> p l - xact_recheck_varids->elements
> $81 = -3388
>

I am able to repeat this issue. I'll look at it.

>
> The second thing I wanted to ask about is a more strategical question. Does
> anyone have clear understanding where this patch is going? The thread is
> quite
> large, and it's hard to catch up with all the details -- it would be great
> if
> someone could summarize the current state of things, are there any
> outstanding
> design issues or not addressed concerns?
>

I hope I fixed the issues founded by Julian and Tomas. Now there is not
implemented transaction support related to values, and I pla

Re: checking snapshot argument for index_beginscan

2022-12-22 Thread Ted Yu
>
> Hi,
> I was looking at commit 941aa6a6268a6a66f6895401aad6b5329111d412 .
>
> I think it would be better to move the assertion into
> `index_beginscan_internal` because it is called by index_beginscan
> and index_beginscan_bitmap
>
> In the future, when a new variant of `index_beginscan_XX` is added, the
> assertion would be effective for the new variant.
>
> Please let me know what you think.
>
> Cheers
>


index-begin-scan-check-snapshot.patch
Description: Binary data


Re: checking snapshot argument for index_beginscan

2022-12-22 Thread Pavel Borisov
On Fri, 23 Dec 2022 at 00:36, Ted Yu  wrote:
>>
>> Hi,
>> I was looking at commit 941aa6a6268a6a66f6895401aad6b5329111d412 .
>>
>> I think it would be better to move the assertion into 
>> `index_beginscan_internal` because it is called by index_beginscan and 
>> index_beginscan_bitmap
>>
>> In the future, when a new variant of `index_beginscan_XX` is added, the 
>> assertion would be effective for the new variant.
>>
>> Please let me know what you think.

Hi, Ted!

The two asserts out of four could be combined as you proposed in the patch.
Assert added by 941aa6a6268a6a66f6 to index_parallelscan_initialize
should remain anyway as otherwise, we can have Segfault in
SerializeSnapshot called from this function.
Assert in index_parallelscan_estimate can be omitted as there is the
same assert inside EstimateSnapshotSpace called from this function.
I've included it into version 2 of a patch.

Not sure it's worth the effort but IMO the patch is right and can be committed.

Kind regards,
Pavel Borisov,
Supabase.
From ea3f6ca585156d0b90eb56516d3f62102b939932 Mon Sep 17 00:00:00 2001
From: Pavel Borisov 
Date: Fri, 23 Dec 2022 01:04:40 +0400
Subject: [PATCH v2] Optimize asserts introduced in 941aa6a6268a6a66f68

Two asserts combined, one unnecessary removed.

Authors: Ted Yu , Pavel Borsiov

Reviewed-by: Pavel Borisov 
---
 src/backend/access/index/indexam.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index dc303995e5b..673395893ba 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -209,8 +209,6 @@ index_beginscan(Relation heapRelation,
 {
 	IndexScanDesc scan;
 
-	Assert(snapshot != InvalidSnapshot);
-
 	scan = index_beginscan_internal(indexRelation, nkeys, norderbys, snapshot, NULL, false);
 
 	/*
@@ -239,8 +237,6 @@ index_beginscan_bitmap(Relation indexRelation,
 {
 	IndexScanDesc scan;
 
-	Assert(snapshot != InvalidSnapshot);
-
 	scan = index_beginscan_internal(indexRelation, nkeys, 0, snapshot, NULL, false);
 
 	/*
@@ -265,6 +261,8 @@ index_beginscan_internal(Relation indexRelation,
 	RELATION_CHECKS;
 	CHECK_REL_PROCEDURE(ambeginscan);
 
+	Assert(snapshot != InvalidSnapshot);
+
 	if (!(indexRelation->rd_indam->ampredlocks))
 		PredicateLockRelation(indexRelation, snapshot);
 
@@ -407,8 +405,6 @@ index_parallelscan_estimate(Relation indexRelation, Snapshot snapshot)
 {
 	Size		nbytes;
 
-	Assert(snapshot != InvalidSnapshot);
-
 	RELATION_CHECKS;
 
 	nbytes = offsetof(ParallelIndexScanDescData, ps_snapshot_data);
-- 
2.24.3 (Apple Git-128)



Re: Schema variables - new implementation for Postgres 15 (typo)

2022-12-22 Thread Dmitry Dolgov
> On Thu, Dec 22, 2022 at 08:45:57PM +0100, Pavel Stehule wrote:
> > From the first look it seems some major topics the discussion is evolving
> > are about:
> >
> > * Validity of the use case. Seems to be quite convincingly addressed in
> > [1] and
> > [2].
> >
> > * Complicated logic around invalidation, concurrent create/drop etc. (I
> > guess
> > the issue above is falling into the same category).
> >
> > * Concerns that session variables could repeat some problems of temporary
> > tables.
> >
>
> Why do you think so? The variable has no mvcc support - it is just stored
> value with local visibility without mvcc support. There can be little bit
> similar issues like with global temporary tables.

Yeah, sorry for not being precise, I mean global temporary tables. This
is not my analysis, I've simply picked up it was mentioned a couple of
times here. The points above are not meant to serve as an objection
against the patch, but rather to figure out if there are any gaps left
to address and come up with some sort of plan with "committed" as a
final destination.




Re: Array initialisation notation in syscache.c

2022-12-22 Thread Thomas Munro
On Thu, Dec 22, 2022 at 5:36 AM Peter Eisentraut
 wrote:
> This looks like a good improvement to me.

Thanks both.  Pushed.

> (I have also thought about having this generated from the catalog
> definition files somehow, but one step at a time ...)

Good plan.




Re: ARRNELEMS Out-of-bounds possible errors

2022-12-22 Thread Ranier Vilela
Em qui., 22 de dez. de 2022 às 15:45, Nikita Malakhov 
escreveu:

> Hi,
>
> Actually, there would be much more sources affected, like
>  nbytes += subbytes[outer_nelems];
>  subnitems[outer_nelems] = ArrayGetNItems(this_ndims,
> ARR_DIMS(array));
>  nitems += subnitems[outer_nelems];
>  havenulls |= ARR_HASNULL(array);
>  outer_nelems++;
>   }
>
> Maybe it is better for most calls like this to keep old behavior, by
> passing a flag
> that says which behavior is expected by caller?
>
I agreed that it is better to keep old behavior.
Even the value 0 is problematic, with calls like this:

nel = ARRNELEMS(ent);
memcpy(ptr, ARRPTR(ent), nel * sizeof(int32));

regards,
Ranier Vilela


Re: Allow WindowFuncs prosupport function to use more optimal WindowClause options

2022-12-22 Thread David Rowley
On Wed, 26 Oct 2022 at 14:38, David Rowley  wrote:
>
> On Sun, 23 Oct 2022 at 03:03, Vik Fearing  wrote:
> > Shouldn't it be able to detect that these two windows are the same and
> > only do one WindowAgg pass?
> >
> >
> > explain (verbose, costs off)
> > select row_number() over w1,
> > lag(amname) over w2
> > from pg_am
> > window w1 as (order by amname),
> > w2 as (w1 rows unbounded preceding)
> > ;
>
> Good thinking. I think the patch should also optimise that case. It
> requires re-doing a similar de-duplication phase the same as what's
> done in transformWindowFuncCall().  I've added code to do that in the
> attached version.

I've spent a bit more time on this now and added a few extra
regression tests.  The previous version had nothing to test to ensure
that an aggregate function being used as a window function does not
have its frame options changed when it's sharing the same WindowClause
as a WindowFunc which can have the frame options changed.

I've now pushed the final result.  Thank you to everyone who provided
input on this.

David




Re: Call lazy_check_wraparound_failsafe earlier for parallel vacuum

2022-12-22 Thread Imseih (AWS), Sami
>I adjusted the FAILSAFE_EVERY_PAGES comments, which now point out that
>FAILSAFE_EVERY_PAGES is a power-of-two. The implication is that the
>compiler is all but guaranteed to be able to reduce the modulo
>division into a shift in the lazy_scan_heap loop, at the point of the
>failsafe check. I doubt that it would really matter if the compiler
>had to generate a DIV instruction, but it seems like a good idea to
>avoid it on general principle, at least in performance sensitive code.

Thank you! 

Regards,

Sami Imseih
Amazon Web Services (AWS)



Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

2022-12-22 Thread Michael Paquier
On Thu, Dec 22, 2022 at 05:03:35PM +0530, Bharath Rupireddy wrote:
> On Thu, Dec 22, 2022 at 4:57 PM Michael Paquier  wrote:
>> As in using "sequence number" removing "file" from the docs and
>> changing the OUT parameter name to segment_number rather than segno?
>> Fine by me.
> 
> +1.

Okay, done this way.
--
Michael


signature.asc
Description: PGP signature


Re: Small miscellaneus fixes (Part II)

2022-12-22 Thread Justin Pryzby
On Thu, Dec 22, 2022 at 02:29:11PM -0300, Ranier Vilela wrote:
> Em ter., 20 de dez. de 2022 às 21:51, Justin Pryzby  
> escreveu:
> 
> > On Fri, Nov 25, 2022 at 06:27:04PM -0300, Ranier Vilela wrote:
> > > 5. Use boolean operator with boolean operands
> > > (b/src/backend/commands/tablecmds.c)
> >
> > tablecmds.c: right.  Since 074c5cfbf
> >
> > pg_dump.c: right.  Since b08dee24a
> >
> > > 4. Fix dead code (src/backend/utils/adt/formatting.c)
> > > Np->sign == '+', is different than "!= '-'" and is different than "!=
> > '+'"
> > > So the else is never hit.
> >
> > formatting.c: I don't see the problem.
> >
> > if (Np->sign != '-')
> > ...
> > else if (Np->sign != '+' && IS_PLUS(Np->Num))
> > ...
> >
> > You said that the "else" is never hit, but why ?
>
> This is a Coverity report.
> 
> dead_error_condition: The condition Np->Num->flag & 0x200 cannot be true.
> 5671else if (Np->sign != '+' && IS_PLUS(Np->Num))
> 
> CID 1501076 (#1 of 1): Logically dead code (DEADCODE)dead_error_line: 
> Execution
> cannot reach this statement: Np->Num->flag &= 0x
> 
> So, the dead code is because IS_PUS(Np->Num) is already tested and cannot
> be true on else.

Makes sense now (in your first message, you said that the problem was
with "sign", and the patch didn't address the actual problem in
IS_PLUS()).

One can look and find that the unreachable code was introduced at
7a3e7b64a.

With your proposed change, the unreachable line is hit by regression
tests, which is an improvment.  As is the change to pg_dump.c.

-- 
Justin




Re: Avoid lost result of recursion (src/backend/optimizer/util/inherit.c)

2022-12-22 Thread Richard Guo
On Thu, Dec 22, 2022 at 5:22 PM David Rowley  wrote:

> On Thu, 22 Dec 2022 at 21:18, Richard Guo  wrote:
> > My best guess is that this function is intended to share the same code
> > pattern as in adjust_appendrel_attrs_multilevel.  The recursion is
> > needed as 'rel' can be more than one inheritance level below the top
> > parent.  I think we can keep the recursion, as in other similar
> > functions, as long as we make it right, as in attached patch.
>
> I still think we should have a test to ensure this is actually
> working. Do you want to write one?


I agree that we should have a test.  According to the code coverage
report, the recursion part of this function is never tested.  I will
have a try to see if I can come up with a proper test case.

Thanks
Richard


Re: Exit walsender before confirming remote flush in logical replication

2022-12-22 Thread Kyotaro Horiguchi
At Thu, 22 Dec 2022 17:29:34 +0530, Ashutosh Bapat 
 wrote in 
> On Thu, Dec 22, 2022 at 11:16 AM Hayato Kuroda (Fujitsu)
>  wrote:
> > In case of logical replication, however, we cannot support the use-case that
> > switches the role publisher <-> subscriber. Suppose same case as above, 
> > additional
..
> > Therefore, I think that we can ignore the condition for shutting down the
> > walsender in logical replication.
...
> > This change may be useful for time-delayed logical replication. The 
> > walsender
> > waits the shutdown until all changes are applied on subscriber, even if it 
> > is
> > delayed. This causes that publisher cannot be stopped if large delay-time is
> > specified.
> 
> I think the current behaviour is an artifact of using the same WAL
> sender code for both logical and physical replication.

Yeah, I don't think we do that for the reason of switchover. On the
other hand I think the behavior was intentionally taken over since it
is thought as sensible alone. And I'm afraind that many people already
relies on that behavior.

> I agree with you that the logical WAL sender need not wait for all the
> WAL to be replayed downstream.

Thus I feel that it might be a bit outrageous to get rid of that
bahavior altogether because of a new feature stumbling on it.  I'm
fine doing that only in the apply_delay case, though.  However, I have
another concern that we are introducing the second exception for
XLogSendLogical in the common path.

I doubt that anyone wants to use synchronous logical replication with
apply_delay since the sender transaction is inevitablly affected back
by that delay.

Thus how about before entering an apply_delay, logrep worker sending a
kind of crafted feedback, which reports commit_data.end_lsn as
flushpos?  A little tweak is needed in send_feedback() but seems to
work..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Force streaming every change in logical decoding

2022-12-22 Thread Kyotaro Horiguchi
At Thu, 22 Dec 2022 14:53:34 +0530, Amit Kapila  wrote 
in 
> For this, I like your proposal for "buffered" as an explicit default value.

Good to hear.

> Okay, how about modifying it to: "When set to
> immediate, stream each change if
> streaming option (see optional parameters set by
> CREATE SUBSCRIPTION) is enabled, otherwise, serialize each change.

Looks fine. Thanks!

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Avoid lost result of recursion (src/backend/optimizer/util/inherit.c)

2022-12-22 Thread David Rowley
On Fri, 23 Dec 2022 at 15:21, Richard Guo  wrote:
>
> On Thu, Dec 22, 2022 at 5:22 PM David Rowley  wrote:
>> I still think we should have a test to ensure this is actually
>> working. Do you want to write one?
>
>
> I agree that we should have a test.  According to the code coverage
> report, the recursion part of this function is never tested.  I will
> have a try to see if I can come up with a proper test case.

I started having a go at writing one yesterday. I only got as far as
the following.
It looks like it'll need a trigger or something added to the foreign
table to hit the code path that would be needed to trigger the issue,
so it'll need more work. It might be a worthy starting point.

CREATE EXTENSION postgres_fdw;

-- this will need to work the same way as it does in postgres_fdw.sql
by using current_database()
CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw OPTIONS
(dbname 'postgres', port '5432');

create table t_gc (a int, b int, c int);
create table t_c (b int, c int, a int) partition by list(a);
create table t_tlp (c int, a int, b int) partition by list (a);

CREATE FOREIGN TABLE ft_tlp (
c int,
a int,
b int
) SERVER loopback OPTIONS (schema_name 'public', table_name 't_tlp');


alter table t_c attach partition t_gc for values in (1);
alter table t_tlp attach partition t_c for values in (1);

create role perm_check login;

CREATE USER MAPPING FOR perm_check SERVER loopback OPTIONS (user
'perm_check', password_required 'false');

grant update (b) on t_tlp to perm_check;
grant update (b) on ft_tlp to perm_check;

set session authorization perm_check;

-- should pass
update ft_tlp set b = 1;

-- should fail
update ft_tlp set a = 1;
update ft_tlp set c = 1;

-- cleanup

drop foreign table ft_tlp cascade;
drop table t_tlp cascade;
drop role perm_check;
drop server loopback cascade;
drop extension postgres_fdw;

David




Re: [BUG] pg_upgrade test fails from older versions.

2022-12-22 Thread Michael Paquier
On Thu, Dec 22, 2022 at 09:59:18AM +0300, Anton A. Melnikov wrote:
> 2) v2-0002-Additional-dumps-filtering.patch

+   # Replace specific privilegies with ALL
+   $dump_contents =~ s/^(GRANT\s|REVOKE\s)(\S*)\s/$1ALL /mgx;
This should not be in 0002, I guess..

> Yes. Made a hook that allows to proceed an external text file with additional
> filtering rules and example of such file. Please take a look on it.
> 
> With the best wishes,

Hmm.  0001 does a direct check on aclitem as data type used in an
attribute, but misses anything related to arrays, domains or even
composite types, not to mention that we'd miss uses of aclitems in
index expressions.

That's basically the kind of thing check_for_data_types_usage() does.
I am not sure that it is a good idea to provide a limited coverage if
we do that for matviews and indexes, and the complexity induced in
upgrade_adapt.sql is not really appealing either.  For now, I have
fixed the most pressing part for tables to match with the buildfarm
code that just drops the aclitem column rather than doing that for all
the relations that could have one.

The part on WITH OIDS has been addressed in its own commit down to
v12, removing the handling for matviews but adding one for foreign
tables where the operation is supported.
--
Michael


signature.asc
Description: PGP signature


Re: Allow WindowFuncs prosupport function to use more optimal WindowClause options

2022-12-22 Thread Vik Fearing

On 12/23/22 00:47, David Rowley wrote:

On Wed, 26 Oct 2022 at 14:38, David Rowley  wrote:

I've now pushed the final result.  Thank you to everyone who provided
input on this.


This is a very good improvement.  Thank you for working on it.
--
Vik Fearing





Re: Perform streaming logical transactions by background workers and parallel apply

2022-12-22 Thread Amit Kapila
On Thu, Dec 22, 2022 at 6:18 PM Masahiko Sawada  wrote:
>
> On Thu, Dec 22, 2022 at 7:04 PM Amit Kapila  wrote:
> >
> > On Thu, Dec 22, 2022 at 11:39 AM Masahiko Sawada  
> > wrote:
> > >
> > > Thank you for updating the patch. Here are some comments on v64 patches:
> > >
> > > While testing the patch, I realized that if all streamed transactions
> > > are handled by parallel workers, there is no chance for the leader to
> > > call maybe_reread_subscription() except for when waiting for the next
> > > message. Due to this, the leader didn't stop for a while even if the
> > > subscription gets disabled. It's an extreme case since my test was
> > > that pgbench runs 30 concurrent transactions and logical_decoding_mode
> > > = 'immediate', but we might want to make sure to call
> > > maybe_reread_subscription() at least after committing/preparing a
> > > transaction.
> > >
> >
> > Won't it be better to call it only if we handle the transaction by the
> > parallel worker?
>
> Agreed. And we won't need to do that after handling stream_prepare as
> we don't do that now.
>

I think we do this for both prepare and non-prepare cases via
begin_replication_step(). Here, in both cases, as the changes are sent
to the parallel apply worker, we missed in both cases. So, I think it
is better to do in both cases.

> >
> > It seems currently we give a similar message when the logical
> > replication worker slots are finished "out of logical replication
> > worker slots" or when we are not able to register background workers
> > "out of background worker slots". Now, OTOH, when we exceed the limit
> > of sync workers "max_sync_workers_per_subscription", we don't display
> > any message. Personally, I think if any user has used the streaming
> > option as "parallel" she wants all large transactions to be performed
> > in parallel and if the system is not able to deal with it, displaying
> > a LOG message will be useful for users. This is because the
> > performance difference for large transactions between parallel and
> > non-parallel is big (30-40%) and it is better for users to know as
> > soon as possible instead of expecting them to run some monitoring
> > query to notice the same.
>
> I see your point. But looking at other parallel features such as
> parallel queries, parallel vacuum and parallel index creation, we
> don't give such messages even if the number of parallel workers
> actually launched is lower than the ideal. They also bring a big
> performance benefit.
>

Fair enough. Let's remove this LOG message.

-- 
With Regards,
Amit Kapila.




Re: Avoid lost result of recursion (src/backend/optimizer/util/inherit.c)

2022-12-22 Thread Amit Langote
Hi,

Thanks everyone for noticing this.  It is indeed very obviously broken. :(

On Fri, Dec 23, 2022 at 11:26 AM David Rowley  wrote:
> On Fri, 23 Dec 2022 at 15:21, Richard Guo  wrote:
> >
> > On Thu, Dec 22, 2022 at 5:22 PM David Rowley  wrote:
> >> I still think we should have a test to ensure this is actually
> >> working. Do you want to write one?
> >
> >
> > I agree that we should have a test.  According to the code coverage
> > report, the recursion part of this function is never tested.  I will
> > have a try to see if I can come up with a proper test case.
>
> I started having a go at writing one yesterday. I only got as far as
> the following.
> It looks like it'll need a trigger or something added to the foreign
> table to hit the code path that would be needed to trigger the issue,
> so it'll need more work. It might be a worthy starting point.

I was looking at this last night and found that having a generated
column in the table, but not a trigger, helps hit the buggy code.
Having a generated column in the foreign partition prevents a direct
update and thus hitting the following block of
postgresPlanForeignModify():

else if (operation == CMD_UPDATE)
{
int col;
RelOptInfo *rel = find_base_rel(root, resultRelation);
Bitmapset  *allUpdatedCols = get_rel_all_updated_cols(root, rel);

col = -1;
while ((col = bms_next_member(allUpdatedCols, col)) >= 0)
{
/* bit numbers are offset by FirstLowInvalidHeapAttributeNumber */
AttrNumber  attno = col + FirstLowInvalidHeapAttributeNumber;

if (attno <= InvalidAttrNumber) /* shouldn't happen */
elog(ERROR, "system-column update is not supported");
targetAttrs = lappend_int(targetAttrs, attno);
}
}

If you add a trigger, which does help with getting a non-direct
update, the code block above this one is executed, so
get_rel_all_updated_cols() isn't called.

Attached shows a test case I was able to come up with that I can see
is broken by a61b1f74 though passes after applying Richard's patch.
What's broken is that deparseUpdateSql() outputs a remote UPDATE
statement with the wrong SET column list, because the wrong attribute
numbers would be added to the targetAttrs list by the above code block
after the buggy multi-level translation in ger_rel_all_updated_cols().

Thanks for writing the patch, Richard.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


v1-0001-postgres_fdw-test-update-of-multi-level-partition.patch
Description: Binary data


Re: [BUG] pg_upgrade test fails from older versions.

2022-12-22 Thread Justin Pryzby
On Fri, Dec 23, 2022 at 11:42:39AM +0900, Michael Paquier wrote:
> Hmm.  0001 does a direct check on aclitem as data type used in an
> attribute,

> For now, I have fixed the most pressing part for tables to match with
> the buildfarm

+DO $$
+  DECLARE
+rec text;
+   col text;
+  BEGIN
+  FOR rec in
+SELECT oid::regclass::text
+FROM pg_class
+WHERE relname !~ '^pg_'
+  AND relkind IN ('r')
+ORDER BY 1
+  LOOP
+FOR col in SELECT attname FROM pg_attribute
+  WHERE attrelid::regclass::text = rec
+  AND atttypid = 'aclitem'::regtype
+LOOP
+  EXECUTE 'ALTER TABLE ' || quote_ident(rec) || ' ALTER COLUMN ' ||
+quote_ident(col) || ' SET DATA TYPE text';
+END LOOP;
+  END LOOP;
+  END; $$;

This will do a seq scan around pg_attribute for each relation (currently
~600)...

Here, that takes a few seconds in a debug build, and I guess it'll be
more painful when running under valgrind/discard_caches/antiquated
hardware/etc.

This would do a single seqscan:
SELECT format('ALTER TABLE %I ALTER COLUMN %I TYPE TEXT', attrelid::regclass, 
attname) FROM pg_attribute WHERE atttypid='aclitem'::regtype; -- AND ...
\gexec

-- 
Justin




Re: Avoid lost result of recursion (src/backend/optimizer/util/inherit.c)

2022-12-22 Thread Amit Langote
On Fri, Dec 23, 2022 at 12:22 PM Amit Langote  wrote:
> Attached shows a test case I was able to come up with that I can see
> is broken by a61b1f74 though passes after applying Richard's patch.

BTW, I couldn't help but notice in the output of the test case I wrote
that a generated column of a foreign table is not actually generated
locally, neither when inserting into the foreign table nor when
updating it, so it is left NULL when passing the NEW row to the remote
server.  Behavior is the same irrespective of whether the
insert/update is performed directly on the foreign table or indirectly
via an insert/update on the parent. If that's documented behavior of
postgres_fdw, maybe we are fine, but just wanted to mention that it's
not related to the bug being discussed here.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: Avoid lost result of recursion (src/backend/optimizer/util/inherit.c)

2022-12-22 Thread David Rowley
On Fri, 23 Dec 2022 at 16:22, Amit Langote  wrote:
> Attached shows a test case I was able to come up with that I can see
> is broken by a61b1f74 though passes after applying Richard's patch.

Thanks for the test case.  I'll look at this now.

+UPDATE rootp SET b = b || 'd' RETURNING a, b, c, d;
+ a |  b   |  c  | d
+---+--+-+---
+ 1 | food | 1.1 |

Coding on an empty stomach I see! :)

David




Re: Perform streaming logical transactions by background workers and parallel apply

2022-12-22 Thread Masahiko Sawada
On Fri, Dec 23, 2022 at 12:21 PM Amit Kapila  wrote:
>
> On Thu, Dec 22, 2022 at 6:18 PM Masahiko Sawada  wrote:
> >
> > On Thu, Dec 22, 2022 at 7:04 PM Amit Kapila  wrote:
> > >
> > > On Thu, Dec 22, 2022 at 11:39 AM Masahiko Sawada  
> > > wrote:
> > > >
> > > > Thank you for updating the patch. Here are some comments on v64 patches:
> > > >
> > > > While testing the patch, I realized that if all streamed transactions
> > > > are handled by parallel workers, there is no chance for the leader to
> > > > call maybe_reread_subscription() except for when waiting for the next
> > > > message. Due to this, the leader didn't stop for a while even if the
> > > > subscription gets disabled. It's an extreme case since my test was
> > > > that pgbench runs 30 concurrent transactions and logical_decoding_mode
> > > > = 'immediate', but we might want to make sure to call
> > > > maybe_reread_subscription() at least after committing/preparing a
> > > > transaction.
> > > >
> > >
> > > Won't it be better to call it only if we handle the transaction by the
> > > parallel worker?
> >
> > Agreed. And we won't need to do that after handling stream_prepare as
> > we don't do that now.
> >
>
> I think we do this for both prepare and non-prepare cases via
> begin_replication_step(). Here, in both cases, as the changes are sent
> to the parallel apply worker, we missed in both cases. So, I think it
> is better to do in both cases.

Agreed. I missed that we call maybe_reread_subscription() even in the
prepare case.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Force streaming every change in logical decoding

2022-12-22 Thread Masahiko Sawada
On Thu, Dec 22, 2022 at 6:19 PM Amit Kapila  wrote:
>
> On Thu, Dec 22, 2022 at 12:47 PM Masahiko Sawada  
> wrote:
> >
> > On Thu, Dec 22, 2022 at 4:05 PM Amit Kapila  wrote:
> > >
> >
> > >  I think
> > > instead of adding new tests with this patch, it may be better to
> > > change some of the existing tests related to streaming to use this
> > > parameter as that will clearly show one of the purposes of this patch.
> >
> > +1. I think test_decoding/sql/stream.sql and spill.sql are good
> > candidates and we change logical replication TAP tests in a separate
> > patch.
> >
>
> I prefer the other way, let's first do TAP tests because that will
> also help immediately with the parallel apply feature. We need to
> execute most of those tests in parallel mode.

Good point. Or we can do both if changes for test_decoding tests are not huge?

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Avoid lost result of recursion (src/backend/optimizer/util/inherit.c)

2022-12-22 Thread Richard Guo
On Fri, Dec 23, 2022 at 11:22 AM Amit Langote 
wrote:

> Attached shows a test case I was able to come up with that I can see
> is broken by a61b1f74 though passes after applying Richard's patch.
> What's broken is that deparseUpdateSql() outputs a remote UPDATE
> statement with the wrong SET column list, because the wrong attribute
> numbers would be added to the targetAttrs list by the above code block
> after the buggy multi-level translation in ger_rel_all_updated_cols().


Thanks for the test!  I looked at this and found that with WCO
constraints we can also hit the buggy code.  Based on David's test case,
I came up with the following in the morning.

CREATE FOREIGN TABLE ft_gc (
a int,
b int,
c int
) SERVER loopback OPTIONS (schema_name 'public', table_name 't_gc');

alter table t_c attach partition ft_gc for values in (1);
alter table t_tlp attach partition t_c for values in (1);

CREATE VIEW rw_view AS SELECT * FROM t_tlp where a < b WITH CHECK OPTION;

explain (verbose, costs off) update rw_view set c = 42;

Currently on HEAD, we can see something wrong in the plan.

  QUERY PLAN
--
 Update on public.t_tlp
   Foreign Update on public.ft_gc t_tlp_1
 Remote SQL: UPDATE public.t_gc SET b = $2 WHERE ctid = $1 RETURNING a,
b
   ->  Foreign Scan on public.ft_gc t_tlp_1
 Output: 42, t_tlp_1.tableoid, t_tlp_1.ctid, t_tlp_1.*
 Remote SQL: SELECT a, b, c, ctid FROM public.t_gc WHERE ((a < b))
FOR UPDATE
(6 rows)

Note that this is wrong: 'UPDATE public.t_gc SET b = $2'.

Thanks
Richard


Re: Avoid lost result of recursion (src/backend/optimizer/util/inherit.c)

2022-12-22 Thread Amit Langote
On Fri, Dec 23, 2022 at 1:04 PM Richard Guo  wrote:
> On Fri, Dec 23, 2022 at 11:22 AM Amit Langote  wrote:
>> Attached shows a test case I was able to come up with that I can see
>> is broken by a61b1f74 though passes after applying Richard's patch.
>> What's broken is that deparseUpdateSql() outputs a remote UPDATE
>> statement with the wrong SET column list, because the wrong attribute
>> numbers would be added to the targetAttrs list by the above code block
>> after the buggy multi-level translation in ger_rel_all_updated_cols().
>
> Thanks for the test!  I looked at this and found that with WCO
> constraints we can also hit the buggy code.

Ah, yes.

/*
 * Try to modify the foreign table directly if (1) the FDW provides
 * callback functions needed for that and (2) there are no local
 * structures that need to be run for each modified row: row-level
 * triggers on the foreign table, stored generated columns, WITH CHECK
 * OPTIONs from parent views.
 */
direct_modify = false;
if (fdwroutine != NULL &&
fdwroutine->PlanDirectModify != NULL &&
fdwroutine->BeginDirectModify != NULL &&
fdwroutine->IterateDirectModify != NULL &&
fdwroutine->EndDirectModify != NULL &&
withCheckOptionLists == NIL &&
!has_row_triggers(root, rti, operation) &&
!has_stored_generated_columns(root, rti))
direct_modify = fdwroutine->PlanDirectModify(root, node, rti, i);


>  Based on David's test case,
> I came up with the following in the morning.
>
> CREATE FOREIGN TABLE ft_gc (
> a int,
> b int,
> c int
> ) SERVER loopback OPTIONS (schema_name 'public', table_name 't_gc');
>
> alter table t_c attach partition ft_gc for values in (1);
> alter table t_tlp attach partition t_c for values in (1);
>
> CREATE VIEW rw_view AS SELECT * FROM t_tlp where a < b WITH CHECK OPTION;
>
> explain (verbose, costs off) update rw_view set c = 42;
>
> Currently on HEAD, we can see something wrong in the plan.
>
>   QUERY PLAN
> --
>  Update on public.t_tlp
>Foreign Update on public.ft_gc t_tlp_1
>  Remote SQL: UPDATE public.t_gc SET b = $2 WHERE ctid = $1 RETURNING a, b
>->  Foreign Scan on public.ft_gc t_tlp_1
>  Output: 42, t_tlp_1.tableoid, t_tlp_1.ctid, t_tlp_1.*
>  Remote SQL: SELECT a, b, c, ctid FROM public.t_gc WHERE ((a < b)) 
> FOR UPDATE
> (6 rows)
>
> Note that this is wrong: 'UPDATE public.t_gc SET b = $2'.

Right, because of the mangled targetAttrs in postgresPlanForeignModify().

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: Windows default locale vs initdb

2022-12-22 Thread Thomas Munro
On Fri, Jul 29, 2022 at 3:33 PM Thomas Munro  wrote:
> On Fri, Jul 22, 2022 at 11:59 PM Juan José Santamaría Flecha
>  wrote:
> > TL;DR; What I want to show through this example is that Windows ACP is not 
> > modified by setlocale(), it can only be done through the Windows registry 
> > and only in recent releases.
>
> Thanks, that was helpful, and so was that SO link.
>
> So it sounds like I should forget about the v3-0002 patch, but the
> v3-0001 and v3-0003 patches might have a future.  And it sounds like
> we might need to investigate maybe defending ourselves against the ACP
> being different than what we expect (ie not matching the database
> encoding)?  Did I understand correctly that you're looking into that?

I'm going to withdraw this entry.  The sooner we get something like
0001 into a release, the sooner the world will be rid of PostgreSQL
clusters initialised with the bad old locale names that the manual
very clearly tells you not to use for databases but I don't
understand this ACP/registry vs database encoding stuff and how it
relates to the use of BCP47 locale names, which puts me off changing
anything until we do.




Re: Force streaming every change in logical decoding

2022-12-22 Thread Masahiko Sawada
On Thu, Dec 22, 2022 at 9:48 PM shiy.f...@fujitsu.com
 wrote:
>
> On Thu, Dec 22, 2022 5:24 PM Amit Kapila  wrote:
> >
> > On Thu, Dec 22, 2022 at 1:15 PM Kyotaro Horiguchi
> >  wrote:
> > >
> > > At Thu, 22 Dec 2022 12:35:46 +0530, Amit Kapila
> >  wrote in
> > > > I have addressed these comments in the attached. Additionally, I have
> > > > modified the docs and commit messages to make those clear. I think
> > > > instead of adding new tests with this patch, it may be better to
> > > > change some of the existing tests related to streaming to use this
> > > > parameter as that will clearly show one of the purposes of this patch.
> > >
> > > Being late but I'm warried that we might sometime regret that the lack
> > > of the explicit default. Don't we really need it?
> > >
> >
> > For this, I like your proposal for "buffered" as an explicit default value.
> >
> > > +Allows streaming or serializing changes immediately in logical
> > decoding.
> > > +The allowed values of logical_decoding_mode
> > are the
> > > +empty string and immediate. When set to
> > > +immediate, stream each change if
> > > +streaming option is enabled, otherwise, 
> > > serialize
> > > +each change.  When set to an empty string, which is the default,
> > > +decoding will stream or serialize changes when
> > > +logical_decoding_work_mem is reached.
> > >
> > > With (really) fresh eyes, I took a bit long time to understand what
> > > the "streaming" option is. Couldn't we augment the description by a
> > > bit?
> > >
> >
> > Okay, how about modifying it to: "When set to
> > immediate, stream each change if
> > streaming option (see optional parameters set by
> > CREATE SUBSCRIPTION) is enabled, otherwise, serialize each change.
> >
>
> I updated the patch to use "buffered" as the explicit default value, and 
> include
> Amit's changes about document.
>
> Besides, I tried to reduce data size in streaming subscription tap tests by 
> this
> new GUC (see 0002 patch). But I didn't covert all streaming tap tests because 
> I
> think we also need to cover the case that there are lots of changes. So, 015* 
> is
> not modified.

If we want to eventually convert 015 some time, isn't it better to
include it even if it requires many changes? Is there any reason we
want to change 017 in a separate patch?

> And 017* is not modified because streaming transactions and
> non-streaming transactions are tested alternately in this test.

How about 029_on_error.pl? It also sets logical_decoding_work_mem to
64kb to test the STREAM COMMIT case.

>
> I collected the time to run these tests before and after applying the patch 
> set
> on my machine. In debug version, it saves about 5.3 s; and in release version,
> it saves about 1.8 s. The time of each test is attached.

Nice improvements.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




RE: Force streaming every change in logical decoding

2022-12-22 Thread Hayato Kuroda (Fujitsu)
Dear Shi,

Thanks for updating the patch! Followings are comments.

ReorderBufferCheckMemoryLimit()

```
+   /*
+* Bail out if logical_decoding_mode is disabled and we haven't exceeded
+* the memory limit.
+*/
```

I think 'disabled' should be '"buffered"'.

```
+* If logical_decoding_mode is immediate, loop until there's no change.
+* Otherwise, loop until we reach under the memory limit. One might 
think
+* that just by evicting the largest (sub)transaction we will come under
+* the memory limit based on assumption that the selected transaction is
+* at least as large as the most recent change (which caused us to go 
over
+* the memory limit). However, that is not true because a user can 
reduce
+* the logical_decoding_work_mem to a smaller value before the most 
recent
 * change.
 */
```

Do we need to pick the largest (sub)transaciton even if we are in the immediate 
mode?
It seems that the liner search is done in ReorderBufferLargestStreamableTopTXN()
to find the largest transaction, but in this case we can choose the arbitrary 
one.

reorderbuffer.h

+/* possible values for logical_decoding_mode */
+typedef enum
+{
+   LOGICAL_DECODING_MODE_BUFFERED,
+   LOGICAL_DECODING_MODE_IMMEDIATE
+}  LogicalDecodingMode;

I'm not sure, but do we have to modify typedefs.list?



Moreover, I have also checked the improvements of elapsed time.
All builds were made by meson system, and the unit of each cells is second.
It seemed that results have same trends with Shi.

Debug build:

HEADPATCH   Delta
16  6.445.960.48
18  6.926.470.45
19  5.935.910.02
22  14.98   13.71.28
23  12.01   8.223.79

SUM of delta: 6.02s

Release build:

HEADPATCH   Delta
16  3.513.220.29
17  4.043.480.56
19  3.433.3 0.13
22  10.06   8.461.6
23  6.745.391.35

SUM of delta: 3.93s


I will check and report the test coverage if I can.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: Force streaming every change in logical decoding

2022-12-22 Thread Amit Kapila
On Fri, Dec 23, 2022 at 10:32 AM Masahiko Sawada  wrote:
>
> >
> > Besides, I tried to reduce data size in streaming subscription tap tests by 
> > this
> > new GUC (see 0002 patch). But I didn't covert all streaming tap tests 
> > because I
> > think we also need to cover the case that there are lots of changes. So, 
> > 015* is
> > not modified.
>
> If we want to eventually convert 015 some time, isn't it better to
> include it even if it requires many changes?
>

I think there is some confusion here because the point is that we
don't want to convert all the tests. It would be good to have some
tests which follow the regular path of logical_decoding_work_mem.

> Is there any reason we
> want to change 017 in a separate patch?
>

Here also, the idea is to leave it as it is. It has a mix of stream
and non-stream cases and it would be tricky to convert it because we
need to change GUC before streamed one and reload the config and
ensure reloaded config is in effect.

> > And 017* is not modified because streaming transactions and
> > non-streaming transactions are tested alternately in this test.
>
> How about 029_on_error.pl? It also sets logical_decoding_work_mem to
> 64kb to test the STREAM COMMIT case.
>

How would you like to change this? Do we want to enable the GUC or
streaming option just before that case and wait for it? If so, that
might take more time than we save.

-- 
With Regards,
Amit Kapila.




Re: Force streaming every change in logical decoding

2022-12-22 Thread Amit Kapila
On Fri, Dec 23, 2022 at 10:48 AM Hayato Kuroda (Fujitsu)
 wrote:
>
> ```
> +* If logical_decoding_mode is immediate, loop until there's no 
> change.
> +* Otherwise, loop until we reach under the memory limit. One might 
> think
> +* that just by evicting the largest (sub)transaction we will come 
> under
> +* the memory limit based on assumption that the selected transaction 
> is
> +* at least as large as the most recent change (which caused us to go 
> over
> +* the memory limit). However, that is not true because a user can 
> reduce
> +* the logical_decoding_work_mem to a smaller value before the most 
> recent
>  * change.
>  */
> ```
>
> Do we need to pick the largest (sub)transaciton even if we are in the 
> immediate mode?
> It seems that the liner search is done in 
> ReorderBufferLargestStreamableTopTXN()
> to find the largest transaction, but in this case we can choose the arbitrary 
> one.
>

In immediate mode, we will stream/spill each change, so ideally, we
don't need to perform any search. Otherwise, also, I think changing
those functions will complicate the code without serving any purpose.

-- 
With Regards,
Amit Kapila.




Re: Force streaming every change in logical decoding

2022-12-22 Thread Amit Kapila
On Thu, Dec 22, 2022 at 6:18 PM shiy.f...@fujitsu.com
 wrote:
>
>
> Besides, I tried to reduce data size in streaming subscription tap tests by 
> this
> new GUC (see 0002 patch). But I didn't covert all streaming tap tests because 
> I
> think we also need to cover the case that there are lots of changes. So, 015* 
> is
> not modified. And 017* is not modified because streaming transactions and
> non-streaming transactions are tested alternately in this test.
>

I think we can remove the newly added test from the patch and instead
combine the 0001 and 0002 patches. I think we should leave the
022_twophase_cascade as it is because it can impact code coverage,
especially the below part of the test:
# 2PC PREPARE with a nested ROLLBACK TO SAVEPOINT
$node_A->safe_psql(
'postgres', "
BEGIN;
INSERT INTO test_tab VALUES (, 'foobar');
SAVEPOINT sp_inner;
INSERT INTO test_tab SELECT i, md5(i::text) FROM
generate_series(3, 5000) s(i);

Here, we will stream first time after the subtransaction, so can
impact the below part of the code in ReorderBufferStreamTXN:
if (txn->snapshot_now == NULL)
{
...
dlist_foreach(subxact_i, &txn->subtxns)
{
ReorderBufferTXN *subtxn;

subtxn = dlist_container(ReorderBufferTXN, node, subxact_i.cur);
ReorderBufferTransferSnapToParent(txn, subtxn);
}
...

-- 
With Regards,
Amit Kapila.




Re: Exit walsender before confirming remote flush in logical replication

2022-12-22 Thread Amit Kapila
On Fri, Dec 23, 2022 at 7:51 AM Kyotaro Horiguchi
 wrote:
>
> At Thu, 22 Dec 2022 17:29:34 +0530, Ashutosh Bapat 
>  wrote in
> > On Thu, Dec 22, 2022 at 11:16 AM Hayato Kuroda (Fujitsu)
> >  wrote:
> > > In case of logical replication, however, we cannot support the use-case 
> > > that
> > > switches the role publisher <-> subscriber. Suppose same case as above, 
> > > additional
> ..
> > > Therefore, I think that we can ignore the condition for shutting down the
> > > walsender in logical replication.
> ...
> > > This change may be useful for time-delayed logical replication. The 
> > > walsender
> > > waits the shutdown until all changes are applied on subscriber, even if 
> > > it is
> > > delayed. This causes that publisher cannot be stopped if large delay-time 
> > > is
> > > specified.
> >
> > I think the current behaviour is an artifact of using the same WAL
> > sender code for both logical and physical replication.
>
> Yeah, I don't think we do that for the reason of switchover. On the
> other hand I think the behavior was intentionally taken over since it
> is thought as sensible alone.
>

Do you see it was discussed somewhere? If so, can you please point to
that discussion?

> And I'm afraind that many people already
> relies on that behavior.
>

But OTOH, it can also be annoying for users to see some wait during
the shutdown which is actually not required.

> > I agree with you that the logical WAL sender need not wait for all the
> > WAL to be replayed downstream.
>
> Thus I feel that it might be a bit outrageous to get rid of that
> bahavior altogether because of a new feature stumbling on it.  I'm
> fine doing that only in the apply_delay case, though.  However, I have
> another concern that we are introducing the second exception for
> XLogSendLogical in the common path.
>
> I doubt that anyone wants to use synchronous logical replication with
> apply_delay since the sender transaction is inevitablly affected back
> by that delay.
>
> Thus how about before entering an apply_delay, logrep worker sending a
> kind of crafted feedback, which reports commit_data.end_lsn as
> flushpos?  A little tweak is needed in send_feedback() but seems to
> work..
>

How can we send commit_data.end_lsn before actually committing the
xact? I think this can lead to a problem because next time (say after
restart of walsender) server can skip sending the xact even if it is
not committed by the client.


-- 
With Regards,
Amit Kapila.




RE: Support logical replication of DDLs

2022-12-22 Thread Takamichi Osumi (Fujitsu)
On Thursday, December 22, 2022 2:22 AM vignesh C  wrote:
> I have handled most of the comments for [1] in the v55 version patch attached.
> I will handle the pending comments in the upcoming version and reply to it.
> [1] -
> https://www.postgresql.org/message-id/20221207122041.hbfj4hen3ibhdzgn%
> 40alvherre.pgsql
Hi, Vignesh


FYI, cfbot causes a failure for v55 in [1].
Could you check it ?


[1] - https://cirrus-ci.com/task/6366800263249920


Best Regards,
Takamichi Osumi



Re: Apply worker fails if a relation is missing on subscriber even if refresh publication has not been refreshed yet

2022-12-22 Thread Amit Kapila
On Thu, Dec 22, 2022 at 7:16 PM Melih Mutlu  wrote:
>
> Hi hackers,
>
> I realized a behaviour of logical replication that seems unexpected to me, 
> but not totally sure.
>
> Let's say a new table is created and added into a publication and not created 
> on subscriber yet. Also "ALTER SUBSCRIPTION ... REFRESH PUBLICATION" has not 
> been called yet.
> What I expect in that case would be that logical replication continues to 
> work as it was working before the new table was created. The new table does 
> not get replicated until "REFRESH PUBLICATION" as stated here [1].
> This is indeed how it actually seems to work. Until we insert a row into the 
> new table.
>
> After a new row into the new table, the apply worker gets this change and 
> tries to apply it. As expected, it fails since the table does not exist on 
> the subscriber yet. And the worker keeps crashing without and can't apply any 
> changes for any table.
> The obvious way to resolve this is creating the table on subscriber as well. 
> After that apply worker will be back to work and skip changes for the new 
> table and move to other changes.
> Since REFRESH PUBLICATION is not called yet, any change for the new table 
> will not be replicated.
>
> If replication of the new table will not start anyway (until REFRESH 
> PUBLICATION), do we really need to have that table on the subscriber for 
> apply worker to work?
> AFAIU any change on publication would not affect logical replication setup 
> until the publication gets refreshed on subscriber.
>

I also have the same understanding but I think if we skip replicating
some table due to the reason that the corresponding publication has
not been refreshed then it is better to LOG that information instead
of silently skipping it. Along similar lines, personally, I don't see
a very strong reason to not throw the ERROR in the case you mentioned.
Do you have any use case in mind where the user has added a table to
the publication even though she doesn't want it to be replicated? One
thing that came to my mind is that due to some reason after adding a
table to the publication, there is some delay in creating the table on
the subscriber and then refreshing the publication and during that
time user expects replication to proceed smoothly. But for that isn't
it better that the user completes the setup on the subscriber before
performing operations on such a table? Because say there is some error
in the subscriber-side setup that the user misses then it would be a
surprise for a user to not see the table data. In such a case, an
ERROR/LOG information could be helpful for users.

-- 
With Regards,
Amit Kapila.




Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-12-22 Thread John Naylor
On Thu, Dec 22, 2022 at 10:00 PM Masahiko Sawada 
wrote:

> If the value is a power of 2, it seems to work perfectly fine. But for
> example if it's 700MB, the total memory exceeds the limit:
>
> 2*(1+2+4+8+16+32+64+128) = 510MB (72.8% of 700MB) -> keep going
> 510 + 256 = 766MB -> stop but it exceeds the limit.
>
> In a more bigger case, if it's 11000MB,
>
> 2*(1+2+...+2048) = 8190MB (74.4%)
> 8190 + 4096 = 12286MB
>
> That being said, I don't think they are not common cases. So the 75%
> threshold seems to work fine in most cases.

Thinking some more, I agree this doesn't have large practical risk, but
thinking from the point of view of the community, being loose with memory
limits by up to 10% is not a good precedent.

Perhaps we can be clever and use 75% when the limit is a power of two and
50% otherwise. I'm skeptical of trying to be clever, and I just thought of
an additional concern: We're assuming behavior of the growth in size of new
DSA segments, which could possibly change. Given how allocators are
typically coded, though, it seems safe to assume that they'll at most
double in size.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Using WaitEventSet in the postmaster

2022-12-22 Thread Thomas Munro
I pushed the small preliminary patches.  Here's a rebase of the main patch.
From d23fba75cf693ffabc068a36424b7be22342c1b2 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Wed, 9 Nov 2022 22:59:58 +1300
Subject: [PATCH v7] Give the postmaster a WaitEventSet and a latch.

Traditionally, the postmaster's architecture was quite unusual.  It did
a lot of work inside signal handlers, which were only unblocked while
waiting in select() to make that safe.

Switch to a more typical architecture, where signal handlers just set
flags and use a latch to close races.  Now the postmaster looks like
all other PostgreSQL processes, multiplexing its event processing in
epoll_wait()/kevent()/poll()/WaitForMultipleObjects() depending on the
OS.

Changes:

 * Allow the postmaster to set up its own local latch.  For now we don't
   want other backends setting the postmaster's latch directly (that
   would require latches robust against arbitrary corruption of shared
   memory).

 * The existing signal handlers are cut in two: a handle_XXX part that
   sets a pending_XXX variable and sets the local latch, and a
   process_XXX part.

 * Signal handlers are now installed with the regular pqsignal()
   function rather then the special pqsignal_pm() function; the concerns
   about the portability of SA_RESTART vs select() are no longer
   relevant: SUSv2 left it implementation-defined whether select()
   restarts, but didn't add that qualification for poll(), and it doesn't
   matter anyway because we call SetLatch() creating a new reason to wake
   up.

Reviewed-by: Andres Freund 
Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com
---
 src/backend/libpq/pqsignal.c  |  40 ---
 src/backend/postmaster/fork_process.c |  12 +-
 src/backend/postmaster/postmaster.c   | 379 ++
 src/backend/tcop/postgres.c   |   1 -
 src/backend/utils/init/miscinit.c |  13 +-
 src/include/libpq/pqsignal.h  |   3 -
 src/include/miscadmin.h   |   1 +
 7 files changed, 225 insertions(+), 224 deletions(-)

diff --git a/src/backend/libpq/pqsignal.c b/src/backend/libpq/pqsignal.c
index 1ab34c5214..718043a39d 100644
--- a/src/backend/libpq/pqsignal.c
+++ b/src/backend/libpq/pqsignal.c
@@ -97,43 +97,3 @@ pqinitmask(void)
 	sigdelset(&StartupBlockSig, SIGALRM);
 #endif
 }
-
-/*
- * Set up a postmaster signal handler for signal "signo"
- *
- * Returns the previous handler.
- *
- * This is used only in the postmaster, which has its own odd approach to
- * signal handling.  For signals with handlers, we block all signals for the
- * duration of signal handler execution.  We also do not set the SA_RESTART
- * flag; this should be safe given the tiny range of code in which the
- * postmaster ever unblocks signals.
- *
- * pqinitmask() must have been invoked previously.
- */
-pqsigfunc
-pqsignal_pm(int signo, pqsigfunc func)
-{
-	struct sigaction act,
-oact;
-
-	act.sa_handler = func;
-	if (func == SIG_IGN || func == SIG_DFL)
-	{
-		/* in these cases, act the same as pqsignal() */
-		sigemptyset(&act.sa_mask);
-		act.sa_flags = SA_RESTART;
-	}
-	else
-	{
-		act.sa_mask = BlockSig;
-		act.sa_flags = 0;
-	}
-#ifdef SA_NOCLDSTOP
-	if (signo == SIGCHLD)
-		act.sa_flags |= SA_NOCLDSTOP;
-#endif
-	if (sigaction(signo, &act, &oact) < 0)
-		return SIG_ERR;
-	return oact.sa_handler;
-}
diff --git a/src/backend/postmaster/fork_process.c b/src/backend/postmaster/fork_process.c
index ec67761487..e1e7d91c52 100644
--- a/src/backend/postmaster/fork_process.c
+++ b/src/backend/postmaster/fork_process.c
@@ -12,24 +12,28 @@
 #include "postgres.h"
 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 
+#include "libpq/pqsignal.h"
 #include "postmaster/fork_process.h"
 
 #ifndef WIN32
 /*
  * Wrapper for fork(). Return values are the same as those for fork():
  * -1 if the fork failed, 0 in the child process, and the PID of the
- * child in the parent process.
+ * child in the parent process.  Signals are blocked while forking, so
+ * the child must unblock.
  */
 pid_t
 fork_process(void)
 {
 	pid_t		result;
 	const char *oomfilename;
+	sigset_t	save_mask;
 
 #ifdef LINUX_PROFILE
 	struct itimerval prof_itimer;
@@ -51,6 +55,7 @@ fork_process(void)
 	getitimer(ITIMER_PROF, &prof_itimer);
 #endif
 
+	sigprocmask(SIG_SETMASK, &BlockSig, &save_mask);
 	result = fork();
 	if (result == 0)
 	{
@@ -103,6 +108,11 @@ fork_process(void)
 		/* do post-fork initialization for random number generation */
 		pg_strong_random_init();
 	}
+	else
+	{
+		/* in parent, restore signal mask */
+		sigprocmask(SIG_SETMASK, &save_mask, NULL);
+	}
 
 	return result;
 }
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index f459dab360..e107c18ff7 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -70,7 +70,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -3

Re: ARRNELEMS Out-of-bounds possible errors

2022-12-22 Thread Nikita Malakhov
Hi,

The most obvious solution I see is to check all calls and for cases like we
both mentioned
to pass a flag meaning safe or unsafe (for these cases) behavior is
expected, like

#define ARRNELEMS(x)  ArrayGetNItems( ARR_NDIM(x), ARR_DIMS(x), false)

...

int
ArrayGetNItems(int ndim, const int *dims, bool issafe)
{
return ArrayGetNItemsSafe(ndim, dims, NULL, issafe);
}

int
ArrayGetNItemsSafe(int ndim, const int *dims, struct Node *escontext, bool
issafe)
{
...

Another solution is revision of wrapping code for all calls of
ArrayGetNItems.
Safe functions is a good idea overall, but a lot of code needs to be
revised.

On Fri, Dec 23, 2022 at 1:20 AM Ranier Vilela  wrote:

> Em qui., 22 de dez. de 2022 às 15:45, Nikita Malakhov 
> escreveu:
>
>> Hi,
>>
>> Actually, there would be much more sources affected, like
>>  nbytes += subbytes[outer_nelems];
>>  subnitems[outer_nelems] = ArrayGetNItems(this_ndims,
>> ARR_DIMS(array));
>>  nitems += subnitems[outer_nelems];
>>  havenulls |= ARR_HASNULL(array);
>>  outer_nelems++;
>>   }
>>
>> Maybe it is better for most calls like this to keep old behavior, by
>> passing a flag
>> that says which behavior is expected by caller?
>>
> I agreed that it is better to keep old behavior.
> Even the value 0 is problematic, with calls like this:
>
> nel = ARRNELEMS(ent);
> memcpy(ptr, ARRPTR(ent), nel * sizeof(int32));
>
> regards,
> Ranier Vilela
>


-- 
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/