date:20210211

From: Andrey V. Lepikhov 
> On 2/9/21 12:47 PM, tsunakawa.ta...@fujitsu.com wrote:
> >  As Tang-san and you showed, the basic part already demonstrated
> impressive improvement.  If there's no objection, I'd like to make this ready 
> for
> committer in a few days.
> Good.

I've marked this as ready for committer.  Good luck.


Regards
Takayuki Tsunakawa

RE: Parallel INSERT (INTO ... SELECT ...)

From: Hou, Zhijie/侯 志杰 
> If we diable bitmapscan, the performance degradation seems will not happen.

Yes, but that's because the hundreds of times slower sequential scan hides the 
insert time.  Furthermore, as an aside, Worker 3 does much of the work in the 
parallel sequential scan + parallel insert case, while the load is well 
balanced in the parallel bitmap scan + parallel insert case.

Oracle and SQL Server executes parallel DML by holding an exclusive lock on the 
target table.  They might use some special path for parallel DML to mitigate 
contention.


[serial bitmap scan + serial insert]
   ->  Bitmap Heap Scan on public.x  (cost=3272.20..3652841.26 rows=79918 
width=8) (actual time=8.096..41.005 rows=12 loops=1)
...
 Execution Time: 360.547 ms

[parallel bitmap scan + parallel insert]
 ->  Parallel Bitmap Heap Scan on public.x  (cost=3272.20..1260119.35 
rows=19980 width=8) (actual time=5.832..14.787 rows=26000 loops=5)
...
 Execution Time: 382.776 ms


[serial sequential scan + serial insert]
   ->  Seq Scan on public.x  (cost=0.00..5081085.52 rows=81338 width=8) (actual 
time=0.010..18997.317 rows=12 loops=1)
...
 Execution Time: 19311.700 ms

[parallel sequential scan + parallel insert]
 ->  Parallel Seq Scan on public.x  (cost=0.00..2081082.88 rows=20334 
width=8) (actual time=4001.641..5287.248 rows=32500 loops=4)
...
 Execution Time: 5488.493 ms


Regards
Takayuki Tsunakawa

RE: [HACKERS] logical decoding of two-phase transactions

2021-02-11 Thread osumi.takami...@fujitsu.com

Hi


On Thursday, February 11, 2021 5:10 PM Peter Smith  
wrote:
> Please find attached the new 2PC patch set v39*
I started to review the patchset
so, let me give some comments I have at this moment.

(1)

File : v39-0007-Support-2PC-txn-tests-for-concurrent-aborts.patch
Modification :

@@ -620,6 +666,9 @@ pg_decode_change(LogicalDecodingContext *ctx, 
ReorderBufferTXN *txn,
}
txndata->xact_wrote_changes = true;

+   /* For testing concurrent  aborts */
+   test_concurrent_aborts(data);
+
class_form = RelationGetForm(relation);
tupdesc = RelationGetDescr(relation);

Comment : There are unnecessary whitespaces in comments like above in v37-007
Please check such as pg_decode_change(), pg_decode_truncate(), 
pg_decode_stream_truncate() as well.
I suggest you align the code formats by pgindent.

(2)

File : v39-0006-Support-2PC-txn-Subscription-option.patch

@@ -213,6 +219,15 @@ parse_subscription_options(List *options,
*streaming_given = true;
*streaming = defGetBoolean(defel);
}
+   else if (strcmp(defel->defname, "two_phase") == 0 && twophase)
+   {
+   if (*twophase_given)
+   ereport(ERROR,
+   (errcode(ERRCODE_SYNTAX_ERROR),
+errmsg("conflicting or 
redundant options")));
+   *twophase_given = true;
+   *twophase = defGetBoolean(defel);
+   }

You can add this test in subscription.sql easily with double twophase options.

When I find something else, I'll let you know.

Best Regards,
Takamichi Osumi

Re: Possible dereference null return (src/backend/replication/logical/reorderbuffer.c)

At Wed, 10 Feb 2021 20:12:38 -0300, Ranier Vilela  wrote 
in 
> Hi,
> 
> Per Coverity.
> 
> If xid is a subtransaction, the setup of base snapshot on the top-level
> transaction,
> can be not optional, otherwise a Dereference null return value
> (NULL_RETURNS) can be raised.
> 
> Patch suggestion to fix this.
> 
> diff --git a/src/backend/replication/logical/reorderbuffer.c
> b/src/backend/replication/logical/reorderbuffer.c
> index 5a62ab8bbc..3c6a81f716 100644
> --- a/src/backend/replication/logical/reorderbuffer.c
> +++ b/src/backend/replication/logical/reorderbuffer.c
> @@ -2993,8 +2993,8 @@ ReorderBufferSetBaseSnapshot(ReorderBuffer *rb,
> TransactionId xid,
>   */
>   txn = ReorderBufferTXNByXid(rb, xid, true, &is_new, lsn, true);
>   if (rbtxn_is_known_subxact(txn))
> - txn = ReorderBufferTXNByXid(rb, txn->toplevel_xid, false,
> - NULL, InvalidXLogRecPtr, false);
> + txn = ReorderBufferTXNByXid(rb, txn->toplevel_xid, true,
> + NULL, InvalidXLogRecPtr, true);
>   Assert(txn->base_snapshot == NULL);

If the return from the first call is a subtransaction, the second call
always obtain the top transaction.  If the top transaction actualy did
not exist, it's rather the correct behavior to cause SEGV, than
creating a bogus rbtxn. THus it is wrong to set create=true and
create_as_top=true.  We could change the assertion like Assert (txn &&
txn->base_snapshot) to make things clearer.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

RE: Parallel INSERT (INTO ... SELECT ...)

From: Hou, Zhijie/侯 志杰 
> > What would the result look like if you turn off
> > parallel_leader_participation?  If the leader is freed from
> > reading/writing the table and index, the index page splits and
> > internal lock contention may decrease enough to recover part of the loss.
> >
> > https://www.postgresql.org/docs/devel/parallel-plans.html
> >
> > "In a parallel bitmap heap scan, one process is chosen as the leader.
> > That process performs a scan of one or more indexes and builds a
> > bitmap indicating which table blocks need to be visited. These blocks
> > are then divided among the cooperating processes as in a parallel
> > sequential scan. In other words, the heap scan is performed in
> > parallel, but the underlying index scan is not."
> 
> If I disable parallel_leader_participation.
> 
> For max_parallel_workers_per_gather = 4, It still have performance
> degradation.
> 
> For max_parallel_workers_per_gather = 2, the performance degradation will
> not happen in most of the case.
> There is sometimes a noise(performance degradation), but most of
> result(about 80%) is good.

Thank you.  The results indicate that it depends on the degree of parallelism 
whether the gain from parallelism outweighs the loss of parallel insert 
operations, at least in the bitmap scan case.

But can we conclude that this is limited to bitmap scan?  Even if that's the 
case, the planner does not have information about insert operation to choose 
other plans like serial execution or parallel sequential scan.  Should we 
encourage the user in the manual to tune parameters and find the fastest plan?


Regards
Takayuki Tsunakawa

Re: Possible dereference after null check (src/backend/executor/ExecUtils.c)

At Wed, 10 Feb 2021 19:54:46 -0300, Ranier Vilela  wrote 
in 
> Hi,
> 
> Per Coverity.
> 
> The functions ExecGetInsertedCols and ExecGetUpdatedCols at ExecUtils.c
> only are safe to call if the variable "ri_RangeTableIndex" is  != 0.
> 
> Otherwise a possible Dereference after null check (FORWARD_NULL) can be
> raised.

As it turns out, it's a false positive. And perhaps we don't want to
take action just to satisfy the static code analyzer.


The coment in ExecGetInsertedCols says:

> /*
>  * The columns are stored in the range table entry. If this ResultRelInfo
>  * doesn't have an entry in the range table (i.e. if it represents a
>  * partition routing target), fetch the parent's RTE and map the columns
>  * to the order they are in the partition.
>  */
> if (relinfo->ri_RangeTableIndex != 0)
> {

This means that any one of the two is always usable here.  AFAICS,
actually, ri_RangeTableIndex is non-zero for partitioned (=leaf) and
non-partitoned relations and ri_RootResultRelInfo is non-null for
partitioned (parent or intermediate) relations (since they don't have
a coressponding range table entry).

The only cases where both are 0 and NULL are trigger-use, which is
unrelated to the code path.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Is Recovery actually paused?

At Fri, 12 Feb 2021 13:33:32 +0900, Yugo NAGATA  wrote in 
> I don't think that we need to include the waiting approach in 
> pg_get_wal_replay_pause_state
> patch. However, Horiguchi-san's patch may be useful for some users who want
> pg_wal_replay_pause to wait until recovery gets paused instead of polling the
> state from applications. So, I shink we could discuss  this patch  in another
> thread as another commitfest entry independent from 
> pg_get_wal_replay_pause_state.

Since what I'm proposing is not making pg_wal_replay_pause() to wait,
and no one seems on my side, I withdraw the proposal.

> I have no futher comments on the v13 patch, too.  Also, I agree with
> Robert Haas's suggestions.

Yeah, look reasonable.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: pg_cryptohash_final possible out-of-bounds access (per Coverity)

At Thu, 11 Feb 2021 19:55:45 -0300, Ranier Vilela  wrote 
in 
> Em qui., 11 de fev. de 2021 às 09:47, Michael Paquier 
> escreveu:
> 
> > On Wed, Feb 10, 2021 at 09:14:46AM -0300, Ranier Vilela wrote:
> > > It is necessary to correct the interfaces. To caller, inform the size of
> > > the buffer it created.
> >
> > Now, the patch you sent has no need to be that complicated, and it
> > partially works while not actually solving at all the problem you are
> > trying to solve (nothing done for MD5 or OpenSSL).  Attached is an
> > example of what I finish with while poking at this issue. There is IMO
> > no point to touch the internals of SCRAM that all rely on the same
> > digest lengths for the proof generation with SHA256.
> >
> Ok, I take a look at your patch and I have comments:
> 
> 1. Looks missed contrib/pgcrypto.
> 2. scram_HMAC_final function still have a exchanged parameters,
> which in the future may impair maintenance.

The v3 drops the changes of the uuid_ossp contrib.  I'm not sure the
change of scram_HMAC_final is needed.

In v2, int_md5_finish() calls pg_cryptohash_final() with
h->block_size(h) (64) but it should be h->result_size(h)
(16). int_sha1_finish() is wrong the same way. (and, v3 seems fixing
them in the wrong way.)

Although I don't oppose to make things defensive, I think the derived
interfaces should be defensive in the same extent if we do. Especially
the calls to the function in checksum_helper.c is just nullifying the
protection.

For now, we can actually protect from too-short buffers in the
following places.  pg_cryptohash_final receives the buffer length
irrelevant to the actual length in other places.

0/3 places in pgcrypto.
2/2 places in uuid-ossp.
1/1 place in auth-scram.c
1/1 place in backup_manifest.c
1/1 place in cryptohashfuncs.c
1/1 place in parse_manifest.c
0/4 places in checksum_helper.c
1/2 place in md5_common.c
2/4 places in scram-common.c (The two places are claimed not to need the 
protection.)

Total 9/19 places.  I think at least pg_checksum_final() should take
the buffer length.  I'm not sure about px_md_finish() and
hmac_md_finish()..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Single transaction in the tablesync worker?

On Fri, Feb 12, 2021 at 10:08 AM Ajin Cherian  wrote:
>
> On Fri, Feb 12, 2021 at 2:46 PM Amit Kapila  wrote:
>
> >
> > Thanks, I have pushed the patch but getting one failure:
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2021-02-12%2002%3A28%3A12
> >
> > The reason seems to be that we are trying to connect and
> > max_wal_senders is set to zero. I think we can write this without
> > trying to connect. The attached patch fixes the problem for me. What
> > do you think?
>
> Verified this with installcheck and modified configuration to have
> wal_level = minimal and max_wal_senders = 0.
> Tests passed. The changes look good  to me.
>

Thanks, I have pushed the fix and the latest run of 'thorntail' has passed.

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-11 Thread Greg Nancarrow

On Fri, Feb 12, 2021 at 3:21 PM Zhihong Yu  wrote:
>
> Greg:
> bq. we should just return parsetree->hasModifyingCTE at this point,
>
> Maybe you can clarify a bit.
> The if (parsetree->hasModifyingCTE) check is followed by if 
> (!hasModifyingCTE).
> When parsetree->hasModifyingCTE is false, !hasModifyingCTE would be true, 
> resulting in the execution of the if (!hasModifyingCTE) block.
>
> In your reply, did you mean that the if (!hasModifyingCTE) block is no longer 
> needed ? (I guess not)
>

Sorry for not making it clear. What I meant was that instead of:

if (parsetree->querySource == QSRC_ORIGINAL)
{
  /* Assume original queries have hasModifyingCTE set correctly */
  if (parsetree->hasModifyingCTE)
hasModifyingCTE = true;
}

I thought I should be able to use the following (it the setting for
QSRC_ORIGINAL can really be trusted):

if (parsetree->querySource == QSRC_ORIGINAL)
{
  /* Assume original queries have hasModifyingCTE set correctly */
  return parsetree->hasModifyingCTE;
}

(and then the "if (!hasModifyingCTE)" test on the code following
immediately below it can be removed)

BUT - after testing that change, the problem test case (in the "with"
tests) STILL fails.
I then checked if hasModifyingCTE is always false in the QSRC_ORIGINAL
case (by adding an Assert), and it always is false.
So actually, there is no point in having the "if
(parsetree->querySource == QSRC_ORIGINAL)" code block - even the so
called "original" query doesn't maintain the setting correctly (even
though the actual original query sent into the query rewriter does).
And also then the "if (!hasModifyingCTE)" test on the code following
immediately below it can be removed.

Regards,
Greg Nancarrow
Fujitsu Australia

RE: libpq debug log

(48)
 
 void PQtrace(PGconn *conn, FILE *stream);
 
  
 
+ 
+  Calls PQtraceSetFlags to output with or without a 
timestamp.
+ 
+
  

Why is this necessary?  Even if you decide to remove this change, can you share 
your idea on why you added this just in case?


(49)
+  Determine to output tracing with or without a timestamp to a debugging 
file stream.

The description should be the overall functionality of this function 
considering the future additions of flags.  Something like:

Controls the tracing behavior of client/server communication.

+  then timestamp is not printed with each message. The default is output 
with timestamp.

The default value for flags argument (0) should be described here.

Also, it should be added that PQsetTraceFlags() can be called before or after 
the PQtrace() call.


(50)
I'd like you to consider skipLogging again, because it seems that such a 
variable is for the logging machinery to control state transitions internally 
in fe-logging.c.
What I'm concerned is that those who modify libpq have to be aware of 
skipLogging and would fail to handle it.


(51)
>> +typedef enum PGLogState
>> +{
>> 
>> This is libpq-logging.c internal type. It is not needed to be exposed.

> I fixed it.

How did you fix it?  The typedef is still in .h file.  It should be in .h, 
shouldn't it?  Houw about the other typedefs?



I won't comment on the other stuff on function naming.


Regards
Takayuki Tsunakawa

Re: WIP: WAL prefetch (another approach)

2021-02-11 Thread Andres Freund

Hi,

On 2021-02-12 00:42:04 +0100, Tomas Vondra wrote:
> Yeah, that's a good point. I think it'd make sense to keep track of recent
> FPIs and skip prefetching such blocks. But how exactly should we implement
> that, how many blocks do we need to track? If you get an FPI, how long
> should we skip prefetching of that block?
> 
> I don't think the history needs to be very long, for two reasons. Firstly,
> the usual pattern is that we have FPI + several changes for that block
> shortly after it. Secondly, maintenance_io_concurrency limits this naturally
> - after crossing that, redo should place the FPI into shared buffers,
> allowing us to skip the prefetch.
> 
> So I think using maintenance_io_concurrency is sufficient. We might track
> more buffers to allow skipping prefetches of blocks that were evicted from
> shared buffers, but that seems like an overkill.
> 
> However, maintenance_io_concurrency can be quite high, so just a simple
> queue is not very suitable - searching it linearly for each block would be
> too expensive. But I think we can use a simple hash table, tracking
> (relfilenode, block, LSN), over-sized to minimize collisions.
> 
> Imagine it's a simple array with (2 * maintenance_io_concurrency) elements,
> and whenever we prefetch a block or find an FPI, we simply add the block to
> the array as determined by hash(relfilenode, block)
> 
> hashtable[hash(...)] = {relfilenode, block, LSN}
> 
> and then when deciding whether to prefetch a block, we look at that one
> position. If the (relfilenode, block) match, we check the LSN and skip the
> prefetch if it's sufficiently recent. Otherwise we prefetch.

I'm a bit doubtful this is really needed at this point. Yes, the
prefetching will do a buffer table lookup - but it's a lookup that
already happens today. And the patch already avoids doing a second
lookup after prefetching (by optimistically caching the last Buffer id,
and re-checking).

I think there's potential for some significant optimization going
forward, but I think it's basically optimization over what we're doing
today. As this is already a nontrivial patch, I'd argue for doing so
separately.

Regards,

Andres

Re: Single transaction in the tablesync worker?

2021-02-11 Thread Ajin Cherian

On Fri, Feb 12, 2021 at 2:46 PM Amit Kapila  wrote:

>
> Thanks, I have pushed the patch but getting one failure:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2021-02-12%2002%3A28%3A12
>
> The reason seems to be that we are trying to connect and
> max_wal_senders is set to zero. I think we can write this without
> trying to connect. The attached patch fixes the problem for me. What
> do you think?

Verified this with installcheck and modified configuration to have
wal_level = minimal and max_wal_senders = 0.
Tests passed. The changes look good  to me.

regards,
Ajin Cherian
Fujitsu Australia

Re: Is Recovery actually paused?

2021-02-11 Thread Yugo NAGATA

On Thu, 11 Feb 2021 16:36:55 +0530
Dilip Kumar  wrote:

> On Thu, Feb 11, 2021 at 3:20 PM Bharath Rupireddy
>  wrote:
> >
> > On Wed, Feb 10, 2021 at 8:39 PM Dilip Kumar  wrote:
> > >
> > > On Wed, Feb 10, 2021 at 10:02 AM Dilip Kumar  
> > > wrote:
> > > >
> > > > I don't find any problem with this approach as well, but I personally
> > > > feel that the other approach where we don't wait in any API and just
> > > > return the recovery pause state is much simpler and more flexible.  So
> > > > I will make the pending changes in that patch and let's see what are
> > > > the other opinion and based on that we can conclude.  Thanks for the
> > > > patch.

I don't think that we need to include the waiting approach in 
pg_get_wal_replay_pause_state
patch. However, Horiguchi-san's patch may be useful for some users who want
pg_wal_replay_pause to wait until recovery gets paused instead of polling the
state from applications. So, I shink we could discuss  this patch  in another
thread as another commitfest entry independent from 
pg_get_wal_replay_pause_state.

> > > Here is an updated version of the patch which fixes the last two open 
> > > problems
> > > 1. In RecoveryRequiresIntParameter set the recovery pause state in the
> > > loop so that if recovery resumed and pause requested again we can set
> > > to pause again.
> > > 2. If the recovery state is already 'paused' then don't set it back to
> > > the 'pause requested'.
> > >
> > > One more point is that in 'pg_wal_replay_pause' even if we don't
> > > change the state because it was already set to the 'paused' then also
> > > we call the WakeupRecovery.  But I don't think there is any problem
> > > with that, if we think that this should be changed then we can make
> > > SetRecoveryPause return a bool such that if it doesn't do state change
> > > then it returns false and in that case we can avoid calling
> > > WakeupRecovery, but I felt that is unnecessary.  Any other thoughts on
> > > this?
> >
> > IMO, that WakeupRecovery should not be a problem, because even now, if
> > we issue a simple select pg_reload_conf(); (without even changing any
> > config parameter), WakeupRecovery gets called.
> >
> > Thanks for the patch. I tested the new function and it works as
> > expected. I have no further comments on the v13 patch.
> 
> Thanks for the review and testing.

I have no futher comments on the v13 patch, too.  Also, I agree with
Robert Haas's suggestions.

Regards,
Yugo Nagata

-- 
Yugo NAGATA

Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-11 Thread Zhihong Yu

Greg:
bq. we should just return parsetree->hasModifyingCTE at this point,

Maybe you can clarify a bit.
The if (parsetree->hasModifyingCTE) check is followed by if
(!hasModifyingCTE).
When parsetree->hasModifyingCTE is false, !hasModifyingCTE would be true,
resulting in the execution of the if (!hasModifyingCTE) block.

In your reply, did you mean that the if (!hasModifyingCTE) block is no
longer needed ? (I guess not)

Cheers

On Thu, Feb 11, 2021 at 8:14 PM Greg Nancarrow  wrote:

> On Fri, Feb 12, 2021 at 2:33 PM Zhihong Yu  wrote:
> >
> > For v17-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch :
> >
> > +   /* Assume original queries have hasModifyingCTE set correctly */
> > +   if (parsetree->hasModifyingCTE)
> > +   hasModifyingCTE = true;
> >
> > Since hasModifyingCTE is false by the time the above is run, it can be
> simplified as:
> > hasModifyingCTE = parsetree->hasModifyingCTE
> >
>
> Actually, we should just return parsetree->hasModifyingCTE at this
> point, because if it's false, we shouldn't need to continue the search
> (as we're assuming it has been set correctly for QSRC_ORIGINAL case).
>
> > +   if (!hasSubQuery)
> > +   return false;
> > +
> > +   return true;
> >
> > The above can be simplified as:
> > return hasSubQuery;
> >
>
> Yes, absolutely right, silly miss on that one!
> Thanks.
>
> This was only ever meant to be a temporary fix for this bug that
> affects this patch.
>
> Regards,
> Greg Nancarrow
> Fujitsu Australia
>

Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-11 Thread Greg Nancarrow

On Fri, Feb 12, 2021 at 2:33 PM Zhihong Yu  wrote:
>
> For v17-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch :
>
> +   /* Assume original queries have hasModifyingCTE set correctly */
> +   if (parsetree->hasModifyingCTE)
> +   hasModifyingCTE = true;
>
> Since hasModifyingCTE is false by the time the above is run, it can be 
> simplified as:
> hasModifyingCTE = parsetree->hasModifyingCTE
>

Actually, we should just return parsetree->hasModifyingCTE at this
point, because if it's false, we shouldn't need to continue the search
(as we're assuming it has been set correctly for QSRC_ORIGINAL case).

> +   if (!hasSubQuery)
> +   return false;
> +
> +   return true;
>
> The above can be simplified as:
> return hasSubQuery;
>

Yes, absolutely right, silly miss on that one!
Thanks.

This was only ever meant to be a temporary fix for this bug that
affects this patch.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: parse mistake in ecpg connect string

At Thu, 11 Feb 2021 15:23:31 -0500, Tom Lane  wrote in 
> "kuroda.hay...@fujitsu.com"  writes:
> > Dear Wang, Horiguchi-san,
> >>> How about the attached?
> 
> >> I think, this patch is good.
> 
> > I agree. The backward compatibility is violated in the doc, but maybe no 
> > one take care.
> 
> Pushed with a little more work on the documentation.

Thanks for committing this (and further update of the document).

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Single transaction in the tablesync worker?

On Fri, Feb 12, 2021 at 7:18 AM Ajin Cherian  wrote:
>
> On Thu, Feb 11, 2021 at 10:38 PM Amit Kapila  wrote:
>
> > Okay, attached an updated patch with only that change.
>
> I ran Erik's test suite [1] on this patch overnight and found no
> errors. No more comments from me. The patch looks good.
>

Thanks, I have pushed the patch but getting one failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2021-02-12%2002%3A28%3A12

The reason seems to be that we are trying to connect and
max_wal_senders is set to zero. I think we can write this without
trying to connect. The attached patch fixes the problem for me. What
do you think?


-- 
With Regards,
Amit Kapila.


fix_subs_test_1.patch
Description: Binary data

Re: Libpq support to connect to standby server as priority

2021-02-11 Thread vignesh C

On Fri, Feb 12, 2021 at 7:07 AM Greg Nancarrow  wrote:
>
> On Wed, Feb 10, 2021 at 5:09 PM vignesh C  wrote:
> >
> > Modified.
> > These comments are handled in v22 patch posted in my earlier mail.
> >
>
> Thanks, just one minor thing I missed in doc/src/sgml/libpq.sgml.
>
> +The support of read-write transactions is determined by the
> value of the
> +default_transaction_read_only and
> +in_hot_standby configuration parameters, that is
> +reported by the server (if supported) upon successful connection.  If
>
>
> should be:
>
> +The support of read-write transactions is determined by the
> values of the
> +default_transaction_read_only and
> +in_hot_standby configuration parameters, that are
> +reported by the server (if supported) upon successful connection.  If
>
>
> (i.e. "value" -> "values" and "is" -> "are")

Thanks for the comments, this is handled in the v23 patch attached.
Thoughts?

Regards,
Vignesh
From 45d5680adae88a5c6f9d81a5077601f036e487c5 Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Mon, 8 Feb 2021 11:23:31 +0530
Subject: [PATCH v23] Enhance the libpq "target_session_attrs" connection
 parameter.

Enhance the connection parameter "target_session_attrs" to support new values:
read-only/primary/standby/prefer-standby.
Add a new "read-only" target_session_attrs option value, to support connecting
to a read-only server if available from the list of hosts (otherwise the
connection attempt fails).
Add a new "primary" target_session_attrs option value, to support connecting to
a server which is not in hot-standby mode, if available from the list of hosts
(otherwise the connection attempt fails).
Add a new "standby" target_session_attrs option value, to support connecting to
a server which is in hot-standby mode, if available from the list of hosts
(otherwise the connection attempt fails).
Add a new "prefer-standby" target_session_attrs option value, to support
connecting to a server which is in hot-standby mode, if available from the list
of hosts (otherwise connect to a server which is not in hot-standby mode).

Discussion: https://www.postgresql.org/message-id/flat/caf3+xm+8-ztokav9ghij3wfgentq97qcjxqt+rbfq6f7onz...@mail.gmail.com
---
 doc/src/sgml/high-availability.sgml   |  16 +-
 doc/src/sgml/libpq.sgml   |  73 +-
 doc/src/sgml/protocol.sgml|   5 +-
 src/backend/utils/misc/guc.c  |   3 +-
 src/interfaces/libpq/fe-connect.c | 432 ++
 src/interfaces/libpq/fe-exec.c|  18 +-
 src/interfaces/libpq/libpq-fe.h   |   3 +-
 src/interfaces/libpq/libpq-int.h  |  52 +++-
 src/test/recovery/t/001_stream_rep.pl |  79 ++-
 src/tools/pgindent/typedefs.list  |   2 +
 10 files changed, 602 insertions(+), 81 deletions(-)

diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index f49f5c0..2bbd52c 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -1700,14 +1700,14 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'

 

-During hot standby, the parameter transaction_read_only is always
-true and may not be changed.  But as long as no attempt is made to modify
-the database, connections during hot standby will act much like any other
-database connection.  If failover or switchover occurs, the database will
-switch to normal processing mode.  Sessions will remain connected while the
-server changes mode.  Once hot standby finishes, it will be possible to
-initiate read-write transactions (even from a session begun during
-hot standby).
+During hot standby, the parameter in_hot_standby and
+transaction_read_only are always true and may not be
+changed.  But as long as no attempt is made to modify the database,
+connections during hot standby will act much like any other database
+connection.  If failover or switchover occurs, the database will switch to
+normal processing mode.  Sessions will remain connected while the server
+changes mode.  Once hot standby finishes, it will be possible to initiate
+read-write transactions (even from a session begun during hot standby).

 

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index b7a8245..08a1b8e 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -1836,18 +1836,66 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   target_session_attrs
   

-If this parameter is set to read-write, only a
-connection in which read-write transactions are accepted by default
-is considered acceptable.  The query
-SHOW transaction_read_only will be sent upon any
-successful connection; if it returns on, the connection
-will be closed.  If multiple hosts were specified in the connection
-string, any remaining servers will be tried just as if the connection
-atte

Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-11 Thread Zhihong Yu

Hi,
For v17-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch :

+   /* Assume original queries have hasModifyingCTE set correctly */
+   if (parsetree->hasModifyingCTE)
+   hasModifyingCTE = true;

Since hasModifyingCTE is false by the time the above is run, it can be
simplified as:
hasModifyingCTE = parsetree->hasModifyingCTE

+   if (!hasSubQuery)
+   return false;
+
+   return true;

The above can be simplified as:
return hasSubQuery;

Cheers

On Thu, Feb 11, 2021 at 7:02 PM Greg Nancarrow  wrote:

> On Thu, Feb 11, 2021 at 11:17 PM Greg Nancarrow 
> wrote:
> >
> > Posting an updated set of patches. Changes are based on feedback, as
> > detailed below:
> >
>
> Oops, looks like I forgot "COSTS OFF" on some added EXPLAINs in the
> tests, and it caused some test failures in the PostgreSQL Patch Tester
> (cfbot).
> Also, I think that perhaps the localized temporary fix included in the
> patch for the hasModifyingCTE bug should be restricted to INSERT, even
> though the bug actually exists for SELECT too.
> Posting an updated set of patches to address these.
>
> Regards,
> Greg Nancarrow
> Fujitsu Australia
>

Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

2021-02-11 Thread Andy Fan

On Thu, Feb 11, 2021 at 9:09 PM Ashutosh Bapat 
wrote:

> Can this information be part of PathTarget structure and hence part of
> RelOptInfo::reltarget, so that it can be extended to join, group and
> other kinds of RelOptInfo in future?


I think you want to expand this field in a more generic way.  For example:
SELECT udf(a)  FROM t WHERE  a is not null;   In current implementation, I
only
knows a is not null, nothing about if udf(a) is null or not. And we can't
present anything
for joinrel as well since it is just attno.

At the same time,  looks we can't tell if UDF(A) is null  even if the UDF
is strict and
A is not null?


> In fact, it might be easy to do that in this patch itself.
>

Actually I can't think out the method:)


> On Wed, Feb 10, 2021 at 8:57 AM Andy Fan  wrote:
> >
> >
> > On Wed, Feb 10, 2021 at 11:18 AM Andy Fan 
> wrote:
> >>
> >> Hi:
> >>
> >> This patch is the first patch in UniqueKey patch series[1], since I
> need to revise
> >> that series many times but the first one would be not that often, so
> I'd like to
> >> submit this one for review first so that I don't need to maintain it
> again and again.
> >>
> >> v1-0001-Introduce-notnullattrs-field-in-RelOptInfo-to-ind.patch
> >>
> >> Introduce notnullattrs field in RelOptInfo to indicate which attr are
> not null
> >> in current query. The not null is judged by checking pg_attribute and
> query's
> >> restrictinfo. The info is only maintained at Base RelOptInfo and
> Partition's
> >> RelOptInfo level so far.
> >>
> >>
> >> Any thoughts?
> >>
> >> [1]
> https://www.postgresql.org/message-id/CAKU4AWr1BmbQB4F7j22G%2BNS4dNuem6dKaUf%2B1BK8me61uBgqqg%40mail.gmail.com
> >> --
> >> Best Regards
> >> Andy Fan (https://www.aliyun.com/)
> >
> >
> > Add the missed patch..
> >
> > --
> > Best Regards
> > Andy Fan (https://www.aliyun.com/)
>
>
>
> --
> Best Wishes,
> Ashutosh Bapat
>


-- 
Best Regards
Andy Fan (https://www.aliyun.com/)

Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

2021-02-11 Thread Andy Fan

Thank you all, friends!

On Fri, Feb 12, 2021 at 9:02 AM David Rowley  wrote:

> On Wed, 10 Feb 2021 at 16:18, Andy Fan  wrote:
> > v1-0001-Introduce-notnullattrs-field-in-RelOptInfo-to-ind.patch
> >
> > Introduce notnullattrs field in RelOptInfo to indicate which attr are
> not null
> > in current query. The not null is judged by checking pg_attribute and
> query's
> > restrictinfo. The info is only maintained at Base RelOptInfo and
> Partition's
> > RelOptInfo level so far.
> >
> >
> > Any thoughts?
>
> I'm not that happy with what exactly the definition is of
> RelOptInfo.notnullattrs.
>
> The comment for the field says:
> + /* Not null attrs, start from -FirstLowInvalidHeapAttributeNumber */
>
> So you could expect someone to assume that these are a Bitmapset of
> attnums for all columns in the relation marked as NOT NULL.  However,
> that's not true since you use find_nonnullable_vars() to chase down
> quals that filter out NULL values and you mark those too.
>
>
The comment might be unclear,  but the behavior is on purpose. I want
to find more cases which can make the attr NOT NULL, all of them are
useful for UniqueKey stuff.



> The reason I don't really like this is that it really depends where
> you want to use RelOptInfo.notnullattrs.  If someone wants to use it
> to optimise something before the base quals are evaluated then they
> might be unhappy that they found some NULLs.
>
>
Do you mean the notnullattrs is not set correctly before the base quals are
evaluated?  I think we have lots of data structures which are set just
after some
stage.  but notnullattrs is special because it is set at more than 1
stage.  However
I'm doubtful it is unacceptable, Some fields ever change their meaning at
different
stages like Var->varno.  If a user has a misunderstanding on it, it
probably will find it
at the testing stage.


> I think you either need to explain in detail what the field means or
> separate out the two meanings somehow.
>
>
Agreed.   Besides the not null comes from 2 places (metadata and quals), it
also
means it is based on the relation, rather than the RelTarget.  for sample:
A is not
null,  but SELECT  return_null_udf(A)  FROM t,   return_null_udf is NULL.
I think
this is not well documented as well.  How about just change the documents
as:

1.  /* Not null attrs, start from -FirstLowInvalidHeapAttributeNumber, the
NOT NULL
  * comes from pg_attribtes and quals at different planning stages.
  */

or

2.  /* Not null attrs, start from -FirstLowInvalidHeapAttributeNumber, the
NOT NULL
  * comes from pg_attribtes and quals at different planning stages. And
it just means
  * the base attr rather than RelOptInfo->reltarget.
  */

I don't like to separate them into 2 fields because it may make the usage
harder a
bit as well.
-- 
Best Regards
Andy Fan (https://www.aliyun.com/)

Re: Single transaction in the tablesync worker?

2021-02-11 Thread Ajin Cherian

On Thu, Feb 11, 2021 at 10:38 PM Amit Kapila  wrote:

> Okay, attached an updated patch with only that change.

I ran Erik's test suite [1] on this patch overnight and found no
errors. No more comments from me. The patch looks good.

regards,
Ajin Cherian
Fujitsu Australia

[1]- 
https://www.postgresql.org/message-id/93d02794068482f96d31b002e0eb248d%40xs4all.nl

pg13.2: invalid memory alloc request size NNNN

2021-02-11 Thread Justin Pryzby

ts=# \errverbose 
ERROR:  XX000: invalid memory alloc request size 18446744073709551613

#0  pg_re_throw () at elog.c:1716
#1  0x00a33b12 in errfinish (filename=0xbff20e "mcxt.c", lineno=959, 
funcname=0xbff2db <__func__.6684> "palloc") at elog.c:502
#2  0x00a6760d in palloc (size=18446744073709551613) at mcxt.c:959
#3  0x009fb149 in text_to_cstring (t=0x2aaae8023010) at varlena.c:212
#4  0x009fbf05 in textout (fcinfo=0x2094538) at varlena.c:557
#5  0x006bdd50 in ExecInterpExpr (state=0x2093990, econtext=0x20933d8, 
isnull=0x7fff5bf04a87) at execExprInterp.c:1112
#6  0x006d4f18 in ExecEvalExprSwitchContext (state=0x2093990, 
econtext=0x20933d8, isNull=0x7fff5bf04a87) at 
../../../src/include/executor/executor.h:316
#7  0x006d4f81 in ExecProject (projInfo=0x2093988) at 
../../../src/include/executor/executor.h:350
#8  0x006d5371 in ExecScan (node=0x20932c8, accessMtd=0x7082e0 
, recheckMtd=0x708385 ) at execScan.c:238
#9  0x007083c2 in ExecSeqScan (pstate=0x20932c8) at nodeSeqscan.c:112
#10 0x006d1b00 in ExecProcNodeInstr (node=0x20932c8) at 
execProcnode.c:466
#11 0x006e742c in ExecProcNode (node=0x20932c8) at 
../../../src/include/executor/executor.h:248
#12 0x006e77de in ExecAppend (pstate=0x2089208) at nodeAppend.c:267
#13 0x006d1b00 in ExecProcNodeInstr (node=0x2089208) at 
execProcnode.c:466
#14 0x0070964f in ExecProcNode (node=0x2089208) at 
../../../src/include/executor/executor.h:248
#15 0x00709795 in ExecSort (pstate=0x2088ff8) at nodeSort.c:108
#16 0x006d1b00 in ExecProcNodeInstr (node=0x2088ff8) at 
execProcnode.c:466
#17 0x006d1ad1 in ExecProcNodeFirst (node=0x2088ff8) at 
execProcnode.c:450
#18 0x006dec36 in ExecProcNode (node=0x2088ff8) at 
../../../src/include/executor/executor.h:248
#19 0x006df079 in fetch_input_tuple (aggstate=0x2088a20) at 
nodeAgg.c:589
#20 0x006e1fad in agg_retrieve_direct (aggstate=0x2088a20) at 
nodeAgg.c:2368
#21 0x006e1bfd in ExecAgg (pstate=0x2088a20) at nodeAgg.c:2183
#22 0x006d1b00 in ExecProcNodeInstr (node=0x2088a20) at 
execProcnode.c:466
#23 0x006d1ad1 in ExecProcNodeFirst (node=0x2088a20) at 
execProcnode.c:450
#24 0x006c6ffa in ExecProcNode (node=0x2088a20) at 
../../../src/include/executor/executor.h:248
#25 0x006c966b in ExecutePlan (estate=0x2032f48, planstate=0x2088a20, 
use_parallel_mode=false, operation=CMD_SELECT, sendTuples=true, numberTuples=0, 
direction=ForwardScanDirection, dest=0xbb3400 , 
execute_once=true) at execMain.c:1632

#3  0x009fb149 in text_to_cstring (t=0x2aaae8023010) at varlena.c:212
212 result = (char *) palloc(len + 1);

(gdb) l
207 /* must cast away the const, unfortunately */
208 text   *tunpacked = pg_detoast_datum_packed(unconstify(text 
*, t));
209 int len = VARSIZE_ANY_EXHDR(tunpacked);
210 char   *result;
211
212 result = (char *) palloc(len + 1);

(gdb) p len
$1 = -4

This VM had some issue early today and I killed the VM, causing PG to execute
recovery.  I'm tentatively blaming that on zfs, so this could conceivably be a
data error (although recovery supposedly would have resolved it).  I just
checked and data_checksums=off.

The query has mode(), string_agg(), distinct.

Here's a redacted plan for the query:

 GroupAggregate  (cost=15681340.44..20726393.56 rows=908609 width=618)
   Group Key: (((COALESCE(a.ii, $0) || lpad(a.ii, 5, '0'::text)) || lpad(a.ii, 
5, '0'::text))), a.ii, (COALESCE(a.ii, $2)), (CASE (a.ii)::integer WHEN 1 THEN 
'qq'::text WHEN 2 THEN 'qq'::text WHEN 3 THEN 'qq'::text WHEN 4 THEN 'qq'::text 
WHEN 5 THEN 'qq qq'::text WHEN 6 THEN 'qq-qq'::text ELSE a.ii END), (CASE WHEN 
(COALESCE(a.ii, $3) = substr(a.ii, 1, length(COALESCE(a.ii, $4 THEN 'qq 
qq'::text WHEN (hashed SubPlan 7) THEN 'qq qq'::text ELSE 'qq qq qq'::text END)
   InitPlan 1 (returns $0)
 ->  Seq Scan on d
   InitPlan 3 (returns $2)
 ->  Seq Scan on d d
   InitPlan 4 (returns $3)
 ->  Seq Scan on d d
   InitPlan 5 (returns $4)
 ->  Seq Scan on d d
   InitPlan 6 (returns $5)
 ->  Seq Scan on d d
   ->  Sort  (cost=15681335.39..15704050.62 rows=9086093 width=313)
 Sort Key: (((COALESCE(a.ii, $0) || lpad(a.ii, 5, '0'::text)) || 
lpad(a.ii, 5, '0'::text))), a.ii, (COALESCE(a.ii, $2)), (CASE (a.ii)::integer 
WHEN 1 THEN 'qq'::text WHEN 2 THEN 'qq'::text WHEN 3 THEN 'qq'::text WHEN 4 
THEN 'qq'::text WHEN 5 THEN 'qq qq'::text WHEN 6 THEN 'qq-qq'::text ELSE a.ii 
END), (CASE WHEN (COALESCE(a.ii, $3) = substr(a.ii, 1, length(COALESCE(a.ii, 
$4 THEN 'qq qq'::text WHEN (hashed SubPlan 7) THEN 'qq qq'::text ELSE 'qq 
qq qq'::text END)
 ->  Append  (cost=1.01..13295792.30 rows=9086093 width=313)
   ->  Seq Scan on a a  (cost=1.01..5689033.34 rows=3948764 
width=328)
 F

Re: Libpq support to connect to standby server as priority

2021-02-11 Thread Greg Nancarrow

On Wed, Feb 10, 2021 at 5:09 PM vignesh C  wrote:
>
> Modified.
> These comments are handled in v22 patch posted in my earlier mail.
>

Thanks, just one minor thing I missed in doc/src/sgml/libpq.sgml.

+The support of read-write transactions is determined by the
value of the
+default_transaction_read_only and
+in_hot_standby configuration parameters, that is
+reported by the server (if supported) upon successful connection.  If


should be:

+The support of read-write transactions is determined by the
values of the
+default_transaction_read_only and
+in_hot_standby configuration parameters, that are
+reported by the server (if supported) upon successful connection.  If


(i.e. "value" -> "values" and "is" -> "are")


Regards,
Greg Nancarrow
Fujitsu Australia

Re: Detecting pointer misalignment (was Re: pgsql: Implementation of subscripting for jsonb)

2021-02-11 Thread Alexander Korotkov

On Thu, Feb 11, 2021 at 9:46 PM Tom Lane  wrote:
> Alexander Korotkov  writes:
> > On Mon, Feb 8, 2021 at 7:49 PM Tom Lane  wrote:
> >> Ugh, no it isn't: even pretty recent clang releases only define
> >> __GNUC__ as 4.  It looks like we need a separate test on clang's
> >> version.  I looked at their version history and sanitizers seem
> >> to have come in around clang 7, so I propose the attached (where
> >> I worked a bit harder on the comment, too).
>
> > Looks good to me.  Thank you for revising!
>
> Were you going to push this, or did you expect me to?

Thank you for noticing.  I'll commit this today.

--
Regards,
Alexander Korotkov

[DOC] add missing "[ NO ]" to various "DEPENDS ON" synopses

2021-02-11 Thread Ian Lawrence Barwick

Greetings

We have following syntax:

ALTER THING name [ NO ] DEPENDS ON EXTENSION name

for the following THINGs:

- ALTER TRIGGER
- ALTER FUNCTION
- ALTER PROCEDURE
- ALTER ROUTINE
- ALTER MATERIALIZED VIEW
- ALTER INDEX

In the documentation, the "[ NO ]" option is listed in the synopsis for
ALTER TRIGGER and ALTER FUNCTION, but not the others.
Trivial patch attached.

Will add to next CF.

Regards

Ian Barwick


-- 
EnterpriseDB: https://www.enterprisedb.com
diff --git a/doc/src/sgml/ref/alter_index.sgml b/doc/src/sgml/ref/alter_index.sgml
index 214005a86c..4b446384c2 100644
--- a/doc/src/sgml/ref/alter_index.sgml
+++ b/doc/src/sgml/ref/alter_index.sgml
@@ -24,7 +24,7 @@ PostgreSQL documentation
 ALTER INDEX [ IF EXISTS ] name RENAME TO new_name
 ALTER INDEX [ IF EXISTS ] name SET TABLESPACE tablespace_name
 ALTER INDEX name ATTACH PARTITION index_name
-ALTER INDEX name DEPENDS ON EXTENSION extension_name
+ALTER INDEX name [ NO ] DEPENDS ON EXTENSION extension_name
 ALTER INDEX name ALTER COLLATION collation_name REFRESH VERSION
 ALTER INDEX [ IF EXISTS ] name SET ( storage_parameter [= value] [, ... ] )
 ALTER INDEX [ IF EXISTS ] name RESET ( storage_parameter [, ... ] )
diff --git a/doc/src/sgml/ref/alter_materialized_view.sgml b/doc/src/sgml/ref/alter_materialized_view.sgml
index bf379db77e..27f60f6df5 100644
--- a/doc/src/sgml/ref/alter_materialized_view.sgml
+++ b/doc/src/sgml/ref/alter_materialized_view.sgml
@@ -24,7 +24,7 @@ PostgreSQL documentation
 ALTER MATERIALIZED VIEW [ IF EXISTS ] name
 action [, ... ]
 ALTER MATERIALIZED VIEW name
-DEPENDS ON EXTENSION extension_name
+[ NO ] DEPENDS ON EXTENSION extension_name
 ALTER MATERIALIZED VIEW [ IF EXISTS ] name
 RENAME [ COLUMN ] column_name TO new_column_name
 ALTER MATERIALIZED VIEW [ IF EXISTS ] name
diff --git a/doc/src/sgml/ref/alter_procedure.sgml b/doc/src/sgml/ref/alter_procedure.sgml
index 5c176fb5d8..9cbe2c7cea 100644
--- a/doc/src/sgml/ref/alter_procedure.sgml
+++ b/doc/src/sgml/ref/alter_procedure.sgml
@@ -30,7 +30,7 @@ ALTER PROCEDURE name [ ( [ [ argmode ] [ argname ] argtype [, ...] ] ) ]
 SET SCHEMA new_schema
 ALTER PROCEDURE name [ ( [ [ argmode ] [ argname ] argtype [, ...] ] ) ]
-DEPENDS ON EXTENSION extension_name
+[ NO ] DEPENDS ON EXTENSION extension_name
 
 where action is one of:
 
diff --git a/doc/src/sgml/ref/alter_routine.sgml b/doc/src/sgml/ref/alter_routine.sgml
index 36acaff319..aab4a49793 100644
--- a/doc/src/sgml/ref/alter_routine.sgml
+++ b/doc/src/sgml/ref/alter_routine.sgml
@@ -30,7 +30,7 @@ ALTER ROUTINE name [ ( [ [ argmode ] [ argname ] argtype [, ...] ] ) ]
 SET SCHEMA new_schema
 ALTER ROUTINE name [ ( [ [ argmode ] [ argname ] argtype [, ...] ] ) ]
-DEPENDS ON EXTENSION extension_name
+[ NO ] DEPENDS ON EXTENSION extension_name
 
 where action is one of:

RE: libpq debug log

2021-02-11 Thread iwata....@fujitsu.com

Hi all,

Thank you for your review. I update patch to v17. 

> From: Tsunakawa, Takayuki/綱川 貴之 
> Sent: Tuesday, February 9, 2021 11:26 AM
> (45)
...
> The description of PQtrace() should be written independent of PQtraceEx().
> It is an unnecessary implementation detail to the user that PQtrace() calls
> PQtraceEx() internally.  Plus, a separate entry for PQtraceEx() needs to be
> added.

I add PQtraceSetFlags() description instead of PQtraceEx() in response to 
Horiguchi san's suggestion.

> (46)
> 
> If skipLogging is intended for use with backend -> frontend messages only,
> shouldn't it be placed in conn->b_msg?

I moved skip flag to be_msg.
 
> (47)
...
> I'm not completely sure if other places interpose a block comment like this
> between if/for/while conditions, but I think it's better to put the comment
> before if.

I moved this comment to before if.

> From: Kyotaro Horiguchi 
> Sent: Tuesday, February 9, 2021 5:26 PM

> +typedef enum
> +{
> + MSGDIR_FROM_BACKEND,
> + MSGDIR_FROM_FRONTEND
> +} PGCommSource;
> 
> This is halfly exposed to other part of libpq. Specifically only
> MSGDIR_FROM_BACKEND is used in fe-misc.c and only for
> pgLogMessageString and pqLogMessagenchar. I would suggest to hide this
> enum from fe-misc.c.
> 
> Looking into pqLogMessageString,
> 
> +pqLogMessageString(PGconn *conn, const char *v, int length,
> PGCommSource source)
> +{
> + if (source == MSGDIR_FROM_BACKEND &&
> conn->be_msg->state != LOG_CONTENTS)
> + {
> + pqLogInvalidProtocol(conn, MSGDIR_FROM_BACKEND);
> + return; /* XXX ??? */
> + }
> +
> + fprintf(conn->Pfdebug, "\"%s\"", v);
> + if (source == MSGDIR_FROM_BACKEND)
> + pqTraceMaybeBreakLine(length, conn);
> +}
> 
> The only part that shared by both be and fe is the fprintf().  I think
> it can be naturally split into separate functions for backend and
> frontend messages.
> 
> Looking into pqLogMessagenchar,
> 
> +/*
> + * pqLogMessagenchar: output a string of exactly len bytes message to the
> log
> + */
> +void
> +pqLogMessagenchar(PGconn *conn, const char *v, int len, PGCommSource
> commsource)
> +{
> + fprintf(conn->Pfdebug, "\'");
> + pqLogBinaryMsg(conn, v, len, commsource);
> + fprintf(conn->Pfdebug, "\'");
> + pqTraceMaybeBreakLine(len, conn);
> +}
> 
> +static void
> +pqLogBinaryMsg(PGconn *conn, const char *v, int length, PGCommSource
> source)
> +{
> + int i,
> + pin;
> +
> + if (source == MSGDIR_FROM_BACKEND &&
> conn->be_msg->state != LOG_CONTENTS)
> + {
> + pqLogInvalidProtocol(conn, MSGDIR_FROM_BACKEND);
> + return; /* XXX ??? */
> 
> # What is this???
> 
> + }
> .. shared part
> +}
> 
> pqLogMessagenchar is the sole caller of pqLogBinaryMsg. So we can
> refactor the two functions and have pqLogMessagenchar_for_be and
> _for_fe without a fear of the side effect to other callers.  (The
> names are discussed below.)

Thank you for your advice on refactoring.
Separating pqLogMessagenchar() into pqLogMessagenchar_for_be and 
pqLogMessagenchar_for_fe 
seemed like adding more similar code. So I didn't work on it.

> +typedef enum PGLogState
> +{
> 
> This is libpq-logging.c internal type. It is not needed to be exposed.

I fixed it.

> +extern void pqTraceForcelyBreakLine(int size, PGconn *conn);
> +extern void pqStoreFrontendMsg(PGconn *conn, PGLogMsgDataType type,
> int length)+extern void pqStoreFeMsgStart(PGconn *conn, char type);
> +extern void pqLogFrontendMsg(PGconn *conn, int msgLen);
> +extern void pqLogMessageByte1(PGconn *conn, char v);
> +extern void pqLogMessageInt(PGconn *conn, int v, int length);
> +extern void pqLogMessageString(PGconn *conn, const char *v, int length,
> +PGCommSource
> commsource);
> +extern void pqLogMessagenchar(PGconn *conn, const char *v, int length,
> +   PGCommSource
> commsource);
> 
> The API functions looks like randomly/inconsistently named and
> designed.  I think that API should be in more structurally designed.
> 
> The comments about individual function names follow.

Thank you for your advice. I changed these functions name.
> 
> +/*
> + * pqTraceForcelyBreakLine:
> + *   If message is not completed, print a line break and reset.
> + */
> +void
> +pqTraceForcelyBreakLine(int size, PGconn *conn)
> +{
> + fprintf(conn->Pfdebug, "\n");
> + pqTraceResetBeMsg(conn);
> +}
> 
> Differently from the comment, this function doesn't work in a
> conditional way nor in a forceful way. It is just putting a new line
> and resetting the backend message variables.  I would name this as
> pqTrace(Finish|Close)BeMsgLog().

This function can be used when a connection is lost or when the copyData result 
is ignored according to the original code.
It is called to reset halfway logging. So I want to know that it will be forced 
to

Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

2021-02-11 Thread David Rowley

On Wed, 10 Feb 2021 at 16:18, Andy Fan  wrote:
> v1-0001-Introduce-notnullattrs-field-in-RelOptInfo-to-ind.patch
>
> Introduce notnullattrs field in RelOptInfo to indicate which attr are not null
> in current query. The not null is judged by checking pg_attribute and query's
> restrictinfo. The info is only maintained at Base RelOptInfo and Partition's
> RelOptInfo level so far.
>
>
> Any thoughts?

I'm not that happy with what exactly the definition is of
RelOptInfo.notnullattrs.

The comment for the field says:
+ /* Not null attrs, start from -FirstLowInvalidHeapAttributeNumber */

So you could expect someone to assume that these are a Bitmapset of
attnums for all columns in the relation marked as NOT NULL.  However,
that's not true since you use find_nonnullable_vars() to chase down
quals that filter out NULL values and you mark those too.

The reason I don't really like this is that it really depends where
you want to use RelOptInfo.notnullattrs.  If someone wants to use it
to optimise something before the base quals are evaluated then they
might be unhappy that they found some NULLs.

I think you either need to explain in detail what the field means or
separate out the two meanings somehow.

David

Re: WIP: WAL prefetch (another approach)

2021-02-11 Thread Tomas Vondra

On 2/10/21 10:50 PM, Stephen Frost wrote:
>

...

>

+/*
+ * Scan the current record for block references, and consider prefetching.
+ *
+ * Return true if we processed the current record to completion and still have
+ * queue space to process a new record, and false if we saturated the I/O
+ * queue and need to wait for recovery to advance before we continue.
+ */
+static bool
+XLogPrefetcherScanBlocks(XLogPrefetcher *prefetcher)
+{
+   DecodedXLogRecord *record = prefetcher->record;
+
+   Assert(!XLogPrefetcherSaturated(prefetcher));
+
+   /*
+* We might already have been partway through processing this record 
when
+* our queue became saturated, so we need to start where we left off.
+*/
+   for (int block_id = prefetcher->next_block_id;
+block_id <= record->max_block_id;
+++block_id)
+   {
+   DecodedBkpBlock *block = &record->blocks[block_id];
+   PrefetchBufferResult prefetch;
+   SMgrRelation reln;
+
+   /* Ignore everything but the main fork for now. */
+   if (block->forknum != MAIN_FORKNUM)
+   continue;
+
+   /*
+* If there is a full page image attached, we won't be reading 
the
+* page, so you might think we should skip it.  However, if the
+* underlying filesystem uses larger logical blocks than us, it
+* might still need to perform a read-before-write some time 
later.
+* Therefore, only prefetch if configured to do so.
+*/
+   if (block->has_image && !recovery_prefetch_fpw)
+   {
+   pg_atomic_unlocked_add_fetch_u64(&Stats->skip_fpw, 1);
+   continue;
+   }

FPIs in the stream aren't going to just avoid reads when the
filesystem's block size matches PG's- they're also going to avoid
subsequent modifications to the block, provided we don't end up pushing
that block out of shared buffers, rights?

That is, if you have an empty shared buffers and see:

Block 5 FPI
Block 6 FPI
Block 5 Update
Block 6 Update

it seems like, with this patch, we're going to Prefetch Block 5 & 6,
even though we almost certainly won't actually need them.

Yeah, that's a good point. I think it'd make sense to keep track of 
recent FPIs and skip prefetching such blocks. But how exactly should we 
implement that, how many blocks do we need to track? If you get an FPI, 
how long should we skip prefetching of that block?

I don't think the history needs to be very long, for two reasons. 
Firstly, the usual pattern is that we have FPI + several changes for 
that block shortly after it. Secondly, maintenance_io_concurrency limits 
this naturally - after crossing that, redo should place the FPI into 
shared buffers, allowing us to skip the prefetch.

So I think using maintenance_io_concurrency is sufficient. We might 
track more buffers to allow skipping prefetches of blocks that were 
evicted from shared buffers, but that seems like an overkill.

However, maintenance_io_concurrency can be quite high, so just a simple 
queue is not very suitable - searching it linearly for each block would 
be too expensive. But I think we can use a simple hash table, tracking 
(relfilenode, block, LSN), over-sized to minimize collisions.

Imagine it's a simple array with (2 * maintenance_io_concurrency) 
elements, and whenever we prefetch a block or find an FPI, we simply add 
the block to the array as determined by hash(relfilenode, block)

hashtable[hash(...)] = {relfilenode, block, LSN}

and then when deciding whether to prefetch a block, we look at that one 
position. If the (relfilenode, block) match, we check the LSN and skip 
the prefetch if it's sufficiently recent. Otherwise we prefetch.

We may issue some extra prefetches due to collisions, but that's fine I 
think. There should not be very many of them, thanks to having the hash 
table oversized.

The good thing is this is quite simple, fixed-sized data structure, 
there's no need for allocations etc.

+   /* Fast path for repeated references to the same relation. */
+   if (RelFileNodeEquals(block->rnode, prefetcher->last_rnode))
+   {
+   /*
+* If this is a repeat access to the same block, then 
skip it.
+*
+* XXX We could also check for last_blkno + 1 too, and 
also update
+* last_blkno; it's not clear if the kernel would do a 
better job
+* of sequential prefetching.
+*/
+   if (block->blkno == prefetcher->last_blkno)
+   {
+   
pg_atomic_unlocked_add_fetch_u64(&Stats->skip_seq, 1);
+   continue;
+   }

I'm sure t

Re: parse_slash_copy doesn't support psql variables substitution

2021-02-11 Thread Corey Huinker

On Wed, Feb 10, 2021 at 8:33 AM Pavel Stehule 
wrote:

> Hi
>
> Is there some reason why \copy statement (parse_slash_copy parser) doesn't
> support psql variables?
>
> Regards
>
> Pavel
>

I remember wondering about that when I was working on the \if stuff. I dug
into it a bit, but the problem was out of scope for my goals.

The additional options recently added to \g reduced my need for \copy, and
it seemed liked there was some effort to have input pipes as well, that
would eliminate the need for \copy altogether.

Re: pg_cryptohash_final possible out-of-bounds access (per Coverity)

Em qui., 11 de fev. de 2021 às 09:47, Michael Paquier 
escreveu:

> On Wed, Feb 10, 2021 at 09:14:46AM -0300, Ranier Vilela wrote:
> > It is necessary to correct the interfaces. To caller, inform the size of
> > the buffer it created.
>
> Now, the patch you sent has no need to be that complicated, and it
> partially works while not actually solving at all the problem you are
> trying to solve (nothing done for MD5 or OpenSSL).  Attached is an
> example of what I finish with while poking at this issue. There is IMO
> no point to touch the internals of SCRAM that all rely on the same
> digest lengths for the proof generation with SHA256.
>
Ok, I take a look at your patch and I have comments:

1. Looks missed contrib/pgcrypto.
2. scram_HMAC_final function still have a exchanged parameters,
which in the future may impair maintenance.

Attached the v3 same patch.

regards,
Ranier Vilela


pg_cryptohash_v3.patch
Description: Binary data

Re: Parallel Full Hash Join

2021-02-11 Thread Melanie Plageman

On Tue, Dec 29, 2020 at 03:28:12PM +1300, Thomas Munro wrote:
> I had some feedback I meant to
> post in November but didn't get around to:
> 
>   *  PHJ_BATCH_PROBING-- all probe
> - *  PHJ_BATCH_DONE   -- end
> +
> + *  PHJ_BATCH_DONE   -- queries not requiring inner fill done
> + *  PHJ_BATCH_FILL_INNER_DONE -- inner fill completed, all queries done
> 
> Would it be better/tidier to keep _DONE as the final phase?  That is,
> to switch around these two final phases.  Or does that make it too
> hard to coordinate the detach-and-cleanup logic?

I updated this to use your suggestion. My rationale for having
PHJ_BATCH_DONE and then PHJ_BATCH_FILL_INNER_DONE was that, for a worker
attaching to the batch for the first time, it might be confusing that it
is in the PHJ_BATCH_FILL_INNER state (not the DONE state) and yet that
worker still just detaches and moves on. It didn't seem intuitive.
Anyway, I think that is all sort of confusing and unnecessary. I changed
it to PHJ_BATCH_FILLING_INNER -- then when a worker who hasn't ever been
attached to this batch before attaches, it will be in the
PHJ_BATCH_FILLING_INNER phase, which it cannot help with and it will
detach and move on.

> 
> +/*
> + * ExecPrepHashTableForUnmatched
> + * set up for a series of ExecScanHashTableForUnmatched calls
> + * return true if this worker is elected to do the
> unmatched inner scan
> + */
> +bool
> +ExecParallelPrepHashTableForUnmatched(HashJoinState *hjstate)
> 
> Comment name doesn't match function name.

Updated -- and a few other comment updates too.

I just attached the diff.
>From 213c36f9e125f52eb6731005d5dcdadca73a Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Thu, 11 Feb 2021 16:31:37 -0500
Subject: [PATCH v4 2/2] Update comments and phase naming

---
 src/backend/executor/nodeHash.c | 19 +--
 src/backend/executor/nodeHashjoin.c |  4 ++--
 src/include/executor/hashjoin.h |  4 ++--
 3 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index dd8d12203a..6305688efd 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -2047,11 +2047,10 @@ void
 ExecPrepHashTableForUnmatched(HashJoinState *hjstate)
 {
 	/*--
-	 * During this scan we use the HashJoinState fields as follows:
+	 * During this scan we use the HashJoinTable fields as follows:
 	 *
-	 * hj_CurBucketNo: next regular bucket to scan
-	 * hj_CurSkewBucketNo: next skew bucket (an index into skewBucketNums)
-	 * hj_CurTuple: last tuple returned, or NULL to start next bucket
+	 * current_chunk: current HashMemoryChunk to scan
+	 * current_chunk_idx: index in current HashMemoryChunk
 	 *--
 	 */
 	HashJoinTable hashtable = hjstate->hj_HashTable;
@@ -2064,13 +2063,21 @@ ExecPrepHashTableForUnmatched(HashJoinState *hjstate)
 }
 
 /*
- * ExecPrepHashTableForUnmatched
- *		set up for a series of ExecScanHashTableForUnmatched calls
+ * ExecParallelPrepHashTableForUnmatched
+ *		set up for a series of ExecParallelScanHashTableForUnmatched calls
  *		return true if this worker is elected to do the unmatched inner scan
  */
 bool
 ExecParallelPrepHashTableForUnmatched(HashJoinState *hjstate)
 {
+	/*--
+	 * During this scan we use the ParallelHashJoinBatchAccessor fields for the
+	 * current batch as follows:
+	 *
+	 * current_chunk: current HashMemoryChunk to scan
+	 * current_chunk_idx: index in current HashMemoryChunk
+	 *--
+	 */
 	HashJoinTable hashtable = hjstate->hj_HashTable;
 	int			curbatch = hashtable->curbatch;
 	ParallelHashJoinBatchAccessor *batch_accessor = &hashtable->batches[curbatch];
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 37b49369aa..40c483cd0c 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -1183,10 +1183,10 @@ ExecParallelHashJoinNewBatch(HashJoinState *hjstate)
 	ExecParallelHashTableSetCurrentBatch(hashtable, batchno);
 	sts_begin_parallel_scan(hashtable->batches[batchno].outer_tuples);
 	return true;
-case PHJ_BATCH_DONE:
+case PHJ_BATCH_FILLING_INNER:
 	/* Fall through. */
 
-case PHJ_BATCH_FILL_INNER_DONE:
+case PHJ_BATCH_DONE:
 
 	/*
 	 * Already done.  Detach and go around again (if any
diff --git a/src/include/executor/hashjoin.h b/src/include/executor/hashjoin.h
index 634a212142..66fea4ac58 100644
--- a/src/include/executor/hashjoin.h
+++ b/src/include/executor/hashjoin.h
@@ -274,8 +274,8 @@ typedef struct ParallelHashJoinState
 #define PHJ_BATCH_ALLOCATING			1
 #define PHJ_BATCH_LOADING2
 #define PHJ_BATCH_PROBING3
-#define PHJ_BATCH_DONE			4
-#define PHJ_BATCH_FILL_INNER_DONE		5
+#define PHJ_BATCH_FILLING_INNER			4
+#define PHJ_BATCH_DONE		5
 
 /* The phases of batch growth while hashing, for grow_batches_barrier. */
 #define PHJ_GROW_BATCHES_ELECTING		0
-- 
2.25.

Re: Is Recovery actually paused?

On Thu, Feb 11, 2021 at 6:07 AM Dilip Kumar  wrote:
> > Thanks for the patch. I tested the new function and it works as
> > expected. I have no further comments on the v13 patch.
>
> Thanks for the review and testing.

I don't see a whole lot wrong with this patch, but I think there are
some things that could make it a little clearer:

- I suggest renaming CheckAndSetRecoveryPause() to ConfirmRecoveryPaused().

- I suggest moving the definition of that function to just after
SetRecoveryPause().

- I suggest changing the argument to SetRecoveryPause() back to bool.
In the one place where you call SetRecoveryPause(RECOVERY_PAUSED),
just call SetRecoveryPause(true) and ConfirmRecoveryPaused() back to
back. This in turn means that the "if" statement in
SetRecoveryPaused() can be rewritten as if (!recoveryPaused)
XLogCtl->recoveryPauseState = RECOVERY_NOT_PAUSED else if
(XLogCtl->recoveryPauseState == RECOVERY_NOT_PAUSED)
XLogCtl->recoveryPauseState = RECOVERY_PAUSE_REQUESTED(). This is
slightly less efficient, but I don't think it matters, and I think it
will be a lot more clear what's the job of SetRecoveryPause (say
whether we're trying to pause or not) and what's the job of
ConfirmRecoveryPaused (say whether we've succeeded in pausing).

- Since the numeric values of RecoveryPauseState don't matter and the
values are never visible to anything outside the server nor stored on
disk, I would be inclined to (a) not specify particular values in
xlog.h and (b) remove the test-and-elog in SetRecoveryPause().

- In the places where you say:

-if (((volatile XLogCtlData *) XLogCtl)->recoveryPause)
+if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState ==
+RECOVERY_PAUSE_REQUESTED)

...I would suggest instead testing for != RECOVERY_NOT_PAUSED. Perhaps
we don't think RECOVERY_PAUSED can happen here. But if somehow it did,
calling recoveryPausesHere() would be right.

There might be some more to say here, but those are things I notice on
a first read-through.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Operands don't affect result (CONSTANT_EXPRESSION_RESULT) (src/backend/utils/adt/jsonfuncs.c)

Em qui., 11 de fev. de 2021 às 14:51, Tom Lane  escreveu:

> Ranier Vilela  writes:
> > Em qui., 11 de fev. de 2021 às 01:46, Tom Lane 
> escreveu:
> >> At the same time, I think this code could be improved; but the way
> >> to do that is to use strtoint(), rather than kluging the choice of
> >> datatype even further.
>
> > No matter the function used strtol or strtoint, push_path will remain
> > broken with Windows 64bits.
>
> There is quite a lot of difference between "broken" and "my compiler
> generates pointless warnings".  Still, I changed it to use strtoint(),
> because that's simpler and better style.
>
Thanks Tom, for fixing this.

regards,
Ranier Vilela

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2021-02-11 Thread Joel Jacobson

>On Thu, Feb 11, 2021, at 16:50, Mark Rofail wrote:
>> Here comes a first review of the anyarray_anyelement_operators-v1.patch.
>Great, thanks! I’ll start applying your comments today and release a new patch.

Here comes some more feedback:

I was surprised to see <<@ and @>> don't complain when trying to compare 
incompatible types:

regression=# select '1'::text <<@ ARRAY[1::integer,2::integer];
?column?
--
f
(1 row)

I would expect the same result as if using the <@ and @> operators,
and wrapping the value in an array:

regression=# select ARRAY['1'::text] <@ ARRAY[1::integer,2::integer];
ERROR:  operator does not exist: text[] <@ integer[]
LINE 1: select ARRAY['1'::text] <@ ARRAY[1::integer,2::integer];
^
HINT:  No operator matches the given name and argument types. You might need to 
add explicit type casts.

The error above for the existing <@ operator is expected,
and I think the <<@ should give a similar error.

Even worse, when using integer on the left side and text in the array,
the <<@ operator causes a seg fault:

regression=# select 1::integer <<@ ARRAY['1'::text,'2'::text];
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

2021-02-11 22:18:53.083 CET [91680] LOG:  server process (PID 1666) was 
terminated by signal 11: Segmentation fault: 11
2021-02-11 22:18:53.083 CET [91680] DETAIL:  Failed process was running: select 
1::integer <<@ ARRAY['1'::text,'2'::text];
2021-02-11 22:18:53.083 CET [91680] LOG:  terminating any other active server 
processes
2021-02-11 22:18:53.084 CET [1735] FATAL:  the database system is in recovery 
mode
2021-02-11 22:18:53.084 CET [91680] LOG:  all server processes terminated; 
reinitializing
2021-02-11 22:18:53.092 CET [1736] LOG:  database system was interrupted; last 
known up at 2021-02-11 22:14:41 CET
2021-02-11 22:18:53.194 CET [1736] LOG:  database system was not properly shut 
down; automatic recovery in progress
2021-02-11 22:18:53.197 CET [1736] LOG:  redo starts at 0/2BCA5520
2021-02-11 22:18:53.197 CET [1736] LOG:  invalid record length at 0/2BCA5558: 
wanted 24, got 0
2021-02-11 22:18:53.197 CET [1736] LOG:  redo done at 0/2BCA5520 system usage: 
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2021-02-11 22:18:53.207 CET [91680] LOG:  database system is ready to accept 
connections

Maybe it's the combination of "anyarray" and "anycompatiblenonarray" that is 
the problem here?

Some more comments on the code:

array_contains_elem(AnyArrayType *array, Datum elem,
+   /*
+* Apply the comparison operator to each pair of array elements.
+*/
This comment has been copy/pasted from array_contain_compare().
Maybe the wording should clarify there is only one array in this function,
the word "pair" seems to imply working with two arrays.

+   for (i = 0; i < nelems; i++)
+   {
+   Datum elt1;

The name `elt1` originates from the array_contain_compare() function.
But since this function, array_contains_elem(), doesn't have a `elt2`,
it would be better to use `elt` as a name here. The same goes for `it1`.

/Joel

Re: Detecting pointer misalignment (was Re: pgsql: Implementation of subscripting for jsonb)

2021-02-11 Thread Thomas Munro

On Tue, Feb 9, 2021 at 1:34 PM Alexander Korotkov  wrote:
> Could we have both cfbot + buildfarm animals?

Hi Alexander,

For cfbot, yeah it does seem like a good idea to throw whatever code
sanitiser stuff we can into the automated tests, especially stuff that
isn't prone to false alarms.  Can you please recommend an exact change
to apply to:

https://github.com/macdice/cfbot/blob/master/cirrus/.cirrus.yml

Note that FreeBSD and macOS are using clang (though you might think
the latter is using gcc from its configure output...), and Linux is
using gcc.

Re: Proposal: Save user's original authenticated identity for logging

2021-02-11 Thread Jacob Champion

On Mon, 2021-02-08 at 23:35 +, Jacob Champion wrote:
> Note that I haven't compiled or tested on
> Windows and BSD yet, so the SSPI and BSD auth changes are eyeballed for
> now.

I've now tested on both.

> - For the SSPI auth method, I pick the format of the identity string
> based on the compatibility mode: "DOMAIN\user" when using compat_realm,
> and "user@DOMAIN" otherwise. For Windows DBAs, is this a helpful way to
> visualize the identity, or should I just stick to one format?

After testing on Windows, I think switching formats based on
compat_realm is a good approach. For users not on a domain, the
MACHINE\user format is probably more familiar than user@MACHINE.
Inversely, users on a domain probably want to see the modern 
user@DOMAIN instead.

v2 just updates the patchset to remove the Windows TODO and fill in the
patch notes; no functional changes. The question about escaping log
contents remains.

--Jacob
From 5dcbe2b90781ec833a1886cc800eed18b0352dcf Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Wed, 3 Feb 2021 11:04:42 -0800
Subject: [PATCH v2 1/3] prep: test/kerberos: only search forward in logs

See

https://www.postgresql.org/message-id/fe7a46f8d46ebb074ba1572d4b5e4af72dc95420.camel%40vmware.com
---
 src/test/kerberos/t/001_auth.pl | 24 +++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/src/test/kerberos/t/001_auth.pl b/src/test/kerberos/t/001_auth.pl
index 079321bbfc..c721d24dbd 100644
--- a/src/test/kerberos/t/001_auth.pl
+++ b/src/test/kerberos/t/001_auth.pl
@@ -182,6 +182,9 @@ $node->safe_psql('postgres', 'CREATE USER test1;');
 
 note "running tests";
 
+my $current_log_name = '';
+my $current_log_position;
+
 # Test connection success or failure, and if success, that query returns true.
 sub test_access
 {
@@ -221,18 +224,37 @@ sub test_access
 		$lfname =~ s/^stderr //;
 		chomp $lfname;
 
+		if ($lfname ne $current_log_name)
+		{
+			$current_log_name = $lfname;
+
+			# Search from the beginning of this new file.
+			$current_log_position = 0;
+			note "current_log_position = $current_log_position";
+		}
+
 		# might need to retry if logging collector process is slow...
 		my $max_attempts = 180 * 10;
 		my $first_logfile;
 		for (my $attempts = 0; $attempts < $max_attempts; $attempts++)
 		{
 			$first_logfile = slurp_file($node->data_dir . '/' . $lfname);
-			last if $first_logfile =~ m/\Q$expect_log_msg\E/;
+
+			# Don't include previously matched text in the search.
+			$first_logfile = substr $first_logfile, $current_log_position;
+			if ($first_logfile =~ m/\Q$expect_log_msg\E/g)
+			{
+$current_log_position += pos($first_logfile);
+last;
+			}
+
 			usleep(100_000);
 		}
 
 		like($first_logfile, qr/\Q$expect_log_msg\E/,
 			 'found expected log file content');
+
+		note "current_log_position = $current_log_position";
 	}
 
 	return;
-- 
2.25.1

From 99d36c0d486e42f5f8a7fadf884efa00c9365333 Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Mon, 8 Feb 2021 10:53:20 -0800
Subject: [PATCH v2 2/3] prep: add port->peer_dn

Original patch by Andrew Dunstan:

https://www.postgresql.org/message-id/fd96ae76-a8e3-ef8e-a642-a592f5b76771%40dunslane.net
---
 src/backend/libpq/be-secure-openssl.c | 53 +++
 src/include/libpq/libpq-be.h  |  1 +
 2 files changed, 47 insertions(+), 7 deletions(-)

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 1e2ecc6e7a..83c0eb5006 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -523,22 +523,25 @@ aloop:
 	/* Get client certificate, if available. */
 	port->peer = SSL_get_peer_certificate(port->ssl);
 
-	/* and extract the Common Name from it. */
+	/* and extract the Common Name / Distinguished Name from it. */
 	port->peer_cn = NULL;
+	port->peer_dn = NULL;
 	port->peer_cert_valid = false;
 	if (port->peer != NULL)
 	{
 		int			len;
+		X509_NAME  *x509name = X509_get_subject_name(port->peer);
+		char	   *peer_cn;
+		char	   *peer_dn;
+		BIO		   *bio = NULL;
+		BUF_MEM*bio_buf = NULL;
 
-		len = X509_NAME_get_text_by_NID(X509_get_subject_name(port->peer),
-		NID_commonName, NULL, 0);
+		len = X509_NAME_get_text_by_NID(x509name, NID_commonName, NULL, 0);
 		if (len != -1)
 		{
-			char	   *peer_cn;
-
 			peer_cn = MemoryContextAlloc(TopMemoryContext, len + 1);
-			r = X509_NAME_get_text_by_NID(X509_get_subject_name(port->peer),
-		  NID_commonName, peer_cn, len + 1);
+			r = X509_NAME_get_text_by_NID(x509name, NID_commonName, peer_cn,
+		  len + 1);
 			peer_cn[len] = '\0';
 			if (r != len)
 			{
@@ -562,6 +565,36 @@ aloop:
 
 			port->peer_cn = peer_cn;
 		}
+
+		bio = BIO_new(BIO_s_mem());
+		if (!bio)
+		{
+			pfree(port->peer_cn);
+			port->peer_cn = NULL;
+			return -1;
+		}
+		/* use commas instead of slashes */
+		X509_NAME_print_ex(bio, x509name, 0, XN_FLAG_RFC2253);
+		BIO_get_mem_ptr(bio, &bio_buf);
+		peer_dn = MemoryContextAl

Re: parse mistake in ecpg connect string

"kuroda.hay...@fujitsu.com"  writes:
> Dear Wang, Horiguchi-san,
>>> How about the attached?

>> I think, this patch is good.

> I agree. The backward compatibility is violated in the doc, but maybe no one 
> take care.

Pushed with a little more work on the documentation.

regards, tom lane

Re: Tightening up allowed custom GUC names

2021-02-11 Thread Andrew Dunstan



On 2/11/21 1:32 PM, Tom Lane wrote:
> Noah Misch  writes:
>> On Tue, Feb 09, 2021 at 05:34:37PM -0500, Tom Lane wrote:
>>> * A case could be made for tightening things up a lot more, and not
>>> allowing anything that doesn't look like an identifier.  I'm not
>>> pushing for that, as it seems more likely to break existing
>>> applications than the narrow restriction proposed here.  But I could
>>> live with it if people prefer that way.
>> I'd prefer that.  Characters like backslash, space, and double quote have
>> significant potential to reveal bugs, while having negligible application
>> beyond revealing bugs.
> Any other opinions here?  I'm hesitant to make such a change on the
> basis of just one vote.
>
>   



That might be a bit restrictive. I could at least see allowing '-' as
reasonable, and maybe ':'. Not sure about other punctuation characters.
OTOH I'd be surprised if the identifier restriction would burden a large
number of people.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Tightening up allowed custom GUC names

On Tue, Feb 9, 2021 at 6:02 PM Noah Misch  wrote:
> > * A case could be made for tightening things up a lot more, and not
> > allowing anything that doesn't look like an identifier.  I'm not
> > pushing for that, as it seems more likely to break existing
> > applications than the narrow restriction proposed here.  But I could
> > live with it if people prefer that way.
>
> I'd prefer that.  Characters like backslash, space, and double quote have
> significant potential to reveal bugs, while having negligible application
> beyond revealing bugs.

I'm not sure exactly what the rule should be here, but in general I
agree that a broader prohibition might be better. It's hard to
understand the rationale behind a system that doesn't allow
robert.max-workers as a GUC name, but does permit ro
b"ert.max^Hworkers.

+1 for not back-patching whatever we do here.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: repeated decoding of prepared transactions

On Thu, Feb 11, 2021 at 5:37 AM Amit Kapila  wrote:
> to explain the exact case to you which is not very apparent. The basic
> idea is that we ship/replay all transactions where commit happens
> after the snapshot has a consistent state (SNAPBUILD_CONSISTENT), see
> atop snapbuild.c for details. Now, for transactions where prepare is
> before snapshot state SNAPBUILD_CONSISTENT and commit prepared is
> after SNAPBUILD_CONSISTENT, we need to send the entire transaction
> including prepare at the commit time.

This might be a dumb question, but: why?

Is this because the effects of the prepared transaction might
otherwise be included neither in the initial synchronization of the
data nor in any subsequently decoded transaction, thus leaving the
replica out of sync?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Tightening up allowed custom GUC names

2021-02-11 Thread Komяpa

чц, 11 лют 2021, 21:33 карыстальнік Tom Lane  напісаў:

> Noah Misch  writes:
> > On Tue, Feb 09, 2021 at 05:34:37PM -0500, Tom Lane wrote:
> >> * A case could be made for tightening things up a lot more, and not
> >> allowing anything that doesn't look like an identifier.  I'm not
> >> pushing for that, as it seems more likely to break existing
> >> applications than the narrow restriction proposed here.  But I could
> >> live with it if people prefer that way.
>
> > I'd prefer that.  Characters like backslash, space, and double quote have
> > significant potential to reveal bugs, while having negligible application
> > beyond revealing bugs.
>
> Any other opinions here?  I'm hesitant to make such a change on the
> basis of just one vote.
>

+1 for the change. I have not seen usage of = and - in the wild in GUC
names but can see a harm of mis-interpretation of these.




> regards, tom lane
>
>
>

Re: Allowing create database within transaction block

Hari Sankar  writes:
> I wanted to see why we do not allow the following statements to be allowed
> within a transaction block:
> 1. Create database
> 2. Drop Database
> Is there a detailed reasoning behind disallowing the above statements as
> part of the design. Will appreciate it if someone can share on why postgres
> does not allow these statements inside a transaction block.

Mostly it's that create and drop database consist of a filesystem tree
copy and a filesystem recursive delete respectively.  So there isn't any
detailed WAL log entry for them, and no way to roll back at transaction
abort.

It might be possible to convert these to roll-back-able operations by
remembering that a recursive delete has to be done during transaction
abort (for the create case) or commit (for the delete case), much as
we do for table create/drop cases.  That's a bit scary however,
remembering that it's totally not acceptable to throw any sort of
error at that stage of a transaction commit.  Any failure during the
recursive delete would likely end up in leaking a lot of disk space
from files we failed to delete.

Short answer is that it could probably be done if someone wanted to
put enough effort into it, but the cost/benefit ratio hasn't seemed
attractive.

regards, tom lane

Re: Detecting pointer misalignment (was Re: pgsql: Implementation of subscripting for jsonb)

Alexander Korotkov  writes:
> On Mon, Feb 8, 2021 at 7:49 PM Tom Lane  wrote:
>> Ugh, no it isn't: even pretty recent clang releases only define
>> __GNUC__ as 4.  It looks like we need a separate test on clang's
>> version.  I looked at their version history and sanitizers seem
>> to have come in around clang 7, so I propose the attached (where
>> I worked a bit harder on the comment, too).

> Looks good to me.  Thank you for revising!

Were you going to push this, or did you expect me to?

regards, tom lane

Re: Tightening up allowed custom GUC names

Noah Misch  writes:
> On Tue, Feb 09, 2021 at 05:34:37PM -0500, Tom Lane wrote:
>> * A case could be made for tightening things up a lot more, and not
>> allowing anything that doesn't look like an identifier.  I'm not
>> pushing for that, as it seems more likely to break existing
>> applications than the narrow restriction proposed here.  But I could
>> live with it if people prefer that way.

> I'd prefer that.  Characters like backslash, space, and double quote have
> significant potential to reveal bugs, while having negligible application
> beyond revealing bugs.

Any other opinions here?  I'm hesitant to make such a change on the
basis of just one vote.

regards, tom lane

Re: Extensibility of the PostgreSQL wire protocol

2021-02-11 Thread Joshua Drake

On Wed, Feb 10, 2021 at 11:04 AM Tom Lane  wrote:

> "Jonah H. Harris"  writes:
> > On Wed, Feb 10, 2021 at 1:10 PM Tom Lane  wrote:
> >> ...  If we start having
> >> modes for MySQL identifier quoting, Oracle outer join syntax, yadda
> >> yadda, it's going to be way more of a maintenance nightmare than some
> >> hook functions.  So if we accept any patch along this line, I want to
> >> drive a hard stake in the ground that the answer to that sort of thing
> >> will be NO.
>
> > Actually, a substantial amount can be done with hooks. For Oracle, which
> is
> > substantially harder than MySQL, I have a completely separate parser that
> > generates a PG-compatible parse tree packaged up as an extension. To
> handle
> > autonomous transactions, database links, hierarchical query conversion,
> > hints, and some execution-related items requires core changes.
>
> That is a spot-on definition of where I do NOT want to end up.  Hooks
> everywhere and enormous extensions that break anytime we change anything
> in the core.  It's not really clear that anybody is going to find that
> more maintainable than a straight fork, except to the extent that it
> enables the erstwhile forkers to shove some of their work onto the PG
> community.
>
> My feeling about this is if you want to use Oracle, go use Oracle.
> Don't ask PG to take on a ton of maintenance issues so you can have
> a frankenOracle.
>

PostgreSQL over the last decade spent a considerable amount of time
allowing it to become extensible outside of core. We are now useful in
workloads nobody would have considered in 2004 or 2008.

The more extensibility we add, the LESS we maintain. It is a lot easier to
maintain an API than it is an entire kernel. When I look at all the
interesting features coming from the ecosystem, they are all built on the
hooks that this community worked so hard to create. This idea is an
extension of that and a result of the community's success.

The more extensible we make PostgreSQL, the more the hacker community can
innovate without damaging the PostgreSQL reputation as a rock solid
database system.

Features like these only enable the entire community to innovate. Is the
real issue that the more extensible PostgreSQL is, the more boring it will
become?

JD

>
> regards, tom lane
>
>
>

Re: Operands don't affect result (CONSTANT_EXPRESSION_RESULT) (src/backend/utils/adt/jsonfuncs.c)

Ranier Vilela  writes:
> Em qui., 11 de fev. de 2021 às 01:46, Tom Lane  escreveu:
>> At the same time, I think this code could be improved; but the way
>> to do that is to use strtoint(), rather than kluging the choice of
>> datatype even further.

> No matter the function used strtol or strtoint, push_path will remain
> broken with Windows 64bits.

There is quite a lot of difference between "broken" and "my compiler
generates pointless warnings".  Still, I changed it to use strtoint(),
because that's simpler and better style.

regards, tom lane

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2021-02-11 Thread Mark Rofail

Hey Joel,

> Here comes a first review of the anyarray_anyelement_operators-v1.patch.
Great, thanks! I’ll start applying your comments today and release a new
patch.

/Mark

Re: Extensibility of the PostgreSQL wire protocol

2021-02-11 Thread Jim Mlodgenski

On Thu, Feb 11, 2021 at 10:29 AM Andrew Dunstan  wrote:

>
> On 2/11/21 10:06 AM, Tom Lane wrote:
> > Robert Haas  writes:
> >> On Thu, Feb 11, 2021 at 9:42 AM Jonah H. Harris 
> wrote:
> >>> As Jan said in his last email, they're not proposing all the different
> >>> aspects needed. In fact, nothing has actually been proposed yet. This
> >>> is an entirely philosophical debate. I don't even know what's being
> >>> proposed at this point - I just know it *could* be useful. Let's just
> >>> wait and see what is actually proposed before shooting it down, yes?
> >> I don't think I'm trying to shoot anything down, because as I said, I
> >> like extensibility and am generally in favor of it. Rather, I'm
> >> expressing a concern which seems to me to be justified, based on what
> >> was posted. I'm sorry that my tone seems to have aggravated you, but
> >> it wasn't intended to do so.
> > Likewise, the point I was trying to make is that a "pluggable wire
> > protocol" is only a tiny part of what would be needed to have a credible
> > MySQL, Oracle, or whatever clone.  There are large semantic differences
> > from those products; there are maintenance issues arising from the fact
> > that we whack structures like parse trees around all the time; and so on.
> > Maybe there is some useful thing that can be accomplished here, but we
> > need to consider the bigger picture rather than believing (without proof)
> > that a few hook variables will be enough to do anything.
>
>
>
> Yeah. I think we'd need a fairly fully worked implementation to see
> where it goes. Is Amazon going to release (under TPL) its TDS
> implementation of this? That might go a long way to convincing me this
> is worth considering.
>
> Everything is planned to be released under the Apache 2.0 license so
people are free to do with it as they choose.

Re: Extensibility of the PostgreSQL wire protocol

2021-02-11 Thread Andrew Dunstan



On 2/11/21 10:06 AM, Tom Lane wrote:
> Robert Haas  writes:
>> On Thu, Feb 11, 2021 at 9:42 AM Jonah H. Harris  
>> wrote:
>>> As Jan said in his last email, they're not proposing all the different
>>> aspects needed. In fact, nothing has actually been proposed yet. This
>>> is an entirely philosophical debate. I don't even know what's being
>>> proposed at this point - I just know it *could* be useful. Let's just
>>> wait and see what is actually proposed before shooting it down, yes?
>> I don't think I'm trying to shoot anything down, because as I said, I
>> like extensibility and am generally in favor of it. Rather, I'm
>> expressing a concern which seems to me to be justified, based on what
>> was posted. I'm sorry that my tone seems to have aggravated you, but
>> it wasn't intended to do so.
> Likewise, the point I was trying to make is that a "pluggable wire
> protocol" is only a tiny part of what would be needed to have a credible
> MySQL, Oracle, or whatever clone.  There are large semantic differences
> from those products; there are maintenance issues arising from the fact
> that we whack structures like parse trees around all the time; and so on.
> Maybe there is some useful thing that can be accomplished here, but we
> need to consider the bigger picture rather than believing (without proof)
> that a few hook variables will be enough to do anything.



Yeah. I think we'd need a fairly fully worked implementation to see
where it goes. Is Amazon going to release (under TPL) its TDS
implementation of this? That might go a long way to convincing me this
is worth considering.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2021-02-11 Thread Joel Jacobson

Hi Mark,

On Mon, Feb 8, 2021, at 09:40, Mark Rofail wrote:
> anyarray_anyelement_operators-v1.patch

Here comes a first review of the anyarray_anyelement_operators-v1.patch.

doc/src/sgml/func.sgml
+Does the array contain specified element ?

* Maybe remove the extra blank space before question mark?

doc/src/sgml/indices.sgml
-<@   @>   =   &&
+<@ @>   <<@ @>>   =   &&

* To me it looks like the pattern is to insert   between each operator, in 
which case this should be written:

<@   @>   <<@ @>>   =   &&

I.e.,   is missing between <@ and @>.

src/backend/access/gin/ginarrayproc.c
/* Make copy of array input to ensure it doesn't disappear while in use 
*/
-   ArrayType  *array = PG_GETARG_ARRAYTYPE_P_COPY(0);
+   ArrayType  *array;

I think the comment above should be changed/moved since the copy has been moved 
and isn't always performed due to the if. Since array variable is only used in 
the else block, couldn't both the comment and the declaration of array be moved 
to the else block as well?

src/backend/utils/adt/arrayfuncs.c
+   /*
+* We assume that the comparison operator is strict, so a NULL 
can't
+* match anything.  XXX this diverges from the "NULL=NULL" 
behavior of
+* array_eq, should we act like that?
+*/

The comment above is copy/pasted from array_contain_compare(). It seems 
unnecessary to have this open question, word-by-word, on two different places. 
I think a reference from here to the existing similar code would be better. And 
also to add a comment in array_contain_compare() about the existence of this 
new code where the same question is discussed.

If this would be new code, then this question should probably be answered 
before committing, but since this is old code, maybe the behaviour now can't be 
changed anyway, since the old code in array_contain_compare() has been around 
since 2006, when it was introduced in commit 
f5b4d9a9e08199e6bcdb050ef42ea7ec0f7525ca.

/Joel

Re: Extensibility of the PostgreSQL wire protocol

Robert Haas  writes:
> On Thu, Feb 11, 2021 at 9:42 AM Jonah H. Harris  
> wrote:
>> As Jan said in his last email, they're not proposing all the different
>> aspects needed. In fact, nothing has actually been proposed yet. This
>> is an entirely philosophical debate. I don't even know what's being
>> proposed at this point - I just know it *could* be useful. Let's just
>> wait and see what is actually proposed before shooting it down, yes?

> I don't think I'm trying to shoot anything down, because as I said, I
> like extensibility and am generally in favor of it. Rather, I'm
> expressing a concern which seems to me to be justified, based on what
> was posted. I'm sorry that my tone seems to have aggravated you, but
> it wasn't intended to do so.

Likewise, the point I was trying to make is that a "pluggable wire
protocol" is only a tiny part of what would be needed to have a credible
MySQL, Oracle, or whatever clone.  There are large semantic differences
from those products; there are maintenance issues arising from the fact
that we whack structures like parse trees around all the time; and so on.
Maybe there is some useful thing that can be accomplished here, but we
need to consider the bigger picture rather than believing (without proof)
that a few hook variables will be enough to do anything.

regards, tom lane

Re: Extensibility of the PostgreSQL wire protocol

On Thu, Feb 11, 2021 at 9:42 AM Jonah H. Harris  wrote:
> I'm quite sure I said I'd open source my MySQL implementation, which allows 
> Postgres to appear to MySQL clients as a MySQL/MariaDB server. This is 
> neither proprietary nor Amazon-related and makes Postgres substantially more 
> useful for a large number of applications.

OK. There's stuff to think about there, too: do we want that in
contrib? Is it in good enough shape to be in contrib even if we did?
If it's not in contrib, how do we incorporate it into, say, the
buildfarm, so that we know if we break something? Is it actively
maintained and stable, so that if it needs adjustment for upstream
changes we can count on that getting addressed in a timely fashion? I
don't know the answers to these questions and am not trying to
prejudge, but I think they are important and relevant questions.

> As Jan said in his last email, they're not proposing all the different 
> aspects needed. In fact, nothing has actually been proposed yet. This is an 
> entirely philosophical debate. I don't even know what's being proposed at 
> this point - I just know it *could* be useful. Let's just wait and see what 
> is actually proposed before shooting it down, yes?

I don't think I'm trying to shoot anything down, because as I said, I
like extensibility and am generally in favor of it. Rather, I'm
expressing a concern which seems to me to be justified, based on what
was posted. I'm sorry that my tone seems to have aggravated you, but
it wasn't intended to do so.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

Ashutosh Bapat  writes:
> Can this information be part of PathTarget structure and hence part of
> RelOptInfo::reltarget, so that it can be extended to join, group and
> other kinds of RelOptInfo in future?

Why would that be better than keeping it in RelOptInfo?

regards, tom lane

Re: [HACKERS] Custom compression methods

On Thu, Feb 11, 2021 at 7:36 AM Dilip Kumar  wrote:
> W.R.T the attached patch, In HeapTupleHeaderGetDatum, we don't even
> attempt to detoast if there is no external field in the tuple, in POC
> I have got rid of that check, but I think we might need to do better.
> Maybe we can add a flag in infomask to detect whether the tuple has
> any compressed data or not as we have for detecting the external data
> (HEAP_HASEXTERNAL).

No. This feature isn't close to being important enough to justify
consuming an infomask bit.

I don't really see why we need it anyway. If array construction
already categorically detoasts, why can't we do the same thing here?
Would it really cost that much? In what case? Having compressed values
in a record we're going to store on disk actually seems like a pretty
dumb idea. We might end up trying to recompress something parts of
which have already been compressed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Improvements and additions to COPY progress reporting

2021-02-11 Thread Bharath Rupireddy

On Thu, Feb 11, 2021, 8:08 PM Josef Šimánek  wrote:

> čt 11. 2. 2021 v 15:27 odesílatel Matthias van de Meent
>  napsal:
> >
> > On Wed, 10 Feb 2021 at 07:43, Bharath Rupireddy
> >  wrote:
> > >
> > > On Tue, Feb 9, 2021 at 6:02 PM Matthias van de Meent
> > >  wrote:
> > > > > Also, you can add this to the current commitfest.
> > > >
> > > > See https://commitfest.postgresql.org/32/2977/
> > > >
> > > > On Tue, 9 Feb 2021 at 12:53, Josef Šimánek 
> wrote:
> > > > >
> > > > > OK, would you mind to integrate my regression test initial patch as
> > > > > well in v3 or should I submit it later in a separate way?
> > > >
> > > > Attached, with minor fixes
> > >
> > > Why do we need to have a new test file progress.sql for the test
> > > cases? Can't we add them into existing copy.sql or copy2.sql? Or do
> > > you have a plan to add test cases into progress.sql for other progress
> > > reporting commands?
> >
> > I don't mind moving the test into copy or copy2, but the main reason
> > to put it in a seperate file is to test the 'copy' component of the
> > feature called 'progress reporting'. If the feature instead is 'copy'
> > and 'progress reporting' is part of that feature, then I'd put it in
> > the copy-tests, but because the documentation of this has it's own
> > docs page  'progress reporting', and because 'copy' is a subsection of
> > that, I do think that this feature warrants its own regression test
> > file.
> >
> > There are no other tests for the progress reporting feature yet,
> > because COPY ... FROM is the only command that is progress reported
> > _and_ that can fire triggers while running the command, so checking
> > the progress view during the progress reported command is only
> > feasable in COPY progress reporting. To test the other progress
> > reporting views, we would need multiple sessions, which I believe is
> > impossible in this test format. Please correct me if I'm wrong; I'd
> > love to add tests for the other components. That will not be in this
> > patchset, though.
> >
> > > IMO, it's better not add any new test file but add the tests to
> existing files.
> >
> > In general I agree, but in some cases (e.g. new system component, new
> > full-fledged feature), new test files are needed. I think that this
> > could be one of those cases.
>
> I have split it since it should be the start of progress reporting
> testing at all. If you better consider this as part of COPY testing,
> feel free to move it to already existing copy testing related files.
> There's no real reason to keep it separated if not needed.
>

+1 to move those test cases to existing copy test files.

Re: Extensibility of the PostgreSQL wire protocol

2021-02-11 Thread Jonah H. Harris

On Thu, Feb 11, 2021 at 9:28 AM Robert Haas  wrote:

> That being said, I'm not in favor of transferring maintenance work to
> the community for this set of hooks any more than I am for something
> on the parsing side. In general, I'm in favor of as much extensibility
> as we can reasonably create, but with a complicated proposal like this
> one, the community should expect to be able to get something out of
> it. And so far what I hear Jan saying is that these hooks could in
> theory be used for things other than Amazon's proprietary efforts and
> those things could in theory bring benefits to the community, but
> there are no actual plans to do anything with this that would benefit
> anyone other than Amazon. Which seems to bring us right back to
> expecting the community to maintain things for the benefit of
> third-party forks.
>

I'm quite sure I said I'd open source my MySQL implementation, which allows
Postgres to appear to MySQL clients as a MySQL/MariaDB server. This is
neither proprietary nor Amazon-related and makes Postgres substantially
more useful for a large number of applications.

As Jan said in his last email, they're not proposing all the different
aspects needed. In fact, nothing has actually been proposed yet. This is an
entirely philosophical debate. I don't even know what's being proposed at
this point - I just know it *could* be useful. Let's just wait and see what
is actually proposed before shooting it down, yes?

-- 
Jonah H. Harris

Re: Improvements and additions to COPY progress reporting

2021-02-11 Thread Josef Šimánek

čt 11. 2. 2021 v 15:27 odesílatel Matthias van de Meent
 napsal:
>
> On Wed, 10 Feb 2021 at 07:43, Bharath Rupireddy
>  wrote:
> >
> > On Tue, Feb 9, 2021 at 6:02 PM Matthias van de Meent
> >  wrote:
> > > > Also, you can add this to the current commitfest.
> > >
> > > See https://commitfest.postgresql.org/32/2977/
> > >
> > > On Tue, 9 Feb 2021 at 12:53, Josef Šimánek  
> > > wrote:
> > > >
> > > > OK, would you mind to integrate my regression test initial patch as
> > > > well in v3 or should I submit it later in a separate way?
> > >
> > > Attached, with minor fixes
> >
> > Why do we need to have a new test file progress.sql for the test
> > cases? Can't we add them into existing copy.sql or copy2.sql? Or do
> > you have a plan to add test cases into progress.sql for other progress
> > reporting commands?
>
> I don't mind moving the test into copy or copy2, but the main reason
> to put it in a seperate file is to test the 'copy' component of the
> feature called 'progress reporting'. If the feature instead is 'copy'
> and 'progress reporting' is part of that feature, then I'd put it in
> the copy-tests, but because the documentation of this has it's own
> docs page  'progress reporting', and because 'copy' is a subsection of
> that, I do think that this feature warrants its own regression test
> file.
>
> There are no other tests for the progress reporting feature yet,
> because COPY ... FROM is the only command that is progress reported
> _and_ that can fire triggers while running the command, so checking
> the progress view during the progress reported command is only
> feasable in COPY progress reporting. To test the other progress
> reporting views, we would need multiple sessions, which I believe is
> impossible in this test format. Please correct me if I'm wrong; I'd
> love to add tests for the other components. That will not be in this
> patchset, though.
>
> > IMO, it's better not add any new test file but add the tests to existing 
> > files.
>
> In general I agree, but in some cases (e.g. new system component, new
> full-fledged feature), new test files are needed. I think that this
> could be one of those cases.

I have split it since it should be the start of progress reporting
testing at all. If you better consider this as part of COPY testing,
feel free to move it to already existing copy testing related files.
There's no real reason to keep it separated if not needed.

>
> With regards,
>
> Matthias van de Meent

Re: ERROR: invalid spinlock number: 0

2021-02-11 Thread Fujii Masao




On 2021/02/11 21:55, Michael Paquier wrote:

Hi Fujii-san,

On Tue, Feb 09, 2021 at 11:17:04PM +0900, Fujii Masao wrote:

ISTM that the commit 2c8dd05d6c caused this issue. The commit changed
pg_stat_get_wal_receiver() so that it reads "writtenUpto" by using
pg_atomic_read_u64(). But since "writtenUpto" is initialized only when
walreceiver starts up, reading "writtenUpto" before the startup of
walreceiver can cause the error.


Indeed, that's a problem.  We should at least move that out of the
spin lock area.


Yes, so what about the attached patch?

We didn't notice this issue long time because no regression test checks
pg_stat_wal_receiver. So I included such test in the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/src/backend/replication/walreceiver.c 
b/src/backend/replication/walreceiver.c
index 723f513d8b..9b32941a76 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1325,7 +1325,6 @@ pg_stat_get_wal_receiver(PG_FUNCTION_ARGS)
state = WalRcv->walRcvState;
receive_start_lsn = WalRcv->receiveStart;
receive_start_tli = WalRcv->receiveStartTLI;
-   written_lsn = pg_atomic_read_u64(&WalRcv->writtenUpto);
flushed_lsn = WalRcv->flushedUpto;
received_tli = WalRcv->receivedTLI;
last_send_time = WalRcv->lastMsgSendTime;
@@ -1345,6 +1344,19 @@ pg_stat_get_wal_receiver(PG_FUNCTION_ARGS)
if (pid == 0 || !ready_to_display)
PG_RETURN_NULL();
 
+   /*
+* We reach here if WAL receiver is ready to display (i.e.,
+* ready_to_display == true). In this case we can ensure that the atomic
+* variable "writtenUpto" has already been initialized by WAL receiver, 
so
+* we can safely read it here.
+*
+* Read "writtenUpto" without holding a spinlock. So it may not be
+* consistent with other WAL receiver's shared variables protected by a
+* spinlock. This is OK because that variable is used only for
+* informational purpose and should not be used for data integrity 
checks.
+*/
+   written_lsn = pg_atomic_read_u64(&WalRcv->writtenUpto);
+
/* determine result type */
if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
elog(ERROR, "return type must be a row type");
diff --git a/src/test/regress/expected/sysviews.out 
b/src/test/regress/expected/sysviews.out
index 81bdacf59d..6d048e309c 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -83,6 +83,13 @@ select count(*) = 1 as ok from pg_stat_wal;
  t
 (1 row)
 
+-- We expect no walreceiver running in this test
+select count(*) = 0 as ok from pg_stat_wal_receiver;
+ ok 
+
+ t
+(1 row)
+
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
diff --git a/src/test/regress/sql/sysviews.sql 
b/src/test/regress/sql/sysviews.sql
index b9b875bc6a..dc8c9a3ac2 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -40,6 +40,9 @@ select count(*) >= 0 as ok from pg_prepared_xacts;
 -- There must be only one record
 select count(*) = 1 as ok from pg_stat_wal;
 
+-- We expect no walreceiver running in this test
+select count(*) = 0 as ok from pg_stat_wal_receiver;
+
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';

Re: Extensibility of the PostgreSQL wire protocol

On Wed, Feb 10, 2021 at 2:04 PM Tom Lane  wrote:
> That is a spot-on definition of where I do NOT want to end up.  Hooks
> everywhere and enormous extensions that break anytime we change anything
> in the core.  It's not really clear that anybody is going to find that
> more maintainable than a straight fork, except to the extent that it
> enables the erstwhile forkers to shove some of their work onto the PG
> community.

+1.

Making the lexer and parser extensible seems desirable to me. It would
be beneficial not only for companies like EDB and Amazon that might
want to extend the grammar in various ways, but also for extension
authors. However, it's vastly harder than Jan's proposal to make the
wire protocol pluggable. The wire protocol is pretty well-isolated
from the rest of the system. As long as you can get queries out of the
packets the client sends and package up the results to send back, it's
all good. The parser, on the other hand, is not at all well-isolated
from the rest of the system. There's a LOT of code that knows a whole
lot of stuff about the structure of parse trees, so your variant
parser can't produce parse trees for new kinds of DDL, or for new
query constructs. And if it parsed some completely different syntax
where, say, joins were not explicit, it would still have to figure out
how to represent them in a way that looked just like it came out of
the regular parser -- otherwise, parse analysis and query planning and
so forth are not going to work, unless you go and change a lot of
other code too, and I don't really have any idea how we could solve
that, even in theory. But that kind of thing just isn't a problem for
the proposal on this thread.

That being said, I'm not in favor of transferring maintenance work to
the community for this set of hooks any more than I am for something
on the parsing side. In general, I'm in favor of as much extensibility
as we can reasonably create, but with a complicated proposal like this
one, the community should expect to be able to get something out of
it. And so far what I hear Jan saying is that these hooks could in
theory be used for things other than Amazon's proprietary efforts and
those things could in theory bring benefits to the community, but
there are no actual plans to do anything with this that would benefit
anyone other than Amazon. Which seems to bring us right back to
expecting the community to maintain things for the benefit of
third-party forks.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Improvements and additions to COPY progress reporting

2021-02-11 Thread Matthias van de Meent

On Wed, 10 Feb 2021 at 07:43, Bharath Rupireddy
 wrote:
>
> On Tue, Feb 9, 2021 at 6:02 PM Matthias van de Meent
>  wrote:
> > > Also, you can add this to the current commitfest.
> >
> > See https://commitfest.postgresql.org/32/2977/
> >
> > On Tue, 9 Feb 2021 at 12:53, Josef Šimánek  wrote:
> > >
> > > OK, would you mind to integrate my regression test initial patch as
> > > well in v3 or should I submit it later in a separate way?
> >
> > Attached, with minor fixes
>
> Why do we need to have a new test file progress.sql for the test
> cases? Can't we add them into existing copy.sql or copy2.sql? Or do
> you have a plan to add test cases into progress.sql for other progress
> reporting commands?

I don't mind moving the test into copy or copy2, but the main reason
to put it in a seperate file is to test the 'copy' component of the
feature called 'progress reporting'. If the feature instead is 'copy'
and 'progress reporting' is part of that feature, then I'd put it in
the copy-tests, but because the documentation of this has it's own
docs page  'progress reporting', and because 'copy' is a subsection of
that, I do think that this feature warrants its own regression test
file.

There are no other tests for the progress reporting feature yet,
because COPY ... FROM is the only command that is progress reported
_and_ that can fire triggers while running the command, so checking
the progress view during the progress reported command is only
feasable in COPY progress reporting. To test the other progress
reporting views, we would need multiple sessions, which I believe is
impossible in this test format. Please correct me if I'm wrong; I'd
love to add tests for the other components. That will not be in this
patchset, though.

> IMO, it's better not add any new test file but add the tests to existing 
> files.

In general I agree, but in some cases (e.g. new system component, new
full-fledged feature), new test files are needed. I think that this
could be one of those cases.

With regards,

Matthias van de Meent

Re: Transactions involving multiple postgres foreign servers, take 2

2021-02-11 Thread Masahiko Sawada

On Fri, Feb 5, 2021 at 2:45 PM Masahiko Sawada  wrote:
>
> On Tue, Feb 2, 2021 at 5:18 PM Fujii Masao  
> wrote:
> >
> >
> >
> > On 2021/01/27 14:08, Masahiko Sawada wrote:
> > > On Wed, Jan 27, 2021 at 10:29 AM Fujii Masao
> > >  wrote:
> > >>
> > >>
> > >> You fixed some issues. But maybe you forgot to attach the latest patches?
> > >
> > > Yes, I've attached the updated patches.
> >
> > Thanks for updating the patch! I tried to review 0001 and 0002 as the 
> > self-contained change.
> >
> > + * An FDW that implements both commit and rollback APIs can request to 
> > register
> > + * the foreign transaction by FdwXactRegisterXact() to participate it to a
> > + * group of distributed tranasction.  The registered foreign transactions 
> > are
> > + * identified by OIDs of server and user.
> >
> > I'm afraid that the combination of OIDs of server and user is not unique. 
> > IOW, more than one foreign transactions can have the same combination of 
> > OIDs of server and user. For example, the following two SELECT queries 
> > start the different foreign transactions but their user OID is the same. 
> > OID of user mapping should be used instead of OID of user?
> >
> >  CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw;
> >  CREATE USER MAPPING FOR postgres SERVER loopback OPTIONS (user 
> > 'postgres');
> >  CREATE USER MAPPING FOR public SERVER loopback OPTIONS (user 
> > 'postgres');
> >  CREATE TABLE t(i int);
> >  CREATE FOREIGN TABLE ft(i int) SERVER loopback OPTIONS (table_name 
> > 't');
> >  BEGIN;
> >  SELECT * FROM ft;
> >  DROP USER MAPPING FOR postgres SERVER loopback ;
> >  SELECT * FROM ft;
> >  COMMIT;
>
> Good catch. I've considered using user mapping OID or a pair of user
> mapping OID and server OID as a key of foreign transactions but I
> think it also has a problem if an FDW caches the connection by pair of
> server OID and user OID whereas the core identifies them by user
> mapping OID. For instance, mysql_fdw manages connections by pair of
> server OID and user OID.
>
> For example, let's consider the following execution:
>
> BEGIN;
> SET ROLE user_A;
> INSERT INTO ft1 VALUES (1);
> SET ROLE user_B;
> INSERT INTO ft1 VALUES (1);
> COMMIT;
>
> Suppose that an FDW identifies the connections by {server OID, user
> OID} and the core GTM identifies the transactions by user mapping OID,
> and user_A and user_B use the public user mapping to connect server_X.
> In the FDW, there are two connections identified by {user_A, sever_X}
> and {user_B, server_X} respectively, and therefore opens two
> transactions on each connection, while GTM has only one FdwXact entry
> because the two connections refer to the same user mapping OID. As a
> result, at the end of the transaction, GTM ends only one foreign
> transaction, leaving another one.
>
> Using user mapping OID seems natural to me but I'm concerned that
> changing role in the middle of transaction is likely to happen than
> dropping the public user mapping but not sure. We would need to find
> more better way.

After more thought, I'm inclined to think it's better to identify
foreign transactions by user mapping OID. The main reason is, I think
FDWs that manages connection caches by pair of user OID and server OID
potentially has a problem with the scenario Fujii-san mentioned. If an
FDW has to use another user mapping (i.g., connection information) due
to the currently used user mapping being removed, it would have to
disconnect the previous connection because it has to use the same
connection cache. But at that time it doesn't know the transaction
will be committed or aborted.

Also, such FDW has the same problem that postgres_fdw used to have; a
backend establishes multiple connections with the same connection
information if multiple local users use the public user mapping. Even
from the perspective of foreign transaction management, it more makes
sense that foreign transactions correspond to the connections to
foreign servers, not to the local connection information.

I can see that some FDW implementations such as mysql_fdw and
firebird_fdw identify connections by pair of server OID and user OID
but I think this is because they consulted to old postgres_fdw code. I
suspect that there is no use case where FDW needs to identify
connections in that way. If the core GTM identifies them by user
mapping OID, we could enforce those FDWs to change their way but I
think that change would be the right improvement.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: TRUNCATE on foreign table

2021-02-11 Thread Ashutosh Bapat

On Wed, Feb 10, 2021 at 10:58 PM Kazutaka Onishi  wrote:
>
> That's because using the foreign server is difficult for the user.
>
> For example, the user doesn't always have the permission to login to the 
> forein server.
> In some cases, the foreign table has been created by the administrator that 
> has permission to access the two servers and the user only uses the local 
> server.
> Then the user has to ask the administrator to run TRUNCATE every time.

That might actually be seen as a loophole but ...

>
> Furthermore,there are some fdw extensions which don't support SQL. mongo_fdw, 
> redis_fdw, etc...
> These extensions have been used to provide SQL interfaces to the users.
> It's hard for the user to run TRUNCATE after learning each database.

this has some appeal.

Thanks for sharing the usecases.
-- 
Best Wishes,
Ashutosh Bapat

Add tests for bytea LIKE operator

2021-02-11 Thread Peter Eisentraut


A few more easy tests for things not covered at all:

bytea LIKE bytea (bytealike)
bytea NOT LIKE bytea (byteanlike)
ESCAPE clause for the above (like_escape_bytea)

also

name NOT ILIKE text (nameicnlike)

See also 
.
From e6af4a7baa2703cc8239a3e9670d21ef36009e90 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 11 Feb 2021 14:21:06 +0100
Subject: [PATCH] Add tests for bytea LIKE operator

Add test coverage for the following operations, which were previously
not tested at all:

bytea LIKE bytea (bytealike)
bytea NOT LIKE bytea (byteanlike)
ESCAPE clause for the above (like_escape_bytea)

also

name NOT ILIKE text (nameicnlike)
---
 src/test/regress/expected/strings.out | 48 +++
 src/test/regress/sql/strings.sql  | 12 +++
 2 files changed, 60 insertions(+)

diff --git a/src/test/regress/expected/strings.out 
b/src/test/regress/expected/strings.out
index 7c91afa6e4..fb4573d85f 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -1035,6 +1035,30 @@ SELECT 'indio' NOT LIKE 'in_o' AS "true";
  t
 (1 row)
 
+SELECT 'abc'::name LIKE '_b_' AS "true";
+ true 
+--
+ t
+(1 row)
+
+SELECT 'abc'::name NOT LIKE '_b_' AS "false";
+ false 
+---
+ f
+(1 row)
+
+SELECT 'abc'::bytea LIKE '_b_'::bytea AS "true";
+ true 
+--
+ t
+(1 row)
+
+SELECT 'abc'::bytea NOT LIKE '_b_'::bytea AS "false";
+ false 
+---
+ f
+(1 row)
+
 -- unused escape character
 SELECT 'hawkeye' LIKE 'h%' ESCAPE '#' AS "true";
  true 
@@ -1158,6 +1182,18 @@ SELECT 'i_dio' NOT LIKE 'i$_d%o' ESCAPE '$' AS "false";
  f
 (1 row)
 
+SELECT 'a_c'::bytea LIKE 'a$__'::bytea ESCAPE '$'::bytea AS "true";
+ true 
+--
+ t
+(1 row)
+
+SELECT 'a_c'::bytea NOT LIKE 'a$__'::bytea ESCAPE '$'::bytea AS "false";
+ false 
+---
+ f
+(1 row)
+
 -- escape character same as pattern character
 SELECT 'maca' LIKE 'm%aca' ESCAPE '%' AS "true";
  true 
@@ -1271,6 +1307,18 @@ SELECT 'Hawkeye' NOT ILIKE 'h%' AS "false";
  f
 (1 row)
 
+SELECT 'ABC'::name ILIKE '_b_' AS "true";
+ true 
+--
+ t
+(1 row)
+
+SELECT 'ABC'::name NOT ILIKE '_b_' AS "false";
+ false 
+---
+ f
+(1 row)
+
 --
 -- test %/_ combination cases, cf bugs #4821 and #5478
 --
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index ef4bfb008a..57a48c9d0b 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -300,6 +300,12 @@
 SELECT 'indio' LIKE 'in_o' AS "false";
 SELECT 'indio' NOT LIKE 'in_o' AS "true";
 
+SELECT 'abc'::name LIKE '_b_' AS "true";
+SELECT 'abc'::name NOT LIKE '_b_' AS "false";
+
+SELECT 'abc'::bytea LIKE '_b_'::bytea AS "true";
+SELECT 'abc'::bytea NOT LIKE '_b_'::bytea AS "false";
+
 -- unused escape character
 SELECT 'hawkeye' LIKE 'h%' ESCAPE '#' AS "true";
 SELECT 'hawkeye' NOT LIKE 'h%' ESCAPE '#' AS "false";
@@ -333,6 +339,9 @@
 SELECT 'i_dio' LIKE 'i$_d%o' ESCAPE '$' AS "true";
 SELECT 'i_dio' NOT LIKE 'i$_d%o' ESCAPE '$' AS "false";
 
+SELECT 'a_c'::bytea LIKE 'a$__'::bytea ESCAPE '$'::bytea AS "true";
+SELECT 'a_c'::bytea NOT LIKE 'a$__'::bytea ESCAPE '$'::bytea AS "false";
+
 -- escape character same as pattern character
 SELECT 'maca' LIKE 'm%aca' ESCAPE '%' AS "true";
 SELECT 'maca' NOT LIKE 'm%aca' ESCAPE '%' AS "false";
@@ -367,6 +376,9 @@
 SELECT 'Hawkeye' ILIKE 'h%' AS "true";
 SELECT 'Hawkeye' NOT ILIKE 'h%' AS "false";
 
+SELECT 'ABC'::name ILIKE '_b_' AS "true";
+SELECT 'ABC'::name NOT ILIKE '_b_' AS "false";
+
 --
 -- test %/_ combination cases, cf bugs #4821 and #5478
 --
-- 
2.30.0

Re: libpq compression

2021-02-11 Thread Konstantin Knizhnik





On 11.02.2021 16:09, Daniil Zakhlystov wrote:

Hi!


On 09.02.2021 09:06, Konstantin Knizhnik wrote:

Sorry, but my interpretation of your results is completely different:
permanent compression is faster than chunked compression (2m15 vs. 2m27)
and consumes less CPU (44 vs 48 sec).
Size of RX data is slightly larger - 0.5Mb but TX size is smaller - 5Mb.
So permanent compression is better from all points of view: it is
faster, consumes less CPU and reduces network traffic!

 From my point of view your results just prove my original opinion that
possibility to control compression on the fly and use different
compression algorithm for TX/RX data
just complicates implementation and given no significant advantages.

When I mentioned the lower CPU usage, I was referring to the pgbench test 
results in attached
google doc, where chunked compression demonstrated lower CPU usage compared to 
the permanent compression.

I made another (a little bit larger) pgbench test to demonstrate this:

Pgbench test parameters:

Data load
pgbench -i -s 100

Run configuration
pgbench --builtin tpcb-like -t 1500 --jobs=64 --client==600"

Pgbench test results:

No compression
latency average = 247.793 ms
tps = 2421.380067 (including connections establishing)
tps = 2421.660942 (excluding connections establishing)

real6m11.818s
user1m0.620s
sys 2m41.087s
RX bytes diff, human: 703.9221M
TX bytes diff, human: 772.2580M

Chunked compression (compress only CopyData and DataRow messages)
latency average = 249.123 ms
tps = 2408.451724 (including connections establishing)
tps = 2408.719578 (excluding connections establishing)

real6m13.819s
user1m18.800s
sys 2m39.941s
RX bytes diff, human: 707.3872M
TX bytes diff, human: 772.1594M

Permanent compression
latency average = 250.312 ms
tps = 2397.005945 (including connections establishing)
tps = 2397.279338 (excluding connections establishing)

real6m15.657s
user1m54.281s
sys 2m37.187s
RX bytes diff, human: 610.6932M
TX bytes diff, human: 513.2225M


As you can see in the above results, user CPU time (1m18.800s vs 1m54.281s) is 
significantly smaller in
chunked compression because it doesn’t try to compress all of the packets.
Well, but permanent compression provides some (not so large) reducing of 
traffic, while
for chunked compression network traffic is almost the same as with 
no-compression, but it consumes more CPU.


Definitely pgbench queries are not the case where compression should be 
used: both requests and responses are too short to make compression 
efficient.

So in this case compression should not be used at all.
From my point of view, "chunked compression" is not a good compromise 
between no-compression and permanent-compression cases,
but it combines drawbacks of two approaches: doesn't reduce traffic but 
consume more CPU.


Here is the summary from my POV, according to these and previous tests results:

1. Permanent compression always brings the highest compression ratio
2. Permanent compression might be not worthwhile in load different from COPY 
data / Replication / BLOBs/JSON queries
3. Chunked compression allows to compress only well compressible messages and 
save the CPU cycles by not compressing the others
4. Chunked compression introduces some traffic overhead compared to the 
permanent (1.2810G vs 1.2761G TX data on pg_restore of IMDB database dump, 
according to results in my previous message)
5. From the protocol point of view, chunked compression seems a little bit more 
flexible:
  - we can inject some uncompressed messages at any time without the need to 
decompress/compress the compressed data
  - we can potentially switch the compression algorithm at any time (but I 
think that this might be over-engineering)

Given the summary above, I think it’s time to make a decision on which path we 
should take and make the final list of goals that need to be reached in this 
patch to make it committable.

Thanks,

Daniil Zakhlystov

Re: pg_cryptohash_final possible out-of-bounds access (per Coverity)

Em qui., 11 de fev. de 2021 às 09:47, Michael Paquier 
escreveu:

> On Wed, Feb 10, 2021 at 09:14:46AM -0300, Ranier Vilela wrote:
> > It is necessary to correct the interfaces. To caller, inform the size of
> > the buffer it created.
>
> Well, Coverity likes nannyism, so each one of its reports is to take
> with a pinch of salt, so there is no point to change something that
> does not make sense just to please a static analyzer.  The context
> of the code matters.
>
I do not agree. Coverity is a valuable tool that points to bad design
functions.
As demonstrated in the first email, it allows the user of the functions to
corrupt memory.
So it makes perfect sense, fixing the interface to prevent and prevent
future modifications, simply breaking cryptohash api.


>
> Now, the patch you sent has no need to be that complicated, and it
> partially works while not actually solving at all the problem you are
> trying to solve (nothing done for MD5 or OpenSSL).  Attached is an
> example of what I finish with while poking at this issue. There is IMO
> no point to touch the internals of SCRAM that all rely on the same
> digest lengths for the proof generation with SHA256.
>
Too fast. I spent 30 minutes doing the patch.


>
> > I think it should be error-out, because the buffer can be malloc.
>
> I don't understand what you mean here, but cryptohash[_openssl].c
> should not issue an error directly, just return a status code that the
> caller can consume to generate an error.
>
I meant that it is not a case of assertion, as suggested by Kyotaro,
because someone might want to create a dynamic buffer per malloc, to store
the digest.
Anyway, the buffer creator needs to tell the functions what the actual
buffer size is, so they can decide what to do.

regards,
Ranier Vilela

Re: Online checksums patch - once again

2021-02-11 Thread Bruce Momjian

On Wed, Feb 10, 2021 at 01:26:18PM -0500, Bruce Momjian wrote:
> On Wed, Feb 10, 2021 at 03:25:58PM +0100, Magnus Hagander wrote:
> > A fairly large amount of this complexity comes out of the fact that it
> > now supports restarting and tracks checksums on a per-table basis. We
> > skipped this in the original patch for exactly this reason (that's not
> > to say there isn't a fair amount of complexity even without it, but it
> > did substantially i increase both the size and the complexity of the
> > patch), but in the review of that i was specifically asked for having
> > that added. I personally don't think it's worth that complexity but at
> > the time that seemed to be a pretty strong argument. So I'm not
> > entirely sure how to move forward with that...
> > 
> > is your impression that it would still be too complicated, even without 
> > that?
> 
> I was wondering why this feature has stalled for so long --- now I know.
> This does highlight the risk of implementing too many additions to a
> feature. I am working against this dynamic in the cluster file
> encryption feature I am working on.

Oh, I think another reason this patchset has had problems is related to
something I mentioned in 2018:

https://www.postgresql.org/message-id/20180801163613.ga2...@momjian.us

This patchset is weird because it is perhaps our first case of trying to
change the state of the server while it is running.  We just don't have
an established protocol for how to orchestrate that, so we are limping
along toward a solution.  Forcing a restart is probably part of that
primitive orchestration.  We will probably have similar challenges if we
ever allowed Postgres to change its data format on the fly.  These
challenges are one reason pg_upgrade only modifies the new cluster,
never the old one.

I don't think anyone has done anything wrong --- rather, it is what we
are _trying_ to do that is complex.  Adding restartability to this just
added to the challenge.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

2021-02-11 Thread Ashutosh Bapat

Can this information be part of PathTarget structure and hence part of
RelOptInfo::reltarget, so that it can be extended to join, group and
other kinds of RelOptInfo in future? In fact, it might be easy to do
that in this patch itself.

On Wed, Feb 10, 2021 at 8:57 AM Andy Fan  wrote:
>
>
> On Wed, Feb 10, 2021 at 11:18 AM Andy Fan  wrote:
>>
>> Hi:
>>
>> This patch is the first patch in UniqueKey patch series[1], since I need to 
>> revise
>> that series many times but the first one would be not that often, so I'd 
>> like to
>> submit this one for review first so that I don't need to maintain it again 
>> and again.
>>
>> v1-0001-Introduce-notnullattrs-field-in-RelOptInfo-to-ind.patch
>>
>> Introduce notnullattrs field in RelOptInfo to indicate which attr are not 
>> null
>> in current query. The not null is judged by checking pg_attribute and query's
>> restrictinfo. The info is only maintained at Base RelOptInfo and Partition's
>> RelOptInfo level so far.
>>
>>
>> Any thoughts?
>>
>> [1] 
>> https://www.postgresql.org/message-id/CAKU4AWr1BmbQB4F7j22G%2BNS4dNuem6dKaUf%2B1BK8me61uBgqqg%40mail.gmail.com
>> --
>> Best Regards
>> Andy Fan (https://www.aliyun.com/)
>
>
> Add the missed patch..
>
> --
> Best Regards
> Andy Fan (https://www.aliyun.com/)



-- 
Best Wishes,
Ashutosh Bapat

Re: libpq compression

2021-02-11 Thread Daniil Zakhlystov

Hi!

> On 09.02.2021 09:06, Konstantin Knizhnik wrote:
> 
> Sorry, but my interpretation of your results is completely different:
> permanent compression is faster than chunked compression (2m15 vs. 2m27) 
> and consumes less CPU (44 vs 48 sec).
> Size of RX data is slightly larger - 0.5Mb but TX size is smaller - 5Mb.
> So permanent compression is better from all points of view: it is 
> faster, consumes less CPU and reduces network traffic!
> 
> From my point of view your results just prove my original opinion that 
> possibility to control compression on the fly and use different 
> compression algorithm for TX/RX data
> just complicates implementation and given no significant advantages.

When I mentioned the lower CPU usage, I was referring to the pgbench test 
results in attached 
google doc, where chunked compression demonstrated lower CPU usage compared to 
the permanent compression.

I made another (a little bit larger) pgbench test to demonstrate this:

Pgbench test parameters:

Data load
pgbench -i -s 100

Run configuration
pgbench --builtin tpcb-like -t 1500 --jobs=64 --client==600"

Pgbench test results:

No compression
latency average = 247.793 ms
tps = 2421.380067 (including connections establishing)
tps = 2421.660942 (excluding connections establishing)

real6m11.818s
user1m0.620s
sys 2m41.087s
RX bytes diff, human: 703.9221M
TX bytes diff, human: 772.2580M

Chunked compression (compress only CopyData and DataRow messages)
latency average = 249.123 ms
tps = 2408.451724 (including connections establishing)
tps = 2408.719578 (excluding connections establishing)

real6m13.819s
user1m18.800s
sys 2m39.941s
RX bytes diff, human: 707.3872M
TX bytes diff, human: 772.1594M

Permanent compression
latency average = 250.312 ms
tps = 2397.005945 (including connections establishing)
tps = 2397.279338 (excluding connections establishing)

real6m15.657s
user1m54.281s
sys 2m37.187s
RX bytes diff, human: 610.6932M
TX bytes diff, human: 513.2225M


As you can see in the above results, user CPU time (1m18.800s vs 1m54.281s) is 
significantly smaller in
chunked compression because it doesn’t try to compress all of the packets. 

Here is the summary from my POV, according to these and previous tests results:

1. Permanent compression always brings the highest compression ratio
2. Permanent compression might be not worthwhile in load different from COPY 
data / Replication / BLOBs/JSON queries
3. Chunked compression allows to compress only well compressible messages and 
save the CPU cycles by not compressing the others
4. Chunked compression introduces some traffic overhead compared to the 
permanent (1.2810G vs 1.2761G TX data on pg_restore of IMDB database dump, 
according to results in my previous message)
5. From the protocol point of view, chunked compression seems a little bit more 
flexible:
 - we can inject some uncompressed messages at any time without the need to 
decompress/compress the compressed data
 - we can potentially switch the compression algorithm at any time (but I think 
that this might be over-engineering)

Given the summary above, I think it’s time to make a decision on which path we 
should take and make the final list of goals that need to be reached in this 
patch to make it committable. 

Thanks,

Daniil Zakhlystov

Re: Operands don't affect result (CONSTANT_EXPRESSION_RESULT) (src/backend/utils/adt/jsonfuncs.c)

Em qui., 11 de fev. de 2021 às 01:46, Tom Lane  escreveu:

> Ranier Vilela  writes:
> > *long* 4 *long int*, *signed long int* -2.147.483.648 a 2.147.483.647
> > Therefore long never be > INT_MAX at Windows 64 bits.
> > Thus lindex is always false in this expression:
> > if (errno != 0 || badp == c || *badp != '\0' || lindex > INT_MAX ||
> lindex
> >  < INT_MIN)
>
> At the same time, I think this code could be improved; but the way
> to do that is to use strtoint(), rather than kluging the choice of
> datatype even further.
>
No matter the function used strtol or strtoint, push_path will remain
broken with Windows 64bits.
Or need to correct the expression.
Definitely using long is a bad idea.

regards,
Ranier Vilela

Re: ERROR: invalid spinlock number: 0

2021-02-11 Thread Michael Paquier

Hi Fujii-san,

On Tue, Feb 09, 2021 at 11:17:04PM +0900, Fujii Masao wrote:
> ISTM that the commit 2c8dd05d6c caused this issue. The commit changed
> pg_stat_get_wal_receiver() so that it reads "writtenUpto" by using
> pg_atomic_read_u64(). But since "writtenUpto" is initialized only when
> walreceiver starts up, reading "writtenUpto" before the startup of
> walreceiver can cause the error.

Indeed, that's a problem.  We should at least move that out of the
spin lock area.  I'll try to look at that in details, and that's going
to take me a couple of days at least.  Sorry for the delay.
--
Michael


signature.asc
Description: PGP signature

Re: pg_cryptohash_final possible out-of-bounds access (per Coverity)

2021-02-11 Thread Michael Paquier

On Wed, Feb 10, 2021 at 09:14:46AM -0300, Ranier Vilela wrote:
> It is necessary to correct the interfaces. To caller, inform the size of
> the buffer it created.

Well, Coverity likes nannyism, so each one of its reports is to take
with a pinch of salt, so there is no point to change something that
does not make sense just to please a static analyzer.  The context
of the code matters.

Now, the patch you sent has no need to be that complicated, and it
partially works while not actually solving at all the problem you are
trying to solve (nothing done for MD5 or OpenSSL).  Attached is an
example of what I finish with while poking at this issue. There is IMO
no point to touch the internals of SCRAM that all rely on the same
digest lengths for the proof generation with SHA256.

> I think it should be error-out, because the buffer can be malloc.

I don't understand what you mean here, but cryptohash[_openssl].c
should not issue an error directly, just return a status code that the
caller can consume to generate an error.
--
Michael
diff --git a/src/include/common/cryptohash.h b/src/include/common/cryptohash.h
index 32d7784ca5..541dc844c8 100644
--- a/src/include/common/cryptohash.h
+++ b/src/include/common/cryptohash.h
@@ -32,7 +32,7 @@ typedef struct pg_cryptohash_ctx pg_cryptohash_ctx;
 extern pg_cryptohash_ctx *pg_cryptohash_create(pg_cryptohash_type type);
 extern int	pg_cryptohash_init(pg_cryptohash_ctx *ctx);
 extern int	pg_cryptohash_update(pg_cryptohash_ctx *ctx, const uint8 *data, size_t len);
-extern int	pg_cryptohash_final(pg_cryptohash_ctx *ctx, uint8 *dest);
+extern int	pg_cryptohash_final(pg_cryptohash_ctx *ctx, uint8 *dest, size_t len);
 extern void pg_cryptohash_free(pg_cryptohash_ctx *ctx);
 
 #endif			/* PG_CRYPTOHASH_H */
diff --git a/src/backend/libpq/auth-scram.c b/src/backend/libpq/auth-scram.c
index 8d857f39df..49386131c1 100644
--- a/src/backend/libpq/auth-scram.c
+++ b/src/backend/libpq/auth-scram.c
@@ -1429,7 +1429,7 @@ scram_mock_salt(const char *username)
 	if (pg_cryptohash_init(ctx) < 0 ||
 		pg_cryptohash_update(ctx, (uint8 *) username, strlen(username)) < 0 ||
 		pg_cryptohash_update(ctx, (uint8 *) mock_auth_nonce, MOCK_AUTH_NONCE_LEN) < 0 ||
-		pg_cryptohash_final(ctx, sha_digest) < 0)
+		pg_cryptohash_final(ctx, sha_digest, PG_SHA256_DIGEST_LENGTH) < 0)
 	{
 		pg_cryptohash_free(ctx);
 		return NULL;
diff --git a/src/backend/replication/backup_manifest.c b/src/backend/replication/backup_manifest.c
index 0cefd181b5..18c7bca8d3 100644
--- a/src/backend/replication/backup_manifest.c
+++ b/src/backend/replication/backup_manifest.c
@@ -330,7 +330,8 @@ SendBackupManifest(backup_manifest_info *manifest)
 	 * twice.
 	 */
 	manifest->still_checksumming = false;
-	if (pg_cryptohash_final(manifest->manifest_ctx, checksumbuf) < 0)
+	if (pg_cryptohash_final(manifest->manifest_ctx, checksumbuf,
+			PG_SHA256_DIGEST_LENGTH) < 0)
 		elog(ERROR, "failed to finalize checksum of backup manifest");
 	AppendStringToManifest(manifest, "\"Manifest-Checksum\": \"");
 	dstlen = pg_hex_enc_len(PG_SHA256_DIGEST_LENGTH);
diff --git a/src/backend/utils/adt/cryptohashfuncs.c b/src/backend/utils/adt/cryptohashfuncs.c
index 152adcbfb4..6a0f0258e6 100644
--- a/src/backend/utils/adt/cryptohashfuncs.c
+++ b/src/backend/utils/adt/cryptohashfuncs.c
@@ -114,7 +114,8 @@ cryptohash_internal(pg_cryptohash_type type, bytea *input)
 		elog(ERROR, "could not initialize %s context", typestr);
 	if (pg_cryptohash_update(ctx, data, len) < 0)
 		elog(ERROR, "could not update %s context", typestr);
-	if (pg_cryptohash_final(ctx, (unsigned char *) VARDATA(result)) < 0)
+	if (pg_cryptohash_final(ctx, (unsigned char *) VARDATA(result),
+			digest_len) < 0)
 		elog(ERROR, "could not finalize %s context", typestr);
 	pg_cryptohash_free(ctx);
 
diff --git a/src/common/checksum_helper.c b/src/common/checksum_helper.c
index a895e2e285..54b684d5e8 100644
--- a/src/common/checksum_helper.c
+++ b/src/common/checksum_helper.c
@@ -198,25 +198,29 @@ pg_checksum_final(pg_checksum_context *context, uint8 *output)
 			memcpy(output, &context->raw_context.c_crc32c, retval);
 			break;
 		case CHECKSUM_TYPE_SHA224:
-			if (pg_cryptohash_final(context->raw_context.c_sha2, output) < 0)
+			if (pg_cryptohash_final(context->raw_context.c_sha2,
+	output, PG_SHA224_DIGEST_LENGTH) < 0)
 return -1;
 			pg_cryptohash_free(context->raw_context.c_sha2);
 			retval = PG_SHA224_DIGEST_LENGTH;
 			break;
 		case CHECKSUM_TYPE_SHA256:
-			if (pg_cryptohash_final(context->raw_context.c_sha2, output) < 0)
+			if (pg_cryptohash_final(context->raw_context.c_sha2,
+	output, PG_SHA256_DIGEST_LENGTH) < 0)
 return -1;
 			pg_cryptohash_free(context->raw_context.c_sha2);
 			retval = PG_SHA256_DIGEST_LENGTH;
 			break;
 		case CHECKSUM_TYPE_SHA384:
-			if (pg_cryptohash_final(context->raw_context.c_sha2, output) < 0)
+			if (pg_cryptohash_final(context->raw_context.c_sha2,
+	output, PG_SHA384_DIGEST_LENGTH) < 0)
 retur

Re: [HACKERS] Custom compression methods

2021-02-11 Thread Dilip Kumar

On Thu, Feb 11, 2021 at 3:26 AM Robert Haas  wrote:
>
> On Wed, Feb 10, 2021 at 3:06 PM Robert Haas  wrote:
> > I think you have a fairly big problem with row types. Consider this example:
> >
> > create table t1 (a int, b text compression pglz);
> > create table t2 (a int, b text compression lz4);
> > create table t3 (x t1);
> > insert into t1 values (1, repeat('foo', 1000));
> > insert into t2 values (1, repeat('foo', 1000));
> > insert into t3 select t1 from t1;
> > insert into t3 select row(a, b)::t1 from t2;
> >
> > rhaas=# select pg_column_compression((t3.x).b) from t3;
> >  pg_column_compression
> > ---
> >  pglz
> >  lz4
> > (2 rows)
> >
> > That's not good, because now

Yeah, that's really bad.

> ...because now I hit send too soon. Also, because now column b has
> implicit dependencies on both compression AMs and the rest of the
> system has no idea. I think we probably should have a rule that
> nothing except pglz is allowed inside of a record, just to keep it
> simple. The record overall can be toasted so it's not clear why we
> should also be toasting the original columns at all, but I think
> precedent probably argues for continuing to allow PGLZ, as it can
> already be like that on disk and pg_upgrade is a thing. The same kind
> of issue probably exists for arrays and range types.

While constructing a row type from the tuple we flatten the external
data I think that would be the place to decompress the data if they
are not compressed with PGLZ.  For array-type, we are already
detoasting/decompressing the source attribute see "construct_md_array"
so the array type doesn't have this problem.  I haven't yet checked
the range type.  Based on my analysis it appeared that the different
data types are getting constructed at different paths so maybe we
should find some centralized place or we need to make some function
call in all such places so that we can decompress the attribute if
required before forming the tuple for the composite type.

I have quickly hacked the code and after that, your test case is working fine.

postgres[55841]=# select pg_column_compression((t3.x).b) from t3;
 pg_column_compression
---
 pglz

(2 rows)

-> now the attribute 'b' inside the second tuple is decompressed
(because it was not compressed with PGLZ) so the compression method of
b is NULL

postgres[55841]=# select pg_column_compression((t3.x)) from t3;
 pg_column_compression
---

 pglz
(2 rows)

--> but the second row itself is compressed back using the local
compression method of t3

W.R.T the attached patch, In HeapTupleHeaderGetDatum, we don't even
attempt to detoast if there is no external field in the tuple, in POC
I have got rid of that check, but I think we might need to do better.
Maybe we can add a flag in infomask to detect whether the tuple has
any compressed data or not as we have for detecting the external data
(HEAP_HASEXTERNAL).

So I will do some more analysis in this area and try to come up with a
clean solution.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

POC_fix_compression_in_rowtype.patch
Description: Binary data

Re: repeated decoding of prepared transactions

2021-02-11 Thread Markus Wanner


Hello Amit,

thanks a lot for your extensive explanation and examples, I appreciate 
this very much.  I'll need to think this through and see how we can make 
this work for us.


Best Regards

Markus

Re: Dump public schema ownership & seclabels

2021-02-11 Thread Noah Misch

On Sun, Jan 17, 2021 at 12:00:06PM +0100, Vik Fearing wrote:
> On 1/17/21 10:41 AM, Noah Misch wrote:
> > On Sat, Jan 16, 2021 at 02:05:43PM +0100, Vik Fearing wrote:
> >> On 12/30/20 12:59 PM, Noah Misch wrote:
> >>> On Tue, Dec 29, 2020 at 05:49:24AM -0800, Noah Misch wrote:
>  https://postgr.es/m/20201031163518.gb4039...@rfd.leadboat.com gave 
>  $SUBJECT as
>  one of the constituent projects for changing the public schema default 
>  ACL.
>  This ended up being simple.  Attached.
> >>>
> >>> This is defective; it fails to reproduce nspacl after "ALTER SCHEMA public
> >>> OWNER TO pg_write_server_files; REVOKE ALL ON SCHEMA public FROM
> >>> pg_write_server_files;".  I will try again later.

Fixed.  The comment added to getNamespaces() explains what went wrong.

Incidentally, --no-acl is fragile without --no-owner, because any REVOKE
statements assume a particular owner.  Since one can elect --no-owner at
restore time, we can't simply adjust the REVOKE statements constructed at dump
time.  That's not new with this patch or specific to initdb-created objects.

> >> Could I ask you to also include COMMENTs when you try again, please?
> > 
> > That may work.  I had not expected to hear of a person changing the comment 
> > on
> > schema public.  To what do you change it?
> 
> It was a while ago and I don't remember because it didn't appear in the
> dump so I stopped doing it. :(
> 
> Mine was an actual comment, but there are some tools out there that
> (ab)use COMMENTs as crude metadata for what they do.  For example:
> https://postgresql-anonymizer.readthedocs.io/en/stable/declare_masking_rules/#declaring-rules-with-comments

I've attached a separate patch for this, which applies atop the ownership
patch.  This makes more restores fail for non-superusers, which is okay.
Author: Noah Misch 
Commit: Noah Misch 

Dump public schema ownership and security labels.

As a side effect, this corrects dumps of public schema ACLs in databases
where the DBA dropped and recreated that schema.

Reviewed by FIXME.

Discussion: https://postgr.es/m/20201229134924.ga1431...@rfd.leadboat.com

diff --git a/src/bin/pg_dump/dumputils.c b/src/bin/pg_dump/dumputils.c
index 60d306e..ea67e52 100644
--- a/src/bin/pg_dump/dumputils.c
+++ b/src/bin/pg_dump/dumputils.c
@@ -725,6 +725,7 @@ void
 buildACLQueries(PQExpBuffer acl_subquery, PQExpBuffer racl_subquery,
PQExpBuffer init_acl_subquery, PQExpBuffer 
init_racl_subquery,
const char *acl_column, const char *acl_owner,
+   const char *initprivs_expr,
const char *obj_kind, bool binary_upgrade)
 {
/*
@@ -765,23 +766,25 @@ buildACLQueries(PQExpBuffer acl_subquery, PQExpBuffer 
racl_subquery,
  "WITH ORDINALITY AS perm(acl,row_n) "
  "WHERE NOT EXISTS ( "
  "SELECT 1 FROM "
- 
"pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault(%s,%s))) "
+ 
"pg_catalog.unnest(coalesce(%s,pg_catalog.acldefault(%s,%s))) "
  "AS init(init_acl) WHERE acl = 
init_acl)) as foo)",
  acl_column,
  obj_kind,
  acl_owner,
+ initprivs_expr,
  obj_kind,
  acl_owner);
 
printfPQExpBuffer(racl_subquery,
  "(SELECT pg_catalog.array_agg(acl 
ORDER BY row_n) FROM "
  "(SELECT acl, row_n FROM "
- 
"pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault(%s,%s))) "
+ 
"pg_catalog.unnest(coalesce(%s,pg_catalog.acldefault(%s,%s))) "
  "WITH ORDINALITY AS initp(acl,row_n) "
  "WHERE NOT EXISTS ( "
  "SELECT 1 FROM "
  
"pg_catalog.unnest(coalesce(%s,pg_catalog.acldefault(%s,%s))) "
  "AS permp(orig_acl) WHERE acl = 
orig_acl)) as foo)",
+ initprivs_expr,
  obj_kind,
  acl_owner,
  acl_column,
@@ -807,12 +810,13 @@ buildACLQueries(PQExpBuffer acl_subquery, PQExpBuffer 
racl_subquery,
printfPQExpBuffer(init_acl_subquery,
  "CASE WHEN privtype = 'e' 
THEN "

Re: Single transaction in the tablesync worker?

On Thu, Feb 11, 2021 at 3:32 PM Petr Jelinek
 wrote:
>
> On 11 Feb 2021, at 10:56, Amit Kapila  wrote:
> >
> >> Well, I was thinking it could be NOTICE or LOG to be honest, WARNING seems 
> >> unnecessarily scary for those usecases to me.
> >>
> >
> > I am fine with LOG and will make that change. Do you have any more
> > comments or want to spend more time on this patch before we call it
> > good?
>
> I am good, thanks!
>

Okay, attached an updated patch with only that change.

-- 
With Regards,
Amit Kapila.


v32-0001-Allow-multiple-xacts-during-table-sync-in-logica.patch
Description: Binary data

Re: Is Recovery actually paused?

2021-02-11 Thread Dilip Kumar

On Thu, Feb 11, 2021 at 3:20 PM Bharath Rupireddy
 wrote:
>
> On Wed, Feb 10, 2021 at 8:39 PM Dilip Kumar  wrote:
> >
> > On Wed, Feb 10, 2021 at 10:02 AM Dilip Kumar  wrote:
> > >
> > > I don't find any problem with this approach as well, but I personally
> > > feel that the other approach where we don't wait in any API and just
> > > return the recovery pause state is much simpler and more flexible.  So
> > > I will make the pending changes in that patch and let's see what are
> > > the other opinion and based on that we can conclude.  Thanks for the
> > > patch.
> >
> > Here is an updated version of the patch which fixes the last two open 
> > problems
> > 1. In RecoveryRequiresIntParameter set the recovery pause state in the
> > loop so that if recovery resumed and pause requested again we can set
> > to pause again.
> > 2. If the recovery state is already 'paused' then don't set it back to
> > the 'pause requested'.
> >
> > One more point is that in 'pg_wal_replay_pause' even if we don't
> > change the state because it was already set to the 'paused' then also
> > we call the WakeupRecovery.  But I don't think there is any problem
> > with that, if we think that this should be changed then we can make
> > SetRecoveryPause return a bool such that if it doesn't do state change
> > then it returns false and in that case we can avoid calling
> > WakeupRecovery, but I felt that is unnecessary.  Any other thoughts on
> > this?
>
> IMO, that WakeupRecovery should not be a problem, because even now, if
> we issue a simple select pg_reload_conf(); (without even changing any
> config parameter), WakeupRecovery gets called.
>
> Thanks for the patch. I tested the new function and it works as
> expected. I have no further comments on the v13 patch.

Thanks for the review and testing.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: repeated decoding of prepared transactions

On Mon, Feb 8, 2021 at 2:01 PM Markus Wanner
wrote:
>
> I did not expect this, as any receiver that wants to have decoded 2PC is
> likely supporting some kind of two-phase commits itself. And would
> therefore prepare the transaction upon its first reception. Potentially
> receiving it a second time would require complicated filtering on every
> prepared transaction.
>

I would like to bring one other scenario to your notice where you
might want to handle things differently for prepared transactions on
the plugin side. Assume we have multiple publications (for simplicity
say 2) on publisher with corresponding subscriptions (say 2, each
corresponding to one publication on the publisher). When a user
performs a transaction on a publisher that involves the tables from
both publications, on the subscriber-side, we do that work via two
different transactions, corresponding to each subscription. But, we
need some way to deal with prepared xacts because they need GID and we
can't use the same GID for both subscriptions. Please see the detailed
example and one idea to deal with the same in the main thread[1]. It
would be really helpful if you or others working on the plugin side
can share your opinion on the same.

Now, coming back to the restart case where the prepared transaction
can be sent again by the publisher. I understand yours and others
point that we should not send prepared transaction if there is a
restart between prepare and commit but there are reasons why we have
done that way and I am open to your suggestions. I'll once again try
to explain the exact case to you which is not very apparent. The basic
idea is that we ship/replay all transactions where commit happens
after the snapshot has a consistent state (SNAPBUILD_CONSISTENT), see
atop snapbuild.c for details. Now, for transactions where prepare is
before snapshot state SNAPBUILD_CONSISTENT and commit prepared is
after SNAPBUILD_CONSISTENT, we need to send the entire transaction
including prepare at the commit time. One might think it is quite easy
to detect that, basically if we skip prepare when the snapshot state
was not SNAPBUILD_CONSISTENT, then mark a flag in ReorderBufferTxn and
use the same to detect during commit and accordingly take the decision
to send prepare but unfortunately it is not that easy. There is always
a chance that on restart we reuse the snapshot serialized by some
other Walsender at a location prior to Prepare and if that happens
then this time the prepare won't be skipped due to snapshot state
(SNAPBUILD_CONSISTENT) but due to start_decodint_at point (considering
we have already shipped some of the later commits but not prepare).
Now, this will actually become the same situation where the restart
has happened after we have sent the prepare but not commit. This is
the reason we have to resend the prepare when the subscriber restarts
between prepare and commit.

You can reproduce the case where we can't distinguish between two
situations by using the test case in twophase_snapshot.spec and
additionally starting a separate session via the debugger. So, the
steps in the test case are as below:

"s2b" "s2txid" "s1init" "s3b" "s3txid" "s2c" "s2b" "s2insert" "s2p"
"s3c" "s1insert" "s1start" "s2cp" "s1start"

Define new steps as

"s4init" {SELECT 'init' FROM
pg_create_logical_replication_slot('isolation_slot_1',
'test_decoding');}
"s4start" {SELECT data FROM
pg_logical_slot_get_changes('isolation_slot_1', NULL, NULL,
'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit',
'1');}

The first thing we need to do is s4init and stop the debugger in
SnapBuildProcessRunningXacts. Now perform steps from 's2b' till first
's1start' in twophase_snapshot.spec. Then continue in the s4 session
and perform s4start. After this, if you debug (or add the logs) the
second s1start, you will notice that we are skipping prepare not
because of inconsistent snapshot but a forward location in
start_decoding_at. If you don't involve session-4, then it will always
skip prepare due to an inconsistent snapshot state. This involves a
debugger so not easy to write an automated test for it.

I have used a bit tricky scenario to explain this but not sure if
there was any other simpler way.

[1] -
https://www.postgresql.org/message-id/CAA4eK1%2BLvkeX%3DB3xon7RcBwD4CVaFSryPj3pTBAALrDxQVPDwA%40mail.gmail.com

--
With Regards,
Amit Kapila.

Re: Single transaction in the tablesync worker?

2021-02-11 Thread Petr Jelinek

On 11 Feb 2021, at 10:56, Amit Kapila  wrote:
> 
> On Thu, Feb 11, 2021 at 3:20 PM Petr Jelinek
>  wrote:
>> 
>> On 11 Feb 2021, at 10:42, Amit Kapila  wrote:
>>> 
>>> On Thu, Feb 11, 2021 at 1:51 PM Petr Jelinek
>>>  wrote:
 
 On 10 Feb 2021, at 06:32, Amit Kapila  wrote:
> 
> On Wed, Feb 10, 2021 at 7:41 AM Peter Smith  wrote:
>> 
>> On Tue, Feb 9, 2021 at 10:38 AM Peter Smith  
>> wrote:
>>> 
>> 
>> PSA v2 of this WalRcvExceResult patch (it is same as v1 but includes
>> some PG doc updates).
>> This applies OK on top of v30 of the main patch.
>> 
> 
> Thanks, I have integrated these changes into the main patch and
> additionally made some changes to comments and docs. I have also fixed
> the function name inconsistency issue you reported and ran pgindent.
 
 One thing:
 
> + else if (res->status == WALRCV_ERROR &&
> +  missing_ok &&
> +  res->sqlstate == ERRCODE_UNDEFINED_OBJECT)
> + {
> + /* WARNING. Error, but missing_ok = true. */
> + ereport(WARNING,
> (errmsg("could not drop the 
> replication slot \"%s\" on publisher",
> slotname),
>  errdetail("The error was: %s", 
> res->err)));
 
 Hmm, why is this WARNING, we mostly call it with missing_ok = true when 
 the slot is not expected to be there, so it does not seem correct to 
 report it as warning?
 
>>> 
>>> WARNING is for the cases where we don't always expect slots to exist
>>> and we don't want to stop the operation due to it. For example, in
>>> DropSubscription, for some of the rel states like (SUBREL_STATE_INIT
>>> and SUBREL_STATE_DATASYNC), the slot won't exist. Similarly, say if we
>>> fail (due to network error) after removing some of the slots, next
>>> time, it will again try to drop already dropped slots and fail. For
>>> these reasons, we need to use WARNING. Similarly for tablesync workers
>>> when we are trying to initially drop the slot there is no certainty
>>> that it exists, so we can't throw ERROR and stop the operation there.
>>> There are other cases like when the table sync worker has finished
>>> syncing the table, there we will raise an ERROR if the slot doesn't
>>> exist. Does this make sense?
>> 
>> Well, I was thinking it could be NOTICE or LOG to be honest, WARNING seems 
>> unnecessarily scary for those usecases to me.
>> 
> 
> I am fine with LOG and will make that change. Do you have any more
> comments or want to spend more time on this patch before we call it
> good?

I am good, thanks!

—
Petr

Re: Single transaction in the tablesync worker?

On Thu, Feb 11, 2021 at 3:20 PM Petr Jelinek
 wrote:
>
> On 11 Feb 2021, at 10:42, Amit Kapila  wrote:
> >
> > On Thu, Feb 11, 2021 at 1:51 PM Petr Jelinek
> >  wrote:
> >>
> >> On 10 Feb 2021, at 06:32, Amit Kapila  wrote:
> >>>
> >>> On Wed, Feb 10, 2021 at 7:41 AM Peter Smith  wrote:
> 
>  On Tue, Feb 9, 2021 at 10:38 AM Peter Smith  
>  wrote:
> >
> 
>  PSA v2 of this WalRcvExceResult patch (it is same as v1 but includes
>  some PG doc updates).
>  This applies OK on top of v30 of the main patch.
> 
> >>>
> >>> Thanks, I have integrated these changes into the main patch and
> >>> additionally made some changes to comments and docs. I have also fixed
> >>> the function name inconsistency issue you reported and ran pgindent.
> >>
> >> One thing:
> >>
> >>> + else if (res->status == WALRCV_ERROR &&
> >>> +  missing_ok &&
> >>> +  res->sqlstate == ERRCODE_UNDEFINED_OBJECT)
> >>> + {
> >>> + /* WARNING. Error, but missing_ok = true. */
> >>> + ereport(WARNING,
> >>>  (errmsg("could not drop the 
> >>> replication slot \"%s\" on publisher",
> >>>  slotname),
> >>>   errdetail("The error was: %s", 
> >>> res->err)));
> >>
> >> Hmm, why is this WARNING, we mostly call it with missing_ok = true when 
> >> the slot is not expected to be there, so it does not seem correct to 
> >> report it as warning?
> >>
> >
> > WARNING is for the cases where we don't always expect slots to exist
> > and we don't want to stop the operation due to it. For example, in
> > DropSubscription, for some of the rel states like (SUBREL_STATE_INIT
> > and SUBREL_STATE_DATASYNC), the slot won't exist. Similarly, say if we
> > fail (due to network error) after removing some of the slots, next
> > time, it will again try to drop already dropped slots and fail. For
> > these reasons, we need to use WARNING. Similarly for tablesync workers
> > when we are trying to initially drop the slot there is no certainty
> > that it exists, so we can't throw ERROR and stop the operation there.
> > There are other cases like when the table sync worker has finished
> > syncing the table, there we will raise an ERROR if the slot doesn't
> > exist. Does this make sense?
>
> Well, I was thinking it could be NOTICE or LOG to be honest, WARNING seems 
> unnecessarily scary for those usecases to me.
>

I am fine with LOG and will make that change. Do you have any more
comments or want to spend more time on this patch before we call it
good?

-- 
With Regards,
Amit Kapila.

Re: Single transaction in the tablesync worker?

2021-02-11 Thread Petr Jelinek

On 11 Feb 2021, at 10:42, Amit Kapila  wrote:
> 
> On Thu, Feb 11, 2021 at 1:51 PM Petr Jelinek
>  wrote:
>> 
>> On 10 Feb 2021, at 06:32, Amit Kapila  wrote:
>>> 
>>> On Wed, Feb 10, 2021 at 7:41 AM Peter Smith  wrote:
 
 On Tue, Feb 9, 2021 at 10:38 AM Peter Smith  wrote:
> 
 
 PSA v2 of this WalRcvExceResult patch (it is same as v1 but includes
 some PG doc updates).
 This applies OK on top of v30 of the main patch.
 
>>> 
>>> Thanks, I have integrated these changes into the main patch and
>>> additionally made some changes to comments and docs. I have also fixed
>>> the function name inconsistency issue you reported and ran pgindent.
>> 
>> One thing:
>> 
>>> + else if (res->status == WALRCV_ERROR &&
>>> +  missing_ok &&
>>> +  res->sqlstate == ERRCODE_UNDEFINED_OBJECT)
>>> + {
>>> + /* WARNING. Error, but missing_ok = true. */
>>> + ereport(WARNING,
>>>  (errmsg("could not drop the 
>>> replication slot \"%s\" on publisher",
>>>  slotname),
>>>   errdetail("The error was: %s", 
>>> res->err)));
>> 
>> Hmm, why is this WARNING, we mostly call it with missing_ok = true when the 
>> slot is not expected to be there, so it does not seem correct to report it 
>> as warning?
>> 
> 
> WARNING is for the cases where we don't always expect slots to exist
> and we don't want to stop the operation due to it. For example, in
> DropSubscription, for some of the rel states like (SUBREL_STATE_INIT
> and SUBREL_STATE_DATASYNC), the slot won't exist. Similarly, say if we
> fail (due to network error) after removing some of the slots, next
> time, it will again try to drop already dropped slots and fail. For
> these reasons, we need to use WARNING. Similarly for tablesync workers
> when we are trying to initially drop the slot there is no certainty
> that it exists, so we can't throw ERROR and stop the operation there.
> There are other cases like when the table sync worker has finished
> syncing the table, there we will raise an ERROR if the slot doesn't
> exist. Does this make sense?

Well, I was thinking it could be NOTICE or LOG to be honest, WARNING seems 
unnecessarily scary for those usecases to me.

—
Petr

Re: Is Recovery actually paused?

2021-02-11 Thread Bharath Rupireddy

On Wed, Feb 10, 2021 at 8:39 PM Dilip Kumar  wrote:
>
> On Wed, Feb 10, 2021 at 10:02 AM Dilip Kumar  wrote:
> >
> > I don't find any problem with this approach as well, but I personally
> > feel that the other approach where we don't wait in any API and just
> > return the recovery pause state is much simpler and more flexible.  So
> > I will make the pending changes in that patch and let's see what are
> > the other opinion and based on that we can conclude.  Thanks for the
> > patch.
>
> Here is an updated version of the patch which fixes the last two open problems
> 1. In RecoveryRequiresIntParameter set the recovery pause state in the
> loop so that if recovery resumed and pause requested again we can set
> to pause again.
> 2. If the recovery state is already 'paused' then don't set it back to
> the 'pause requested'.
>
> One more point is that in 'pg_wal_replay_pause' even if we don't
> change the state because it was already set to the 'paused' then also
> we call the WakeupRecovery.  But I don't think there is any problem
> with that, if we think that this should be changed then we can make
> SetRecoveryPause return a bool such that if it doesn't do state change
> then it returns false and in that case we can avoid calling
> WakeupRecovery, but I felt that is unnecessary.  Any other thoughts on
> this?

IMO, that WakeupRecovery should not be a problem, because even now, if
we issue a simple select pg_reload_conf(); (without even changing any
config parameter), WakeupRecovery gets called.

Thanks for the patch. I tested the new function and it works as
expected. I have no further comments on the v13 patch.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Single transaction in the tablesync worker?