RE: psql - factor out echo code

2021-06-13 Thread Fabien COELHO



Wouldn't it be better to comment it like any other function?


Sure. Attached.

--
Fabien.diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 9a00499510..00e5bf290b 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -523,6 +523,18 @@ PrintTiming(double elapsed_msec)
 		   elapsed_msec, days, (int) hours, (int) minutes, seconds);
 }
 
+/*
+ * Echo user query
+ */
+static void
+echoQuery(FILE *out, const char *query)
+{
+	fprintf(out,
+			_("* QUERY **\n"
+			  "%s\n"
+			  "**\n\n"), query);
+	fflush(out);
+}
 
 /*
  * PSQLexec
@@ -549,18 +561,9 @@ PSQLexec(const char *query)
 
 	if (pset.echo_hidden != PSQL_ECHO_HIDDEN_OFF)
 	{
-		printf(_("* QUERY **\n"
- "%s\n"
- "**\n\n"), query);
-		fflush(stdout);
+		echoQuery(stdout, query);
 		if (pset.logfile)
-		{
-			fprintf(pset.logfile,
-	_("* QUERY **\n"
-	  "%s\n"
-	  "**\n\n"), query);
-			fflush(pset.logfile);
-		}
+			echoQuery(pset.logfile, query);
 
 		if (pset.echo_hidden == PSQL_ECHO_HIDDEN_NOEXEC)
 			return NULL;
@@ -1226,13 +1229,7 @@ SendQuery(const char *query)
 	}
 
 	if (pset.logfile)
-	{
-		fprintf(pset.logfile,
-_("* QUERY **\n"
-  "%s\n"
-  "**\n\n"), query);
-		fflush(pset.logfile);
-	}
+		echoQuery(pset.logfile, query);
 
 	SetCancelConn(pset.db);
 


Re: Fix dropped object handling in pg_event_trigger_ddl_commands

2021-06-13 Thread Michael Paquier
On Fri, Jun 11, 2021 at 09:36:57PM +0900, Michael Paquier wrote:
> Hm, nope.  I think that we had better pass true as argument here.

The main patch has been applied as of 2d689ba.

> First, this is more consistent with the identity lookup (OK, it does
> not matter as we would have discarded the object after the identity
> lookup anyway, but any future shuffling of this code may not be that
> wise).  Second, now that I look at it, getObjectTypeDescription() can
> never be NULL as we have fallback names for relations, routines and
> constraints for all object types so the buffer will be filled with
> some data.  Let's replace the bottom of getObjectTypeDescription()
> that returns now NULL by Assert(buffer.len > 0).  This code is new as
> of v14, so it is better to adjust that sooner than later.

And this has been simplified with b56b83a.
--
Michael


signature.asc
Description: PGP signature


Re: Avoid stuck of pbgench due to skipped transactions

2021-06-13 Thread Fabien COELHO



I attached a patch for this fix.


The patch mostly works for me, and I agree that the bench should not be in
a loop on any parameters, even when "crazy" parameters are given…

However I'm not sure this is the right way to handle this issue.

The catch-up loop can be dropped and the automaton can loop over itself to
reschedule. Doing that as the attached fixes this issue and also makes
progress reporting work proprely in more cases, and reduces the number of
lines of code. I did not add a test case because time sensitive tests have
been removed (which is too bad, IMHO).


I agree with your way to fix. However, the progress reporting didn't work
because we cannot return from advanceConnectionState to threadRun and just
break the loop.

+   /* otherwise loop over 
PREPARE_THROTTLE */
break;

I attached the fixed patch that uses return instead of break, and I confirmed
that this made the progress reporting work property.


I'm hesitating to do such a strictural change for a degenerate case linked 
to "insane" parameters, as pg is unlikely to reach 100 million tps, ever.

It seems to me enough that the command is not blocked in such cases.

--
Fabien.

Re: [bug?] Missed parallel safety checks, and wrong parallel safety

2021-06-13 Thread Tom Lane
Amit Kapila  writes:
> Why do you think we don't need to check index AM functions?

Primarily because index AMs and opclasses can only be defined by
superusers, and the superuser is assumed to know what she's doing.

More generally, we've never made any provisions for the properties
of index AMs or opclasses to change on-the-fly.  I doubt that doing
so could possibly be justified on a cost-benefit basis.

regards, tom lane




Re: Decoding speculative insert with toast leaks memory

2021-06-13 Thread Dilip Kumar
On Mon, Jun 14, 2021 at 9:44 AM Dilip Kumar  wrote:
>
> On Mon, Jun 14, 2021 at 8:34 AM Amit Kapila  wrote:
> >
> > I think the test in this patch is quite similar to what Noah has
> > pointed in the nearby thread [1] to be failing at some intervals. Can
> > you also please once verify the same and if we can expect similar
> > failures here then we might want to consider dropping the test in this
> > patch for now? We can always come back to it once we find a good
> > solution to make it pass consistently.
>
> test insert-conflict-do-nothing   ... ok  646 ms
> test insert-conflict-do-nothing-2 ... ok 1994 ms
> test insert-conflict-do-update... ok 1786 ms
> test insert-conflict-do-update-2  ... ok 2689 ms
> test insert-conflict-do-update-3  ... ok  851 ms
> test insert-conflict-specconflict ... FAILED 3695 ms
> test delete-abort-savept  ... ok 1238 ms
>
> Yeah, this is the same test that we have used base for our test so
> let's not push this test until it becomes stable.

Patches without test case.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From e73d20545d7f1725dc424de3d9168269bc20ad33 Mon Sep 17 00:00:00 2001
From: Dilip Kumar 
Date: Fri, 11 Jun 2021 10:08:42 +0530
Subject: [PATCH v8] 96-Fix decoding of speculative aborts.

During decoding for speculative inserts, we were relying for cleaning
toast hash on confirmation records or next change records. But that
could lead to multiple problems (a) memory leak if there is neither a
confirmation record nor any other record after toast insertion for a
speculative insert in the transaction, (b) error and assertion failures
if the next operation is not an insert/update on the same table.

The fix is to start queuing spec abort change and clean up toast hash
and change record during its processing. Currently, we are queuing the
spec aborts for both toast and main table even though we perform cleanup
while processing the main table's spec abort record. Later, if we have a
way to distinguish between the spec abort record of toast and the main
table, we can avoid queuing the change for spec aborts of toast tables.

Reported-by: Ashutosh Bapat
Author: Dilip Kumar
Reviewed-by: Amit Kapila
Backpatch-through: 9.6, where it was introduced
Discussion: https://postgr.es/m/caexhw5spkf-oovx_qze4p5om6dvof7_p+xgsnaviug15fm9...@mail.gmail.com
---
 src/backend/replication/logical/decode.c| 14 
 src/backend/replication/logical/reorderbuffer.c | 48 ++---
 src/include/replication/reorderbuffer.h | 11 +++---
 3 files changed, 48 insertions(+), 25 deletions(-)

diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 1300902..571a901 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -778,19 +778,17 @@ DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	if (target_node.dbNode != ctx->slot->data.database)
 		return;
 
-	/*
-	 * Super deletions are irrelevant for logical decoding, it's driven by the
-	 * confirmation records.
-	 */
-	if (xlrec->flags & XLH_DELETE_IS_SUPER)
-		return;
-
 	/* output plugin doesn't look for this origin, no need to queue */
 	if (FilterByOrigin(ctx, XLogRecGetOrigin(r)))
 		return;
 
 	change = ReorderBufferGetChange(ctx->reorder);
-	change->action = REORDER_BUFFER_CHANGE_DELETE;
+
+	if (xlrec->flags & XLH_DELETE_IS_SUPER)
+		change->action = REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT;
+	else
+		change->action = REORDER_BUFFER_CHANGE_DELETE;
+
 	change->origin_id = XLogRecGetOrigin(r);
 
 	memcpy(&change->data.tp.relnode, &target_node, sizeof(RelFileNode));
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index f0de337..1cd0bbd 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -364,6 +364,9 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 		txn->invalidations = NULL;
 	}
 
+	/* Reset the toast hash */
+	ReorderBufferToastReset(rb, txn);
+
 	/* check whether to put into the slab cache */
 	if (rb->nr_cached_transactions < max_cached_transactions)
 	{
@@ -449,6 +452,7 @@ ReorderBufferReturnChange(ReorderBuffer *rb, ReorderBufferChange *change)
 			break;
 			/* no data in addition to the struct itself */
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
 		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
 		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
 			break;
@@ -1674,8 +1678,8 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 			change_done:
 
 	/*
-	 * Either speculative insertion was confirmed, or it was
-	 * unsuccessful and the record isn't needed anymore.
+	 * If speculative insertion was confirmed, the record isn't
+	 * needed anymore.
 	 */
 	if (specinsert != NULL)
 

Re: [bug?] Missed parallel safety checks, and wrong parallel safety

2021-06-13 Thread Amit Kapila
On Sat, Jun 12, 2021 at 1:56 AM Robert Haas  wrote:
>
> On Fri, Jun 11, 2021 at 12:13 AM Amit Kapila  wrote:
> > Do we invalidate relcache entry if someone changes say trigger or some
> > index AM function property via Alter Function (in our case from safe
> > to unsafe or vice-versa)? Tsunakawa-San has mentioned this as the
> > reason in his email [1] why we can't rely on caching this property in
> > relcache entry. I also don't see anything in AlterFunction which would
> > suggest that we invalidate the relation with which the function might
> > be associated via trigger.
>
> Hmm. I am not sure index that AM functions really need to be checked,
> but triggers certainly do.
>

Why do you think we don't need to check index AM functions? Say we
have an index expression that uses function and if its parallel safety
is changed then probably that also impacts whether we can do insert in
parallel. Because otherwise, we will end up executing some parallel
unsafe function in parallel mode during index insertion.

> I think if you are correct that an ALTER
> FUNCTION wouldn't invalidate the relcache entry, which is I guess
> pretty much the same problem Tom was pointing out in the thread to
> which you linked.
>
> But ... thinking out of the box as Tom suggests, what if we came up
> with some new kind of invalidation message that is only sent when a
> function's parallel-safety marking is changed? And every backend in
> the same database then needs to re-evaluate the parallel-safety of
> every relation for which it has cached a value. Such recomputations
> might be expensive, but they would probably also occur very
> infrequently. And you might even be able to make it a bit more
> fine-grained if it's worth the effort to worry about that: say that in
> addition to caching the parallel-safety of the relation, we also cache
> a list of the pg_proc OIDs upon which that determination depends. Then
> when we hear that the flag's been changed for OID 123456, we only need
> to invalidate the cached value for relations that depended on that
> pg_proc entry.
>

Yeah, this could be one idea but I think even if we use pg_proc OID,
we still need to check all the rel cache entries to find which one
contains the invalidated OID and that could be expensive. I wonder
can't we directly find the relation involved and register invalidation
for the same? We are able to find the relation to which trigger
function is associated during drop function via findDependentObjects
by scanning pg_depend. Assuming, we are able to find the relation for
trigger function by scanning pg_depend, what kinds of problems do we
envision in registering the invalidation for the same?

I think we probably need to worry about the additional cost to find
dependent objects and if there are any race conditions in doing so as
pointed out by Tom in his email [1]. The concern related to cost could
be addressed by your idea of registering such an invalidation only
when the user changes the parallel safety of the function which we
don't expect to be a frequent operation. Now, here the race condition,
I could think of could be that by the time we change parallel-safety
(say making it unsafe) of a function, some of the other sessions might
have already started processing insert on a relation where that
function is associated via trigger or check constraint in which case
there could be a problem. I think to avoid that we need to acquire an
Exclusive lock on the relation as we are doing in Rename Policy kind
of operations.


> There are ways that a relation could become
> parallel-unsafe without changing the parallel-safety marking of any
> function, but perhaps all of the other ways involve a relcache
> invalidation?
>

Probably, but I guess we can once investigate/test those cases as well
if we find/agree on the solution for the functions stuff.

[1] - https://www.postgresql.org/message-id/1030301.1616560249%40sss.pgh.pa.us

-- 
With Regards,
Amit Kapila.




Fix around conn_duration in pgbench

2021-06-13 Thread Yugo NAGATA
Hi,

TState has a field called "conn_duration" and this is, the comment says,
"cumulated connection and deconnection delays". This value is summed over
threads and reported as "average connection time" under -C/--connect.
If this options is not specified, the value is never used.

However, I found that conn_duration is calculated even when -C/--connect
is not specified, which is waste. SO we can remove this code as fixed in
the attached patch.

In addition, deconnection delays are not cumulated even under -C/--connect
in spite of mentioned in the comment. I also fixed this in the attached patch.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index d7479925cb..1ec42a3ba2 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -3536,8 +3536,10 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 
 if (is_connect)
 {
+	pg_time_usec_t start = pg_time_now();
+
 	finishCon(st);
-	now = 0;
+	thread->conn_duration += pg_time_now() - start;
 }
 
 if ((st->cnt >= nxacts && duration <= 0) || timer_exceeded)
@@ -6421,6 +6423,7 @@ main(int argc, char **argv)
 		thread->logfile = NULL; /* filled in later */
 		thread->latency_late = 0;
 		initStats(&thread->stats, 0);
+		thread->conn_duration = 0;
 
 		nclients_dealt += thread->nstate;
 	}
@@ -6584,14 +6587,6 @@ threadRun(void *arg)
 goto done;
 			}
 		}
-
-		/* compute connection delay */
-		thread->conn_duration = pg_time_now() - thread->started_time;
-	}
-	else
-	{
-		/* no connection delay to record */
-		thread->conn_duration = 0;
 	}
 
 	/* GO */


Re: An out-of-date comment in nodeIndexonlyscan.c

2021-06-13 Thread Thomas Munro
On Mon, Jun 14, 2021 at 2:29 PM David Rowley  wrote:
> Have you also thought about deferrable unique / primary key constraints?
>
> It's possible to the uniqueness temporarily violated during a
> transaction when the unique constraint is deferred,

Oh, yeah, very good question.  I think many scenarios involving
duplicates work correctly, but I think there is a danger like this:

create table t (i int primary key deferrable initially deferred, j int);
create unique index on t(j);
insert into t values (999, 999); -- avoid empty index
set enable_seqscan = off;

begin transaction isolation level serializable;
insert into t values (1, 2); -- create phantom row
select * from t where i = 1;
delete from t where j = 2; -- remove phantom row
SELECT locktype, relation::regclass, page, tuple FROM pg_locks WHERE
mode = 'SIReadLock';
commit;

master:

 locktype | relation | page | tuple
--+--+--+---
 page | t_pkey   |1 |
 page | t_j_idx  |1 |
(2 rows)

v3 patch:

 locktype | relation | page | tuple
--+--+--+---
(0 rows)

In fact the lock on t_pkey's page 1 was important: it represents the
search for a tuple with i = 1, other than our temporary phantom (only
allowed because constraint deferred).  If someone else inserts a row
matching i = 1, the SSI system will not know that we tried to look for
it, because our own temporary phantom confused us.

> I think you'd just need to add a check to ensure that indimmediate is
> true around where you're checking the indisunique flag.

Yeah, that fixes the problem in this case at least.  With v4:

 locktype | relation | page | tuple
--+--+--+---
 page | t_pkey   |1 |
(1 row)

(It's expected that t_j_idx is not locked: the delete query benefits
from the optimisation when accessing the index on t(j)).

That test case is a little confusing, because at no point does it ever
actually create a duplicate, but I suspect the problem is avoided
without the deferred constraint because you'd either get a UCV on
insertion, or block anyone else from inserting until after you commit
(even after you delete the phantom), and I think that may avoid the
hazard.  I could be confused about that.  If I am wrong, then a
possible general solution may be to apply the optimisation only if we
find a match that wasn't written by this transaction, though I'm not
sure how given the current layering.
From e52860f760ade79813aae7beb7e9d81ab0d03f34 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Thu, 10 Jun 2021 23:35:16 +1200
Subject: [PATCH v4 1/2] Use tuple-level SIREAD locks for index-only scans.

When index-only scans manage to avoid fetching heap tuples,
previously we'd predicate-lock the whole heap page (since commit
cdf91edb).  Commits c01262a8 and 6f38d4da made it possible to lock the
tuple instead with only a TID, to match the behavior of regular index
scans and avoid some false conflicts.

Discussion: https://postgr.es/m/CAEepm%3D2GK3FVdnt5V3d%2Bh9njWipCv_fNL%3DwjxyUhzsF%3D0PcbNg%40mail.gmail.com
---
 src/backend/executor/nodeIndexonlyscan.c  | 12 --
 .../isolation/expected/index-only-scan-2.out  | 15 +++
 .../isolation/expected/index-only-scan-3.out  | 16 
 src/test/isolation/isolation_schedule |  2 +
 .../isolation/specs/index-only-scan-2.spec| 39 +++
 .../isolation/specs/index-only-scan-3.spec| 33 
 6 files changed, 113 insertions(+), 4 deletions(-)
 create mode 100644 src/test/isolation/expected/index-only-scan-2.out
 create mode 100644 src/test/isolation/expected/index-only-scan-3.out
 create mode 100644 src/test/isolation/specs/index-only-scan-2.spec
 create mode 100644 src/test/isolation/specs/index-only-scan-3.spec

diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 0754e28a9a..f7185b4519 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -243,12 +243,16 @@ IndexOnlyNext(IndexOnlyScanState *node)
 
 		/*
 		 * If we didn't access the heap, then we'll need to take a predicate
-		 * lock explicitly, as if we had.  For now we do that at page level.
+		 * lock explicitly, as if we had.  We don't have the inserter's xid,
+		 * but that's only used by PredicateLockTID to check if it matches the
+		 * caller's top level xid, and it can't possibly have been inserted by
+		 * us or the page wouldn't be all visible.
 		 */
 		if (!tuple_from_heap)
-			PredicateLockPage(scandesc->heapRelation,
-			  ItemPointerGetBlockNumber(tid),
-			  estate->es_snapshot);
+			PredicateLockTID(scandesc->heapRelation,
+			 tid,
+			 estate->es_snapshot,
+			 InvalidTransactionId);
 
 		return slot;
 	}
diff --git a/src/test/isolation/expected/index-only-scan-2.out b/src/test/isolation/expected/index-only-scan-2.out
new file mode 100644
index 00..fef5b8d398
--- /dev/null
+++ b/src/test/isolation/expected/index-only-scan-2.out
@@ 

Re: Failure in subscription test 004_sync.pl

2021-06-13 Thread Masahiko Sawada
On Sat, Jun 12, 2021 at 9:57 PM Amit Kapila  wrote:
>
> On Sat, Jun 12, 2021 at 1:13 PM Michael Paquier  wrote:
> >
> > wrasse has just failed with what looks like a timing error with a
> > replication slot drop:
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2021-06-12%2006%3A16%3A30
> >
> > Here is the error:
> > error running SQL: 'psql::1: ERROR:  could not drop replication
> > slot "tap_sub" on publisher: ERROR:  replication slot "tap_sub" is
> > active for PID 1641'
> >
> > It seems to me that this just lacks a poll_query_until() doing some
> > slot monitoring?
> >
>
> I think it is showing a race condition issue in the code. In
> DropSubscription, we first stop the worker that is receiving the WAL,
> and then in a separate connection with the publisher, it tries to drop
> the slot which leads to this error. The reason is that walsender is
> still active as we just wait for wal receiver (or apply worker) to
> stop. Normally, as soon as the apply worker is stopped the walsender
> detects it and exits but in this case, it took some time to exit, and
> in the meantime, we tried to drop the slot which is still in use by
> walsender.

There might be possible.

That's weird since DROP SUBSCRIPTION executes DROP_REPLICATION_SLOT
command with WAIT option. I found a bug that is possibly an oversight
of commit 1632ea4368. The commit changed the code around the error as
follows:

if (active_pid != MyProcPid)
{
-   if (behavior == SAB_Error)
+   if (!nowait)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_IN_USE),
 errmsg("replication slot \"%s\" is active for PID %d",
NameStr(s->data.name), active_pid)));
-   else if (behavior == SAB_Inquire)
-   return active_pid;

/* Wait here until we get signaled, and then restart */
ConditionVariableSleep(&s->active_cv,

The condition should be the opposite; we should raise the error when
'nowait' is true. I think this is the cause of the test failure. Even
if DROP SUBSCRIPTION tries to drop the slot with the WAIT option, we
don't wait but raise the error.

Attached a small patch fixes it.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


fix_slot_drop.patch
Description: Binary data


RE: [PATCH] expand the units that pg_size_pretty supports on output

2021-06-13 Thread Shinya11.Kato
>From: David Christensen  
>Sent: Friday, June 4, 2021 4:18 AM
>To: PostgreSQL-development 
>Subject: Re: [PATCH] expand the units that pg_size_pretty supports on output
>
>New versions attached to address the initial CF feedback and rebase on HEAD as 
>of now.
>
>0001-Expand-the-units-that-pg_size_pretty-numeric-knows-a.patch 
>
>- expands the units that pg_size_pretty() can handle up to YB.
>
>0002-Expand-the-supported-units-in-pg_size_bytes-to-cover.patch
>
>- expands the units that pg_size_bytes() can handle, up to YB.
>
I don't see the need to extend the unit to YB.
What use case do you have in mind?

Regards,
Shinya Kato


Relaxing the sub-transaction restriction in parallel query

2021-06-13 Thread Amit Khandekar
Hi,

Currently we don't allow a sub-transaction to be spawned from inside a
parallel worker (and also from a leader who is in parallel mode). This
imposes a restriction that pl/pgsql functions that use an exception block
can't be marked parallel safe, even when the exception block is there only
to catch trivial errors such as divide-by-zero. I tried to look at the
implications of removing this sub-transaction restriction, and came up with
the attached WIP patch after considering the below points. I may have
missed other points, or may have assumed something wrong. So comments are
welcome.

- TBLOCK_PARALLEL_INPROGRESS

Now that there can be an in-progress sub-transaction in a parallel worker,
the sub-transaction states need to be accounted for. Rather than having new
transaction states such as TBLOCK_PARALLEL_SUBINPROGRESS, I removed the
existing TBLOCK_PARALLEL_INPROGRESS from the code. At a couple of places
is_parallel_worker is set if state is TBLOCK_PARALLEL_INPROGRESS. Instead,
for now, I have used (ParallelCurrentXids != NULL) to identify if it's a
worker in a valid transaction state. Maybe we can improve on this.
In EndTransactionBlock(), there is a fatal error thrown if it's a
parallel-worker in-progress. This seems to be a can't-have case. So I have
removed this check. Need to further think how we can retain this check.


- IsInParallelMode()

On HEAD, the parallel worker cannot have any sub-transactions, so
CurrentTransactionState always points to the TopTransactionStateData. And
when ParallelWorkerMain() calls EnterParallelMode(), from that point
onwards IsInParallelMode() always returns true. But with the patch,
CurrentTransactionState can point to some nest level down below, and in
that TransactionState, parallelModeLevel would be 0, so IsInParallelMode()
will return false in a sub-transaction, unless some other function happens
to explicitly call EnterParallelMode().

One option for making IsInParallelMode() always return true for worker is
to just check whether the worker is in a transaction (ParallelCurrentXids
!= NULL). Or else, check only the TopTransactionData->parallelModeLevel.
Still another option is for the new TransactionState to inherit
parallelModeLevel from it's parent. I chose this option. This avoids
additional conditions in IsInParallelMode() specifically for worker.

Does this inherit-parent-parallelModeLevel option affect the leader code ?
The functions calling EnterParallelMode() are : begin_parallel_vacuum,
_bt_begin_parallel, ParallelWorkerMain, CommitTransaction, ExecutePlan().
After entering Parallel mode, it does not look like a subtransaction will
be spawned at these places. If at all it does, on HEAD, the
IsInParallelMode() will return false, which does not sound right. For all
the child transactions, this function should return true. So w.r.t. this,
in fact inheriting parent's parallelModeLevel looks better.

Operations that are not allowed to run in worker would continue to be
disallowed in a worker sub-transaction as well. E.g. assigning new xid,
heap_insert, etc. These places already are using IsInParallelMode() which
takes care of guarding against such operations.

Just for archival ...
ExecutePlan() is called with use_parallel_mode=true when there was a gather
plan created, in which case it enters Parallel mode. From here, if the
backend happens to start a new subtransaction for some reason, it does
sound right for the Parallel mode to be true for this sub-transaction,
although I am not sure if there can be such a case.
In worker, as far as I understand, ExecutePlan() always gets called with
use_parallel_mode=false, because there is no gather plan in the worker. So
it does not enter Parallel mode. But because the worker is already in
parallel mode, it does not matter.


- List of ParallelContexts (pcxt_list) :
ParallelContext is created only in backends. So there are no implications
of the pcxt_list w.r.t. parallel workers spawning a subtransaction, because
pcxt_list will always be empty in workers.


- ParallelCurrentXids  :

A parallel worker always maintains a global flat sorted list of xids which
represent all the xids that are considered as current xids (i.e. the ones
that are returned by TransactionIdIsCurrentTransactionId() in a leader). So
this global list should continue to work no matter what is the
sub-transaction nest level, since there won't be new xids created in the
worker.

- Savepoints :

Haven't considered savepoints. The restriction is retained for savepoints.

Thanks
-Amit Khandekar
Huawei Technologies
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 441445927e..c46e32ba37 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -160,7 +160,6 @@ typedef enum TBlockState
 	TBLOCK_BEGIN,/* starting transaction block */
 	TBLOCK_INPROGRESS,			/* live transaction */
 	TBLOCK_IMPLICIT_INPROGRESS, /* live transaction after implicit BEGIN */
-	TBLOCK_PARALLEL_INPROGRESS, /

Re: Decoding speculative insert with toast leaks memory

2021-06-13 Thread Dilip Kumar
On Mon, Jun 14, 2021 at 8:34 AM Amit Kapila  wrote:
>
> I think the test in this patch is quite similar to what Noah has
> pointed in the nearby thread [1] to be failing at some intervals. Can
> you also please once verify the same and if we can expect similar
> failures here then we might want to consider dropping the test in this
> patch for now? We can always come back to it once we find a good
> solution to make it pass consistently.

test insert-conflict-do-nothing   ... ok  646 ms
test insert-conflict-do-nothing-2 ... ok 1994 ms
test insert-conflict-do-update... ok 1786 ms
test insert-conflict-do-update-2  ... ok 2689 ms
test insert-conflict-do-update-3  ... ok  851 ms
test insert-conflict-specconflict ... FAILED 3695 ms
test delete-abort-savept  ... ok 1238 ms

Yeah, this is the same test that we have used base for our test so
let's not push this test until it becomes stable.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




RE: psql - factor out echo code

2021-06-13 Thread Shinya11.Kato
>-Original Message-
>From: Fabien COELHO 
>Sent: Sunday, May 30, 2021 6:10 PM
>To: PostgreSQL Developers 
>Subject: psql - factor out echo code
>
>
>While working on something in "psql/common.c" I noticed some triplicated code,
>including a long translatable string. This minor patch refactors this in one
>function.
>
>--
>Fabien.

Wouldn't it be better to comment it like any other function?

Best regards,
Shinya Kato




Re: Decoding speculative insert with toast leaks memory

2021-06-13 Thread Amit Kapila
On Fri, Jun 11, 2021 at 7:23 PM Amit Kapila  wrote:
>
> On Fri, Jun 11, 2021 at 11:37 AM Dilip Kumar  wrote:
> >
> > On Thu, Jun 10, 2021 at 7:15 PM Amit Kapila  wrote:
> > >
> >
> > >
> > > Please find the patch for HEAD attached. Can you please prepare the
> > > patch for back-branches by doing all the changes I have done in the
> > > patch for HEAD?
> >
> > Done
> >
>
> Thanks, the patch looks good to me. I'll push these early next week
> (Tuesday) unless someone has any other comments or suggestions.
>

I think the test in this patch is quite similar to what Noah has
pointed in the nearby thread [1] to be failing at some intervals. Can
you also please once verify the same and if we can expect similar
failures here then we might want to consider dropping the test in this
patch for now? We can always come back to it once we find a good
solution to make it pass consistently.


[1] - 
https://www.postgresql.org/message-id/20210613073407.GA768908%40rfd.leadboat.com

-- 
With Regards,
Amit Kapila.




Re: Added missing tab completion for alter subscription set option

2021-06-13 Thread vignesh C
On Fri, Jun 11, 2021 at 12:27 PM Michael Paquier  wrote:
>
> On Sun, May 23, 2021 at 04:24:59PM +0530, vignesh C wrote:
> >   /* Complete "CREATE SUBSCRIPTION  ...  WITH ( " */
> >   else if (HeadMatches("CREATE", "SUBSCRIPTION") && TailMatches("WITH", 
> > "("))
> > - COMPLETE_WITH("copy_data", "connect", "create_slot", 
> > "enabled",
> > -   "slot_name", "synchronous_commit");
> > + COMPLETE_WITH("binary", "copy_data", "connect", "create_slot",
> > +   "enabled", "slot_name", "streaming",
> > +   "synchronous_commit");
>
> "copy_data" and "connect" need to be reversed.  Applied.

Thanks for committing this.

Regards,
Vignesh




Re: An out-of-date comment in nodeIndexonlyscan.c

2021-06-13 Thread David Rowley
On Mon, 14 Jun 2021 at 10:03, Thomas Munro  wrote:
>
> On Mon, Jun 14, 2021 at 12:54 AM David Rowley  wrote:
> > I think a more optimal and nicer way of doing that would be setting
> > bits in a Bitmapset then checking bms_num_members is equal to
> > n_scan_keys.
>
> Shouldn't it be compared with indnkeyatts?  Yes, much nicer, thanks!

Oh yeah, I did mean that. Thanks for the correction.

Have you also thought about deferrable unique / primary key constraints?

It's possible to the uniqueness temporarily violated during a
transaction when the unique constraint is deferred,

For example:
create table t (id int primary key deferrable initially deferred);
begin;
insert into t values(1),(1);
select * from t;
 id

  1
  1
(2 rows)

I think you'd just need to add a check to ensure that indimmediate is
true around where you're checking the indisunique flag.

David




Re: unnesting multirange data types

2021-06-13 Thread Justin Pryzby
On Sun, Jun 13, 2021 at 06:36:42PM -0700, Zhihong Yu wrote:
> On Sun, Jun 13, 2021 at 2:10 PM Alexander Korotkov  
> wrote:
> > On Sun, Jun 13, 2021 at 5:53 PM Zhihong Yu  wrote:
> > > +   ObjectAddress myself,
> > >
> > > nit: myself -> self
> >
> > Probably "self" is a better name than "myself" in this context.
> > However, you can see that the surrounding code already uses the name
> > "myself".  Therefore, I prefer to stay with "myself".
>
> Is it Okay if I submit a patch changing the 'myself's to 'self' ?

I think it's too nit-picky to be useful and and too much like busy-work.

The patch wouldn't be applied to backbranches, and the divergence complicates
future backpatches, and can create the possibility to introduce errors.

I already looked for and reported typos introduced in v14, but I can almost
promise that if someone looks closely at the documentation changes there are
more errors to be found, even without testing that the code behaves as
advertised.

You can look for patches which changed docs in v14 like so:
git log -p --cherry-pick --stat origin/REL_13_STABLE...origin/master -- doc

But I recommend reading the changes to documentation in HTML/PDF, since it's
easy to miss an errors while reading SGML.
https://www.postgresql.org/docs/devel/index.html

-- 
Justin




Re: Avoid stuck of pbgench due to skipped transactions

2021-06-13 Thread Yugo NAGATA
Hello Fabien,

On Sun, 13 Jun 2021 08:56:59 +0200 (CEST)
Fabien COELHO  wrote:

> > I attached a patch for this fix.
> 
> The patch mostly works for me, and I agree that the bench should not be in 
> a loop on any parameters, even when "crazy" parameters are given…
> 
> However I'm not sure this is the right way to handle this issue.
> 
> The catch-up loop can be dropped and the automaton can loop over itself to 
> reschedule. Doing that as the attached fixes this issue and also makes 
> progress reporting work proprely in more cases, and reduces the number of 
> lines of code. I did not add a test case because time sensitive tests have 
> been removed (which is too bad, IMHO).

I agree with your way to fix. However, the progress reporting didn't work
because we cannot return from advanceConnectionState to threadRun and just
break the loop.

+   /* otherwise loop over 
PREPARE_THROTTLE */
break;

I attached the fixed patch that uses return instead of break, and I confirmed
that this made the progress reporting work property.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index dc84b7b9b7..8f5d000938 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -3223,32 +3223,31 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 /*
  * If --latency-limit is used, and this slot is already late
  * so that the transaction will miss the latency limit even if
- * it completed immediately, skip this time slot and schedule
- * to continue running on the next slot that isn't late yet.
- * But don't iterate beyond the -t limit, if one is given.
+ * it completed immediately, skip this time slot and loop to
+ * reschedule.
  */
 if (latency_limit)
 {
 	pg_time_now_lazy(&now);
 
-	while (thread->throttle_trigger < now - latency_limit &&
-		   (nxacts <= 0 || st->cnt < nxacts))
+	if (thread->throttle_trigger < now - latency_limit)
 	{
 		processXactStats(thread, st, &now, true, agg);
-		/* next rendez-vous */
-		thread->throttle_trigger +=
-			getPoissonRand(&thread->ts_throttle_rs, throttle_delay);
-		st->txn_scheduled = thread->throttle_trigger;
-	}
 
-	/*
-	 * stop client if -t was exceeded in the previous skip
-	 * loop
-	 */
-	if (nxacts > 0 && st->cnt >= nxacts)
-	{
-		st->state = CSTATE_FINISHED;
-		break;
+		/* stop client if -T/-t was exceeded. */
+		if (timer_exceeded || (nxacts > 0 && st->cnt >= nxacts))
+			/*
+			 * For very unrealistic rates under -T, some skipped
+			 * transactions are not counted because the catchup
+			 * loop is not fast enough just to do the scheduling
+			 * and counting at the expected speed.
+			 *
+			 * We do not bother with such a degenerate case.
+			 */
+			st->state = CSTATE_FINISHED;
+
+		/*otherwise loop over PREPARE_THROTTLE */
+		return;
 	}
 }
 


RE: Transactions involving multiple postgres foreign servers, take 2

2021-06-13 Thread tsunakawa.ta...@fujitsu.com
From: Robert Haas 
> On Thu, Jun 10, 2021 at 9:58 PM tsunakawa.ta...@fujitsu.com
>  wrote:
> > The question I have been asking is how.  With that said, we should only have
> two options; one is the return value of the FDW commit routine, and the other 
> is
> via ereport(ERROR).  I suggested the possibility of the former, because if the
> FDW does ereport(ERROR), Postgres core (transaction manager) may have
> difficulty in handling the rest of the participants.
> 
> I don't think that is going to work. It is very difficult to write
> code that doesn't ever ERROR in PostgreSQL. It is not impossible if
> the operation is trivial enough, but I think you're greatly
> underestimating the complexity of committing the remote transaction.
> If somebody had designed PostgreSQL so that every function returns a
> return code and every time you call some other function you check that
> return code and pass any error up to your own caller, then there would
> be no problem here. But in fact the design was that at the first sign
> of trouble you throw an ERROR. It's not easy to depart from that
> programming model in just one place.

> > I'm not completely sure about this.  I thought (and said) that the only 
> > thing
> the FDW does would be to send a commit request through an existing
> connection.  So, I think it's not a severe restriction to require FDWs to do
> ereport(ERROR) during commits (of the second phase of 2PC.)
> 
> To send a commit request through an existing connection, you have to
> send some bytes over the network using a send() or write() system
> call. That can fail. Then you have to read the response back over the
> network using recv() or read(). That can also fail. You also need to
> parse the result that you get from the remote side, which can also
> fail, because you could get back garbage for some reason. And
> depending on the details, you might first need to construct the
> message you're going to send, which might be able to fail too. Also,
> the data might be encrypted using SSL, so you might have to decrypt
> it, which can also fail, and you might need to encrypt data before
> sending it, which can fail. In fact, if you're using the OpenSSL,
> trying to call SSL_read() or SSL_write() can both read and write data
> from the socket, even multiple times, so you have extra opportunities
> to fail.

I know sending a commit request may get an error from various underlying 
functions, but we're talking about the client side, not the Postgres's server 
side that could unexpectedly ereport(ERROR) somewhere.  So, the new FDW commit 
routine won't lose control and can return an error code as its return value.  
For instance, the FDW commit routine for DBMS-X would typically be:

int
DBMSXCommit(...)
{
int ret;

/* extract info from the argument to pass to xa_commit() */

ret = DBMSX_xa_commit(...);
/* This is the actual commit function which is exposed to the app 
server (e.g. Tuxedo) through the xa_commit() interface */

/* map xa_commit() return values to the corresponding return values of 
the FDW commit routine */
switch (ret)
{
case XA_RMERR:
ret = ...;
break;
...
}

return ret;
}


> I think that's a valid concern, but we also have to have a plan that
> is realistic. Some things are indeed not possible in PostgreSQL's
> design. Also, some of these problems are things everyone has to
> somehow confront. There's no database doing 2PC that can't have a
> situation where one of the machines disappears unexpectedly due to
> some natural disaster or administrator interference. It might be the
> case that our inability to do certain things safely during transaction
> commit puts us out of compliance with the spec, but it can't be the
> case that some other system has no possible failures during
> transaction commit. The problem of the network potentially being
> disconnected between one packet and the next exists in every system.

So, we need to design how commit behaves from the user's perspective.  That's 
the functional design.  We should figure out what's the desirable response of 
commit first, and then see if we can implement it or have to compromise in some 
way.  I think we can reference the X/Open TX standard and/or JTS (Java 
Transaction Service) specification (I haven't had a chance to read them yet, 
though.)  Just in case we can't find the requested commit behavior in the 
volcano case from those specifications, ... (I'm hesitant to say this because 
it may be hard,) it's desirable to follow representative products such as 
Tuxedo and GlassFish (the reference implementation of Java EE specs.)


> > I don't think the resolver-based approach would bring us far enough.  It's
> fundamentally a bottleneck.  Such a background process should only handle
> commits whose requests failed to be sent due to server down.
> 
> Why is it fundamentally a b

Re: Continuing instability in insert-conflict-specconflict test

2021-06-13 Thread Noah Misch
On Sun, Jun 13, 2021 at 04:49:04PM -0700, Andres Freund wrote:
> On 2021-06-13 15:22:12 -0700, Noah Misch wrote:
> > On Sun, Jun 13, 2021 at 06:09:20PM -0400, Tom Lane wrote:
> > > We might be able to get rid of the stuff about concurrent step
> > > completion in isolationtester.c if we required the spec files
> > > to use annotations to force a deterministic step completion
> > > order in all such cases.
> > 
> > Yeah.  If we're willing to task spec authors with that, the test program 
> > can't
> > then guess wrong under unusual timing.
> 
> I think it'd make it *easier* for spec authors. Right now one needs to
> find some way to get a consistent ordering, which is often hard and
> complicates tests way more than specifying an explicit ordering
> would. And it's often unreliable, as evidenced here and in plenty other
> tests.

Fine with me.  Even if it weren't easier for spec authors, it shifts efforts
to spec authors and away from buildfarm observers, which is a good thing.




Re: Use extended statistics to estimate (Var op Var) clauses

2021-06-13 Thread Zhihong Yu
On Sun, Jun 13, 2021 at 1:29 PM Tomas Vondra 
wrote:

> Hi,
>
> Here is a slightly updated version of the patch - rebased to current
> master and fixing some minor issues to handle expressions (and not just
> the Var nodes as before).
>
> The changes needed to support (Expr op Expr) are mostly mechanical,
> though I'm sure the code needs some cleanup. The main issue I ran into
> is the special case clauselist_selectivity, which does
>
>  if (list_length(clauses) == 1)
>  return clause_selectivity_ext(...);
>
> which applies to cases like "WHERE a < b" which can now be handled by
> extended statistics, thanks to this patch. But clause_selectivity_ext
> only used to call restriction_selectivity for these clauses, which does
> not use extended statistics, of course.
>
> I considered either getting rid of the special case, passing everything
> through extended stats, including cases with a single clause. But that
> ends up affecting e.g. OR clauses, so I tweaked clause_selectivity_ext a
> bit, which seems like a better approach.
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
Hi,

-   for (i = 0; i < mcvlist->nitems; i++)
+   if (cst)/* Expr op Const */

It seems the Const op Expr is also covered by this if branch. Hence the
comment should include this case.

Cheers


Re: unnesting multirange data types

2021-06-13 Thread Zhihong Yu
On Sun, Jun 13, 2021 at 2:10 PM Alexander Korotkov 
wrote:

> On Sun, Jun 13, 2021 at 5:53 PM Zhihong Yu  wrote:
> > +   ObjectAddress myself,
> >
> > nit: myself -> self
>
> Probably "self" is a better name than "myself" in this context.
> However, you can see that the surrounding code already uses the name
> "myself".  Therefore, I prefer to stay with "myself".
>
> --
> Regards,
> Alexander Korotkov
>

Hi,
Is it Okay if I submit a patch changing the 'myself's to 'self' ?

Cheers


Re: Different compression methods for FPI

2021-06-13 Thread Justin Pryzby
On Tue, Jun 01, 2021 at 11:06:53AM +0900, Michael Paquier wrote:
> - Speed and CPU usage.  We should worry about that for CPU-bounded
> environments.
> - Compression ratio, which is just monitoring the difference in WAL.
> - Effect of the level of compression perhaps?
> - Use a fixed amount of WAL generated, meaning a set of repeatable SQL
> queries, with one backend, no benchmarks like pgbench.
> - Avoid any I/O bottleneck, so run tests on a tmpfs or ramfs.
> - Avoid any extra WAL interference, like checkpoints, no autovacuum
> running in parallel.

I think it's more nuanced than just finding the algorithm with the least CPU
use.  The GUC is PGC_USERSET, and it's possible that a data-loading process
might want to use zlib for better compress ratio, but an interactive OLTP
process might want to use lz4 or no compression for better responsiveness.

Reducing WAL volume during loading can be important - at one site, their SAN
was too slow to keep up during their period of heaviest loading, the
checkpointer fell behind, WAL couldn't be recycled as normal, and the (local)
WAL filesystem overflowed, and then the oversized WAL then needed to be
replayed, to the slow SAN.  A large fraction of their WAL is FPI, and
compression now made this a non-issue.  We'd happily incur 2x more CPU cost if
WAL were 25% smaller.

We're not proposing to enable it by default, so the threshold doesn't have to
be "no performance regression" relative to no compression.  The feature should
provide a faster alternative to PGLZ, and also a method with better compression
ratio to improve the case of heavy WAL writes, by reducing I/O from FPI.

In a CPU-bound environment, one would just disable WAL compression, or use LZ4
if it's cheap enough.  In the IO bound case, someone might enable zlib or zstd
compression.

I found this old thread about btree performance with wal compression (+Peter,
+Andres).
https://www.postgresql.org/message-id/flat/540584F2-A554-40C1-8F59-87AF8D623BB7%40yandex-team.ru#94c0dcaa34e3170992749f6fdc8db35c

And the differences are pretty dramatic, so I ran a single test on my PC:

CREATE TABLE t AS SELECT generate_series(1,99)a; VACUUM t;
SET wal_compression= off;
\set QUIET \\ \timing on \\ SET max_parallel_maintenance_workers=0; SELECT 
pg_stat_reset_shared('wal'); begin; CREATE INDEX ON t(a); rollback; SELECT * 
FROM pg_stat_wal;
Time: 1639.375 ms (00:01.639)
wal_bytes| 20357193

pglz writes ~half as much, but takes twice as long as uncompressed:
|Time: 3362.912 ms (00:03.363)
|wal_bytes| 11644224

zlib writes ~4x less than ncompressed, and still much faster than pglz
|Time: 2167.474 ms (00:02.167)
|wal_bytes| 5611653

lz4 is as fast as uncompressed, and writes a bit more than pglz:
|Time: 1612.874 ms (00:01.613)
|wal_bytes| 12397123

zstd(6) is slower than lz4, but compresses better than anything but zlib.
|Time: 1808.881 ms (00:01.809)
|wal_bytes| 6395993

In this patch series, I added compression information to the errcontext from
xlog_block_info(), and allow specifying compression levels like zlib-2.  I'll
rearrange that commit earlier if we decide that's desirable to include.
>From d006044bdd6272fa4e37890b8e634b0b2a179dff Mon Sep 17 00:00:00 2001
From: Andrey Borodin 
Date: Sat, 27 Feb 2021 09:03:50 +0500
Subject: [PATCH v9 1/9] Allow alternate compression methods for
 wal_compression

TODO: bump XLOG_PAGE_MAGIC
---
 doc/src/sgml/config.sgml  |  9 +-
 doc/src/sgml/installation.sgml|  4 +-
 src/backend/Makefile  |  2 +-
 src/backend/access/transam/xlog.c | 14 ++-
 src/backend/access/transam/xloginsert.c   | 65 ++--
 src/backend/access/transam/xlogreader.c   | 99 ---
 src/backend/utils/misc/guc.c  | 21 ++--
 src/backend/utils/misc/postgresql.conf.sample |  2 +-
 src/bin/pg_waldump/pg_waldump.c   | 13 ++-
 src/include/access/xlog.h |  2 +-
 src/include/access/xlog_internal.h| 10 ++
 src/include/access/xlogrecord.h   | 15 ++-
 12 files changed, 207 insertions(+), 49 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index aa3e178240..1df56d8034 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3127,23 +3127,26 @@ include_dir 'conf.d'
  
 
  
-  wal_compression (boolean)
+  wal_compression (enum)
   
wal_compression configuration parameter
   
   
   

-When this parameter is on, the PostgreSQL
+This parameter enables compression of WAL using the specified 
+compression method.
+When enabled, the PostgreSQL
 server compresses full page images written to WAL when
  is on or during a base backup.
 A compressed page image will be decompressed during WAL replay.
+The supported methods are pglz and zlib.
 The default value is off.
  

Re: Signed vs Unsigned (take 2) (src/backend/storage/ipc/procarray.c)

2021-06-13 Thread Ranier Vilela
Em dom., 13 de jun. de 2021 às 09:43, Ranier Vilela 
escreveu:

> Hi Andres, thanks for taking a look.
>
> Em sáb., 12 de jun. de 2021 às 16:27, Andres Freund 
> escreveu:
>
>> Hi,
>>
>> On 2021-06-12 10:55:22 -0300, Ranier Vilela wrote:
>> > With the recent changes at procarray.c, I take a look in.
>> > msvc compiler, has some warnings about signed vs unsigned.
>>
>> > 1. Size_t is weird, because all types are int.
>>
>> Not sure why I ended up using size_t here. There are cases where using a
>> natively sized integer can lead to better code being generated, so I'd
>> want to see some evaluation of the code generation effects.
>>
>  Yes, sure.
>
I'm a little confused by the msvc compiler, but here's the difference in
code generation.
Apart from the noise caused by unnecessary changes regarding the names.

Microsoft (R) C/C++ Optimizing Compiler Versão 19.28.29915 para x64

diff attached.

regards,
Ranier Vilela


procarray_asm.diff
Description: Binary data


Re: Error on pgbench logs

2021-06-13 Thread Tatsuo Ishii
>> + while ((next = agg->start_time + agg_interval * INT64CONST(100))
>> <= now)
>>
>> I can find the similar code to convert "seconds" to "us" using casting
>> like
>>
>> end_time = threads[0].create_time + (int64) 100 * duration;
>>
>> or
>>
>> next_report = last_report + (int64) 100 * progress;
>>
>> Is there a reason use INT64CONST instead of (int64)? Do these imply
>> the same effect?
> 
> I guess that the macros does 100LL or something similar to
> directly create an int64 constant. It is necessary if the constant
> would overflow a usual 32 bits C integer, whereas the cast is
> sufficient if there is no overflow. Maybe I c/should have used the
> previous approach.

I think using INT64CONST to create a 64-bit constant is the standard
practice in PostgreSQL.

commit 9d6b160d7db76809f0c696d9073f6955dd5a973a
Author: Tom Lane 
Date:   Fri Sep 1 15:14:18 2017 -0400

Make [U]INT64CONST safe for use in #if conditions.

Instead of using a cast to force the constant to be the right width,
assume we can plaster on an L, UL, LL, or ULL suffix as appropriate.
The old approach to this is very hoary, dating from before we were
willing to require compilers to have working int64 types.

This fix makes the PG_INT64_MIN, PG_INT64_MAX, and PG_UINT64_MAX
constants safe to use in preprocessor conditions, where a cast
doesn't work.  Other symbolic constants that might be defined using
[U]INT64CONST are likewise safer than before.

Also fix the SIZE_MAX macro to be similarly safe, if we are forced
to provide a definition for that.  The test added in commit 2e70d6b5e
happens to do what we want even with the hack "(size_t) -1" definition,
but we could easily get burnt on other tests in future.

Back-patch to all supported branches, like the previous commits.

Discussion: https://postgr.es/m/15883.1504278...@sss.pgh.pa.us


Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




RE: pgbench bug candidate: negative "initial connection time"

2021-06-13 Thread kuroda.hay...@fujitsu.com
Dear Fabien,

Thank you for replying!

> Hmmm. Possibly. Another option could be not to report anything after some 
> errors. I'm not sure, because it would depend on the use case. I guess the 
> command returned an error status as well.

I did not know any use cases and decisions , but I vote to report nothing when 
error occurs.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED




Re: Move pg_attribute.attcompression to earlier in struct for reduced size?

2021-06-13 Thread Michael Paquier
On Thu, Jun 10, 2021 at 11:09:52AM +0900, Michael Paquier wrote:
> On Tue, Jun 08, 2021 at 11:32:21PM -0400, Tom Lane wrote:
>> In the meantime I'm +1 for dropping this logic from VACUUM FULL.
>> I don't even want to spend enough more time on it to confirm the
>> different overhead measurements that have been reported.
> 
> Agreed.  It looks like we are heading toward doing just that for this
> release.

Hearing nothing, done this way.
--
Michael


signature.asc
Description: PGP signature


Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?

2021-06-13 Thread Thomas Munro
Just as an FYI: this entry currently fails with "Timed out!" on cfbot
because of an oversight in the master branch[1], AFAICS.  It should
pass again once that's fixed.

[1] 
https://www.postgresql.org/message-id/CA%2BhUKGLah2w1pWKHonZP_%2BEQw69%3Dq56AHYwCgEN8GDzsRG_Hgw%40mail.gmail.com




Re: Continuing instability in insert-conflict-specconflict test

2021-06-13 Thread Gavin Flower

On 14/06/21 11:49 am, Andres Freund wrote:

Hi,

On 2021-06-13 15:22:12 -0700, Noah Misch wrote:

On Sun, Jun 13, 2021 at 06:09:20PM -0400, Tom Lane wrote:

We might be able to get rid of the stuff about concurrent step
completion in isolationtester.c if we required the spec files
to use annotations to force a deterministic step completion
order in all such cases.

Yeah.  If we're willing to task spec authors with that, the test program can't
then guess wrong under unusual timing.

I think it'd make it *easier* for spec authors. Right now one needs to
find some way to get a consistent ordering, which is often hard and
complicates tests way more than specifying an explicit ordering
would. And it's often unreliable, as evidenced here and in plenty other
tests.

Greetings,

Andres Freund


How about adding a keyword like 'DETERMINISTIC' to the top level SELECT, 
the idea being the output would be deterministic (given the same table 
values after filtering etc), but not necessarily in any particular 
order?  So pg could decide the optimum way to achieve that which may not 
necessarily need a sort.



Cheers,
Gavin





Re: Continuing instability in insert-conflict-specconflict test

2021-06-13 Thread Andres Freund
Hi,

On 2021-06-13 15:22:12 -0700, Noah Misch wrote:
> On Sun, Jun 13, 2021 at 06:09:20PM -0400, Tom Lane wrote:
> > We might be able to get rid of the stuff about concurrent step
> > completion in isolationtester.c if we required the spec files
> > to use annotations to force a deterministic step completion
> > order in all such cases.
> 
> Yeah.  If we're willing to task spec authors with that, the test program can't
> then guess wrong under unusual timing.

I think it'd make it *easier* for spec authors. Right now one needs to
find some way to get a consistent ordering, which is often hard and
complicates tests way more than specifying an explicit ordering
would. And it's often unreliable, as evidenced here and in plenty other
tests.

Greetings,

Andres Freund




Re: Continuing instability in insert-conflict-specconflict test

2021-06-13 Thread Noah Misch
On Sun, Jun 13, 2021 at 06:09:20PM -0400, Tom Lane wrote:
> Noah Misch  writes:
> > On Sun, Jun 13, 2021 at 04:48:48PM -0400, Tom Lane wrote:
> >> * Adjust the test script's functions to emit a NOTICE *after* acquiring
> >> a lock, not before.
> 
> > I suspect that particular lock acquisition merely unblocks the processing 
> > that
> > reaches the final lock state expected by the test.  So, ...
> 
> Ah, you're probably right.
> 
> >> * Annotate permutations with something along the lines of "expect N
> >> NOTICE outputs before allowing this step to be considered complete",
> >> which we'd attach to the unlock steps.
> 
> > ... I don't expect this to solve $SUBJECT.  It could be a separately-useful
> > feature, though.
> 
> I think it would solve it.  In the examples at hand, where we have
> 
> @@ -377,8 +377,6 @@
>  pg_advisory_unlock
>  
>  t  
> -s1: NOTICE:  blurt_and_lock_123() called for k1 in session 1
> -s1: NOTICE:  acquiring advisory lock on 2
>  step s2_upsert: <... completed>
>  step controller_print_speculative_locks: 
>  SELECT pa.application_name, locktype, mode, granted

It would solve that one particular diff.  I meant that it wouldn't solve the
aforementioned pg_locks diffs:

 dory │ 2020-03-14 19:35:31 │ HEAD  │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory&dt=2020-03-14%2019%3A35%3A31
 walleye  │ 2021-03-25 05:00:44 │ REL_13_STABLE │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=walleye&dt=2021-03-25%2005%3A00%3A44
 walleye  │ 2021-05-05 17:00:41 │ REL_13_STABLE │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=walleye&dt=2021-05-05%2017%3A00%3A41

> We might be able to get rid of the stuff about concurrent step
> completion in isolationtester.c if we required the spec files
> to use annotations to force a deterministic step completion
> order in all such cases.

Yeah.  If we're willing to task spec authors with that, the test program can't
then guess wrong under unusual timing.




Re: Continuing instability in insert-conflict-specconflict test

2021-06-13 Thread Tom Lane
Noah Misch  writes:
> On Sun, Jun 13, 2021 at 04:48:48PM -0400, Tom Lane wrote:
>> * Adjust the test script's functions to emit a NOTICE *after* acquiring
>> a lock, not before.

> I suspect that particular lock acquisition merely unblocks the processing that
> reaches the final lock state expected by the test.  So, ...

Ah, you're probably right.

>> * Annotate permutations with something along the lines of "expect N
>> NOTICE outputs before allowing this step to be considered complete",
>> which we'd attach to the unlock steps.

> ... I don't expect this to solve $SUBJECT.  It could be a separately-useful
> feature, though.

I think it would solve it.  In the examples at hand, where we have

@@ -377,8 +377,6 @@
 pg_advisory_unlock
 
 t  
-s1: NOTICE:  blurt_and_lock_123() called for k1 in session 1
-s1: NOTICE:  acquiring advisory lock on 2
 step s2_upsert: <... completed>
 step controller_print_speculative_locks: 
 SELECT pa.application_name, locktype, mode, granted

and then those notices show up sometime later, I'm hypothesizing
that the actions did happen timely, but the actual delivery of
those packets to the isolationtester client did not.  If we
annotated step s2_upsert with a marker to the effect of "wait
for 2 NOTICEs from session 1 before considering this step done",
we could resolve that race condition.  Admittedly, this is putting
a thumb on the scales a little bit, but it's hard to see how to
deal with inconsistent TCP delivery delays without that.

(BTW, I find that removing the pq_flush() call at the bottom of
send_message_to_frontend produces this failure and a bunch of
other similar ones.)

> Yeah, a special permutation list entry like PQgetResult(s8) could solve
> failures like
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole&dt=2021-06-11%2017%3A13%3A44

Right.  I'm visualizing it more like annotating s7a8 as requiring
s8a1 to complete first -- or vice versa, either would stabilize
that test result I think.

We might be able to get rid of the stuff about concurrent step
completion in isolationtester.c if we required the spec files
to use annotations to force a deterministic step completion
order in all such cases.

regards, tom lane




Re: An out-of-date comment in nodeIndexonlyscan.c

2021-06-13 Thread Thomas Munro
On Mon, Jun 14, 2021 at 12:54 AM David Rowley  wrote:
> I think a more optimal and nicer way of doing that would be setting
> bits in a Bitmapset then checking bms_num_members is equal to
> n_scan_keys.

Shouldn't it be compared with indnkeyatts?  Yes, much nicer, thanks!
From 2325ae403196f73865f7c6b3224db925ca406969 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Thu, 10 Jun 2021 23:35:16 +1200
Subject: [PATCH v3 1/2] Use tuple-level SIREAD locks for index-only scans.

When index-only scans manage to avoid fetching heap tuples,
previously we'd predicate-lock the whole heap page (since commit
cdf91edb).  Commits c01262a8 and 6f38d4da made it possible to lock the
tuple instead with only a TID, to match the behavior of regular index
scans and avoid some false conflicts.

Discussion: https://postgr.es/m/CAEepm%3D2GK3FVdnt5V3d%2Bh9njWipCv_fNL%3DwjxyUhzsF%3D0PcbNg%40mail.gmail.com
---
 src/backend/executor/nodeIndexonlyscan.c  | 12 --
 .../isolation/expected/index-only-scan-2.out  | 15 +++
 .../isolation/expected/index-only-scan-3.out  | 16 
 src/test/isolation/isolation_schedule |  2 +
 .../isolation/specs/index-only-scan-2.spec| 39 +++
 .../isolation/specs/index-only-scan-3.spec| 33 
 6 files changed, 113 insertions(+), 4 deletions(-)
 create mode 100644 src/test/isolation/expected/index-only-scan-2.out
 create mode 100644 src/test/isolation/expected/index-only-scan-3.out
 create mode 100644 src/test/isolation/specs/index-only-scan-2.spec
 create mode 100644 src/test/isolation/specs/index-only-scan-3.spec

diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 0754e28a9a..f7185b4519 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -243,12 +243,16 @@ IndexOnlyNext(IndexOnlyScanState *node)
 
 		/*
 		 * If we didn't access the heap, then we'll need to take a predicate
-		 * lock explicitly, as if we had.  For now we do that at page level.
+		 * lock explicitly, as if we had.  We don't have the inserter's xid,
+		 * but that's only used by PredicateLockTID to check if it matches the
+		 * caller's top level xid, and it can't possibly have been inserted by
+		 * us or the page wouldn't be all visible.
 		 */
 		if (!tuple_from_heap)
-			PredicateLockPage(scandesc->heapRelation,
-			  ItemPointerGetBlockNumber(tid),
-			  estate->es_snapshot);
+			PredicateLockTID(scandesc->heapRelation,
+			 tid,
+			 estate->es_snapshot,
+			 InvalidTransactionId);
 
 		return slot;
 	}
diff --git a/src/test/isolation/expected/index-only-scan-2.out b/src/test/isolation/expected/index-only-scan-2.out
new file mode 100644
index 00..fef5b8d398
--- /dev/null
+++ b/src/test/isolation/expected/index-only-scan-2.out
@@ -0,0 +1,15 @@
+Parsed test spec with 2 sessions
+
+starting permutation: r1 r2 w1 w2 c1 c2
+step r1: SELECT id1 FROM account WHERE id1 = 1;
+id1
+
+1  
+step r2: SELECT id2 FROM account WHERE id2 = 2;
+id2
+
+2  
+step w1: UPDATE account SET balance = 200 WHERE id1 = 1;
+step w2: UPDATE account SET balance = 200 WHERE id2 = 2;
+step c1: COMMIT;
+step c2: COMMIT;
diff --git a/src/test/isolation/expected/index-only-scan-3.out b/src/test/isolation/expected/index-only-scan-3.out
new file mode 100644
index 00..efef162779
--- /dev/null
+++ b/src/test/isolation/expected/index-only-scan-3.out
@@ -0,0 +1,16 @@
+Parsed test spec with 2 sessions
+
+starting permutation: r1 r2 w1 w2 c1 c2
+step r1: SELECT id1 FROM account WHERE id1 = 2;
+id1
+
+2  
+step r2: SELECT id2 FROM account WHERE id2 = 1;
+id2
+
+1  
+step w1: UPDATE account SET balance = 200 WHERE id1 = 1;
+step w2: UPDATE account SET balance = 200 WHERE id2 = 2;
+step c1: COMMIT;
+step c2: COMMIT;
+ERROR:  could not serialize access due to read/write dependencies among transactions
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index f4c01006fc..570aedb900 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -17,6 +17,8 @@ test: partial-index
 test: two-ids
 test: multiple-row-versions
 test: index-only-scan
+test: index-only-scan-2
+test: index-only-scan-3
 test: predicate-lock-hot-tuple
 test: update-conflict-out
 test: deadlock-simple
diff --git a/src/test/isolation/specs/index-only-scan-2.spec b/src/test/isolation/specs/index-only-scan-2.spec
new file mode 100644
index 00..cd3c599af8
--- /dev/null
+++ b/src/test/isolation/specs/index-only-scan-2.spec
@@ -0,0 +1,39 @@
+# Test the granularity of predicate locks on heap tuples, for index-only scans.
+
+# Access the accounts through two different indexes, so that index predicate
+# locks don't confuse matters; here we only want to test heap locking.  Since
+# s1 and s2 access different heap tuples there is no cycle, so both
+#

Re: Continuing instability in insert-conflict-specconflict test

2021-06-13 Thread Noah Misch
On Sun, Jun 13, 2021 at 04:48:48PM -0400, Tom Lane wrote:
> Noah Misch  writes:
> > On Tue, Aug 25, 2020 at 12:04:41PM -0400, Tom Lane wrote:
> >> I think what we have to do to salvage this test is to get rid of the
> >> use of NOTICE outputs, and instead have the test functions insert
> >> log records into some table, which we can inspect after the fact
> >> to verify that things happened as we expect.
> 
> > That sounds promising.  Are those messages important for observing server
> > bugs, or are they for debugging/modifying the test itself? If the latter, 
> > one
> > could just change the messages to LOG.
> 
> I think they are important, because they show that the things we expect
> to happen occur when we expect them to happen.
> 
> I experimented with replacing the RAISE NOTICEs with INSERTs, and ran
> into two problems:
> 
> 1. You can't do an INSERT in an IMMUTABLE function.  This is easily
> worked around by putting the INSERT in a separate, volatile function.
> That's cheating like mad of course, but so is the rest of the stuff
> this test does in "immutable" functions.
> 
> 2. The inserted events don't become visible from the outside until the
> respective session commits.  This seems like an absolute show-stopper.
> After the fact, we can see that the events happened in the expected
> relative order; but we don't have proof that they happened in the right
> order relative to the actions visible in the test output file.

One could send the inserts over dblink, I suppose.

> > ... Any of the above won't solve things
> > completely, because 3 of the 21 failures have diffs in the pg_locks output:

> * Adjust the test script's functions to emit a NOTICE *after* acquiring
> a lock, not before.

I suspect that particular lock acquisition merely unblocks the processing that
reaches the final lock state expected by the test.  So, ...

> * Annotate permutations with something along the lines of "expect N
> NOTICE outputs before allowing this step to be considered complete",
> which we'd attach to the unlock steps.

... I don't expect this to solve $SUBJECT.  It could be a separately-useful
feature, though.

> This idea is only half baked at present, but maybe there's something
> to work with there.  If it works, maybe we could improve the other
> test cases that are always pseudo-failing in a similar way.  For
> example, in the deadlock tests, annotate steps with "expect step
> Y to finish before step X".

Yeah, a special permutation list entry like PQgetResult(s8) could solve
failures like
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole&dt=2021-06-11%2017%3A13%3A44

Incidentally, I have a different idle patch relevant to deadlock test failures
like that.  Let me see if it has anything useful.




Re: unnesting multirange data types

2021-06-13 Thread Alexander Korotkov
On Sun, Jun 13, 2021 at 9:46 PM Jonathan S. Katz  wrote:
>
> On 6/13/21 11:49 AM, Justin Pryzby wrote:
> > On Sun, Jun 13, 2021 at 11:25:05AM -0400, Jonathan S. Katz wrote:
> >> On 6/13/21 10:57 AM, Zhihong Yu wrote:
> >>> +/* Turn multirange into a set of ranges */
> >>>
> >>> set of ranges: sequence of ranges
> >>
> >> I believe "set of ranges" is accurate here, as the comparable return is
> >> a "SETOF rangetype". Sequences are objects unto themselves.
> >>
> >
> > I believe the point was that (in mathematics) a "set" is unordered, and a
> > sequence is ordered.  Also, a "setof" tuples in postgres can contain
> > duplicates.
>
> The comment in question is part of the header for the
> "multirange_unnest" function in the code and AFAICT it is accurate: it
> is returning a "set of" ranges as it's literally calling into the
> set-returning function framework.
>
> I would suggest leaving it as is.

+1

> > The docs say "The ranges are read out in storage order (ascending).", so I
> > think this is just a confusion between what "set" means in math vs in 
> > postgres.
>
> This is nearly identical to the language in the array unnest[1]
> function, which is what I believed Alexander borrowed from:

Yes, that's it! :)

> "Expands an array into a set of rows. The array's elements are read out
> in storage order."
>
> If we tweaked the multirange "unnest" function, we could change it to:
>
> +   
> +Expands a multirange into a set of rows.
> +The ranges are read out in storage order (ascending).
> +   
>
> to match what the array "unnest" function docs, or
>
> +   
> +Expands a multirange into a set of rows that each
> +contain an individual range.
> +The ranges are read out in storage order (ascending).
> +   
>
> to be a bit more specific. However, I think this is also bordering on
> overengineering the text, given there has been a lack of feedback on the
> "unnest" array function description being confusing.

I think it's not necessarily to say about rows here.  Our
documentation already has already a number of examples, where we
describe set of returned values without speaking about rows including:
json_array_elements, json_array_elements_text, json_object_keys,
pg_listening_channels, pg_tablespace_databases...

--
Regards,
Alexander Korotkov




Re: logical decoding and replication of sequences

2021-06-13 Thread Tomas Vondra

Hi,

Seems the cfbot was not entirely happy with the patch on some platforms, 
so here's a fixed version. There was a bogus call to ensure_transaction 
function (which does not exist at all) and a silly bug in transaction 
management in apply_handle_sequence.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
>From 8c9a36e8e9bd38ddb7e0c449ccdaf0db46ad28d5 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Sun, 13 Jun 2021 23:10:18 +0200
Subject: [PATCH] Logical decoding / replication of sequences

---
 contrib/test_decoding/Makefile|   3 +-
 contrib/test_decoding/expected/sequence.out   | 327 +++
 contrib/test_decoding/sql/sequence.sql| 119 ++
 contrib/test_decoding/test_decoding.c |  40 ++
 doc/src/sgml/catalogs.sgml|  71 
 doc/src/sgml/ref/alter_publication.sgml   |  28 +-
 doc/src/sgml/ref/alter_subscription.sgml  |   6 +-
 src/backend/catalog/pg_publication.c  | 160 +++-
 src/backend/catalog/system_views.sql  |  10 +
 src/backend/commands/publicationcmds.c| 232 ++-
 src/backend/commands/sequence.c   | 132 +-
 src/backend/commands/subscriptioncmds.c   | 274 +
 src/backend/executor/execReplication.c|   4 +-
 src/backend/nodes/copyfuncs.c |   4 +-
 src/backend/nodes/equalfuncs.c|   4 +-
 src/backend/parser/gram.y |  44 +-
 src/backend/replication/logical/decode.c  | 131 +-
 src/backend/replication/logical/logical.c |  42 ++
 src/backend/replication/logical/proto.c   |  52 +++
 .../replication/logical/reorderbuffer.c   | 384 ++
 src/backend/replication/logical/tablesync.c   | 118 +-
 src/backend/replication/logical/worker.c  |  56 +++
 src/backend/replication/pgoutput/pgoutput.c   |  80 +++-
 src/backend/utils/cache/relcache.c|   4 +-
 src/bin/psql/tab-complete.c   |   2 +-
 src/include/catalog/pg_proc.dat   |   5 +
 src/include/catalog/pg_publication.h  |  14 +
 src/include/commands/sequence.h   |   2 +
 src/include/nodes/parsenodes.h|   4 +-
 src/include/replication/logicalproto.h|  20 +
 src/include/replication/output_plugin.h   |  14 +
 src/include/replication/pgoutput.h|   1 +
 src/include/replication/reorderbuffer.h   |  34 +-
 src/test/regress/expected/publication.out |   8 +-
 src/test/regress/expected/rules.out   |   8 +
 35 files changed, 2391 insertions(+), 46 deletions(-)
 create mode 100644 contrib/test_decoding/expected/sequence.out
 create mode 100644 contrib/test_decoding/sql/sequence.sql

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index 9a31e0b879..56ddc3abae 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -5,7 +5,8 @@ PGFILEDESC = "test_decoding - example of a logical decoding output plugin"
 
 REGRESS = ddl xact rewrite toast permissions decoding_in_xact \
 	decoding_into_rel binary prepared replorigin time messages \
-	spill slot truncate stream stats twophase twophase_stream
+	spill slot truncate stream stats twophase twophase_stream \
+	sequence
 ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
 	oldest_xmin snapshot_transfer subxact_without_top concurrent_stream \
 	twophase_snapshot
diff --git a/contrib/test_decoding/expected/sequence.out b/contrib/test_decoding/expected/sequence.out
new file mode 100644
index 00..07f3e3e92c
--- /dev/null
+++ b/contrib/test_decoding/expected/sequence.out
@@ -0,0 +1,327 @@
+-- predictability
+SET synchronous_commit = on;
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+ ?column? 
+--
+ init
+(1 row)
+
+CREATE SEQUENCE test_sequence;
+-- test the sequence changes by several nextval() calls
+SELECT nextval('test_sequence');
+ nextval 
+-
+   1
+(1 row)
+
+SELECT nextval('test_sequence');
+ nextval 
+-
+   2
+(1 row)
+
+SELECT nextval('test_sequence');
+ nextval 
+-
+   3
+(1 row)
+
+SELECT nextval('test_sequence');
+ nextval 
+-
+   4
+(1 row)
+
+-- test the sequence changes by several ALTER commands
+ALTER SEQUENCE test_sequence INCREMENT BY 10;
+SELECT nextval('test_sequence');
+ nextval 
+-
+  14
+(1 row)
+
+ALTER SEQUENCE test_sequence START WITH 3000;
+ALTER SEQUENCE test_sequence MAXVALUE 1;
+ALTER SEQUENCE test_sequence RESTART WITH 4000;
+SELECT nextval('test_sequence');
+ nextval 
+-
+4000
+(1 row)
+
+-- test the sequence changes by several setval() calls
+SELECT setval('test_sequence', 3500);
+ setval 
+
+   3500
+(1 row)
+
+SELECT nextval('test_sequence');
+ nextval 
+-
+3510
+(1 row)
+
+SELECT setval('test_sequence', 3500, true);
+ setval 
+
+   3500
+(1 row)
+
+SELECT nextval

Re: unnesting multirange data types

2021-06-13 Thread Alexander Korotkov
On Sun, Jun 13, 2021 at 5:53 PM Zhihong Yu  wrote:
> +   ObjectAddress myself,
>
> nit: myself -> self

Probably "self" is a better name than "myself" in this context.
However, you can see that the surrounding code already uses the name
"myself".  Therefore, I prefer to stay with "myself".

--
Regards,
Alexander Korotkov




Re: Continuing instability in insert-conflict-specconflict test

2021-06-13 Thread Tom Lane
Noah Misch  writes:
> The test material added in commit 43e0841 continues to yield buildfarm
> failures.

Yeah.  It's only a relatively small fraction of the total volume of
isolation-test failures, so I'm not sure it's worth expending a
whole lot of effort on this issue by itself.

> On Tue, Aug 25, 2020 at 12:04:41PM -0400, Tom Lane wrote:
>> I think what we have to do to salvage this test is to get rid of the
>> use of NOTICE outputs, and instead have the test functions insert
>> log records into some table, which we can inspect after the fact
>> to verify that things happened as we expect.

> That sounds promising.  Are those messages important for observing server
> bugs, or are they for debugging/modifying the test itself? If the latter, one
> could just change the messages to LOG.

I think they are important, because they show that the things we expect
to happen occur when we expect them to happen.

I experimented with replacing the RAISE NOTICEs with INSERTs, and ran
into two problems:

1. You can't do an INSERT in an IMMUTABLE function.  This is easily
worked around by putting the INSERT in a separate, volatile function.
That's cheating like mad of course, but so is the rest of the stuff
this test does in "immutable" functions.

2. The inserted events don't become visible from the outside until the
respective session commits.  This seems like an absolute show-stopper.
After the fact, we can see that the events happened in the expected
relative order; but we don't have proof that they happened in the right
order relative to the actions visible in the test output file.

> ... Any of the above won't solve things
> completely, because 3 of the 21 failures have diffs in the pg_locks output:

Yeah, it looks like the issue there is that session 2 reports completion
of its step before session 1 has a chance to make progress after obtaining
the lock.  This seems to me to be closely related to the race conditions
I described upthread.

[ thinks for awhile ... ]

I wonder whether we could do better with something along these lines:

* Adjust the test script's functions to emit a NOTICE *after* acquiring
a lock, not before.

* Annotate permutations with something along the lines of "expect N
NOTICE outputs before allowing this step to be considered complete",
which we'd attach to the unlock steps.

This idea is only half baked at present, but maybe there's something
to work with there.  If it works, maybe we could improve the other
test cases that are always pseudo-failing in a similar way.  For
example, in the deadlock tests, annotate steps with "expect step
Y to finish before step X".

regards, tom lane




Re: Use extended statistics to estimate (Var op Var) clauses

2021-06-13 Thread Tomas Vondra

Hi,

Here is a slightly updated version of the patch - rebased to current 
master and fixing some minor issues to handle expressions (and not just 
the Var nodes as before).


The changes needed to support (Expr op Expr) are mostly mechanical, 
though I'm sure the code needs some cleanup. The main issue I ran into 
is the special case clauselist_selectivity, which does


if (list_length(clauses) == 1)
return clause_selectivity_ext(...);

which applies to cases like "WHERE a < b" which can now be handled by 
extended statistics, thanks to this patch. But clause_selectivity_ext 
only used to call restriction_selectivity for these clauses, which does 
not use extended statistics, of course.


I considered either getting rid of the special case, passing everything 
through extended stats, including cases with a single clause. But that 
ends up affecting e.g. OR clauses, so I tweaked clause_selectivity_ext a 
bit, which seems like a better approach.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
>From 6dcd4f2fc78be2fca8a4e934fc2e24bcd8340c4c Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Sun, 13 Jun 2021 21:59:16 +0200
Subject: [PATCH] Handling Expr op Expr clauses in extended stats

---
 src/backend/optimizer/path/clausesel.c|  46 -
 src/backend/statistics/extended_stats.c   |  81 +++--
 src/backend/statistics/mcv.c  | 169 +-
 .../statistics/extended_stats_internal.h  |   2 +-
 src/test/regress/expected/stats_ext.out   |  96 ++
 src/test/regress/sql/stats_ext.sql|  26 +++
 6 files changed, 350 insertions(+), 70 deletions(-)

diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index d263ecf082..d732a9dc93 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -714,6 +714,7 @@ clause_selectivity_ext(PlannerInfo *root,
 	Selectivity s1 = 0.5;		/* default for any unhandled clause type */
 	RestrictInfo *rinfo = NULL;
 	bool		cacheable = false;
+	Node	   *src = clause;
 
 	if (clause == NULL)			/* can this still happen? */
 		return s1;
@@ -871,11 +872,46 @@ clause_selectivity_ext(PlannerInfo *root,
 		}
 		else
 		{
-			/* Estimate selectivity for a restriction clause. */
-			s1 = restriction_selectivity(root, opno,
-		 opclause->args,
-		 opclause->inputcollid,
-		 varRelid);
+			/*
+			 * It might be (Expr op Expr), which goes here thanks to the
+			 * optimization at the beginning of clauselist_selectivity.
+			 * So try applying extended stats first, then fall back to
+			 * restriction_selectivity.
+			 *
+			 * XXX Kinda does the same thing as clauselist_selectivity, but
+			 * for a single clause. Maybe we could call that, but need to
+			 * be careful not to cause infinite loop.
+			 */
+			bool	estimated = false;
+
+			if (use_extended_stats)
+			{
+Bitmapset *estimatedclauses = NULL;
+List *clauses = list_make1(src);
+RelOptInfo *rel = find_single_rel_for_clauses(root, clauses);
+
+if (rel && rel->rtekind == RTE_RELATION && rel->statlist != NIL)
+{
+	/*
+	 * Estimate as many clauses as possible using extended statistics.
+	 *
+	 * 'estimatedclauses' is populated with the 0-based list position
+	 * index of clauses estimated here, and that should be ignored below.
+	 */
+	s1 = statext_clauselist_selectivity(root, clauses, varRelid,
+		jointype, sjinfo, rel,
+		&estimatedclauses, false);
+
+	estimated = (bms_num_members(estimatedclauses) == 1);
+}
+			}
+
+			/* Estimate selectivity for a restriction clause (fallback). */
+			if (!estimated)
+s1 = restriction_selectivity(root, opno,
+			 opclause->args,
+			 opclause->inputcollid,
+			 varRelid);
 		}
 
 		/*
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index b05e818ba9..59a9a64b09 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -1352,14 +1352,15 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 	{
 		RangeTblEntry *rte = root->simple_rte_array[relid];
 		OpExpr	   *expr = (OpExpr *) clause;
-		Node	   *clause_expr;
+		Node	   *clause_expr,
+   *clause_expr2;
 
 		/* Only expressions with two arguments are considered compatible. */
 		if (list_length(expr->args) != 2)
 			return false;
 
 		/* Check if the expression has the right shape */
-		if (!examine_opclause_args(expr->args, &clause_expr, NULL, NULL))
+		if (!examine_opclause_args(expr->args, &clause_expr, &clause_expr2, NULL, NULL))
 			return false;
 
 		/*
@@ -1399,13 +1400,44 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 			!get_func_leakproof(get_opcode(expr->opno)))
 			return false;
 
+		/* clause_expr is always valid */
+		Assert(clause_expr);
+
 		/* Check (Var op Con

Re: unnesting multirange data types

2021-06-13 Thread Jonathan S. Katz
On 6/13/21 11:49 AM, Justin Pryzby wrote:
> On Sun, Jun 13, 2021 at 11:25:05AM -0400, Jonathan S. Katz wrote:
>> On 6/13/21 10:57 AM, Zhihong Yu wrote:
>>> +/* Turn multirange into a set of ranges */
>>>
>>> set of ranges: sequence of ranges
>>
>> I believe "set of ranges" is accurate here, as the comparable return is
>> a "SETOF rangetype". Sequences are objects unto themselves.
>>
> 
> I believe the point was that (in mathematics) a "set" is unordered, and a
> sequence is ordered.  Also, a "setof" tuples in postgres can contain
> duplicates.

The comment in question is part of the header for the
"multirange_unnest" function in the code and AFAICT it is accurate: it
is returning a "set of" ranges as it's literally calling into the
set-returning function framework.

I would suggest leaving it as is.

> The docs say "The ranges are read out in storage order (ascending).", so I
> think this is just a confusion between what "set" means in math vs in 
> postgres.

This is nearly identical to the language in the array unnest[1]
function, which is what I believed Alexander borrowed from:

"Expands an array into a set of rows. The array's elements are read out
in storage order."

If we tweaked the multirange "unnest" function, we could change it to:

+   
+Expands a multirange into a set of rows.
+The ranges are read out in storage order (ascending).
+   

to match what the array "unnest" function docs, or

+   
+Expands a multirange into a set of rows that each
+contain an individual range.
+The ranges are read out in storage order (ascending).
+   

to be a bit more specific. However, I think this is also bordering on
overengineering the text, given there has been a lack of feedback on the
"unnest" array function description being confusing.

Thanks,

Jonathan

[1] https://www.postgresql.org/docs/current/functions-array.html



OpenPGP_signature
Description: OpenPGP digital signature


Re: Slow standby snapshot

2021-06-13 Thread Michail Nikolaev
)Hello.

> I recently ran into a problem in one of our production postgresql cluster.
> I had noticed lock contention on procarray lock on standby, which causes WAL
> replay lag growth.

Yes, I saw the same issue on my production cluster.

> 1) set max_connections to big number, like 10

I made the tests with a more realistic value - 5000. It is valid value
for Amazon RDS for example (default is
LEAST({DBInstanceClassMemory/9531392}, 5000)).

The test looks like this:

pgbench -i -s 10 -U postgres -d postgres
pgbench -b select-only -p 6543 -j 1 -c 50 -n -P 1 -T 18000 -U postgres postgres
pgbench -b simple-update -j 1 -c 50 -n -P 1 -T 18000 -U postgres postgres
long transaction on primary - begin;select txid_current();
perf top -p 

So, on postgres 14 (master) non-patched version looks like this:

   5.13%  postgres[.] KnownAssignedXidsGetAndSetXmin
   4.61%  postgres[.] pg_checksum_block
   2.54%  postgres[.] AllocSetAlloc
   2.44%  postgres[.] base_yyparse

It is too much to spend 5-6% of CPU running throw an array :) I think
it should be fixed for both the 13 and 14 versions.

The patched version like this (was unable to notice
KnownAssignedXidsGetAndSetXmin):

   3.08%  postgres[.] pg_checksum_block
   2.89%  postgres[.] AllocSetAlloc
   2.66%  postgres[.] base_yyparse
   2.00%  postgres[.] MemoryContextAllocZeroAligned

On postgres 13 non patched version looks even worse (definitely need
to be fixed in my opinion):

  26.44%  postgres   [.] KnownAssignedXidsGetAndSetXmin
   2.17%  postgres[.] base_yyparse
   2.01%  postgres[.] AllocSetAlloc
   1.55%  postgres[.] MemoryContextAllocZeroAligned

But your patch does not apply to REL_13_STABLE. Could you please
provide two versions?

Also, there are warnings while building with patch:

procarray.c:4595:9: warning: ISO C90 forbids mixed
declarations and code [-Wdeclaration-after-statement]
4595 | int prv = -1;
 | ^~~
procarray.c: In function ‘KnownAssignedXidsGetOldestXmin’:
procarray.c:5056:5: warning: variable ‘tail’ set but not used
[-Wunused-but-set-variable]
5056 | tail;
 | ^~~~
procarray.c:5067:38: warning: ‘i’ is used uninitialized in
this function [-Wuninitialized]
5067 | i = KnownAssignedXidsValidDLL[i].nxt;


Some of them are clear errors, so, please recheck the code.

Also, maybe it is better to reduce the invasivity by using a more
simple approach. For example, use the first bit to mark xid as valid
and the last 7 bit (128 values) as an optimistic offset to the next
valid xid (jump by 127 steps in the worse scenario).
What do you think?

Also, it is a good idea to register the patch in the commitfest app
(https://commitfest.postgresql.org/).

Thanks,
Michail.




Re: Race condition in recovery?

2021-06-13 Thread Mikael Kjellström




On 2021-06-10 01:09, Tom Lane wrote:

Robert Haas  writes:

Got it. I have now committed the patch to all branches, after adapting
your changes just a little bit.
Thanks to you and Kyotaro-san for all the time spent on this. What a slog!


conchuela failed its first encounter with this test case:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2021-06-09%2021%3A12%3A25

That machine has a certain, er, history of flakiness; so this may
not mean anything.  Still, we'd better keep an eye out to see if
the test needs more stabilization.


Yes, the flakiness is caused by the very weird filesystem (HAMMERFS) 
that has some weird garbage collection handling that sometimes fills up 
the disk and then never recovers automatically.


I have tried to put in the cleanup-utility for HAMMERFS in cron to run 
at a schedule but it's isn't 100% fool proof.


So I am going to upgrade to a newer version of DragonflyBSD in the near 
future.


/Mikael




Re: Fdw batch insert error out when set batch_size > 65535

2021-06-13 Thread Tomas Vondra



On 6/13/21 5:25 PM, Bharath Rupireddy wrote:
> On Sun, Jun 13, 2021 at 6:10 AM Alvaro Herrera  
> wrote:
>>
>> On 2021-Jun-12, Tomas Vondra wrote:
>>
>>> There's one caveat, though - for regular builds the slowdown is pretty
>>> much eliminated. But with valgrind it's still considerably slower. For
>>> postgres_fdw the "make check" used to take ~5 minutes for me, now it
>>> takes >1h. And yes, this is entirely due to the new test case which is
>>> generating / inserting 70k rows. So maybe the test case is not worth it
>>> after all, and we should get rid of it.
>>
>> Hmm, what if the table is made 1600 columns wide -- would inserting 41
>> rows be sufficient to trigger the problem case?  If it does, maybe it
>> would reduce the runtime for valgrind/cache-clobber animals enough that
>> it's no longer a concern.
> 
> Yeah, that's a good idea. PSA patch that creates the table of 1600
> columns and inserts 41 rows into the foreign table. If the batch_size
> adjustment fix isn't there, we will hit the error. On my dev system,
> postgres_fdw contrib regression tests execution time: with and without
> the attached patch 4.5 sec and 5.7 sec respectively.
> 

But we're discussing cases with valgrind and/or CLOBBER_CACHE_ALWAYS.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: unnesting multirange data types

2021-06-13 Thread Justin Pryzby
On Sun, Jun 13, 2021 at 11:25:05AM -0400, Jonathan S. Katz wrote:
> On 6/13/21 10:57 AM, Zhihong Yu wrote:
> > +/* Turn multirange into a set of ranges */
> > 
> > set of ranges: sequence of ranges
> 
> I believe "set of ranges" is accurate here, as the comparable return is
> a "SETOF rangetype". Sequences are objects unto themselves.
> 

I believe the point was that (in mathematics) a "set" is unordered, and a
sequence is ordered.  Also, a "setof" tuples in postgres can contain
duplicates.

The docs say "The ranges are read out in storage order (ascending).", so I
think this is just a confusion between what "set" means in math vs in postgres.

In postgres, "sequence" usually refers to the object that generarates a
sequence:
| CREATE SEQUENCE creates a new sequence number generator.

-- 
Justin




Re: Fdw batch insert error out when set batch_size > 65535

2021-06-13 Thread Bharath Rupireddy
On Sun, Jun 13, 2021 at 6:10 AM Alvaro Herrera  wrote:
>
> On 2021-Jun-12, Tomas Vondra wrote:
>
> > There's one caveat, though - for regular builds the slowdown is pretty
> > much eliminated. But with valgrind it's still considerably slower. For
> > postgres_fdw the "make check" used to take ~5 minutes for me, now it
> > takes >1h. And yes, this is entirely due to the new test case which is
> > generating / inserting 70k rows. So maybe the test case is not worth it
> > after all, and we should get rid of it.
>
> Hmm, what if the table is made 1600 columns wide -- would inserting 41
> rows be sufficient to trigger the problem case?  If it does, maybe it
> would reduce the runtime for valgrind/cache-clobber animals enough that
> it's no longer a concern.

Yeah, that's a good idea. PSA patch that creates the table of 1600
columns and inserts 41 rows into the foreign table. If the batch_size
adjustment fix isn't there, we will hit the error. On my dev system,
postgres_fdw contrib regression tests execution time: with and without
the attached patch 4.5 sec and 5.7 sec respectively.

On Sun, Jun 13, 2021 at 7:25 PM Tomas Vondra
 wrote:
> Good idea. I gave that a try, creating a table with 1500 columns and
> inserting 50 rows (so 75k parameters). See the attached patch.

Thanks for the patch. I also prepared a patch, just sharing. I'm okay
if it's ignored.

With Regards,
Bharath Rupireddy.
From c97cc8385a8e0ee5e5b0f3b4532a7107b1a183f6 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Sun, 13 Jun 2021 02:07:20 -0700
Subject: [PATCH v1] enhance batch insert test case

---
 .../postgres_fdw/expected/postgres_fdw.out| 23 ++-
 contrib/postgres_fdw/sql/postgres_fdw.sql | 21 +
 2 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 1fb26639fc..cb8a8c8d5c 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9683,16 +9683,27 @@ SELECT COUNT(*) FROM ftable;
 TRUNCATE batch_table;
 DROP FOREIGN TABLE ftable;
 -- try if large batches exceed max number of bind parameters
-CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '10' );
-INSERT INTO ftable SELECT * FROM generate_series(1, 7) i;
-SELECT COUNT(*) FROM ftable;
+DO LANGUAGE plpgsql
+$$
+DECLARE
+	l_tbl text :=  'CREATE TABLE local_tbl_b('|| string_agg('c' || i || ' int', ',') || ');'
+	FROM generate_series(1,1600) As i;
+f_tbl text :=  'CREATE FOREIGN TABLE foreign_tbl_b(' || string_agg('c' || i || ' int', ',') || E') SERVER loopback OPTIONS (table_name \'local_tbl_b\', batch_size \'10\');'
+FROM generate_series(1,1600) As i;
+BEGIN
+EXECUTE l_tbl;
+EXECUTE f_tbl;
+END;
+$$;
+INSERT INTO foreign_tbl_b SELECT * FROM generate_series(1, 41) i;
+SELECT COUNT(*) FROM foreign_tbl_b;
  count 
 ---
- 7
+41
 (1 row)
 
-TRUNCATE batch_table;
-DROP FOREIGN TABLE ftable;
+DROP FOREIGN TABLE foreign_tbl_b;
+DROP TABLE local_tbl_b;
 -- Disable batch insert
 CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '1' );
 EXPLAIN (VERBOSE, COSTS OFF) INSERT INTO ftable VALUES (1), (2);
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 8cb2148f1f..1be80e07b9 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3027,11 +3027,22 @@ TRUNCATE batch_table;
 DROP FOREIGN TABLE ftable;
 
 -- try if large batches exceed max number of bind parameters
-CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '10' );
-INSERT INTO ftable SELECT * FROM generate_series(1, 7) i;
-SELECT COUNT(*) FROM ftable;
-TRUNCATE batch_table;
-DROP FOREIGN TABLE ftable;
+DO LANGUAGE plpgsql
+$$
+DECLARE
+	l_tbl text :=  'CREATE TABLE local_tbl_b('|| string_agg('c' || i || ' int', ',') || ');'
+	FROM generate_series(1,1600) As i;
+f_tbl text :=  'CREATE FOREIGN TABLE foreign_tbl_b(' || string_agg('c' || i || ' int', ',') || E') SERVER loopback OPTIONS (table_name \'local_tbl_b\', batch_size \'10\');'
+FROM generate_series(1,1600) As i;
+BEGIN
+EXECUTE l_tbl;
+EXECUTE f_tbl;
+END;
+$$;
+INSERT INTO foreign_tbl_b SELECT * FROM generate_series(1, 41) i;
+SELECT COUNT(*) FROM foreign_tbl_b;
+DROP FOREIGN TABLE foreign_tbl_b;
+DROP TABLE local_tbl_b;
 
 -- Disable batch insert
 CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '1' );
-- 
2.25.1



Re: unnesting multirange data types

2021-06-13 Thread Jonathan S. Katz
On 6/13/21 10:57 AM, Zhihong Yu wrote:
> 
> 
> On Sat, Jun 12, 2021 at 4:58 PM Alexander Korotkov  > wrote:
> 
> On Sun, Jun 13, 2021 at 1:16 AM Jonathan S. Katz
> mailto:jk...@postgresql.org>> wrote:
> > On 6/12/21 5:57 PM, Alexander Korotkov wrote:
> > > On Sat, Jun 12, 2021 at 2:44 AM Alexander Korotkov
> mailto:aekorot...@gmail.com>> wrote:
> > >> ()On Sat, Jun 12, 2021 at 2:30 AM Justin Pryzby
> mailto:pry...@telsasoft.com>> wrote:
> > >>> On Fri, Jun 11, 2021 at 11:37:58PM +0300, Alexander Korotkov
> wrote:
> >  On Fri, Jun 11, 2021 at 1:04 AM Justin Pryzby
> mailto:pry...@telsasoft.com>> wrote:
> > >
> > > +{ oid => '1293', descr => 'expand mutlirange to set of ranges',
> > >
> > > typo: mutlirange
> > 
> >  Fixed, thanks.
> > 
> >  The patch with the implementation of both unnest() and cast
> to array
> >  is attached.  It contains both tests and docs.
> > >>>
> > >>> |+   The multirange could be explicitly cast to the array of
> corresponding
> > >>> should say: "can be cast to an array of corresponding.."
> > >>>
> > >>> |+ * Cast multirange to the array of ranges.
> > >>> I think should be: *an array of ranges
> > >>
> > >> Thank you for catching this.
> > >>
> > >>> Per sqlsmith, this is causing consistent crashes.
> > >>> I took one of its less appalling queries and simplified it to
> this:
> > >>>
> > >>> select
> > >>> pg_catalog.multirange_to_array(
> > >>>     cast(pg_catalog.int8multirange() as int8multirange)) as c2
> > >>> from (select 1)x;
> > >>
> > >> It seems that multirange_to_array() doesn't handle empty
> multiranges.
> > >> I'll post an updated version of the patch tomorrow.
> > >
> > > A revised patch is attached.  Now empty multiranges are handled
> > > properly (and it's covered by tests).  Typos are fixed as well.
> >
> > Tested both against my original cases using both SQL + PL/pgSQL. All
> > worked well. I also tested the empty multirange case as well.
> >
> > Overall the documentation seems to make sense, I'd suggest:
> >
> > +  
> > +   The multirange can be cast to an array of corresponding ranges.
> > +  
> >
> > becomes:
> >
> > +  
> > +   A multirange can be cast to an array of ranges of the same type.
> > +  
> 
> Thank you. This change is incorporated in the attached revision of
> the patch.
> 
> This thread gave me another lesson about English articles.  Hopefully,
> I would be able to make progress in future patches :)
> 
> > Again, I'll defer to others on the code, but this seems to solve
> the use
> > case I presented. Thanks for the quick turnaround!
> 
> Thank you for the feedback!
> 
> --
> Regards,
> Alexander Korotkov
> 
> 
> Hi,
> +   A multirange can be cast to an array of ranges of the same type.
> 
> I think 'same type' is not very accurate. It should be 'of the subtype'.

I think that's more technically correct, but it could be confusing to
the user. There is an example next to it that shows how this function
works, i.e. it returns the type of range that is represented by the
multirange.

> +   ObjectAddress myself,
> 
> nit: myself -> self
> 
> +/* Turn multirange into a set of ranges */
> 
> set of ranges: sequence of ranges

I believe "set of ranges" is accurate here, as the comparable return is
a "SETOF rangetype". Sequences are objects unto themselves.

Jonathan



OpenPGP_signature
Description: OpenPGP digital signature


Re: unnesting multirange data types

2021-06-13 Thread Zhihong Yu
On Sat, Jun 12, 2021 at 4:58 PM Alexander Korotkov 
wrote:

> On Sun, Jun 13, 2021 at 1:16 AM Jonathan S. Katz 
> wrote:
> > On 6/12/21 5:57 PM, Alexander Korotkov wrote:
> > > On Sat, Jun 12, 2021 at 2:44 AM Alexander Korotkov <
> aekorot...@gmail.com> wrote:
> > >> ()On Sat, Jun 12, 2021 at 2:30 AM Justin Pryzby 
> wrote:
> > >>> On Fri, Jun 11, 2021 at 11:37:58PM +0300, Alexander Korotkov wrote:
> >  On Fri, Jun 11, 2021 at 1:04 AM Justin Pryzby 
> wrote:
> > >
> > > +{ oid => '1293', descr => 'expand mutlirange to set of ranges',
> > >
> > > typo: mutlirange
> > 
> >  Fixed, thanks.
> > 
> >  The patch with the implementation of both unnest() and cast to array
> >  is attached.  It contains both tests and docs.
> > >>>
> > >>> |+   The multirange could be explicitly cast to the array of
> corresponding
> > >>> should say: "can be cast to an array of corresponding.."
> > >>>
> > >>> |+ * Cast multirange to the array of ranges.
> > >>> I think should be: *an array of ranges
> > >>
> > >> Thank you for catching this.
> > >>
> > >>> Per sqlsmith, this is causing consistent crashes.
> > >>> I took one of its less appalling queries and simplified it to this:
> > >>>
> > >>> select
> > >>> pg_catalog.multirange_to_array(
> > >>> cast(pg_catalog.int8multirange() as int8multirange)) as c2
> > >>> from (select 1)x;
> > >>
> > >> It seems that multirange_to_array() doesn't handle empty multiranges.
> > >> I'll post an updated version of the patch tomorrow.
> > >
> > > A revised patch is attached.  Now empty multiranges are handled
> > > properly (and it's covered by tests).  Typos are fixed as well.
> >
> > Tested both against my original cases using both SQL + PL/pgSQL. All
> > worked well. I also tested the empty multirange case as well.
> >
> > Overall the documentation seems to make sense, I'd suggest:
> >
> > +  
> > +   The multirange can be cast to an array of corresponding ranges.
> > +  
> >
> > becomes:
> >
> > +  
> > +   A multirange can be cast to an array of ranges of the same type.
> > +  
>
> Thank you. This change is incorporated in the attached revision of the
> patch.
>
> This thread gave me another lesson about English articles.  Hopefully,
> I would be able to make progress in future patches :)
>
> > Again, I'll defer to others on the code, but this seems to solve the use
> > case I presented. Thanks for the quick turnaround!
>
> Thank you for the feedback!
>
> --
> Regards,
> Alexander Korotkov
>

Hi,
+   A multirange can be cast to an array of ranges of the same type.

I think 'same type' is not very accurate. It should be 'of the subtype'.

+   ObjectAddress myself,

nit: myself -> self

+/* Turn multirange into a set of ranges */

set of ranges: sequence of ranges

Cheers


Re: Fdw batch insert error out when set batch_size > 65535

2021-06-13 Thread Tomas Vondra


On 6/13/21 2:40 AM, Alvaro Herrera wrote:
> On 2021-Jun-12, Tomas Vondra wrote:
> 
>> There's one caveat, though - for regular builds the slowdown is pretty
>> much eliminated. But with valgrind it's still considerably slower. For
>> postgres_fdw the "make check" used to take ~5 minutes for me, now it
>> takes >1h. And yes, this is entirely due to the new test case which is
>> generating / inserting 70k rows. So maybe the test case is not worth it
>> after all, and we should get rid of it.
> 
> Hmm, what if the table is made 1600 columns wide -- would inserting 41
> rows be sufficient to trigger the problem case?  If it does, maybe it
> would reduce the runtime for valgrind/cache-clobber animals enough that
> it's no longer a concern.
> 

Good idea. I gave that a try, creating a table with 1500 columns and
inserting 50 rows (so 75k parameters). See the attached patch.

While this cuts the runtime about in half (to ~30 minutes on my laptop),
that's probably not enough - it's still about ~6x longer than it used to
take. All these timings are with valgrind.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 8cb2148f1f..0c6bc240b7 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3026,13 +3026,6 @@ SELECT COUNT(*) FROM ftable;
 TRUNCATE batch_table;
 DROP FOREIGN TABLE ftable;
 
--- try if large batches exceed max number of bind parameters
-CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '10' );
-INSERT INTO ftable SELECT * FROM generate_series(1, 7) i;
-SELECT COUNT(*) FROM ftable;
-TRUNCATE batch_table;
-DROP FOREIGN TABLE ftable;
-
 -- Disable batch insert
 CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '1' );
 EXPLAIN (VERBOSE, COSTS OFF) INSERT INTO ftable VALUES (1), (2);
@@ -3041,6 +3034,46 @@ SELECT COUNT(*) FROM ftable;
 DROP FOREIGN TABLE ftable;
 DROP TABLE batch_table;
 
+
+CREATE OR REPLACE FUNCTION create_batch_tables(p_cnt int) RETURNS void AS $$
+DECLARE
+v_sql_local text := '';
+v_sql_foreign text := '';
+v_i int;
+BEGIN
+
+v_sql_local := 'CREATE TABLE batch_table (';
+v_sql_foreign := 'CREATE FOREIGN TABLE ftable (';
+
+FOR v_i IN 1 .. p_cnt LOOP
+
+IF v_i > 1 THEN
+v_sql_local := v_sql_local || ', col_' || v_i || ' int';
+v_sql_foreign := v_sql_foreign || ', col_' || v_i || ' int';
+ELSE
+v_sql_local := v_sql_local || 'col_' || v_i || ' int';
+v_sql_foreign := v_sql_foreign || 'col_' || v_i || ' int';
+END IF;
+
+END LOOP;
+
+v_sql_local := v_sql_local || ')';
+v_sql_foreign := v_sql_foreign || ') SERVER loopback OPTIONS (table_name ''batch_table'', batch_size ''10'' )';
+
+EXECUTE v_sql_local;
+EXECUTE v_sql_foreign;
+
+END;
+$$ LANGUAGE plpgsql;
+
+-- try if large batches exceed max number of bind parameters
+SELECT create_batch_tables(1500);
+INSERT INTO ftable SELECT * FROM generate_series(1, 50) i;
+SELECT COUNT(*) FROM ftable;
+DROP TABLE batch_table;
+DROP FOREIGN TABLE ftable;
+DROP FUNCTION create_batch_tables(int);
+
 -- Use partitioning
 CREATE TABLE batch_table ( x int ) PARTITION BY HASH (x);
 


Re: a path towards replacing GEQO with something better

2021-06-13 Thread Tomas Vondra
Hi,

On 6/10/21 3:21 AM, John Naylor wrote:
> Hi,
> 
> On occasion it comes up that the genetic query optimizer (GEQO) doesn't
> produce particularly great plans, and is slow ([1] for example). The
> only alternative that has gotten as far as a prototype patch (as far as
> I know) is simulated annealing some years ago, which didn't seem to get far.
> 
> The join problem as it pertains to Postgres has been described within
> the community in
> [Gustaffson, 2017] and [Stroffek & Kovarik, 2007].
> 
> The fact that there is much more interest than code in this area
> indicates that this is a hard problem. I hadn't given it much thought
> myself until by chance I came across [Neumann, 2018], which describes a
> number of interesting ideas. The key takeaway is that you want a
> graceful transition between exhaustive search and heuristic search. In
> other words, if the space of possible join orderings is just slightly
> larger than the maximum allowed exhaustive search, then the search
> should be *almost* exhaustive.

Yeah, I think this is one of the places with the annoying "cliff edge"
behavior in our code base, so an alternative that would degrade a bit
more gracefully would be welcome.

I only quickly read the [Neumann, 2018] paper over the weekend, and
overall it seems like a very interesting/promising approach. Of course,
the question is how well it can be combined with the rest of our code,
and various other details from real-world queries (papers often ignore
some of those bits for simplicity).

> This not only increases the chances of finding a good plan, but also
> has three engineering advantages I can think of:
> 
> 1) It's natural to re-use data structures etc. already used by the
> existing dynamic programming (DP) algorithm. Framing the problem as
> extending DP greatly lowers the barrier to making realistic progress. If
> the problem is framed as "find an algorithm as a complete drop-in
> replacement for GEQO", it's a riskier project in my view.
> 

True.

> 2) We can still keep GEQO around (with some huge limit by default) for a
> few years as an escape hatch, while we refine the replacement. If there
> is some bug that prevents finding a plan, we can emit a WARNING and fall
> back to GEQO. Or if a user encounters a regression in a big query, they
> can lower the limit to restore the plan they had in an earlier version.
> 

Not sure. Keeping significant amounts of code may not be free - both for
maintenance and new features. It'd be a bit sad if someone proposed
improvements to join planning, but had to do 2x the work to support it
in both the DP and GEQO branches, or risk incompatibility.

OTOH maybe this concern is unfounded in practice - I don't think we've
done very many big changes to geqo in the last few years.

> 3) It actually improves the existing exhaustive search, because the
> complexity of the join order problem depends on the query shape: a
> "chain" shape (linear) vs. a "star" shape (as in star schema), for the
> most common examples. The size of the DP table grows like this (for n >= 4):
> 
> Chain: (n^3 - n) / 6   (including bushy plans)
> Star:  (n - 1) * 2^(n - 2)
> 
>  n  chain       star
> 
>  4     10         12
>  5     20         32
>  6     35         80
>  7     56        192
>  8     84        448
>  9    120       1024
> 10    165       2304
> 11    220       5120
> 12    286      11264
> 13    364      24576
> 14    455      53248
> 15    560     114688
> ...
> 64  43680     290536219160925437952
> 
> The math behind this is described in detail in [Ono & Lohman, 1990]. I
> verified this in Postgres by instrumenting the planner to count how many
> times it calls make_join_rel().
> 

So, did you verify it for star query with 64 relations? ;-)

> Imagine having a "join enumeration budget" that, if exceeded, prevents
> advancing to the next join level. Given the above numbers and a query
> with some combination of chain and star shapes, a budget of 400 can
> guarantee an exhaustive search when there are up to 8 relations. For a
> pure chain join, we can do an exhaustive search on up to 13 relations,
> for a similar cost of time and space. Out of curiosity I tested HEAD
> with a chain query having 64 tables found in the SQLite tests [2] and
> found exhaustive search to take only twice as long as GEQO. If we have
> some 30-way join, and some (> 400) budget, it's actually possible that
> we will complete the exhaustive search and get the optimal plan. This is
> a bottom-up way of determining the complexity. Rather than configuring a
> number-of-relations threshold and possibly have exponential behavior
> blow up in their faces, users can configure something that somewhat
> resembles the runtime cost.
> 

Sound reasonable in principle, I think.

This reminds me the proposals to have a GUC that'd determine how much
effort should the planner invest into various optimizations. For OLTP it
might be quite low, for large OLAP queries it'd be economical to spen

Re: An out-of-date comment in nodeIndexonlyscan.c

2021-06-13 Thread David Rowley
On Mon, 14 Jun 2021 at 00:02, Thomas Munro  wrote:
> Here's a highly experimental patch I came up with that seems to
> produce the right results in simple cases, without (yet) involving the
> planner.

+ /* Find all equality quals. */
+ for (int i = 0; i < n_scan_keys; ++i)
+ {
+ if (scan_keys[i].sk_strategy == BTEqualStrategyNumber)
+ attnos[nattnos++] = scan_keys[i].sk_attno;
+ }
+
+ /* Are all attributes covered? */
+ /* XXX is this check enough or do we need to work harder? */
+ qsort(attnos, nattnos, sizeof(AttrNumber), compare_int16);
+ nattnos = qunique(attnos, nattnos, sizeof(AttrNumber), compare_int16);
+ if (nattnos == index->rd_index->indnkeyatts)

I think a more optimal and nicer way of doing that would be setting
bits in a Bitmapset then checking bms_num_members is equal to
n_scan_keys.

David




Re: Signed vs Unsigned (take 2) (src/backend/storage/ipc/procarray.c)

2021-06-13 Thread Ranier Vilela
Hi Andres, thanks for taking a look.

Em sáb., 12 de jun. de 2021 às 16:27, Andres Freund 
escreveu:

> Hi,
>
> On 2021-06-12 10:55:22 -0300, Ranier Vilela wrote:
> > With the recent changes at procarray.c, I take a look in.
> > msvc compiler, has some warnings about signed vs unsigned.
>
> > 1. Size_t is weird, because all types are int.
>
> Not sure why I ended up using size_t here. There are cases where using a
> natively sized integer can lead to better code being generated, so I'd
> want to see some evaluation of the code generation effects.
>
 Yes, sure.

>
>
> > 2. Wouldn't it be better to initialize static variables?
>
> No, explicit initialization needs additional space in the binary, and
> static variables are always zero initialized.
>
Yes, I missed this part.
But I was worried about this line:

/* hasn't been updated yet */
if (!TransactionIdIsValid(ComputeXidHorizonsResultLastXmin))

The first run with ComputeXidHorizonsResultLastXmin = 0, is ok?


>
> > 3. There are some shadowing parameters.
>
> Hm, yea, that's not great. Those are from
>
> commit 0e141c0fbb211bdd23783afa731e3eef95c9ad7a
> Author: Robert Haas 
> Date:   2015-08-06 11:52:51 -0400
>
> Reduce ProcArrayLock contention by removing backends in batches.
>
> Amit, Robert, I assume you don't mind changing this?
>


>
>
> > 4. Possible loop beyond numProcs?
>
> What are you referring to here?
>
My mistake.

best regards,
Ranier Vilela


Re: unnesting multirange data types

2021-06-13 Thread Jonathan S. Katz
On 6/13/21 8:26 AM, Jonathan S. Katz wrote:

> One question: if I were to make a custom multirange type (e.g. let's say
> I use "inet" to make "inetrange" and then a "inetmultirange") will this
> method still work? It seems so, but I wanted clarify.

I went ahead and answered this myself: "yes":

  CREATE TYPE inetrange AS RANGE (SUBTYPE = inet);

  SELECT unnest(inetmultirange(inetrange('192.168.1.1', '192.168.1.5'),
inetrange('192.168.1.7', '192.168.1.10')));
 unnest
  
   [192.168.1.1,192.168.1.5)
   [192.168.1.7,192.168.1.10)
  (2 rows)

Awesome stuff.

Jonathan



OpenPGP_signature
Description: OpenPGP digital signature


Re: unnesting multirange data types

2021-06-13 Thread Jonathan S. Katz
On 6/13/21 7:43 AM, Alexander Korotkov wrote:
> On Sun, Jun 13, 2021 at 2:58 AM Alexander Korotkov  
> wrote:
>> On Sun, Jun 13, 2021 at 1:16 AM Jonathan S. Katz  
>> wrote:

>>> Again, I'll defer to others on the code, but this seems to solve the use
>>> case I presented. Thanks for the quick turnaround!
>>
>> Thank you for the feedback!
> 
> I've added the commit message to the patch.  I'm going to push it if
> no objections.

I went ahead and tried testing a few more cases with the patch, and
everything seems to work as expected.

I did skim through the code -- I'm much less familiar with this part of
the system -- and I did not see anything that I would consider "obvious
to correct" from my perspective.

So I will continue to go with what I said above: no objections on the
use case perspective, but I defer to others on the code.

One question: if I were to make a custom multirange type (e.g. let's say
I use "inet" to make "inetrange" and then a "inetmultirange") will this
method still work? It seems so, but I wanted clarify.

Thanks,

Jonathan



OpenPGP_signature
Description: OpenPGP digital signature


Re: An out-of-date comment in nodeIndexonlyscan.c

2021-06-13 Thread Thomas Munro
On Sat, Jun 12, 2021 at 2:35 PM Thomas Munro  wrote:
> ... and here is the corresponding code change, with a test to
> demonstrate the change.
>
> I'm working on a proof-of-concept to skip the btree page lock
> sometimes too, which seems promising, but it requires some planner
> work which has temporarily pretzeled my brain.

Here's a highly experimental patch I came up with that seems to
produce the right results in simple cases, without (yet) involving the
planner.  The regression tests show single table queries, but it works
also for nest loop joins, which is where this optimisation should be
most interesting, I think.  There are a few weird things about this
patch though, and there could well be much better ways to do it, as
noted in the commit message and comments.  It's a start on the
problem...
From 2325ae403196f73865f7c6b3224db925ca406969 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Thu, 10 Jun 2021 23:35:16 +1200
Subject: [PATCH v2 1/2] Use tuple-level SIREAD locks for index-only scans.

When index-only scans manage to avoid fetching heap tuples,
previously we'd predicate-lock the whole heap page (since commit
cdf91edb).  Commits c01262a8 and 6f38d4da made it possible to lock the
tuple instead with only a TID, to match the behavior of regular index
scans and avoid some false conflicts.

Discussion: https://postgr.es/m/CAEepm%3D2GK3FVdnt5V3d%2Bh9njWipCv_fNL%3DwjxyUhzsF%3D0PcbNg%40mail.gmail.com
---
 src/backend/executor/nodeIndexonlyscan.c  | 12 --
 .../isolation/expected/index-only-scan-2.out  | 15 +++
 .../isolation/expected/index-only-scan-3.out  | 16 
 src/test/isolation/isolation_schedule |  2 +
 .../isolation/specs/index-only-scan-2.spec| 39 +++
 .../isolation/specs/index-only-scan-3.spec| 33 
 6 files changed, 113 insertions(+), 4 deletions(-)
 create mode 100644 src/test/isolation/expected/index-only-scan-2.out
 create mode 100644 src/test/isolation/expected/index-only-scan-3.out
 create mode 100644 src/test/isolation/specs/index-only-scan-2.spec
 create mode 100644 src/test/isolation/specs/index-only-scan-3.spec

diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 0754e28a9a..f7185b4519 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -243,12 +243,16 @@ IndexOnlyNext(IndexOnlyScanState *node)
 
 		/*
 		 * If we didn't access the heap, then we'll need to take a predicate
-		 * lock explicitly, as if we had.  For now we do that at page level.
+		 * lock explicitly, as if we had.  We don't have the inserter's xid,
+		 * but that's only used by PredicateLockTID to check if it matches the
+		 * caller's top level xid, and it can't possibly have been inserted by
+		 * us or the page wouldn't be all visible.
 		 */
 		if (!tuple_from_heap)
-			PredicateLockPage(scandesc->heapRelation,
-			  ItemPointerGetBlockNumber(tid),
-			  estate->es_snapshot);
+			PredicateLockTID(scandesc->heapRelation,
+			 tid,
+			 estate->es_snapshot,
+			 InvalidTransactionId);
 
 		return slot;
 	}
diff --git a/src/test/isolation/expected/index-only-scan-2.out b/src/test/isolation/expected/index-only-scan-2.out
new file mode 100644
index 00..fef5b8d398
--- /dev/null
+++ b/src/test/isolation/expected/index-only-scan-2.out
@@ -0,0 +1,15 @@
+Parsed test spec with 2 sessions
+
+starting permutation: r1 r2 w1 w2 c1 c2
+step r1: SELECT id1 FROM account WHERE id1 = 1;
+id1
+
+1  
+step r2: SELECT id2 FROM account WHERE id2 = 2;
+id2
+
+2  
+step w1: UPDATE account SET balance = 200 WHERE id1 = 1;
+step w2: UPDATE account SET balance = 200 WHERE id2 = 2;
+step c1: COMMIT;
+step c2: COMMIT;
diff --git a/src/test/isolation/expected/index-only-scan-3.out b/src/test/isolation/expected/index-only-scan-3.out
new file mode 100644
index 00..efef162779
--- /dev/null
+++ b/src/test/isolation/expected/index-only-scan-3.out
@@ -0,0 +1,16 @@
+Parsed test spec with 2 sessions
+
+starting permutation: r1 r2 w1 w2 c1 c2
+step r1: SELECT id1 FROM account WHERE id1 = 2;
+id1
+
+2  
+step r2: SELECT id2 FROM account WHERE id2 = 1;
+id2
+
+1  
+step w1: UPDATE account SET balance = 200 WHERE id1 = 1;
+step w2: UPDATE account SET balance = 200 WHERE id2 = 2;
+step c1: COMMIT;
+step c2: COMMIT;
+ERROR:  could not serialize access due to read/write dependencies among transactions
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index f4c01006fc..570aedb900 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -17,6 +17,8 @@ test: partial-index
 test: two-ids
 test: multiple-row-versions
 test: index-only-scan
+test: index-only-scan-2
+test: index-only-scan-3
 test: predicate-lock-hot-tuple
 test: update-conflict-out
 test: deadlock-simple
diff --git a/src/

Re: doc issue missing type name "multirange" in chapter title

2021-06-13 Thread Alexander Korotkov
On Fri, Jun 11, 2021 at 11:16 PM Tom Lane  wrote:
>
> Alexander Korotkov  writes:
> > What about "Range/Multirange Functions and Operators"?
>
> Better than a comma, I guess.  Personally I didn't have a
> problem with the form with two "ands".

Thank you.  I propose to push the slash option because it both evades
double "and" and it's aligned with sibling section headers (we have
"Date/Time Functions and Operators").

Any objections?

--
Regards,
Alexander Korotkov
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index c7c2a2d11f7..cae5c990fa1 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -18165,7 +18165,7 @@ SELECT NULLIF(value, '(none)') ...
   
 
  
-  Range Functions and Operators
+  Range/Multirange Functions and Operators
 
   
See  for an overview of range types.


Re: unnesting multirange data types

2021-06-13 Thread Alexander Korotkov
On Sun, Jun 13, 2021 at 2:58 AM Alexander Korotkov  wrote:
> On Sun, Jun 13, 2021 at 1:16 AM Jonathan S. Katz  wrote:
> > On 6/12/21 5:57 PM, Alexander Korotkov wrote:
> > > On Sat, Jun 12, 2021 at 2:44 AM Alexander Korotkov  
> > > wrote:
> > >> ()On Sat, Jun 12, 2021 at 2:30 AM Justin Pryzby  
> > >> wrote:
> > >>> On Fri, Jun 11, 2021 at 11:37:58PM +0300, Alexander Korotkov wrote:
> >  On Fri, Jun 11, 2021 at 1:04 AM Justin Pryzby  
> >  wrote:
> > >
> > > +{ oid => '1293', descr => 'expand mutlirange to set of ranges',
> > >
> > > typo: mutlirange
> > 
> >  Fixed, thanks.
> > 
> >  The patch with the implementation of both unnest() and cast to array
> >  is attached.  It contains both tests and docs.
> > >>>
> > >>> |+   The multirange could be explicitly cast to the array of 
> > >>> corresponding
> > >>> should say: "can be cast to an array of corresponding.."
> > >>>
> > >>> |+ * Cast multirange to the array of ranges.
> > >>> I think should be: *an array of ranges
> > >>
> > >> Thank you for catching this.
> > >>
> > >>> Per sqlsmith, this is causing consistent crashes.
> > >>> I took one of its less appalling queries and simplified it to this:
> > >>>
> > >>> select
> > >>> pg_catalog.multirange_to_array(
> > >>> cast(pg_catalog.int8multirange() as int8multirange)) as c2
> > >>> from (select 1)x;
> > >>
> > >> It seems that multirange_to_array() doesn't handle empty multiranges.
> > >> I'll post an updated version of the patch tomorrow.
> > >
> > > A revised patch is attached.  Now empty multiranges are handled
> > > properly (and it's covered by tests).  Typos are fixed as well.
> >
> > Tested both against my original cases using both SQL + PL/pgSQL. All
> > worked well. I also tested the empty multirange case as well.
> >
> > Overall the documentation seems to make sense, I'd suggest:
> >
> > +  
> > +   The multirange can be cast to an array of corresponding ranges.
> > +  
> >
> > becomes:
> >
> > +  
> > +   A multirange can be cast to an array of ranges of the same type.
> > +  
>
> Thank you. This change is incorporated in the attached revision of the patch.
>
> This thread gave me another lesson about English articles.  Hopefully,
> I would be able to make progress in future patches :)
>
> > Again, I'll defer to others on the code, but this seems to solve the use
> > case I presented. Thanks for the quick turnaround!
>
> Thank you for the feedback!

I've added the commit message to the patch.  I'm going to push it if
no objections.

--
Regards,
Alexander Korotkov


multirange_unnest_cast_to_array-v4.patch
Description: Binary data


Re: "an SQL" vs. "a SQL"

2021-06-13 Thread Andrew Dunstan


On 6/13/21 7:13 AM, Michael Paquier wrote:
> On Sun, Jun 13, 2021 at 07:36:54AM +0100, Geoff Winkless wrote:
>> On Thu, 10 Jun 2021, 15:35 Alvaro Herrera,  wrote:
>>> src/backend/libpq/auth.c:847:* has.  If it's an MD5 hash, we must do
>>> MD5 authentication, and if it's a
>>> src/backend/libpq/auth.c:848:* SCRAM secret, we must do SCRAM
>>> authentication.
>> Not sure whether you were just listing examples and you weren't suggesting
>> this should be changed, but surely "SCRAM" is pronounced "scram" and is
>> thus "a SCRAM"?
> RFC 5802 uses "a SCRAM something" commonly, but "a SCRAM" alone does
> not make sense:
> https://datatracker.ietf.org/doc/html/rfc5802
>
> The sentences quoted above look fine to me.


I don't think anyone was suggesting SCRAM should be used as a noun
rather than as an adjective. But adjectives can be preceded by an
indefinite article just as nouns can. The discussion simply left out the
implied following noun.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: pg_regress.c also sensitive to various PG* environment variables

2021-06-13 Thread Michael Paquier
On Sat, Jun 12, 2021 at 09:10:06AM +0900, Michael Paquier wrote:
> Good idea, thanks.  I'll add comments for each one that cannot be
> unsetted.

And done, finally.
--
Michael


signature.asc
Description: PGP signature


Re: "an SQL" vs. "a SQL"

2021-06-13 Thread Michael Paquier
On Sun, Jun 13, 2021 at 07:36:54AM +0100, Geoff Winkless wrote:
> On Thu, 10 Jun 2021, 15:35 Alvaro Herrera,  wrote:
>> src/backend/libpq/auth.c:847:* has.  If it's an MD5 hash, we must do
>> MD5 authentication, and if it's a
>> src/backend/libpq/auth.c:848:* SCRAM secret, we must do SCRAM
>> authentication.
> 
> Not sure whether you were just listing examples and you weren't suggesting
> this should be changed, but surely "SCRAM" is pronounced "scram" and is
> thus "a SCRAM"?

RFC 5802 uses "a SCRAM something" commonly, but "a SCRAM" alone does
not make sense:
https://datatracker.ietf.org/doc/html/rfc5802

The sentences quoted above look fine to me.
--
Michael


signature.asc
Description: PGP signature


Re: Continuing instability in insert-conflict-specconflict test

2021-06-13 Thread Noah Misch
The test material added in commit 43e0841 continues to yield buildfarm
failures.  Failures new since the rest of this thread:

 damselfly│ 2021-02-02 10:19:15 │ HEAD  │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=damselfly&dt=2021-02-02%2010%3A19%3A15
 drongo   │ 2021-02-05 01:13:10 │ REL_13_STABLE │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2021-02-05%2001%3A13%3A10
 lorikeet │ 2021-03-05 21:30:13 │ HEAD  │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet&dt=2021-03-05%2021%3A30%3A13
 lorikeet │ 2021-03-16 08:28:36 │ HEAD  │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet&dt=2021-03-16%2008%3A28%3A36
 macaque  │ 2021-03-21 10:14:52 │ REL_13_STABLE │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=macaque&dt=2021-03-21%2010%3A14%3A52
 walleye  │ 2021-03-25 05:00:44 │ REL_13_STABLE │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=walleye&dt=2021-03-25%2005%3A00%3A44
 sungazer │ 2021-04-23 21:52:31 │ REL_13_STABLE │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2021-04-23%2021%3A52%3A31
 gharial  │ 2021-04-30 06:08:36 │ REL_13_STABLE │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-04-30%2006%3A08%3A36
 walleye  │ 2021-05-05 17:00:41 │ REL_13_STABLE │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=walleye&dt=2021-05-05%2017%3A00%3A41
 gharial  │ 2021-05-05 22:35:33 │ REL_13_STABLE │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-05-05%2022%3A35%3A33

On Tue, Aug 25, 2020 at 12:04:41PM -0400, Tom Lane wrote:
> I think what we have to do to salvage this test is to get rid of the
> use of NOTICE outputs, and instead have the test functions insert
> log records into some table, which we can inspect after the fact
> to verify that things happened as we expect.

That sounds promising.  Are those messages important for observing server
bugs, or are they for debugging/modifying the test itself?  If the latter, one
could just change the messages to LOG.  Any of the above won't solve things
completely, because 3 of the 21 failures have diffs in the pg_locks output:

 dory │ 2020-03-14 19:35:31 │ HEAD  │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory&dt=2020-03-14%2019%3A35%3A31
 walleye  │ 2021-03-25 05:00:44 │ REL_13_STABLE │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=walleye&dt=2021-03-25%2005%3A00%3A44
 walleye  │ 2021-05-05 17:00:41 │ REL_13_STABLE │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=walleye&dt=2021-05-05%2017%3A00%3A41

Perhaps the pg_locks query should poll until pg_locks has the expected rows.
Or else poll until all test sessions are waiting or idle.