Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock(PG10.7)

2019-09-29 Thread Andrey Borodin



> 29 сент. 2019 г., в 21:27, Alexander Korotkov  
> написал(а):
> 
> Patch with fix is attached.  Idea is simple: ginScanToDelete() now
> keeps exclusive lock on left page eliminating the need to relock it.
> So, we preserve left-to-right locking order and can't deadlock with
> ginStepRight().

In this function ginDeletePage(gvs, blkno, 
BufferGetBlockNumber(me->leftBuffer),...)
we are going to reread buffer
lBuffer = ReadBufferExtended(gvs->index, MAIN_FORKNUM, leftBlkno,
 RBM_NORMAL, gvs->strategy);
Is it OK?


> 30 сент. 2019 г., в 0:52, Peter Geoghegan  написал(а):
> 
> Why is it okay
> that there is no nbtree-style distinction between page deletion and
> page recycling?
As far as I understand deleted page is stamped with
GinPageSetDeleteXid(page, ReadNewTransactionId());
It will not be recycled until that Xid is far behind.
BTW we found a small bug (wraparound) in similar GiST and B-tree 
implementations.
Probably, it's there in GIN too.

--
Andrey Borodin
Open source RDBMS development team leader
Yandex.Cloud





Re: SSL tests failing for channel_binding with OpenSSL <= 1.0.1

2019-09-29 Thread Michael Paquier
On Fri, Sep 27, 2019 at 11:44:57AM +0900, Michael Paquier wrote:
> We need to do something similar to c3d41cc for the test, as per the
> attached.  I have tested that with OpenSSL 1.0.1 and 1.0.2 to stress
> both scenarios.
> 
> Any objections to this fix?

Committed as a12c75a1.
--
Michael


signature.asc
Description: PGP signature


Re: Change atoi to strtol in same place

2019-09-29 Thread Joe Nelson
Alvaro Herrera wrote:
> ... can we have a new patch?

OK, I've attached v4. It works cleanly on 55282fa20f with
str2int-16.patch applied. My patch won't compile without the other one
applied too.

Changed:
[x] revert my changes in common/Makefile
[x] rename arg_utils.[ch] to option.[ch]
[x] update @pgfeutilsfiles in Mkvcbuild.pm
[x] pgindent everything
[x] get rid of atoi() in more utilities

One question about how the utilities parse port numbers.  I currently
have it check that the value can be parsed as an integer, and that its
range is within 1 .. (1<<16)-1. I wonder if the former restriction is
(un)desirable, because ultimately getaddrinfo() takes a "service name
description" for the port, which can be a name such as found in
'/etc/services' as well as the string representation of a number. If
desired, I *could* treat only range errors as a failure for ports, and
allow integer parse errors.

-- 
Joe Nelson  https://begriffs.com
diff --git a/contrib/pg_standby/Makefile b/contrib/pg_standby/Makefile
index 0bca2f8e9e..cb9292d0f4 100644
--- a/contrib/pg_standby/Makefile
+++ b/contrib/pg_standby/Makefile
@@ -6,6 +6,8 @@ PGAPPICON = win32
 PROGRAM = pg_standby
 OBJS	= pg_standby.o $(WIN32RES)
 
+PG_LDFLAGS += -L$(top_builddir)/src/fe_utils -lpgfeutils
+
 ifdef USE_PGXS
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
diff --git a/contrib/pg_standby/pg_standby.c b/contrib/pg_standby/pg_standby.c
index 031b1b5cd5..56ac7fd726 100644
--- a/contrib/pg_standby/pg_standby.c
+++ b/contrib/pg_standby/pg_standby.c
@@ -33,6 +33,7 @@
 #include "pg_getopt.h"
 
 #include "access/xlog_internal.h"
+#include "fe_utils/option.h"
 
 const char *progname;
 
@@ -678,6 +679,10 @@ main(int argc, char **argv)
 
 	while ((c = getopt(argc, argv, "cdk:lr:s:t:w:")) != -1)
 	{
+		pg_strtoint_status s;
+		int64		parsed;
+		char	   *parse_error;
+
 		switch (c)
 		{
 			case 'c':			/* Use copy */
@@ -687,12 +692,15 @@ main(int argc, char **argv)
 debug = true;
 break;
 			case 'k':			/* keepfiles */
-keepfiles = atoi(optarg);
-if (keepfiles < 0)
+s = pg_strtoint64_range(optarg, ,
+		0, INT_MAX, _error);
+if (s != PG_STRTOINT_OK)
 {
-	fprintf(stderr, "%s: -k keepfiles must be >= 0\n", progname);
+	fprintf(stderr, "%s: -k keepfiles %s\n",
+			progname, parse_error);
 	exit(2);
 }
+keepfiles = parsed;
 break;
 			case 'l':			/* Use link */
 
@@ -706,31 +714,39 @@ main(int argc, char **argv)
 #endif
 break;
 			case 'r':			/* Retries */
-maxretries = atoi(optarg);
-if (maxretries < 0)
+s = pg_strtoint64_range(optarg, ,
+		0, INT_MAX, _error);
+if (s != PG_STRTOINT_OK)
 {
-	fprintf(stderr, "%s: -r maxretries must be >= 0\n", progname);
+	fprintf(stderr, "%s: -r maxretries %s\n",
+			progname, parse_error);
 	exit(2);
 }
+maxretries = parsed;
 break;
 			case 's':			/* Sleep time */
-sleeptime = atoi(optarg);
-if (sleeptime <= 0 || sleeptime > 60)
+s = pg_strtoint64_range(optarg, , 1, 60, _error);
+if (s != PG_STRTOINT_OK)
 {
-	fprintf(stderr, "%s: -s sleeptime incorrectly set\n", progname);
+	fprintf(stderr, "%s: -s sleeptime %s\n",
+			progname, parse_error);
 	exit(2);
 }
+sleeptime = parsed;
 break;
 			case 't':			/* Trigger file */
 triggerPath = pg_strdup(optarg);
 break;
 			case 'w':			/* Max wait time */
-maxwaittime = atoi(optarg);
-if (maxwaittime < 0)
+s = pg_strtoint64_range(optarg, ,
+		0, INT_MAX, _error);
+if (s != PG_STRTOINT_OK)
 {
-	fprintf(stderr, "%s: -w maxwaittime incorrectly set\n", progname);
+	fprintf(stderr, "%s: -w maxwaittime %s\n",
+			progname, parse_error);
 	exit(2);
 }
+maxwaittime = parsed;
 break;
 			default:
 fprintf(stderr, "Try \"%s --help\" for more information.\n", progname);
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 55ef13926d..7869c8cf9a 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -32,6 +32,7 @@
 #include "common/logging.h"
 #include "common/string.h"
 #include "fe_utils/recovery_gen.h"
+#include "fe_utils/option.h"
 #include "fe_utils/string_utils.h"
 #include "getopt_long.h"
 #include "libpq-fe.h"
@@ -2073,6 +2074,10 @@ main(int argc, char **argv)
 	while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
 			long_options, _index)) != -1)
 	{
+		pg_strtoint_status s;
+		int64		parsed;
+		char	   *parse_error;
+
 		switch (c)
 		{
 			case 'C':
@@ -2157,12 +2162,13 @@ main(int argc, char **argv)
 #endif
 break;
 			case 'Z':
-compresslevel = atoi(optarg);
-if (compresslevel < 0 || compresslevel > 9)
+s = pg_strtoint64_range(optarg, , 0, 9, _error);
+if (s != PG_STRTOINT_OK)
 {
-	pg_log_error("invalid compression level \"%s\"", optarg);
+	pg_log_error("invalid compression level: %s", 

Re: Batch insert in CTAS/MatView code

2019-09-29 Thread Paul Guo
>
>
> > > However, I can also see that there is no better alternative.  We need
> to
> > > compute the size of accumulated tuples so far, in order to decide
> whether
> > > to stop accumulating tuples.  There is no convenient way to obtain the
> > > length of the tuple, given a slot.  How about making that decision
> solely
> > > based on number of tuples, so that we can avoid ExecFetchSlotHeapTuple
> call
> > > altogether?
> >
> > ... maybe we should add a new operation to slots, that returns the
> > (approximate?) size of a tuple?
>
> Hm, I'm not convinced that it's worth adding that as a dedicated
> operation. It's not that clear what it'd exactly mean anyway - what
> would it measure? As referenced in the slot? As if it were stored on
> disk? etc?
>
> I wonder if the right answer wouldn't be to just measure the size of a
> memory context containing the batch slots, or something like that.
>
>
Probably a better way is to move those logic (append slot to slots, judge
when to flush, flush, clean up slots) into table_multi_insert()? Generally
the final implementation of table_multi_insert() should be able to know
the sizes easily. One concern is that currently just COPY in the repo uses
multi insert, so not sure if other callers in the future want their own
logic (or set up a flag to allow customization but seems a bit
over-designed?).


Re: Batch insert in CTAS/MatView code

2019-09-29 Thread Paul Guo
On Sat, Sep 28, 2019 at 5:49 AM Andres Freund  wrote:

> Hi,
>
> On 2019-09-09 18:31:54 +0800, Paul Guo wrote:
> > diff --git a/src/backend/access/heap/heapam.c
> b/src/backend/access/heap/heapam.c
> > index e9544822bf..8a844b3b5f 100644
> > --- a/src/backend/access/heap/heapam.c
> > +++ b/src/backend/access/heap/heapam.c
> > @@ -2106,7 +2106,6 @@ heap_multi_insert(Relation relation,
> TupleTableSlot **slots, int ntuples,
> > CommandId cid, int options,
> BulkInsertState bistate)
> >  {
> >   TransactionId xid = GetCurrentTransactionId();
> > - HeapTuple  *heaptuples;
> >   int i;
> >   int ndone;
> >   PGAlignedBlock scratch;
> > @@ -2115,6 +2114,10 @@ heap_multi_insert(Relation relation,
> TupleTableSlot **slots, int ntuples,
> >   SizesaveFreeSpace;
> >   boolneed_tuple_data =
> RelationIsLogicallyLogged(relation);
> >   boolneed_cids =
> RelationIsAccessibleInLogicalDecoding(relation);
> > + /* Declare it as static to let this memory be not on stack. */
> > + static HeapTupleheaptuples[MAX_MULTI_INSERT_TUPLES];
> > +
> > + Assert(ntuples <= MAX_MULTI_INSERT_TUPLES);
> >
> >   /* currently not needed (thus unsupported) for heap_multi_insert()
> */
> >   AssertArg(!(options & HEAP_INSERT_NO_LOGICAL));
> > @@ -2124,7 +2127,6 @@ heap_multi_insert(Relation relation,
> TupleTableSlot **slots, int ntuples,
> >
> HEAP_DEFAULT_FILLFACTOR);
> >
> >   /* Toast and set header data in all the slots */
> > - heaptuples = palloc(ntuples * sizeof(HeapTuple));
> >   for (i = 0; i < ntuples; i++)
> >   {
> >   HeapTuple   tuple;
>
> I don't think this is a good idea. We shouldn't unnecessarily allocate
> 8KB on the stack. Is there any actual evidence this is a performance
> benefit? To me this just seems like it'll reduce the flexibility of the
>

Previous  heaptuples is palloc-ed in each batch, which should be slower than
pre-allocated & reusing memory in theory.

API, without any benefit.  I'll also note that you've apparently not
> updated tableam.h to document this new restriction.
>

Yes it should be moved from heapam.h to that file along with the 65535
definition.


Re: pg_wal/RECOVERYHISTORY file remains after archive recovery

2019-09-29 Thread Masahiko Sawada
On Mon, Sep 30, 2019 at 10:10 AM Michael Paquier  wrote:
>
> On Fri, Sep 27, 2019 at 10:00:16PM +0900, Masahiko Sawada wrote:
> > I abandoned once to move the removal code to between
> > writeTimeLineHistory() and timeline switching because of expanding the
> > window but since unlink itself will complete within a very short time
> > it would not be problamatic much.
> >
> > Attached the updated patch that just moves the removal code.
>
> That's not quite it, as you forgot to move the declaration of
> recoveryPath so the patch fails to compile.

Oops, thanks.

>
> Adding some tests would be nice, so I updated your patch to include
> something.  One place where we recover files from archives is
> 002_archiving.pl, still the files get renamed to the segment names
> when recovered so that's difficult to make that part 100%
> deterministic yet.  Still as a reminder of the properties behind those
> files it does not sound bad to document it in the test either, that's
> cheap, and we get the future covered.

Thank you for updating the patch!

+1 to add tests but even the current postgres passes this tests
because of two reasons: one is $node_standby tries to restore
0001.history but fails and therefore RECOVERYHISTORY isn't
created. Another one is described  To reproduce this issue the new
timeline ID of recovered database needs to be more than 3.

+isnt(
+ -f "$node_standby_data/pg_wal/RECOVERYHISTORY",
+ "RECOVERYHISTORY removed after promotion");
+isnt(
+ -f "$node_standby_data/pg_wal/RECOVERYXLOG",
+ "RECOVERYXLOG removed after promotion");

I think that the above checks are always true because isnt() function
checks if the 1st argument and 2nd argument are not the same.

I've attached the updated version patch including the tests. Please review it.

Regards,

--
Masahiko Sawada


v5_remove_recovered_historyfile.patch
Description: Binary data


Re: recovery_min_apply_delay in archive recovery causes assertion failure in latch

2019-09-29 Thread Michael Paquier
On Mon, Sep 30, 2019 at 12:49:03AM +0900, Fujii Masao wrote:
> Attached patch fixes this issue by making archive recovery always ignore
> recovery_min_apply_delay. This change is OK because
> recovery_min_apply_delay was introduced for standby mode, I think.
> 
> This issue is not new in v12. I observed that the issue was reproduced
> in v11. So the back-patch is necessary.

I have not directly tested, but from a lookup at the code I think
that you are right.  Perhaps we'd want more safeguards in
WaitForWALToBecomeAvailable(), like an assert within the
XLOG_FROM_STREAM part similar to the check you are adding?  My point
is that we should switch to XLOG_FROM_STREAM only if we are in standby
mode, and that's the only place where the startup process looks at
recoveryWakeupLatch.
--
Michael


signature.asc
Description: PGP signature


Re: default partitions can be partitioned and have default partitions?

2019-09-29 Thread Amit Langote
On Sun, Sep 29, 2019 at 12:18 AM Justin Pryzby  wrote:
> postgres=# CREATE TABLE t(i int)PARTITION BY RANGE(i);
> CREATE TABLE
> postgres=# CREATE TABLE t0 PARTITION OF t DEFAULT PARTITION BY RANGE(i);
> CREATE TABLE
> postgres=# CREATE TABLE t00 PARTITION OF t0 DEFAULT; -- oh yes
> CREATE TABLE

Actually, you can go even further

CREATE TABLE t00 PARTITION OF t0 DEFAULT PARTITION BY HASH (i);

> Not sure how it could be useful to partition default into subpartitions of
> lists, ranges, hashes.

Yeah, maybe the top-level partitioning should be designed such that
the default partition doesn't need sub-partitioning, but perhaps
Postgres shouldn't prevent users from trying it.  This was discussed
when the default partition feature went in; see [1].

Thanks,
Amit

[1] 
https://www.postgresql.org/message-id/CA%2BTgmoYh-hitRRUfxVxDVAjioYPrjhBCehePGRUa6qNNUnKvuw%40mail.gmail.com




Re: Standby accepts recovery_target_timeline setting?

2019-09-29 Thread Fujii Masao
On Mon, Sep 30, 2019 at 6:59 AM Peter Eisentraut
 wrote:
>
> On 2019-09-29 18:36, Fujii Masao wrote:
> > Yes, but ArchiveRecoveryRequested should be checked instead of
> > InArchiveRecovery, I think. Otherwise recovery targets would take effect
> > when recovery.signal is missing but backup_label exists. In this case,
> > InArchiveRecovery is set to true though ArchiveRecoveryRequested is
> > false because recovery.signal is missing.
> >
> > With the attached patch, I checked that the steps that I described
> > upthread didn't reproduce the issue.
>
> Your patch looks correct to me.

Thanks! So I committed the patch.

Also we need to do the same thing for other recovery options like
restore_command. Attached is the patch which makes crash recovery
ignore restore_command and recovery_end_command.

Regards,

-- 
Fujii Masao


ignore-restore-command-in-crash-recovery.patch
Description: Binary data


Re: Hooks for session start and end, take two

2019-09-29 Thread Michael Paquier
On Sat, Sep 28, 2019 at 10:43:36AM -0300, Fabrízio de Royes Mello wrote:
> I poked a little with the patch and everything is ok. Your check for normal
> backend on test_session_hooks is much simpler than I did before:
> 
> +/* just consider normal backends */
> +if (MyBackendId == InvalidBackendId)
> +return;
> 
> But one thing came to my mind, why not in this first version we hook just
> normal backends?

This code path can only be taken by normal backends, so that would
apply, still I don't actually see why we should limit us here on the
backend side.  If for a reason or another those two code paths begin
to be taken by a backend with InvalidBackendId, then users of the
session start/end hook will need to think how to handle it if they
didn't from the start, which sounds like a good thing to me.
--
Michael


signature.asc
Description: PGP signature


documentation inconsistent re: alignment

2019-09-29 Thread Chapman Flack
Hi,

The documentation for CREATE TYPE has this to say about alignment:

"The alignment parameter specifies the storage alignment required for the
data type. The allowed values equate to alignment on 1, 2, 4, or 8 byte
boundaries."

... while the documentation for pg_type has:

"c = char alignment, i.e., no alignment needed.
s = short alignment (2 bytes on most machines).
i = int alignment (4 bytes on most machines).
d = double alignment (8 bytes on many machines, but by no means all)."

so, in 2019, are the alignments weaselly and variable, or are they 1,2,4,8?

Regards,
-Chap





Re: [DOC] Document concurrent index builds waiting on each other

2019-09-29 Thread Michael Paquier
On Sat, Sep 28, 2019 at 10:22:28PM -0300, Alvaro Herrera wrote:
> I always thought that create index concurrently was prevented from
> running concurrently in a table by the ShareUpdateExclusive lock that's
> held during the operation.

REINDEX CONCURRENTLY and CIC can deadlock while waiting for each other
to finish after their validation phase, see:
https://www.postgresql.org/message-id/20190507030756.gd1...@paquier.xyz
https://www.postgresql.org/message-id/20190507032543.gh1...@paquier.xyz
--
Michael


signature.asc
Description: PGP signature


Re: pg_wal/RECOVERYHISTORY file remains after archive recovery

2019-09-29 Thread Michael Paquier
On Fri, Sep 27, 2019 at 10:00:16PM +0900, Masahiko Sawada wrote:
> I abandoned once to move the removal code to between
> writeTimeLineHistory() and timeline switching because of expanding the
> window but since unlink itself will complete within a very short time
> it would not be problamatic much.
> 
> Attached the updated patch that just moves the removal code.

That's not quite it, as you forgot to move the declaration of
recoveryPath so the patch fails to compile.

Adding some tests would be nice, so I updated your patch to include
something.  One place where we recover files from archives is
002_archiving.pl, still the files get renamed to the segment names
when recovered so that's difficult to make that part 100%
deterministic yet.  Still as a reminder of the properties behind those
files it does not sound bad to document it in the test either, that's
cheap, and we get the future covered.
--
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6c69eb6dd7..1e5d1691ee 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5461,7 +5461,6 @@ validateRecoveryParameters(void)
 static void
 exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
 {
-	char		recoveryPath[MAXPGPATH];
 	char		xlogfname[MAXFNAMELEN];
 	XLogSegNo	endLogSegNo;
 	XLogSegNo	startLogSegNo;
@@ -5541,17 +5540,6 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
 	XLogFileName(xlogfname, ThisTimeLineID, startLogSegNo, wal_segment_size);
 	XLogArchiveCleanup(xlogfname);
 
-	/*
-	 * Since there might be a partial WAL segment named RECOVERYXLOG, get rid
-	 * of it.
-	 */
-	snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
-	unlink(recoveryPath);		/* ignore any error */
-
-	/* Get rid of any remaining recovered timeline-history file, too */
-	snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYHISTORY");
-	unlink(recoveryPath);		/* ignore any error */
-
 	/*
 	 * Remove the signal files out of the way, so that we don't accidentally
 	 * re-enter archive recovery mode in a subsequent crash.
@@ -7419,6 +7407,7 @@ StartupXLOG(void)
 	if (ArchiveRecoveryRequested)
 	{
 		char		reason[200];
+		char		recoveryPath[MAXPGPATH];
 
 		Assert(InArchiveRecovery);
 
@@ -7475,6 +7464,17 @@ StartupXLOG(void)
 		 */
 		writeTimeLineHistory(ThisTimeLineID, recoveryTargetTLI,
 			 EndRecPtr, reason);
+
+		/*
+		 * Since there might be a partial WAL segment named RECOVERYXLOG, get
+		 * rid of it.
+		 */
+		snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
+		unlink(recoveryPath);	/* ignore any error */
+
+		/* Get rid of any remaining recovered timeline-history file, too */
+		snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYHISTORY");
+		unlink(recoveryPath);	/* ignore any error */
 	}
 
 	/* Save the selected TimeLineID in shared memory, too */
diff --git a/src/test/recovery/t/002_archiving.pl b/src/test/recovery/t/002_archiving.pl
index e1bd3c95cc..e71cb8a25a 100644
--- a/src/test/recovery/t/002_archiving.pl
+++ b/src/test/recovery/t/002_archiving.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 1;
+use Test::More tests => 3;
 use File::Copy;
 
 # Initialize master node, doing archives
@@ -49,3 +49,14 @@ $node_standby->poll_query_until('postgres', $caughtup_query)
 my $result =
   $node_standby->safe_psql('postgres', "SELECT count(*) FROM tab_int");
 is($result, qq(1000), 'check content from archives');
+
+# Promote the standby, and check that files specifically generated during
+# archive recovery are cleaned up.
+$node_standby->promote;
+my $node_standby_data = $node_standby->data_dir;
+isnt(
+	-f "$node_standby_data/pg_wal/RECOVERYHISTORY",
+	"RECOVERYHISTORY removed after promotion");
+isnt(
+	-f "$node_standby_data/pg_wal/RECOVERYXLOG",
+	"RECOVERYXLOG removed after promotion");


signature.asc
Description: PGP signature


checking my understanding of TupleDesc

2019-09-29 Thread Chapman Flack
>From looking around the code, I've made these tentative observations
about TupleDescs:

1. If the TupleDesc was obtained straight from the relcache for some
   relation, then all of its attributes should have nonzero attrelid
   identifying that relation, but in (every? nearly every?) other case,
   the attributes found in a TupleDesc will have a dummy attrelid of zero.

2. The attributes in a TupleDesc will (always?) have consecutive attnum
   corresponding to their positions in the TupleDesc (and therefore
   redundant). A query, say, that projects out a subset of columns
   from a relation will not have a result TupleDesc with attributes
   still bearing their original attrelid and attnum; they'll have
   attrelid zero and consecutive renumbered attnum.

   Something like SendRowDescriptionCols_3 that wants the original table
   and attnum has to reconstruct them from the targetlist if available,

Have I mistaken any of that?

Thanks,
-Chap




Re: JIT performance bug/regression & JIT EXPLAIN

2019-09-29 Thread Andres Freund
Hi,

On 2019-09-27 00:20:53 -0700, Andres Freund wrote:
> Unfortunately I found a performance regression for JITed query
> compilation introduced in 12, compared to 11. Fixed in one of the
> attached patches 
> (v1-0009-Fix-determination-when-tuple-deforming-can-be-JIT.patch
> - which needs a better commit message).
> 
> The first question is when to push that fix. I'm inclined to just do so
> now - as we still do JITed tuple deforming in most cases, as well as
> doing so in 11 in the places this patch fixes, the risk of that seems
> low. But I can also see an arguments for waiting after 12.0.

Since nobody opined, I now have pushed that, and the other fix mentioned
later in that email.

I'd appreciate comments on the rest of the email, it's clear that we
need to improve the test infrastructure here. And also the explain
output for grouping sets...

Greetings,

Andres Freund




typo: postGER

2019-09-29 Thread Justin Pryzby
$ git grep Postger
src/backend/po/tr.po:"Bu durum, sistemin semaphore set (SEMMNI) veya semaphore 
(SEMMNS) sayı sınırlaması aşmasında meydana gelmektedir.  Belirtilen 
parametrelerin değerleri yükseltmelisiniz.  Başka seçeneğiniz ise PostgerSQL 
sisteminin semaphore tütekitimini max_connections parametresini şu an %d) 
düşürerek azaltabilirsiniz.\n"

commit 3c439a58df83ae51f650cfae9878df1f9b70c4b8
Author: Peter Eisentraut 
Date:   Mon May 20 16:00:53 2019 +0200

Translation updates

Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash: a20bf6b8a5b4e32450967055eb5b07cee4704edd




Re: Standby accepts recovery_target_timeline setting?

2019-09-29 Thread Peter Eisentraut
On 2019-09-29 18:36, Fujii Masao wrote:
> Yes, but ArchiveRecoveryRequested should be checked instead of
> InArchiveRecovery, I think. Otherwise recovery targets would take effect
> when recovery.signal is missing but backup_label exists. In this case,
> InArchiveRecovery is set to true though ArchiveRecoveryRequested is
> false because recovery.signal is missing.
> 
> With the attached patch, I checked that the steps that I described
> upthread didn't reproduce the issue.

Your patch looks correct to me.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: v12 relnotes: alter system tables

2019-09-29 Thread Peter Eisentraut
On 2019-09-27 20:30, Justin Pryzby wrote:
> https://www.postgresql.org/docs/12/release-12.html
> 
> |Allow modifications of system catalogs' options using ALTER TABLE (Peter 
> Eisentraut)
> |Modifications of catalogs' reloptions and autovacuum settings are now 
> supported.
> 
> I wonder if that should say: "... WHEN ALLOW_SYSTEM_TABLE_MODS IS ENABLED."

fixed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Standby accepts recovery_target_timeline setting?

2019-09-29 Thread Peter Eisentraut
On 2019-09-27 17:14, David Steele wrote:
> On 9/27/19 10:36 AM, Peter Eisentraut wrote:
>> On 2019-09-26 23:02, David Steele wrote:
>>> On 9/26/19 4:48 PM, Peter Eisentraut wrote:
>>>
 I don't know if recovery_target_timeline is actually useful to change in
 standby mode.
>> OK, I have committed your original documentation patch.
> Thanks, that's a good start.
> 
> As Fujii noticed and I have demonstrated upthread, just about any target
> setting can be used in a standby restore.  This matches the behavior of
> prior versions so it's not exactly a regression, but the old docs made
> no claim that standby_mode disabled targeted restore.

I have further fixed the documentation.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Windows v readline

2019-09-29 Thread Andrew Dunstan


The configure code currently has this:


# readline on MinGW has problems with backslashes in psql and other bugs.
# This is particularly a problem with non-US code pages.
# Therefore disable its use until we understand the cause. 2004-07-20
if test "$PORTNAME" = "win32"; then
  if test "$with_readline" = yes; then
    AC_MSG_WARN([*** Readline does not work on MinGW --- disabling])
    with_readline=no
  fi
fi


2004 is a very long time ago. Has anyone looked at this more recently?
It would certainly be nice to have readline-enabled psql on Windows if
possible.


cheers


andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





python detection v windows

2019-09-29 Thread Andrew Dunstan


I just tried building with Python on msys2. However, the setup of the
latest python doesn't fit our detection code. I see this:


# /c/Python37/python -c 'import distutils.sysconfig;
print(distutils.sysconfig.get_config_vars());'

{'LIBDEST': 'C:\\Python37\\Lib', 'BINLIBDEST': 'C:\\Python37\\Lib',
'INCLUDEPY': 'C:\\Python37\\include', 'EXT_SUFFIX':
'.cp37-win_amd64.pyd', 'EXE': '.exe', 'VERSION': '37', 'BINDIR':
'C:\\Python37', 'prefix': 'C:\\Python37', 'exec_prefix': 'C:\\Python37',
'SO': '.cp37-win_amd64.pyd', 'srcdir': 'C:\\Python37'}


The python3.dll and python37.dll files are in c:\\python37, i.e. the
BINDIR as one might expect on Windows.


It would be nice to get this working.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock(PG10.7)

2019-09-29 Thread Peter Geoghegan
On Sun, Sep 29, 2019 at 7:38 AM Alexander Korotkov
 wrote:
> Starting from root seems OK for me, because vacuum blocks all
> concurrent inserts before doing this.  But this needs to be properly
> documented in readme.

I never got an adequate answer to this closely related question almost
two years ago:

https://www.postgresql.org/message-id/CAH2-Wz=gtnapzeezqyelov3h1fxpo5xhmrbp6amgeklkv95...@mail.gmail.com

In general, ginInsertCleanup() seems badly designed. Why is it okay
that there is no nbtree-style distinction between page deletion and
page recycling?

> Locking from right to left is clearly wrong.  It could deadlock with
> concurrent ginStepRight(), which locks from left to right.  I expect
> this happened in your case.  I'm going to reproduce this and fix.

I am sick and tired of seeing extremely basic errors like this within
GIN's locking protocols. Bugs happen, but these are not ordinary bugs.
They're more or less all a result of giving no thought to the high
level design. I'm not blaming you for this, or any one person. But
this is not okay.

Anything around index concurrency needs to be explained in
excruciating detail, while taking a top-down approach that applies
general rules (e.g. you can only do lock coupling left to right, or
bottom to top in nbtree). Anything less than that should be assumed to
be wrong on general principle.

-- 
Peter Geoghegan




Re: Unstable select_parallel regression output in 12rc1

2019-09-29 Thread Christoph Berg
Re: Tom Lane 2019-09-28 <24917.1569692...@sss.pgh.pa.us>
> > (There's some extension modules
> > where the testsuite fails at a much higher rate, getting all targets
> > to pass at the same time is next to impossible there :(. )
> 
> I feel your pain, believe me.  Used to fight the same kind of problems
> when I was at Red Hat.  Are any of those extension modules part of
> Postgres?

No, external ones. The main offenders at the moment are pglogical and
patroni (admittedly not an extension in the strict sense). Both have
extensive testsuites that exercise replication scenarios that are
prone to race conditions. (Maybe we should just run less tests for the
packaging.)

Christoph




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-09-29 Thread Tomas Vondra

On Sun, Sep 29, 2019 at 02:30:44PM -0300, Alvaro Herrera wrote:

On 2019-Sep-29, Amit Kapila wrote:


On Sun, Sep 29, 2019 at 12:39 AM Tomas Vondra  
wrote:



> So that's what I did in the attached patch - I've renamed the GUC to
> logical_decoding_work_mem, detached it from m_w_m and set the default to
> 64MB (i.e. the same default as m_w_m).

Fair enough, let's not argue more on this unless someone else wants to
share his opinion.


I just read this part of the conversation and I agree that having a
separate GUC with its own value independent from other GUCs is a good
solution.  Tying it to m_w_m seemed reasonable, but it's true that
people frequently set m_w_m very high, and it would be undesirable to
propagate that value to logical decoding memory usage.


I wonder what would constitute good advice on how to set this value, I
mean what is the metric that the user needs to be thinking about.   Is
it the total of memory required to keep all concurrent write transactions
in memory?  (Quick example: if you do 2048 wTPS and each transaction
lasts 1s, and each transaction does 1kB of logically-decoded changes,
then ~2MB are sufficient for the average case.  Is that correct? 


Yes, something like that. Essentially we'd like to keep all concurrent
transactions decoded in memory, to eliminate the need to spill to disk.
One of the subsequent patches adds some subscription-level stats, so
maybe we don't need to worry about this too much - the stats seem like a
better source of information for tuning.


I *think* that full-page images do not count, correct?  With these
things in mind users could go through pg_waldump output and figure out
what to set the value to.)



Right, FPW do not matter here.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-09-29 Thread Alvaro Herrera
On 2019-Sep-29, Amit Kapila wrote:

> On Sun, Sep 29, 2019 at 12:39 AM Tomas Vondra  
> wrote:

> > So that's what I did in the attached patch - I've renamed the GUC to
> > logical_decoding_work_mem, detached it from m_w_m and set the default to
> > 64MB (i.e. the same default as m_w_m).
> 
> Fair enough, let's not argue more on this unless someone else wants to
> share his opinion.

I just read this part of the conversation and I agree that having a
separate GUC with its own value independent from other GUCs is a good
solution.  Tying it to m_w_m seemed reasonable, but it's true that
people frequently set m_w_m very high, and it would be undesirable to
propagate that value to logical decoding memory usage.


I wonder what would constitute good advice on how to set this value, I
mean what is the metric that the user needs to be thinking about.   Is
it the total of memory required to keep all concurrent write transactions 
in memory?  (Quick example: if you do 2048 wTPS and each transaction
lasts 1s, and each transaction does 1kB of logically-decoded changes,
then ~2MB are sufficient for the average case.  Is that correct?  I
*think* that full-page images do not count, correct?  With these things
in mind users could go through pg_waldump output and figure out what to
set the value to.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Standby accepts recovery_target_timeline setting?

2019-09-29 Thread Fujii Masao
On Sun, Sep 29, 2019 at 6:08 AM Peter Eisentraut
 wrote:
>
> On 2019-09-28 19:45, Tom Lane wrote:
> > Maybe I'm misunderstanding, but I think that rather than adding error
> > checks that were not there before, the right path to fixing this is
> > to cause these settings to be ignored if we're doing crash recovery.
>
> That makes sense to me.

+1

> Something like this (untested)?

Yes, but ArchiveRecoveryRequested should be checked instead of
InArchiveRecovery, I think. Otherwise recovery targets would take effect
when recovery.signal is missing but backup_label exists. In this case,
InArchiveRecovery is set to true though ArchiveRecoveryRequested is
false because recovery.signal is missing.

With the attached patch, I checked that the steps that I described
upthread didn't reproduce the issue.

Regards,

-- 
Fujii Masao


ignore-recovery-targets-in-crash-recovery.patch
Description: Binary data


Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock(PG10.7)

2019-09-29 Thread Alexander Korotkov
On Sun, Sep 29, 2019 at 6:12 PM Alexander Korotkov
 wrote:
> On Sun, Sep 29, 2019 at 5:38 PM Alexander Korotkov
>  wrote:
> > On Sun, Sep 29, 2019 at 11:17 AM chenhj  wrote:
> > > Does the locking order of autovacuum process(root->right->left) correct? 
> > > While insert process lock gin buffer by order of bottom->top and 
> > > left->right.
> > >
> > > 1. vacuum(root->right->left):
> >
> > Starting from root seems OK for me, because vacuum blocks all
> > concurrent inserts before doing this.  But this needs to be properly
> > documented in readme.
> >
> > Locking from right to left is clearly wrong.  It could deadlock with
> > concurrent ginStepRight(), which locks from left to right.  I expect
> > this happened in your case.  I'm going to reproduce this and fix.
>
> I just managed to reproduce this using two sessions on master branch.
>
> session 1
> session 2
>
> # create table test with (autovacuum_enabled = false) as (select
> array[1] ar from generate_series(1,2) i);
> # create index test_ar_idx on test using gin (ar);
> # vacuum analyze test;
> # delete from test;
>
> # set enable_seqscan = off;
> gdb> b ginbtree.c:150
> # select * from test where ar @> '{1}'::integer[];
> Step in gdb just before ReadBuffer() in ReleaseAndReadBuffer().
>
> gdb> b ginvacuum.c:155
> # vacuum test;
>
> gdb > continue
> gdb> continue

Patch with fix is attached.  Idea is simple: ginScanToDelete() now
keeps exclusive lock on left page eliminating the need to relock it.
So, we preserve left-to-right locking order and can't deadlock with
ginStepRight().

Also, we need to adjust Concurrency section in GIN README.  For me the
description looks vague and inconsistent even with current behavior.
I'm going to post this later.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


gin_ginDeletePage_ginStepRight_deadlock_fix-1.patch
Description: Binary data


recovery_min_apply_delay in archive recovery causes assertion failure in latch

2019-09-29 Thread Fujii Masao
Hi,

I got the following assertion failure when I enabled recovery_min_apply_delay
and started archive recovery (i.e., I put only recovery.signal not
standby.signal).

TRAP: FailedAssertion("latch->owner_pid == MyProcPid", File:
"latch.c", Line: 522)

Here is the example to reproduce the issue:


initdb -D data
pg_ctl -D data start
psql -c "alter system set recovery_min_apply_delay to '60s'"
psql -c "alter system set archive_mode to on"
psql -c "alter system set archive_command to 'cp %p ../arch/%f'"
psql -c "alter system set restore_command to 'cp ../arch/%f %p'"
mkdir arch
pg_basebackup -D bkp -c fast
pgbench -i
pgbench -t 1000
pg_ctl -D data -m i stop
rm -rf bkp/pg_wal
mv data/pg_wal bkp
rm -rf data
mv bkp data
touch data/recovery.signal
pg_ctl -D data -W start


The latch that causes this assertion failure is recoveryWakeupLatch.
The ownership of this latch is taken only when standby mode is
requested. But this latch can be used when starting archive recovery
with recovery_min_apply_delay set even though it's unowned.
So the assertion failure happened.

Attached patch fixes this issue by making archive recovery always ignore
recovery_min_apply_delay. This change is OK because
recovery_min_apply_delay was introduced for standby mode, I think.

This issue is not new in v12. I observed that the issue was reproduced
in v11. So the back-patch is necessary.

Regards,

-- 
Fujii Masao


fix-assertion-failure-in-latch.patch
Description: Binary data


Re: [DOC] Document concurrent index builds waiting on each other

2019-09-29 Thread Alvaro Herrera
On 2019-Sep-28, James Coleman wrote:

> I believe caveats like this are worth calling out rather than
> expecting users to have to understand the implementation details an
> work out the implications on their own.

I agree.

> I read Alvaro as referring to the fact that the docs already call out
> the following:
> 
> > Regular index builds permit other regular index builds on the same
> > table to occur simultaneously, but only one concurrent index build
> > can occur on a table at a time.

Yeah, that's what I was understanding.

BTW I think there's an approach that could alleviate part of this
problem, at least some of the time: whenever CIC runs for an index
that's not on expression and not partial, we could set the
PROC_IN_VACUUM flag.  That would cause it to get ignored by other
processes for snapshot purposes (including CIC itself), as well as by
vacuum.  I need to take some time to research the safety of this, but
intuitively it seems safe.

Even further, I think we could also do it for regular CREATE INDEX
(under the same conditions) provided that it's not run in a transaction
block.  But that requires even more research/proof.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock(PG10.7)

2019-09-29 Thread Alexander Korotkov
On Sun, Sep 29, 2019 at 5:38 PM Alexander Korotkov
 wrote:
> On Sun, Sep 29, 2019 at 11:17 AM chenhj  wrote:
> > Does the locking order of autovacuum process(root->right->left) correct? 
> > While insert process lock gin buffer by order of bottom->top and 
> > left->right.
> >
> > 1. vacuum(root->right->left):
>
> Starting from root seems OK for me, because vacuum blocks all
> concurrent inserts before doing this.  But this needs to be properly
> documented in readme.
>
> Locking from right to left is clearly wrong.  It could deadlock with
> concurrent ginStepRight(), which locks from left to right.  I expect
> this happened in your case.  I'm going to reproduce this and fix.

I just managed to reproduce this using two sessions on master branch.

session 1
session 2

# create table test with (autovacuum_enabled = false) as (select
array[1] ar from generate_series(1,2) i);
# create index test_ar_idx on test using gin (ar);
# vacuum analyze test;
# delete from test;

# set enable_seqscan = off;
gdb> b ginbtree.c:150
# select * from test where ar @> '{1}'::integer[];
Step in gdb just before ReadBuffer() in ReleaseAndReadBuffer().

gdb> b ginvacuum.c:155
# vacuum test;

gdb > continue
gdb> continue

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock(PG10.7)

2019-09-29 Thread Alexander Korotkov
Hi!

Thank you for reporting.

On Sun, Sep 29, 2019 at 11:17 AM chenhj  wrote:
> Does the locking order of autovacuum process(root->right->left) correct? 
> While insert process lock gin buffer by order of bottom->top and left->right.
>
> 1. vacuum(root->right->left):

Starting from root seems OK for me, because vacuum blocks all
concurrent inserts before doing this.  But this needs to be properly
documented in readme.

Locking from right to left is clearly wrong.  It could deadlock with
concurrent ginStepRight(), which locks from left to right.  I expect
this happened in your case.  I'm going to reproduce this and fix.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: pgsql: Implement jsonpath .datetime() method

2019-09-29 Thread Alexander Korotkov
On Fri, Sep 27, 2019 at 6:58 PM Nikita Glukhov  wrote:
> On Thu, Sep 26, 2019 at 2:57 AM Tom Lane  
> wrote:
> > * More generally, it's completely unclear why some error conditions
> > are thrown as errors and others just result in returning *have_error.
> > In particular, it seems weird that some unsupported datatype combinations
> > cause hard errors while others do not.  Maybe that's fine, but if so,
> > the function header comment is falling down on the job by not explaining
> > the reasoning.
>
> All cast errors are caught by jsonpath predicate.  Comparison of the
> uncomparable datetime types (time[tz] to dated types) also returns Unknown.
> And only if datatype conversion requires current timezone, which is not
> available in immutable family of jsonb_xxx() functions, hard error is thrown.
> This behavior is specific only for our jsonpath implementation.  But I'm
> really not sure if we should throw an error or return Unknown in this case.

I'd like to share my further thoughts about errors.  I think we should
suppress errors defined by standard and which user can expect.  So,
user can expect that wrong date format causes an error, division by
zero causes an error and so on.  And those errors are defined by
standard.

However, we error is caused by limitation of our implementation, then
suppression doesn't look right to me.

For instance.

# select jsonb_path_query('"100-01-01"', '$.datetime() >
"2020-01-01 12:00:00".datetime()'::jsonpath);
 jsonb_path_query
--
 null
(1 row)

# select '100-01-01'::date > '2020-01-01 12:00:00'::timestamp;
ERROR:  date out of range for timestamp

So, jsonpath behaves like 100 is not greater than 2020.  This
looks like plain false.  And user can't expect that unless she is
familiar with our particular issues.  Now I got opinion  that such
errors shouldn't be suppressed.  We can't suppress *every* error.  If
trying to do this, we can come to an idea to suppress OOM error and
return garbage then, which is obviously ridiculous.  Opinions?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Connections hang indefinitely while taking a gin index's LWLock buffer_content lock(PG10.7)

2019-09-29 Thread chenhj
Hi,all


In our PostgreSQL 10.7(rhel 6.3) database, autovacuum process and many insert 
processes blocked in gin index's LWLock:buffer_content for long time. 


In other words, the following gin index lwlock deadlock phenomenon has occurred 
again. Since the following bug in 10.7 has been fixed. So this should be a new 
bug.


https://www.postgresql.org/message-id/flat/31a702a.14dd.166c1366ac1.Coremail.chjischj%40163.com


We have already obtained coredump files of autovacuum process and one of insert 
processes.
Unfortunately the insert process(run by gcore) held no lwlock, it should be 
another process(we did not fetch core file) that hold the lwlock needed for 
autovacuum process.


the stack is as following:


## stack of one insert process: Acquire lock 0x7f6c517dbfa4 which was held by 
vacuum process

(gdb) bt
#0  0x00369ea0da00 in sem_wait () from /lib64/libpthread.so.0
#1  0x006a7910 in PGSemaphoreLock (sema=0x7f6c4f76a7b8) at pg_sema.c:316
#2  0x00718225 in LWLockAcquire (lock=0x7f6c517dbfa4, mode=LW_SHARED) 
at lwlock.c:1233
#3  0x0048b622 in ginTraverseLock (buffer=224225, searchMode=0 '\000') 
at ginbtree.c:40
#4  0x0048ca13 in ginFindLeafPage (btree=0x7fffc71c4ea0, searchMode=0 
'\000', snapshot=0x0) at ginbtree.c:97
#5  0x004894db in ginInsertItemPointers (index=, 
rootBlkno=, items=, nitem=, buildStats=0x0)
at gindatapage.c:1909
#6  0x004863a7 in ginEntryInsert (ginstate=0x1c72158, attnum=1, 
key=20190913, category=0 '\000', items=0x1c81508, nitem=72, buildStats=0x0) at 
gininsert.c:214
#7  0x0049219a in ginInsertCleanup (ginstate=0x1c72158, full_clean=0 
'\000', fill_fsm=1 '\001', forceCleanup=, stats=) at ginfast.c:878
#8  0x0049308e in ginHeapTupleFastInsert (ginstate=0x1c72158, 
collector=) at ginfast.c:443
#9  0x00486749 in gininsert (index=, 
values=0x7fffc71c54f0, isnull=0x7fffc71c5600 "", ht_ctid=0x1c6d3a4, 
heapRel=, 
checkUnique=, indexInfo=0x1c61da8) at gininsert.c:522
#10 0x005f75f0 in ExecInsertIndexTuples (slot=0x1c62168, 
tupleid=0x1c6d3a4, estate=0x1c61768, noDupErr=0 '\000', specConflict=0x0, 
arbiterIndexes=0x0) at execIndexing.c:387
#11 0x00616497 in ExecInsert (pstate=0x1c61ab8) at nodeModifyTable.c:519
#12 ExecModifyTable (pstate=0x1c61ab8) at nodeModifyTable.c:1779
#13 0x005fb6bf in ExecProcNode (queryDesc=0x1c67760, direction=, count=0, execute_once=-72 '\270') at 
../../../src/include/executor/executor.h:250
#14 ExecutePlan (queryDesc=0x1c67760, direction=, count=0, 
execute_once=-72 '\270') at execMain.c:1723
#15 standard_ExecutorRun (queryDesc=0x1c67760, direction=, 
count=0, execute_once=-72 '\270') at execMain.c:364
#16 0x7f6e226aa6f8 in pgss_ExecutorRun (queryDesc=0x1c67760, 
direction=ForwardScanDirection, count=0, execute_once=1 '\001') at 
pg_stat_statements.c:889
#17 0x7f6e224a474d in explain_ExecutorRun (queryDesc=0x1c67760, 
direction=ForwardScanDirection, count=0, execute_once=1 '\001') at 
auto_explain.c:267
#18 0x0072a15b in ProcessQuery (plan=, 
sourceText=0x1c21458 "INSERT INTO bi_dm.tdm_wh_shopgds_fnsh_rt 
(STATIS_DATE,SITE_CD,LGORT,ZSIZE,ZVTWEG,VSBED,TOTAL_CNT,FNSH_CNT,UNFNSH_CNT,ETL_TIME,DEPT_CD,TMALL_FLG,BUSS_TP,ZCKYWLX)
 VALUES($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$"..., params=0x1c21580, queryEnv=0x0, 
dest=, completionTag=0x7fffc71c5de0 "") at pquery.c:161
#19 0x0072a395 in PortalRunMulti (portal=0x1c57f18, isTopLevel=1 
'\001', setHoldSnapshot=0 '\000', dest=0xc9b480, altdest=0xc9b480, 
completionTag=0x7fffc71c5de0 "") at pquery.c:1286
#20 0x0072aa98 in PortalRun (portal=0x1c57f18, count=1, isTopLevel=1 
'\001', run_once=1 '\001', dest=0x1c25768, altdest=0x1c25768, 
completionTag=0x7fffc71c5de0 "") at pquery.c:799
#21 0x00728c9a in exec_execute_message (argc=, 
argv=, dbname=0x1bbb800 "lbiwhdb", username=) at postgres.c:2007
#22 PostgresMain (argc=, argv=, 
dbname=0x1bbb800 "lbiwhdb", username=) at postgres.c:4180
#23 0x006bb43a in BackendRun (argc=, argv=) at postmaster.c:4405
---Type  to continue, or q  to quit---
#24 BackendStartup (argc=, argv=) at 
postmaster.c:4077
#25 ServerLoop (argc=, argv=) at 
postmaster.c:1755
#26 PostmasterMain (argc=, argv=) at 
postmaster.c:1363
#27 0x0063b4d0 in main (argc=3, argv=0x1b839e0) at main.c:228
(gdb) f 2
#2  0x00718225 in LWLockAcquire (lock=0x7f6c517dbfa4, mode=LW_SHARED) 
at lwlock.c:1233
1233lwlock.c: No such file or directory.
in lwlock.c
(gdb) p num_held_lwlocks
$1 = 0
(gdb) 




## stack of autovacuum:Acquire lock 0x7f6c519ba5a4 and hold 0x7f6c517dbfa4, 
0x7f6c51684f64
--
(gdb) bt
#0  0x00369ea0da00 in sem_wait () from /lib64/libpthread.so.0
#1  0x006a7910 in PGSemaphoreLock (sema=0x7f6c4f77fdb8) at pg_sema.c:316
#2  0x00718225 in LWLockAcquire (lock=0x7f6c519ba5a4, 
mode=LW_EXCLUSIVE) at 

Re:Re: could not access status of transaction

2019-09-29 Thread chenhj
Hi, all


Our other system had encountered the same failure, but this time it is 
PostgreSQL 10.7(rhel 6.3).


Details are as follows:


Phenomenon:


app_db=# select count(*) from loba_sp_cost_xcd_104561;
ERROR:  could not access status of transaction 35153545
DETAIL:  Could not open file "pg_xact/0021": No such file or directory.




Related Information:
The clog file in the pg_xact directory is 00CE~0179, and pg_xact/0021 has been 
cleaned up.


Through gdb analysis, the tuple that caused the error was in block 163363. 


(gdb) bt
#0  SlruReportIOError (ctl=0xca6b60, pageno=1072, xid=35153545) at slru.c:894
#1  0x004e2060 in SimpleLruReadPage (ctl=0xca6b60, pageno=1072, 
write_ok=1 '\001', xid=35153545) at slru.c:445
#2  0x004da431 in TransactionIdGetStatus (xid=35153545, 
lsn=0x7fffd07b1e88) at clog.c:411
#3  0x004e385a in TransactionLogFetch (transactionId=35153545) at 
transam.c:79
#4  0x004e39a5 in TransactionIdDidCommit (transactionId=) at transam.c:129
#5  0x0085e959 in HeapTupleSatisfiesMVCC (htup=, 
snapshot=0x2c21740, buffer=320) at tqual.c:1057
#6  0x004b49e7 in heapgetpage (scan=0x2c8a5d8, page=163363) at 
heapam.c:439
#7  0x004b5091 in heapgettup_pagemode (scan=0x2c8a5d8, direction=) at heapam.c:1034
#8  heap_getnext (scan=0x2c8a5d8, direction=) at 
heapam.c:1801
#9  0x00618b21 in SeqNext (node=0x2c63708) at nodeSeqscan.c:81
#10 0x005fe132 in ExecScanFetch (node=0x2c63708, accessMtd=0x618ae0 
, recheckMtd=0x618840 ) at execScan.c:97
#11 ExecScan (node=0x2c63708, accessMtd=0x618ae0 , recheckMtd=0x618840 
) at execScan.c:164
#12 0x006061ee in ExecProcNode (aggstate=0x2c630c0) at 
../../../src/include/executor/executor.h:250
#13 fetch_input_tuple (aggstate=0x2c630c0) at nodeAgg.c:695
#14 0x00607f89 in agg_retrieve_direct (pstate=0x2c630c0) at 
nodeAgg.c:2448
#15 ExecAgg (pstate=0x2c630c0) at nodeAgg.c:2158
#16 0x0060b240 in ExecProcNode (pstate=0x2c62da0) at 
../../../src/include/executor/executor.h:250
#17 gather_getnext (pstate=0x2c62da0) at nodeGather.c:286
#18 ExecGather (pstate=0x2c62da0) at nodeGather.c:215
#19 0x006061ee in ExecProcNode (aggstate=0x2c626f8) at 
../../../src/include/executor/executor.h:250
#20 fetch_input_tuple (aggstate=0x2c626f8) at nodeAgg.c:695
#21 0x006080ec in agg_retrieve_direct (pstate=0x2c626f8) at 
nodeAgg.c:2347
#22 ExecAgg (pstate=0x2c626f8) at nodeAgg.c:2158
#23 0x005fb6bf in ExecProcNode (queryDesc=0x2c81658, direction=, count=0, execute_once=-8 '\370') at 
../../../src/include/executor/executor.h:250
#24 ExecutePlan (queryDesc=0x2c81658, direction=, count=0, 
execute_once=-8 '\370') at execMain.c:1723
#25 standard_ExecutorRun (queryDesc=0x2c81658, direction=, 
count=0, execute_once=-8 '\370') at execMain.c:364
#26 0x7fd6f161c6f8 in pgss_ExecutorRun (queryDesc=0x2c81658, 
direction=ForwardScanDirection, count=0, execute_once=1 '\001') at 
pg_stat_statements.c:889
#27 0x7fd6f141674d in explain_ExecutorRun (queryDesc=0x2c81658, 
direction=ForwardScanDirection, count=0, execute_once=1 '\001') at 
auto_explain.c:267
#28 0x0072986b in PortalRunSelect (portal=0x2c85f98, forward=, count=0, dest=) at pquery.c:932
#29 0x0072aa01 in PortalRun (portal=0x2c85f98, 
count=9223372036854775807, isTopLevel=1 '\001', run_once=1 '\001', 
dest=0x2c5fb58, altdest=0x2c5fb58, completionTag=0x7fffd07b26d0 "")
at pquery.c:773
#30 0x00727051 in exec_simple_query (query_string=0x2c26358 "select 
count(*) from loba_sp_cost_xcd_104561;") at postgres.c:1122
#31 0x00728039 in PostgresMain (argc=, argv=, dbname=0x2bcc2f0 "app_db", username=) at 
postgres.c:4117
#32 0x006bb43a in BackendRun (argc=, argv=) at postmaster.c:4405
#33 BackendStartup (argc=, argv=) at 
postmaster.c:4077
---Type  to continue, or q  to quit---
#34 ServerLoop (argc=, argv=) at 
postmaster.c:1755
#35 PostmasterMain (argc=, argv=) at 
postmaster.c:1363
#36 0x0063b4d0 in main (argc=1, argv=0x2b8e9f0) at main.c:228
(gdb) f 5
#5  0x0085e959 in HeapTupleSatisfiesMVCC (htup=, 
snapshot=0x2c21740, buffer=320) at tqual.c:1057
1057tqual.c: No such file or directory.
in tqual.c
(gdb) p *tuple
$1 = {t_choice = {t_heap = {t_xmin = 35153545, t_xmax = 0, t_field3 = {t_cid = 
0, t_xvac = 0}}, t_datum = {datum_len_ = 35153545, datum_typmod = 0, 
datum_typeid = 0}}, t_ctid = {ip_blkid = {
  bi_hi = 2, bi_lo = 32291}, ip_posid = 9}, t_infomask2 = 32788, t_infomask = 
10243, t_hoff = 32 ' ', t_bits = 0x7fd6d97f0440 "\211f\030\002"}
  
  
The problematic tuple data parsed using `pg_filedump -i` is as follows:




Block 32291 
 -
 Block Offset: 0x0fc46000 Offsets: Lower 236 (0x00ec)
 Block: Size 8192  Version4Upper2112 (0x0840)
 LSN:  logid254 recoff 0xa3a598c8  Special  8192 (0x2000)
 Items:   53  Free Space: 1876
 Checksum: 0x8355