Re: explicit_bzero for sslpassword

2020-05-19 Thread Michael Paquier
On Tue, May 19, 2020 at 02:33:40PM +0200, Daniel Gustafsson wrote:
> Since commit 74a308cf5221f we use explicit_bzero on pgpass and connhost
> password in libpq, but not sslpassword which seems an oversight.  The attached
> performs an explicit_bzero before freeing like the pattern for other password
> variables.

Good catch, let's fix that.  I would like to apply your suggested fix,
but let's see first if others have any comments.
--
Michael


signature.asc
Description: PGP signature


Re: SyncRepLock acquired exclusively in default configuration

2020-05-19 Thread Michael Paquier
On Tue, May 19, 2020 at 08:56:13AM -0700, Ashwin Agrawal wrote:
> Sure, add it to commit fest.
> Seems though it should be backpatched to relevant branches as well.

It does not seem to be listed yet.  Are you planning to add it under
the section for bug fixes?
--
Michael


signature.asc
Description: PGP signature


Re: Problem with pg_atomic_compare_exchange_u64 at 32-bit platformwd

2020-05-19 Thread Andres Freund
Hi, 

On May 19, 2020 8:05:00 PM PDT, Noah Misch  wrote:
>On Tue, May 19, 2020 at 04:07:29PM +0300, Konstantin Knizhnik wrote:
>> Definition of pg_atomic_compare_exchange_u64 requires alignment of
>expected
>> pointer on 8-byte boundary.
>> 
>> pg_atomic_compare_exchange_u64(volatile pg_atomic_uint64 *ptr,
>>                                uint64 *expected, uint64 newval)
>> {
>> #ifndef PG_HAVE_ATOMIC_U64_SIMULATION
>>     AssertPointerAlignment(ptr, 8);
>>     AssertPointerAlignment(expected, 8);
>> #endif
>> 
>> 
>> I wonder if there are platforms  where such restriction is actually
>needed.
>
>In general, sparc Linux does SIGBUS on unaligned access.  Other
>platforms
>function but suffer performance penalties.

Indeed. Cross cacheline atomics are e.g. really expensive on x86. Essentially 
requiring a full blown bus lock iirc.


>> And if so, looks like our ./src/test/regress/regress.c is working
>only
>> occasionally:
>> 
>> static void
>> test_atomic_uint64(void)
>> {
>>     pg_atomic_uint64 var;
>>     uint64        expected;
>>     ...
>>         if (!pg_atomic_compare_exchange_u64(, , 1))
>> 
>> because there is no warranty that "expected" variable will be aligned
>on
>> stack at 8 byte boundary (at least at Win32).
>
>src/tools/msvc sets ALIGNOF_LONG_LONG_INT=8, so it believes that win32
>does
>guarantee 8-byte alignment of both automatic variables.  Is it wrong?

Generally the definition of the atomics should ensure the required alignment. 
E.g. using alignment attributes to the struct.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: MultiXact\SLRU buffers configuration

2020-05-19 Thread Kyotaro Horiguchi
At Fri, 15 May 2020 14:01:46 +0500, "Andrey M. Borodin"  
wrote in 
> 
> 
> > 15 мая 2020 г., в 05:03, Kyotaro Horiguchi  
> > написал(а):
> > 
> > At Thu, 14 May 2020 11:44:01 +0500, "Andrey M. Borodin" 
> >  wrote in 
> >>> GetMultiXactIdMembers believes that 4 is successfully done if 2
> >>> returned valid offset, but actually that is not obvious.
> >>> 
> >>> If we add a single giant lock just to isolate ,say,
> >>> GetMultiXactIdMember and RecordNewMultiXact, it reduces concurrency
> >>> unnecessarily.  Perhaps we need finer-grained locking-key for standby
> >>> that works similary to buffer lock on primary, that doesn't cause
> >>> confilicts between irrelevant mxids.
> >>> 
> >> We can just replay members before offsets. If offset is already there - 
> >> members are there too.
> >> But I'd be happy if we could mitigate those 1000us too - with a hint about 
> >> last maixd state in a shared MX state, for example.
> > 
> > Generally in such cases, condition variables would work.  In the
> > attached PoC, the reader side gets no penalty in the "likely" code
> > path.  The writer side always calls ConditionVariableBroadcast but the
> > waiter list is empty in almost all cases.  But I couldn't cause the
> > situation where the sleep 1000u is reached.
> Thanks! That really looks like a good solution without magic timeouts. 
> Beautiful!
> I think I can create temporary extension which calls MultiXact API and tests 
> edge-cases like this 1000us wait.
> This extension will also be also useful for me to assess impact of bigger 
> buffers, reduced read locking (as in my 2nd patch) and other tweaks.

Happy to hear that, It would need to use timeout just in case, though.

> >> Actually, if we read empty mxid array instead of something that is 
> >> replayed just yet - it's not a problem of inconsistency, because 
> >> transaction in this mxid could not commit before we started. ISTM.
> >> So instead of fix, we, probably, can just add a comment. If this reasoning 
> >> is correct.
> > 
> > The step 4 of the reader side reads the members of the target mxid. It
> > is already written if the offset of the *next* mxid is filled-in.
> Most often - yes, but members are not guaranteed to be filled in order. Those 
> who win MXMemberControlLock will write first.
> But nobody can read members of MXID before it is returned. And its members 
> will be written before returning MXID.

Yeah, right.  Otherwise assertion failure happens.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Is it useful to record whether plans are generic or custom?

2020-05-19 Thread Kyotaro Horiguchi
At Tue, 19 May 2020 22:56:17 +0900, Atsushi Torikoshi  wrote 
in 
> On Sat, May 16, 2020 at 6:01 PM legrand legrand 
> wrote:
> 
> > > To track executed plan types, I think execution layer hooks
> > > are appropriate.
> > > These hooks, however, take QueryDesc as a param and it does
> > > not include cached plan information.
> >
> > It seems that the same QueryDesc entry is reused when executing
> > a generic plan.
> > For exemple marking queryDesc->plannedstmt->queryId (outside
> > pg_stat_statements) with a pseudo tag during ExecutorStart
> > reappears in later executions with generic plans ...
> >
> > Is this QueryDesc reusable by a custom plan ? If not maybe a solution
> > could be to add a flag in queryDesc->plannedstmt ?
> >
> 
> Thanks for your proposal!
> 
> I first thought it was a good idea and tried to add a flag to QueryDesc,
> but the comments on QueryDesc say it encapsulates everything that
> the executor needs to execute a query.
> 
> Whether a plan is generic or custom is not what executor needs to
> know for running queries, so now I hesitate to do so.
> 
> Instead, I'm now considering using a static hash for prepared queries
> (static HTAB *prepared_queries).
> 
> 
> BTW, I'd also appreciate other opinions about recording the number
> of generic and custom plans on pg_stat_statemtents.

If you/we just want to know how a prepared statement is executed,
couldn't we show that information in pg_prepared_statements view?

=# select * from pg_prepared_statements;
-[ RECORD 1 ]---+
name| stmt1
statement   | prepare stmt1 as select * from t where b = $1;
prepare_time| 2020-05-20 12:01:55.733469+09
parameter_types | {text}
from_sql| t
exec_custom | 5<- existing num_custom_plans
exec_total  | 40   <- new member of CachedPlanSource

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Trouble with hashagg spill I/O pattern and costing

2020-05-19 Thread Jeff Davis
On Tue, 2020-05-19 at 19:53 +0200, Tomas Vondra wrote:
> 
> And if there a way to pre-allocate larger chunks? Presumably we could
> assign the blocks to tape in larger chunks (e.g. 128kB, i.e. 16 x
> 8kB)
> instead of just single block. I haven't seen anything like that in
> tape.c, though ...

It turned out to be simple (at least a POC) so I threw together a
patch. I just added a 32-element array of block numbers to each tape.
When we need a new block, we retrieve a block number from that array;
or if it's empty, we fill it by calling ltsGetFreeBlock() 32 times.

I reproduced the problem on a smaller scale (330M groups, ~30GB of
memory on a 16GB box). Work_mem=64MB. The query is a simple distinct.

Unpatched master:
   Sort: 250s
   HashAgg: 310s
Patched master:
   Sort: 245s
   HashAgg: 262s

That's a nice improvement for such a simple patch. We can tweak the
number of blocks to preallocate, or do other things like double from a
small number up to a maximum. Also, a proper patch would probably
release the blocks back as free when the tape was rewound.

As long as the number of block numbers to preallocate is not too large,
I don't think we need to change the API. It seems fine for sort to do
the same thing, even though there's not any benefit.

Regards,
Jeff Davis


diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index bc8d56807e2..f44594191e1 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -110,6 +110,7 @@ typedef struct TapeBlockTrailer
 #define TapeBlockSetNBytes(buf, nbytes) \
 	(TapeBlockGetTrailer(buf)->next = -(nbytes))
 
+#define TAPE_WRITE_N_PREALLOC 32
 
 /*
  * This data structure represents a single "logical tape" within the set
@@ -151,6 +152,15 @@ typedef struct LogicalTape
 	int			max_size;		/* highest useful, safe buffer_size */
 	int			pos;			/* next read/write position in buffer */
 	int			nbytes;			/* total # of valid bytes in buffer */
+
+	/*
+	 * When multiple tapes are being written to concurrently (as in HashAgg),
+	 * it's imporant to preallocate some block numbers to make writes more
+	 * sequential. This doesn't preallocate the blocks themselves; only the
+	 * block numbers.
+	 */
+	int			nprealloc;
+	long		prealloc[TAPE_WRITE_N_PREALLOC];
 } LogicalTape;
 
 /*
@@ -557,6 +567,7 @@ ltsInitTape(LogicalTape *lt)
 	lt->max_size = MaxAllocSize;
 	lt->pos = 0;
 	lt->nbytes = 0;
+	lt->nprealloc = 0;
 }
 
 /*
@@ -733,7 +744,13 @@ LogicalTapeWrite(LogicalTapeSet *lts, int tapenum,
 			 * First allocate the next block, so that we can store it in the
 			 * 'next' pointer of this block.
 			 */
-			nextBlockNumber = ltsGetFreeBlock(lts);
+			if (lt->nprealloc == 0)
+			{
+lt->nprealloc = TAPE_WRITE_N_PREALLOC;
+for (int i = lt->nprealloc; i > 0; i--)
+	lt->prealloc[i - 1] = ltsGetFreeBlock(lts);
+			}
+			nextBlockNumber = lt->prealloc[--lt->nprealloc];
 
 			/* set the next-pointer and dump the current block. */
 			TapeBlockGetTrailer(lt->buffer)->next = nextBlockNumber;


Re: Warn when parallel restoring a custom dump without data offsets

2020-05-19 Thread Justin Pryzby
I started fooling with this at home while our ISP is broke (pardon my brevity).

Maybe you also saw commit b779ea8a9a2dc3a089b3ac152b1ec4568bfeb26f
"Fix pg_restore so parallel restore doesn't fail when the input file
doesn't contain data offsets (which it won't, if pg_dump thought its
output wasn't seekable)..."

...which I guess should actually say "doesn't NECESSARILY fail", since
it also adds this comment:
"This could fail if we are asked to restore items out-of-order."

So this is a known issue and not a regression.  I think the PG11
commit you mentioned (548e5097) happens to make some databases fail in
parallel restore that previously worked (I didn't check).  Possibly
also some databases (or some pre-existing dumps) which used to fail
might possibly now succeed.

Your patch adds a warning if unseekable output might fail during
parallel restore.  I'm not opposed to that, but can we just make
pg_restore work in that case?  If the input is unseekable, then we can
never do a parallel restore at all.  If it *is* seekable, could we
make _PrintTocData rewind if it gets to EOF using ftello(SEEK_SET, 0)
and re-scan again from the beginning?  Would you want to try that ?




Re: Parallel Seq Scan vs kernel read ahead

2020-05-19 Thread Thomas Munro
On Wed, May 20, 2020 at 2:23 PM Amit Kapila  wrote:
> Good experiment.  IIRC, we have discussed a similar idea during the
> development of this feature but we haven't seen any better results by
> allocating in ranges on the systems we have tried.  So, we want with
> the current approach which is more granular and seems to allow better
> parallelism.  I feel we need to ensure that we don't regress
> parallelism in existing cases, otherwise, the idea sounds promising to
> me.

Yeah, Linux seems to do pretty well at least with smallish numbers of
workers, and when you use large numbers you can probably tune your way
out of the problem.  ZFS seems to do fine.  I wonder how well the
other OSes cope.




Re: Problem with pg_atomic_compare_exchange_u64 at 32-bit platformwd

2020-05-19 Thread Noah Misch
On Tue, May 19, 2020 at 04:07:29PM +0300, Konstantin Knizhnik wrote:
> Definition of pg_atomic_compare_exchange_u64 requires alignment of expected
> pointer on 8-byte boundary.
> 
> pg_atomic_compare_exchange_u64(volatile pg_atomic_uint64 *ptr,
>                                uint64 *expected, uint64 newval)
> {
> #ifndef PG_HAVE_ATOMIC_U64_SIMULATION
>     AssertPointerAlignment(ptr, 8);
>     AssertPointerAlignment(expected, 8);
> #endif
> 
> 
> I wonder if there are platforms  where such restriction is actually needed.

In general, sparc Linux does SIGBUS on unaligned access.  Other platforms
function but suffer performance penalties.

> And if so, looks like our ./src/test/regress/regress.c is working only
> occasionally:
> 
> static void
> test_atomic_uint64(void)
> {
>     pg_atomic_uint64 var;
>     uint64        expected;
>     ...
>         if (!pg_atomic_compare_exchange_u64(, , 1))
> 
> because there is no warranty that "expected" variable will be aligned on
> stack at 8 byte boundary (at least at Win32).

src/tools/msvc sets ALIGNOF_LONG_LONG_INT=8, so it believes that win32 does
guarantee 8-byte alignment of both automatic variables.  Is it wrong?




Re: Parallel Seq Scan vs kernel read ahead

2020-05-19 Thread Amit Kapila
On Wed, May 20, 2020 at 7:24 AM Thomas Munro  wrote:
>
> Hello hackers,
>
> Parallel sequential scan relies on the kernel detecting sequential
> access, but we don't make the job easy.  The resulting striding
> pattern works terribly on strict next-block systems like FreeBSD UFS,
> and degrades rapidly when you add too many workers on sliding window
> systems like Linux.
>
> Demonstration using FreeBSD on UFS on a virtual machine, taking ball
> park figures from iostat:
>
>   create table t as select generate_series(1, 2)::int i;
>
>   set max_parallel_workers_per_gather = 0;
>   select count(*) from t;
>   -> execution time 13.3s, average read size = ~128kB, ~500MB/s
>
>   set max_parallel_workers_per_gather = 1;
>   select count(*) from t;
>   -> execution time 24.9s, average read size = ~32kB, ~250MB/s
>
> Note the small read size, which means that there was no read
> clustering happening at all: that's the logical block size of this
> filesystem.
>
> That explains some complaints I've heard about PostgreSQL performance
> on that filesystem: parallel query destroys I/O performance.
>
> As a quick experiment, I tried teaching the block allocated to
> allocate ranges of up 64 blocks at a time, ramping up incrementally,
> and ramping down at the end, and I got:
>

Good experiment.  IIRC, we have discussed a similar idea during the
development of this feature but we haven't seen any better results by
allocating in ranges on the systems we have tried.  So, we want with
the current approach which is more granular and seems to allow better
parallelism.  I feel we need to ensure that we don't regress
parallelism in existing cases, otherwise, the idea sounds promising to
me.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Parallel Seq Scan vs kernel read ahead

2020-05-19 Thread Thomas Munro
Hello hackers,

Parallel sequential scan relies on the kernel detecting sequential
access, but we don't make the job easy.  The resulting striding
pattern works terribly on strict next-block systems like FreeBSD UFS,
and degrades rapidly when you add too many workers on sliding window
systems like Linux.

Demonstration using FreeBSD on UFS on a virtual machine, taking ball
park figures from iostat:

  create table t as select generate_series(1, 2)::int i;

  set max_parallel_workers_per_gather = 0;
  select count(*) from t;
  -> execution time 13.3s, average read size = ~128kB, ~500MB/s

  set max_parallel_workers_per_gather = 1;
  select count(*) from t;
  -> execution time 24.9s, average read size = ~32kB, ~250MB/s

Note the small read size, which means that there was no read
clustering happening at all: that's the logical block size of this
filesystem.

That explains some complaints I've heard about PostgreSQL performance
on that filesystem: parallel query destroys I/O performance.

As a quick experiment, I tried teaching the block allocated to
allocate ranges of up 64 blocks at a time, ramping up incrementally,
and ramping down at the end, and I got:

  set max_parallel_workers_per_gather = 1;
  select count(*) from t;
  -> execution time 7.5s, average read size = ~128kB, ~920MB/s

  set max_parallel_workers_per_gather = 3;
  select count(*) from t;
  -> execution time 5.2s, average read size = ~128kB, ~1.2GB/s

I've attached the quick and dirty patch I used for that.
From 65726902f42a19c319623eb7194b0040517008a6 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 19 May 2020 09:16:07 +1200
Subject: [PATCH] Use larger step sizes for Parallel Seq Scan.

Instead of handing out a single block at a time, which
confuses the read ahead heuristics of many systems, use
an increasing step size.

XXX Proof of concept grade code only, needs better ramp
down algorithm and contains a magic number 16
---
 src/backend/access/heap/heapam.c   | 22 +--
 src/backend/access/table/tableam.c | 34 +++---
 src/include/access/relscan.h   | 13 +++-
 src/include/access/tableam.h   |  2 ++
 4 files changed, 61 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0d4ed602d7..8982d59ff0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -520,12 +520,14 @@ heapgettup(HeapScanDesc scan,
 			{
 ParallelBlockTableScanDesc pbscan =
 (ParallelBlockTableScanDesc) scan->rs_base.rs_parallel;
+ParallelBlockTableScanWork pbscanwork =
+(ParallelBlockTableScanWork) scan->rs_base.rs_parallel_work;
 
 table_block_parallelscan_startblock_init(scan->rs_base.rs_rd,
-		 pbscan);
+		 pbscanwork, pbscan);
 
 page = table_block_parallelscan_nextpage(scan->rs_base.rs_rd,
-		 pbscan);
+		 pbscanwork, pbscan);
 
 /* Other processes might have already finished the scan. */
 if (page == InvalidBlockNumber)
@@ -720,9 +722,11 @@ heapgettup(HeapScanDesc scan,
 		{
 			ParallelBlockTableScanDesc pbscan =
 			(ParallelBlockTableScanDesc) scan->rs_base.rs_parallel;
+			ParallelBlockTableScanWork pbscanwork =
+			(ParallelBlockTableScanWork) scan->rs_base.rs_parallel_work;
 
 			page = table_block_parallelscan_nextpage(scan->rs_base.rs_rd,
-	 pbscan);
+	 pbscanwork, pbscan);
 			finished = (page == InvalidBlockNumber);
 		}
 		else
@@ -834,12 +838,14 @@ heapgettup_pagemode(HeapScanDesc scan,
 			{
 ParallelBlockTableScanDesc pbscan =
 (ParallelBlockTableScanDesc) scan->rs_base.rs_parallel;
+ParallelBlockTableScanWork pbscanwork =
+(ParallelBlockTableScanWork) scan->rs_base.rs_parallel_work;
 
 table_block_parallelscan_startblock_init(scan->rs_base.rs_rd,
-		 pbscan);
+		 pbscanwork, pbscan);
 
 page = table_block_parallelscan_nextpage(scan->rs_base.rs_rd,
-		 pbscan);
+		 pbscanwork, pbscan);
 
 /* Other processes might have already finished the scan. */
 if (page == InvalidBlockNumber)
@@ -1019,9 +1025,11 @@ heapgettup_pagemode(HeapScanDesc scan,
 		{
 			ParallelBlockTableScanDesc pbscan =
 			(ParallelBlockTableScanDesc) scan->rs_base.rs_parallel;
+			ParallelBlockTableScanWork pbscanwork =
+			(ParallelBlockTableScanWork) scan->rs_base.rs_parallel_work;
 
 			page = table_block_parallelscan_nextpage(scan->rs_base.rs_rd,
-	 pbscan);
+	 pbscanwork, pbscan);
 			finished = (page == InvalidBlockNumber);
 		}
 		else
@@ -1155,6 +1163,8 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 	scan->rs_base.rs_nkeys = nkeys;
 	scan->rs_base.rs_flags = flags;
 	scan->rs_base.rs_parallel = parallel_scan;
+	scan->rs_base.rs_parallel_work =
+		palloc0(sizeof(ParallelBlockTableScanWorkData));
 	scan->rs_strategy = NULL;	/* set in initscan */
 
 	/*
diff --git a/src/backend/access/table/tableam.c 

Re: pg_stat_wal_receiver and flushedUpto/writtenUpto

2020-05-19 Thread Fujii Masao




On 2020/05/20 8:31, Michael Paquier wrote:

On Tue, May 19, 2020 at 11:38:52PM +0900, Fujii Masao wrote:

I found that "received_lsn" is still used in high-availability.sgml.
We should apply the following change in high-availability?

- view's received_lsn indicates that WAL is being
+ view's flushed_lsn indicates that WAL is being


Oops, thanks.  Will fix.


Thanks for the fix!




BTW, we have pg_last_wal_receive_lsn() that returns the same lsn as
pg_stat_wal_receiver.flushed_lsn. Previously both used the term "receive"
in their names, but currently not. IMO it's better to use the same term in
those names for the consistency, but it's not good idea to rename
pg_last_wal_receive_lsn() to something like pg_last_wal_receive_lsn().
I have no better idea for now. So I'm ok with the current names.


I think you mean renaming pg_last_wal_receive_lsn() to something like
pg_last_wal_flushed_lsn(), no?


No, that's not good idea, as I told upthread.


This name may become confusing because
we lose the "receive" idea in the function, that we have with the
"receiver" part of pg_stat_wal_receiver.  Maybe something like that,
though that's long:
- pg_last_wal_receive_flushed_lsn()
- pg_last_wal_receive_written_lsn()


Yes, that's long.


Anyway, a rename of this function does not strike me as strongly
necessary, as that's less tied with the shared memory structure, and
we document that pg_last_wal_receive_lsn() tracks the current LSN
received and flushed.  I am actually wondering if in the future it may
not be better to remove this function, but it has no maintenance
cost either so I would just let it as-is.


Yeah, agreed.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: factorial function/phase out postfix operators?

2020-05-19 Thread Tom Lane
I wrote:
> However, we do have to have a benefit to show those people whose
> queries we break.  Hence my insistence on having a working AS fix
> (or some other benefit) before not after.

I experimented with this a bit more, and came up with the attached.
It's not a working patch, just a set of grammar changes that Bison
is happy with.  (Getting to a working patch would require fixing the
various build infrastructure that knows about the keyword classification,
which seems straightforward but tedious.)

As Robert theorized, it works to move a fairly-small number of unreserved
keywords into a new slightly-reserved category.  However, as the patch
stands, only the remaining fully-unreserved keywords can be used as bare
column labels.  I'd hoped to be able to also use col_name keywords in that
way (which'd make the set of legal bare column labels mostly the same as
ColId).  The col_name keywords that cause problems are, it appears,
only PRECISION, CHARACTER, and CHAR_P.  So in principle we could move
those three into yet another keyword category and then let the remaining
col_name keywords be included in BareColLabel.  I kind of think that
that's more complication than it's worth, though.

regards, tom lane

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index a24b30f..0b034b6 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -542,13 +542,14 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 		Sconst comment_text notify_payload
 %type 		RoleId opt_boolean_or_string
 %type 	var_list
-%type 		ColId ColLabel var_name type_function_name param_name
+%type 		ColId ColLabel BareColLabel
 %type 		NonReservedWord NonReservedWord_or_Sconst
+%type 		var_name type_function_name param_name
 %type 		createdb_opt_name
 %type 	var_value zone_value
 %type  auth_ident RoleSpec opt_granted_by
 
-%type  unreserved_keyword type_func_name_keyword
+%type  unreserved_keyword non_label_keyword type_func_name_keyword
 %type  col_name_keyword reserved_keyword
 
 %type 	TableConstraint TableLikeClause
@@ -744,7 +745,6 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %nonassoc	'<' '>' '=' LESS_EQUALS GREATER_EQUALS NOT_EQUALS
 %nonassoc	BETWEEN IN_P LIKE ILIKE SIMILAR NOT_LA
 %nonassoc	ESCAPE			/* ESCAPE must be just above LIKE/ILIKE/SIMILAR */
-%left		POSTFIXOP		/* dummy for postfix Op rules */
 /*
  * To support target_el without AS, we must give IDENT an explicit priority
  * between POSTFIXOP and Op.  We can safely assign the same priority to
@@ -3908,6 +3908,7 @@ PartitionSpec: PARTITION BY part_strategy '(' part_params ')'
 
 part_strategy:	IDENT	{ $$ = $1; }
 | unreserved_keyword	{ $$ = pstrdup($1); }
+| non_label_keyword		{ $$ = pstrdup($1); }
 		;
 
 part_params:	part_elem		{ $$ = list_make1($1); }
@@ -13230,8 +13231,6 @@ a_expr:		c_expr	{ $$ = $1; }
 { $$ = (Node *) makeA_Expr(AEXPR_OP, $2, $1, $3, @2); }
 			| qual_Op a_expr	%prec Op
 { $$ = (Node *) makeA_Expr(AEXPR_OP, $1, NULL, $2, @1); }
-			| a_expr qual_Op	%prec POSTFIXOP
-{ $$ = (Node *) makeA_Expr(AEXPR_OP, $2, $1, NULL, @2); }
 
 			| a_expr AND a_expr
 { $$ = makeAndExpr($1, $3, @2); }
@@ -13645,8 +13644,6 @@ b_expr:		c_expr
 { $$ = (Node *) makeA_Expr(AEXPR_OP, $2, $1, $3, @2); }
 			| qual_Op b_expr	%prec Op
 { $$ = (Node *) makeA_Expr(AEXPR_OP, $1, NULL, $2, @1); }
-			| b_expr qual_Op	%prec POSTFIXOP
-{ $$ = (Node *) makeA_Expr(AEXPR_OP, $2, $1, NULL, @2); }
 			| b_expr IS DISTINCT FROM b_expr		%prec IS
 {
 	$$ = (Node *) makeSimpleA_Expr(AEXPR_DISTINCT, "=", $1, $5, @2);
@@ -14910,7 +14907,7 @@ target_el:	a_expr AS ColLabel
 			 * as an infix expression, which we accomplish by assigning
 			 * IDENT a precedence higher than POSTFIXOP.
 			 */
-			| a_expr IDENT
+			| a_expr BareColLabel
 {
 	$$ = makeNode(ResTarget);
 	$$->name = $2;
@@ -15228,6 +15225,7 @@ role_list:	RoleSpec
  */
 ColId:		IDENT	{ $$ = $1; }
 			| unreserved_keyword	{ $$ = pstrdup($1); }
+			| non_label_keyword		{ $$ = pstrdup($1); }
 			| col_name_keyword		{ $$ = pstrdup($1); }
 		;
 
@@ -15235,6 +15233,7 @@ ColId:		IDENT	{ $$ = $1; }
  */
 type_function_name:	IDENT			{ $$ = $1; }
 			| unreserved_keyword	{ $$ = pstrdup($1); }
+			| non_label_keyword		{ $$ = pstrdup($1); }
 			| type_func_name_keyword{ $$ = pstrdup($1); }
 		;
 
@@ -15242,15 +15241,23 @@ type_function_name:	IDENT			{ $$ = $1; }
  */
 NonReservedWord:	IDENT			{ $$ = $1; }
 			| unreserved_keyword	{ $$ = pstrdup($1); }
+			| non_label_keyword		{ $$ = pstrdup($1); }
 			| col_name_keyword		{ $$ = pstrdup($1); }
 			| type_func_name_keyword{ $$ = pstrdup($1); }
 		;
 
+/* Bare column label --- names that can be column labels without writing "AS".
+ */
+BareColLabel:	IDENT{ $$ = $1; }
+			| unreserved_keyword	{ $$ = 

Re: pg_stat_wal_receiver and flushedUpto/writtenUpto

2020-05-19 Thread Michael Paquier
On Tue, May 19, 2020 at 11:38:52PM +0900, Fujii Masao wrote:
> I found that "received_lsn" is still used in high-availability.sgml.
> We should apply the following change in high-availability?
> 
> - view's received_lsn indicates that WAL is being
> + view's flushed_lsn indicates that WAL is being

Oops, thanks.  Will fix.

> BTW, we have pg_last_wal_receive_lsn() that returns the same lsn as
> pg_stat_wal_receiver.flushed_lsn. Previously both used the term "receive"
> in their names, but currently not. IMO it's better to use the same term in
> those names for the consistency, but it's not good idea to rename
> pg_last_wal_receive_lsn() to something like pg_last_wal_receive_lsn().
> I have no better idea for now. So I'm ok with the current names.

I think you mean renaming pg_last_wal_receive_lsn() to something like
pg_last_wal_flushed_lsn(), no?  This name may become confusing because
we lose the "receive" idea in the function, that we have with the
"receiver" part of pg_stat_wal_receiver.  Maybe something like that,
though that's long:
- pg_last_wal_receive_flushed_lsn()
- pg_last_wal_receive_written_lsn() 

Anyway, a rename of this function does not strike me as strongly
necessary, as that's less tied with the shared memory structure, and
we document that pg_last_wal_receive_lsn() tracks the current LSN
received and flushed.  I am actually wondering if in the future it may
not be better to remove this function, but it has no maintenance
cost either so I would just let it as-is.
--
Michael


signature.asc
Description: PGP signature


Re: Extension ownership and misuse of SET ROLE/SET SESSION AUTHORIZATION

2020-05-19 Thread Daniel Gustafsson
> On 19 May 2020, at 17:34, Tom Lane  wrote:
> 
> Daniel Gustafsson  writes:
>>> On 13 Feb 2020, at 23:55, Tom Lane  wrote:
>>> Given the current behavior of SET ROLE and SET SESSION AUTHORIZATION,
>>> I don't actually see any way that we could get these features to
>>> play together.
> 
>> Is this being worked on for the 13 cycle such that it should be an open item?
> 
> I didn't have it on my list, but yeah maybe we should add it to the
> "pre-existing issues" list.
> 
>>> The quick-and-dirty answer is to disallow these switches from being
>>> used together in pg_restore, and I'm inclined to think maybe we should
>>> do that in the back branches.
> 
>> ..or should we do this for v13 and back-branches and leave fixing it for 14?
>> Considering the potential invasiveness of the fix I think the latter sounds
>> rather appealing at this point in the cycle.  Something like the attached
>> should be enough IIUC.
> 
> pg_dump and pg_dumpall also have that switch no?

They do, but there the switches actually work as intended and the combination
should be allowed AFAICT.  Since SET ROLE is sent to the source server and not
the output we can use for starting the dump without being a superuser.

cheers ./daniel



Re: Two fsync related performance issues?

2020-05-19 Thread Thomas Munro
On Wed, May 20, 2020 at 12:51 AM Robert Haas  wrote:
> On Mon, May 11, 2020 at 8:43 PM Paul Guo  wrote:
> > I have this concern since I saw an issue in a real product environment that 
> > the startup process needs 10+ seconds to start wal replay after relaunch 
> > due to elog(PANIC) (it was seen on postgres based product Greenplum but it 
> > is a common issue in postgres also). I highly suspect the delay was mostly 
> > due to this. Also it is noticed that on public clouds fsync is much slower 
> > than that on local storage so the slowness should be more severe on cloud. 
> > If we at least disable fsync on the table directories we could skip a lot 
> > of file fsync - this may save a lot of seconds during crash recovery.
>
> I've seen this problem be way worse than that. Running fsync() on all
> the files and performing the unlogged table cleanup steps can together
> take minutes or, I think, even tens of minutes. What I think sucks
> most in this area is that we don't even emit any log messages if the
> process takes a long time, so the user has no idea why things are
> apparently hanging. I think we really ought to try to figure out some
> way to give the user a periodic progress indication when this kind of
> thing is underway, so that they at least have some idea what's
> happening.
>
> As Tom says, I don't think there's any realistic way that we can
> disable it altogether, but maybe there's some way we could make it
> quicker, like some kind of parallelism, or by overlapping it with
> other things. It seems to me that we have to complete the fsync pass
> before we can safely checkpoint, but I don't know that it needs to be
> done any sooner than that... not sure though.

I suppose you could with the whole directory tree what
register_dirty_segment() does, for the pathnames that you recognise as
regular md.c segment names.  Then it'd be done as part of the next
checkpoint, though you might want to bring the pre_sync_fname() stuff
back into it somehow to get more I/O parallelism on Linux (elsewhere
it does nothing).  Of course that syscall could block, and the
checkpointer queue can fill up and then you have to do it
synchronously anyway, so you'd have to look into whether that's
enough.

The whole concept of SyncDataDirectory() bothers me a bit though,
because although it's apparently trying to be safe by being
conservative, it assumes a model of write back error handling that we
now know to be false on Linux.  And then it thrashes the inode cache
to make it more likely that error state is forgotten, just for good
measure.

What would a precise version of this look like?  Maybe we really only
need to fsync relation files that recovery modifies (as we already
do), plus those that it would have touched but didn't because of the
page LSN (a new behaviour to catch segment files that were written by
the last run but not yet flushed, which I guess in practice would only
happen with full_page_writes=off)?  (If you were paranoid enough to
believe that the buffer cache were actively trying to trick you and
marked dirty pages clean and lost the error state so you'll never hear
about it, you might even want to rewrite such pages once.)




Re: factorial function/phase out postfix operators?

2020-05-19 Thread Tom Lane
Robert Haas  writes:
> On Tue, May 19, 2020 at 2:30 PM Tom Lane  wrote:
>> Anyway, the bottom-line conclusion remains the same: let's make sure
>> we know what we'd do after getting rid of postfix ops, before we do
>> that.

> Well, I don't think we really need to get too conservative here.
> ... It seems to me that
> the first thing that we need to do here is get a deprecation notice
> out, so that people know that we're planning to break this.

No, I disagree with that, because from what I've seen so far it's
not really clear to me that we have a full solution to the AS
problem excepting only postfix ops.  I don't want to deprecate
postfix ops before it's completely clear that we can get something 
out of it.  Otherwise, we'll either end up un-deprecating them,
which makes us look silly, or removing a feature for zero benefit.

Stephen's nearby proposal to deprecate only after a patch has been
committed doesn't seem all that unreasonable, if you're only intending
to allow one cycle's worth of notice.  In particular, I could easily
see us committing a fix sometime this summer and then sticking
deprecation notices into the back branches before v13 goes gold.
But let's have the complete fix in hand first.

> I'm still interested in hearing what people think about hard-coding !
> as a postfix operator vs. removing postfix operators altogether. I
> think Vik and Tom are against keeping just !, Kenneth Marshall are for
> it, and I'm not sure I understand Pavel's position.

Yes, I'm VERY strongly against keeping just !.  I think it'd be a
ridiculous, and probably very messy, backwards-compatibility hack; and the
fact that it will break valid use-cases that we don't need to break seems
to me to well outweigh the possibility that someone would rather not
change their queries to use factorial() or !!.

However, we do have to have a benefit to show those people whose
queries we break.  Hence my insistence on having a working AS fix
(or some other benefit) before not after.

regards, tom lane




Re: factorial function/phase out postfix operators?

2020-05-19 Thread Robert Haas
On Tue, May 19, 2020 at 2:30 PM Tom Lane  wrote:
> Might work.  My main concern would be if we have to forbid those keywords
> as column names --- for words like "year", in particular, that'd be a
> disaster.  If the net effect is only that they can't be AS-less col labels,
> it won't break any cases that worked before.

ISTM that all we have to do to avoid that is switch from a four-way
classification to a five-way classification: just split
unreserved_keyword into totally_unreserved_keyword and
very_slightly_reserved_keyword.

> Our existing four-way keyword classification is not something that was
> handed down on stone tablets.  I wonder whether postfix-ectomy changes
> the situation enough that a complete rethinking would be helpful.

I don't see that they do, but I might be missing something. I think
there's an excellent argument for adding one new category, but it's
not clear to me why it should reshape the landscape any more than
that.

> I also continue to think that more lookahead and token-merging would
> be interesting to pursue.  It'd hardly surprise anybody if the
> token pair "character varying" were always treated as a type name,
> for instance.

I think that line of attack will not buy very much. The ability to
avoid unexpected consequences is entirely contingent on the
unlikeliness of the keywords appearing adjacent to each other in some
other context, and the only argument for that here is that neither of
those words is a terribly likely column name. I think that when you
try to solve interesting problems with this, though, you very quickly
run into problems where that's not the case, and you'll need a
technique that has some knowledge of the parser state to actually do
something that works well. I read a paper some years ago that proposed
a solution to this problem: if the parser generator sees a
shift/reduce conflict, it checks whether the conflict can be resolve
by looking ahead one or more additional tokens. If so, it can build a
little DFA that gets run when you enter that state, with edges labeled
with lookahead tokens, and it runs that DFA whenever you reach the
problematic state. Since, hopefully, such states are relatively rarely
encountered, the overhead is low, yet it still gives you a way out of
conflicts in many practical cases. Unfortunately, the chances of bison
implementing such a thing do not seem very good.

> Anyway, the bottom-line conclusion remains the same: let's make sure
> we know what we'd do after getting rid of postfix ops, before we do
> that.

Well, I don't think we really need to get too conservative here. I've
studied this issue enough over the years to be pretty darn sure that
this is a necessary prerequisite to doing something about the "AS
unreserved_keyword" issue, and that it is by far the most significant
issue in doing something about that problem. Sure, there are other
issues, but I think they are basically matters of politics or policy.
For example, if some key people DID think that the four-way keyword
classification was handed down on stone tablets, that could be quite a
problem, but if we're willing to take the view that solving the "AS
unreserved_keyword" problem is pretty important and we need to find a
way to get it done, then I think we an do that. It seems to me that
the first thing that we need to do here is get a deprecation notice
out, so that people know that we're planning to break this. I think we
should go ahead and make that happen now, or at least pretty soon.

I'm still interested in hearing what people think about hard-coding !
as a postfix operator vs. removing postfix operators altogether. I
think Vik and Tom are against keeping just !, Kenneth Marshall are for
it, and I'm not sure I understand Pavel's position. I'm about +0.3 for
keeping just ! myself. Maybe we'll get some other votes. If you're
willing to be persuaded that keeping only ! is a sensible thing to
consider, I could probably draft a very rough patch showing that it
would still be sufficient to get us out from under the "AS
unreserved_keyword" problem, but if you and/or enough other people
hate that direction with a fiery passion, I won't bother. I'm pretty
sure it's technically possible, but the issue is more about what
people actually want.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: factorial function/phase out postfix operators?

2020-05-19 Thread Tom Lane
Robert Haas  writes:
> On Tue, May 19, 2020 at 11:32 AM Tom Lane  wrote:
>> Before we go much further on this, we should have some proof
>> that there's actually material benefit to be gained.  I spent some
>> time just now trying to relax the AS restriction by ripping out
>> postfix ops, and the results were not too promising.

> I came to similar conclusions a couple of years ago:
> https://www.postgresql.org/message-id/ca+tgmoyzpvt7uihjwgktytivhhlncp0ylavcoipe-lyg3w2...@mail.gmail.com

Ah, right.

> What I proposed at the time was creating a new category of keywords.

Might work.  My main concern would be if we have to forbid those keywords
as column names --- for words like "year", in particular, that'd be a
disaster.  If the net effect is only that they can't be AS-less col labels,
it won't break any cases that worked before.

Our existing four-way keyword classification is not something that was
handed down on stone tablets.  I wonder whether postfix-ectomy changes
the situation enough that a complete rethinking would be helpful.

I also continue to think that more lookahead and token-merging would
be interesting to pursue.  It'd hardly surprise anybody if the
token pair "character varying" were always treated as a type name,
for instance.

Anyway, the bottom-line conclusion remains the same: let's make sure
we know what we'd do after getting rid of postfix ops, before we do
that.

regards, tom lane




Re: factorial function/phase out postfix operators?

2020-05-19 Thread Robert Haas
On Tue, May 19, 2020 at 11:32 AM Tom Lane  wrote:
> Before we go much further on this, we should have some proof
> that there's actually material benefit to be gained.  I spent some
> time just now trying to relax the AS restriction by ripping out
> postfix ops, and the results were not too promising.  Indeed the
> postfix-ops problem goes away, but then you find out that SQL's
> random syntax choices for type names become the stumbling block.
> An example here is that given
>
> SELECT 'foo'::character varying
>
> it's not clear if "varying" is supposed to be part of the type name or a
> column label.  It looks to me like we'd have to increase the reserved-ness
> of VARYING, PRECISION, and about half a dozen currently-unreserved
> keywords involved in INTERVAL syntax, including such popular column names
> as "month", "day", and "year".
>
> Plus I got conflicts on WITHIN, GROUP, and FILTER from ordered-set
> aggregate syntax; those are currently unreserved keywords, but they
> can't be allowed as AS-less column labels.

I came to similar conclusions a couple of years ago:

https://www.postgresql.org/message-id/ca+tgmoyzpvt7uihjwgktytivhhlncp0ylavcoipe-lyg3w2...@mail.gmail.com

What I proposed at the time was creating a new category of keywords.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Trouble with hashagg spill I/O pattern and costing

2020-05-19 Thread Tomas Vondra

On Tue, May 19, 2020 at 09:27:34AM -0700, Jeff Davis wrote:

On Tue, 2020-05-19 at 17:12 +0200, Tomas Vondra wrote:

I think there are two related problem - with costing and with
excessive
I/O due to using logical tapes.


Thank you for the detailed analysis. I am still digesting this
information.


This kinda makes me question whether logical tapes are the right tool
for hashagg. I've read the explanation in logtape.c why it's about
the
same amount of I/O as using separate files, but IMO that only really
works for I/O patters similar to merge sort - the more I think about
this, the more I'm convinced we should just do what hashjoin is
doing.


Fundamentally, sort writes sequentially and reads randomly; while
HashAgg writes randomly and reads sequentially.



Not sure. I think the charts and stats of iosnoop data show that an
awful lot of reads during sort is actually pretty sequential. Moreover,
sort manages to read the data in much larger blocks - 128kB instead of
just 8kB (which is what hashagg seems to be doing).

I wonder why is that and if we could achieve that for hashagg too ...


If the random writes of HashAgg end up fragmented too much on disk,
then clearly the sequential reads are not so sequential anyway. The
only way to avoid fragmentation on disk is to preallocate for the
tape/file.



And if there a way to pre-allocate larger chunks? Presumably we could
assign the blocks to tape in larger chunks (e.g. 128kB, i.e. 16 x 8kB)
instead of just single block. I haven't seen anything like that in
tape.c, though ...


BufFile (relying more on the OS) would probably do a better job of
preallocating the disk space in a useful way; whereas logtape.c makes
it easier to manage buffers and the overall number of files created
(thereby allowing higher fanout of partitions).

We have a number of possibilities here:

1. Improve costing to reflect that HashAgg is creating more random IOs
than Sort.


I think we'll need to do something about this, but I think we should try
improving the behavior first and then model the costing based on that.


2. Reduce the partition fanout in the hopes that the OS does a better
job with readahead.


I doubt this will make a significant difference. I think the problem is
the partitions end up interleaved way too much in the temp file, and I
don't see how a lower fanout would fix that.

BTW what do you mean when you say "fanout"? Do you mean how fast we
increase the number of partitions, or some parameter in particular?


3. Switch back to BufFile, in which case we probably need to reduce the
fanout for other reasons.


Maybe, although that seems pretty invasive post beta1.


4. Change logtape.c to allow preallocation or to write in larger
blocks.


I think this is what I suggested above (allocating 16 blocks at a time,
or something). I wonder how wasteful this would be, but I think not very
much. Essentially, with 1024 partitions and pre-allocating space in
128kB chunks, that means 128MB may end up unused, which seems ok-ish,
and I guess we could further restrict that by starting with lower value
and gradually increasing the number. Or something like that ...


5. Change BufFile to allow more control over buffer usage, and switch
to that.



Maybe. I don't recall what exactly is the issue with buffer usage, but I
think it has the same invasiveness issue as (3). OTOH it's what hashjoin
does, and we've lived with it for ages ...


#1 or #2 are the least invasive, and I think we can get a satisfactory
solution by combining those.



OK. I think tweaking the costing (and essentially reverting to what 12
does for those queries) is perfectly reasonable. But if we can actually
get some speedup thanks to hashagg, even better.


I saw good results with the high fanout and low work_mem when there is
still a lot of system memory. That's a nice benefit, but perhaps it's
safer to use a lower fanout (which will lead to recursion) until we get
a better handle on the IO patterns.



I don't know how much we can rely on that - once we push some of the
data from page cache, it has the issues I described. The trouble is
people may not have enough memory to keep everything in cache, otherwise
they might just as well bump up work_mem and not spill at all.


Perhaps you can try recompiling with a lower max partitions and rerun
the query? How much would we have to lower it for either the cost to
approach reality or the OS readahead to become effective?



I can try that, of course. Which parameters should I tweak / how?

I can also try running it with BufFile, in case you prepare a WIP patch.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: PostgreSQL 13 Beta 1 Release Announcement Draft

2020-05-19 Thread Lukas Fittl
On Mon, May 18, 2020 at 7:29 PM Jonathan S. Katz 
wrote:

> Attached is a draft of the release announcement for the PostgreSQL 13
> Beta 1 release this week.
>

We could call out the additional commits that Tom has done for wait event
renaming, re: compatibility - next to "Rename some recovery-related wait
events".

These are the relevant commits, I think:

Rename SLRU structures and associated LWLocks.
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=5da14938f7bfb96b648ee3c47e7ea2afca5bcc4a

Rename assorted LWLock tranches.
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=36ac359d3621578cefc2156a3917024cdd3b1829

Drop the redundant "Lock" suffix from LWLock wait event names.
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=14a91010912632cae322b06fce0425faedcf7353

Mop-up for wait event naming issues.
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=3048898e73c75f54bb259323382e0e7f6368cb6f

Thanks,
Lukas

-- 
Lukas Fittl


Re: Trouble with hashagg spill I/O pattern and costing

2020-05-19 Thread Jeff Davis
On Tue, 2020-05-19 at 17:12 +0200, Tomas Vondra wrote:
> I think there are two related problem - with costing and with
> excessive
> I/O due to using logical tapes.

Thank you for the detailed analysis. I am still digesting this
information.

> This kinda makes me question whether logical tapes are the right tool
> for hashagg. I've read the explanation in logtape.c why it's about
> the
> same amount of I/O as using separate files, but IMO that only really
> works for I/O patters similar to merge sort - the more I think about
> this, the more I'm convinced we should just do what hashjoin is
> doing.

Fundamentally, sort writes sequentially and reads randomly; while
HashAgg writes randomly and reads sequentially. 

If the random writes of HashAgg end up fragmented too much on disk,
then clearly the sequential reads are not so sequential anyway. The
only way to avoid fragmentation on disk is to preallocate for the
tape/file.

BufFile (relying more on the OS) would probably do a better job of
preallocating the disk space in a useful way; whereas logtape.c makes
it easier to manage buffers and the overall number of files created
(thereby allowing higher fanout of partitions).

We have a number of possibilities here:

1. Improve costing to reflect that HashAgg is creating more random IOs
than Sort.
2. Reduce the partition fanout in the hopes that the OS does a better
job with readahead.
3. Switch back to BufFile, in which case we probably need to reduce the
fanout for other reasons.
4. Change logtape.c to allow preallocation or to write in larger
blocks.
5. Change BufFile to allow more control over buffer usage, and switch
to that.

#1 or #2 are the least invasive, and I think we can get a satisfactory
solution by combining those.

I saw good results with the high fanout and low work_mem when there is
still a lot of system memory. That's a nice benefit, but perhaps it's
safer to use a lower fanout (which will lead to recursion) until we get
a better handle on the IO patterns.

Perhaps you can try recompiling with a lower max partitions and rerun
the query? How much would we have to lower it for either the cost to
approach reality or the OS readahead to become effective?

Regards,
Jeff Davis






Re: SyncRepLock acquired exclusively in default configuration

2020-05-19 Thread Ashwin Agrawal
On Mon, May 18, 2020 at 7:41 PM Masahiko Sawada <
masahiko.saw...@2ndquadrant.com> wrote:

> This item is for PG14, right? If so I'd like to add this item to the
> next commit fest.
>

Sure, add it to commit fest.
Seems though it should be backpatched to relevant branches as well.


Re: Missing grammar production for WITH TIES

2020-05-19 Thread Alvaro Herrera
On 2020-May-19, Tom Lane wrote:

> Yeah, that would have been better per project protocol: if a tarball
> re-wrap becomes necessary then it would be messy not to include this
> change along with fixing whatever urgent bug there might be.
> 
> However, I thought the case for delaying this fix till post-wrap was kind
> of thin anyway, so if that does happen I won't be too fussed about it.
> Otherwise I would've said something earlier on this thread.

In the end, it's a judgement call.  In this case, my assessment was that
the risk was small enough that I could push after I saw the tarballs
announced.  In other cases I've judged differently and waited for
longer.  If the fix had been even simpler, I would have pushed it right
away, but my confidence with grammar changes is not as high as I would
like.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Extension ownership and misuse of SET ROLE/SET SESSION AUTHORIZATION

2020-05-19 Thread Tom Lane
Daniel Gustafsson  writes:
>> On 13 Feb 2020, at 23:55, Tom Lane  wrote:
>> Given the current behavior of SET ROLE and SET SESSION AUTHORIZATION,
>> I don't actually see any way that we could get these features to
>> play together.

> Is this being worked on for the 13 cycle such that it should be an open item?

I didn't have it on my list, but yeah maybe we should add it to the
"pre-existing issues" list.

>> The quick-and-dirty answer is to disallow these switches from being
>> used together in pg_restore, and I'm inclined to think maybe we should
>> do that in the back branches.

> ..or should we do this for v13 and back-branches and leave fixing it for 14?
> Considering the potential invasiveness of the fix I think the latter sounds
> rather appealing at this point in the cycle.  Something like the attached
> should be enough IIUC.

pg_dump and pg_dumpall also have that switch no?

regards, tom lane




Re: factorial function/phase out postfix operators?

2020-05-19 Thread Tom Lane
Vik Fearing  writes:
> I'm -1 on keeping ! around as a hard-coded postfix operator.

Before we go much further on this, we should have some proof
that there's actually material benefit to be gained.  I spent some
time just now trying to relax the AS restriction by ripping out
postfix ops, and the results were not too promising.  Indeed the
postfix-ops problem goes away, but then you find out that SQL's
random syntax choices for type names become the stumbling block.
An example here is that given

SELECT 'foo'::character varying

it's not clear if "varying" is supposed to be part of the type name or a
column label.  It looks to me like we'd have to increase the reserved-ness
of VARYING, PRECISION, and about half a dozen currently-unreserved
keywords involved in INTERVAL syntax, including such popular column names
as "month", "day", and "year".

Plus I got conflicts on WITHIN, GROUP, and FILTER from ordered-set
aggregate syntax; those are currently unreserved keywords, but they
can't be allowed as AS-less column labels.

We could possibly minimize the damage by inventing another keyword
classification besides the four we have now.  Or maybe we should
think harder about using more lookahead between the lexer and grammar.
But this is going to be a lot more ticklish than I would've hoped,
and possibly not cost-free, so we might well end up never pulling
the trigger on such a change.

So right at the moment I'm agreeing with Stephen's nearby opinion:
let's not deprecate these until we've got a patch that gets some
concrete benefit from removing them.

regards, tom lane




Re: Extension ownership and misuse of SET ROLE/SET SESSION AUTHORIZATION

2020-05-19 Thread Daniel Gustafsson
> On 13 Feb 2020, at 23:55, Tom Lane  wrote:

Is this being worked on for the 13 cycle such that it should be an open item?

> Given the current behavior of SET ROLE and SET SESSION AUTHORIZATION,
> I don't actually see any way that we could get these features to
> play together.  SET SESSION AUTHORIZATION insists on the originally
> authenticated user being a superuser, so that the documented point of
> --role (to allow you to start the restore from a not-superuser role)
> isn't going to work.  I thought about starting to use SET ROLE for
> both purposes, but it checks whether you have role privilege based
> on the session userid, so that a previous SET ROLE doesn't get you
> past that check even if it was a successful SET ROLE to a superuser.
> 
> The quick-and-dirty answer is to disallow these switches from being
> used together in pg_restore, and I'm inclined to think maybe we should
> do that in the back branches.

..or should we do this for v13 and back-branches and leave fixing it for 14?
Considering the potential invasiveness of the fix I think the latter sounds
rather appealing at this point in the cycle.  Something like the attached
should be enough IIUC.

cheers ./daniel



pg_restore_role.patch
Description: Binary data


Re: factorial function/phase out postfix operators?

2020-05-19 Thread Stephen Frost
Greetings,

* Vik Fearing (v...@postgresfriends.org) wrote:
> On 5/19/20 4:03 AM, Tom Lane wrote:
> > Peter Eisentraut  writes:
> >> What are the thoughts about then marking the postfix operator deprecated 
> >> and eventually removing it?
> > 
> > If we do this it'd require a plan.  We'd have to also warn about the
> > feature deprecation in (at least) the CREATE OPERATOR man page, and
> > we'd have to decide how many release cycles the deprecation notices
> > need to stand for.
> 
> I have never come across any custom postfix operators in the wild, and
> I've never even seen ! used in practice.
> 
> So I would suggest a very short deprecation period.  Deprecate now in
> 13, let 14 go by, and rip it all out for 15.  That should give us enough
> time to extend the deprecation period if we need to, or go back on it
> entirely (like I seem to remember we did with VACUUM FREEZE).
> 
> > If that's the intention, though, it'd be good to get those deprecation
> > notices published in v13 not v14.
> 
> +1

I agree with putting notices into v13 saying they're deprecated, but
then actually removing them in v14.  For that matter, I'd vote that we
generally accept a system whereby when we commit something that removes
a feature in the next major version, we put out some kind of notice that
it's been deprecated and won't be in v14.  We don't want to run the risk
of saying XYZ has been deprecated and then it staying around for a few
years, nor trying to say "it'll be removed in v14" before we actually
know that it's been committed for v14.

In other words, wait to deprecate until the commit has happened for v14
(and maybe wait a couple days in case someone wasn't watching and argues
to revert, but not longer than any normal commit), and then go back and
mark it as "deprecated and removed in v14" for all back-branches.  Users
will continue to have 5 years (by upgrading to v13, or whatever the last
release was before their favorite feature was removed, if they really
need to) to update their systems to deal with the change.

We do not do ourselves nor our users a real service by carrying forward
deprecated code/interfaces/views/etc, across major versions; instead
they tend to live on in infamy, with some users actually updating and
some not, ever, and then complaining when we suggest actually removing
it (we have lots of good examples of that too) and then we have to have
the debate again about removing it and, in some cases, we end up
un-deprecating it, which is confusing for users and a bit ridiculous.

Let's not do that.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: factorial function/phase out postfix operators?

2020-05-19 Thread Tom Lane
Robert Haas  writes:
> The ambiguity doesn't come from the mere existence of postfix
> operators. It comes from the fact that, when we lex the input, we
> can't tell whether a particular operator that we happen to encounter
> is prefix, infix, or postfix. So hard-coding, for example, a rule that
> '!' is always a postfix operator and anything else is never a postfix
> operator is sufficient to solve the key problems.

If we were willing to say that '!' could *only* be a postfix operator,
then maybe the ambiguity would go away.  Or maybe it wouldn't; if
you're seriously proposing this, I think it'd be incumbent on you
to demonstrate that we could still simplify the grammar to the same
extent.  But that will incur its own set of compatibility problems,
because there's no reason to assume that nobody has made prefix or
infix '!' operators.

In any case, it's hard to decide that that's a less klugy solution
than getting rid of postfix ops altogether.  There's a reason why
few programming languages have those.

In general, I put this on about the same level as when we decided
to remove ';' and ':' as operators (cf 259489bab, 766fb7f70).
Somebody thought it was cute that it was possible to have that,
which maybe it was, but it wasn't really sane in the big picture.
And as I recall, the amount of pushback we got was nil.

regards, tom lane




Re: Performance penalty when requesting text values in binary format

2020-05-19 Thread Jack Christensen
On Mon, May 18, 2020 at 7:07 AM Laurenz Albe 
wrote:

> Did you profile your benchmark?
> It would be interesting to know where the time is spent.
>

Unfortunately, I have not. Fortunately, it appears that Tom Lane recognized
this as a part of another issue and has prepared a patch.

https://www.postgresql.org/message-id/6648.1589819885%40sss.pgh.pa.us

Thanks,
Jack


Re: pg_stat_wal_receiver and flushedUpto/writtenUpto

2020-05-19 Thread Fujii Masao




On 2020/05/17 10:08, Michael Paquier wrote:

On Sat, May 16, 2020 at 10:15:47AM +0900, Michael Paquier wrote:

Thanks.  If there are no objections, I'll revisit that tomorrow and
apply it with those changes, just in time for beta1.


Okay, done this part then.


I found that "received_lsn" is still used in high-availability.sgml.
We should apply the following change in high-availability?

- view's received_lsn indicates that WAL is being
+ view's flushed_lsn indicates that WAL is being

BTW, we have pg_last_wal_receive_lsn() that returns the same lsn as
pg_stat_wal_receiver.flushed_lsn. Previously both used the term "receive"
in their names, but currently not. IMO it's better to use the same term in
those names for the consistency, but it's not good idea to rename
pg_last_wal_receive_lsn() to something like pg_last_wal_receive_lsn().
I have no better idea for now. So I'm ok with the current names.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: factorial function/phase out postfix operators?

2020-05-19 Thread Vik Fearing
On 5/19/20 4:22 PM, Robert Haas wrote:
> On Tue, May 19, 2020 at 9:51 AM Tom Lane  wrote:
>> Uh ... what exactly would be the point of that?  The real reason to do
>> this at all is not that we have it in for '!', but that we want to
>> drop the possibility of postfix operators from the grammar altogether,
>> which will remove a boatload of ambiguity.
> 
> The ambiguity doesn't come from the mere existence of postfix
> operators. It comes from the fact that, when we lex the input, we
> can't tell whether a particular operator that we happen to encounter
> is prefix, infix, or postfix. So hard-coding, for example, a rule that
> '!' is always a postfix operator and anything else is never a postfix
> operator is sufficient to solve the key problems. Then "SELECT a ! b"
> can only be a postfix operator application followed by a column
> labeling, a "SELECT a + b" can only be the application of an infix
> operator.

So if I make a complex UDT where a NOT operator makes a lot of sense[*],
why wouldn't I be allowed to make a prefix operator ! for it?  All for
what?  That one person in the corner over there who doesn't want to
rewrite their query to use factorial() instead?

I'm -1 on keeping ! around as a hard-coded postfix operator.


[*] I don't have a concrete example in mind, just this abstract one.
-- 
Vik Fearing




Re: ldap tls test fails in some environments

2020-05-19 Thread Christoph Berg
Re: Thomas Munro
> In your transcript for test 20, it looks like the client (PostgreSQL)
> is hanging up without even sending a TLS ClientHello:

Maybe tests 19 and 20 are failing because 18 was already bad. (But
probably not.)

> I wonder how to figure out why... does this tell you anything?
> 
>  $ENV{'LDAPCONF'}   = $ldap_conf;
> +$ENV{"GNUTLS_DEBUG_LEVEL"} = '99';

Sorry, I had not seen your message until now. Logs attached.

Even ldapsearch ... -ZZ is failing with that slapd config, so it's not
a PG problem.

$ GNUTLS_DEBUG_LEVEL=99 ldapsearch -h localhost -p 56118 -s base -b 
dc=example,dc=net -D cn=Manager,dc=example,dc=net -y 
/home/myon/postgresql-13-13~~devel~20200515.0434/build/src/test/ldap/tmp_check/ldappassword
 -n 'objectclass=*' -ZZ
gnutls[2]: Enabled GnuTLS 3.6.13 logging...
gnutls[2]: getrandom random generator was detected
gnutls[2]: Intel SSSE3 was detected
gnutls[2]: Intel AES accelerator was detected
gnutls[2]: Intel GCM accelerator was detected
gnutls[2]: cfg: unable to access: /etc/gnutls/config: 2
5ec3ec38 daemon: activity on 1 descriptor
5ec3ec38 daemon: activity on:
5ec3ec38 slap_listener_activate(6):
5ec3ec38 daemon: epoll: listen=6 busy
5ec3ec38 daemon: epoll: listen=7 active_threads=0 tvp=NULL
5ec3ec38 daemon: epoll: listen=8 active_threads=0 tvp=NULL
5ec3ec38 daemon: epoll: listen=9 active_threads=0 tvp=NULL
5ec3ec38 >>> slap_listener(ldap://localhost:56118)
5ec3ec38 daemon: accept() = 10
5ec3ec38 daemon: listen=6, new connection on 10
5ec3ec38 daemon: activity on 1 descriptor
5ec3ec38 daemon: activity on:
5ec3ec38 daemon: epoll: listen=6 active_threads=0 tvp=NULL
5ec3ec38 daemon: epoll: listen=7 active_threads=0 tvp=NULL
5ec3ec38 daemon: epoll: listen=8 active_threads=0 tvp=NULL
5ec3ec38 daemon: epoll: listen=9 active_threads=0 tvp=NULL
5ec3ec38 daemon: added 10r (active) listener=(nil)
5ec3ec38 daemon: activity on 1 descriptor
5ec3ec38 daemon: activity on: 10r
5ec3ec38 daemon: read active on 10
5ec3ec38 daemon: epoll: listen=6 active_threads=0 tvp=NULL
5ec3ec38 connection_get(10)
5ec3ec38 daemon: epoll: listen=7 active_threads=0 tvp=NULL
5ec3ec38 connection_get(10): got connid=1000
5ec3ec38 connection_read(10): checking for input on id=1000
5ec3ec38 daemon: epoll: listen=8 active_threads=0 tvp=NULL
5ec3ec38 daemon: epoll: listen=9 active_threads=0 tvp=NULL
5ec3ec38 daemon: activity on 1 descriptor
ber_get_next
5ec3ec38 daemon: activity on:
ldap_read: want=8, got=8
  :  30 1d 02 01 01 77 18 800w..
5ec3ec38 daemon: epoll: listen=6 active_threads=0 tvp=NULL
5ec3ec38 daemon: epoll: listen=7 active_threads=0 tvp=NULL
5ec3ec38 daemon: epoll: listen=8 active_threads=0 tvp=NULL
5ec3ec38 daemon: epoll: listen=9 active_threads=0 tvp=NULL
ldap_read: want=23, got=23
  :  16 31 2e 33 2e 36 2e 31  2e 34 2e 31 2e 31 34 36   .1.3.6.1.4.1.146
  0010:  36 2e 32 30 30 33 37   6.20037
ber_get_next: tag 0x30 len 29 contents:
ber_dump: buf=0x7f0c5bc0 ptr=0x7f0c5bc0 end=0x7f0c5bdd len=29
  :  02 01 01 77 18 80 16 31  2e 33 2e 36 2e 31 2e 34   ...w...1.3.6.1.4
  0010:  2e 31 2e 31 34 36 36 2e  32 30 30 33 37.1.1466.20037
5ec3ec38 op tag 0x77, time 1589898296
ber_get_next
ldap_read: want=8 error=Resource temporarily unavailable
5ec3ec38 conn=1000 op=0 do_extended
5ec3ec38 daemon: activity on 1 descriptor
5ec3ec38 daemon: activity on:
ber_scanf fmt ({m) ber:
ber_dump: buf=0x7f0c5bc0 ptr=0x7f0c5bc3 end=0x7f0c5bdd len=26
5ec3ec38 daemon: epoll: listen=6 active_threads=0 tvp=NULL
  :  77 18 80 16 31 2e 33 2e  36 2e 31 2e 34 2e 31 2e   w...1.3.6.1.4.1.
  0010:  31 34 36 36 2e 32 30 30  33 37 1466.20037
5ec3ec38 daemon: epoll: listen=7 active_threads=0 tvp=NULL
5ec3ec38 daemon: epoll: listen=8 active_threads=0 tvp=NULL
5ec3ec38 daemon: epoll: listen=9 active_threads=0 tvp=NULL
5ec3ec38 do_extended: oid=1.3.6.1.4.1.1466.20037
5ec3ec38 send_ldap_extended: err=0 oid= len=0
5ec3ec38 send_ldap_response: msgid=1 tag=120 err=0
ber_flush2: 14 bytes to sd 10
  :  30 0c 02 01 01 78 07 0a  01 00 04 00 04 00 0x
ldap_write: want=14, written=14
  :  30 0c 02 01 01 78 07 0a  01 00 04 00 04 00 0x
gnutls[2]: added 6 protocols, 29 ciphersuites, 19 sig algos and 10 groups into 
priority list
gnutls[3]: ASSERT: 
../../../lib/x509/verify-high2.c[gnutls_x509_trust_list_add_trust_file]:361
ldap_start_tls: Connect error (-11)
5ec3ec38 daemon: activity on 1 descriptor
5ec3ec38 daemon: activity on: 10r
5ec3ec38 daemon: read active on 10
5ec3ec38 daemon: epoll: listen=6 active_threads=0 tvp=NULL
5ec3ec38 daemon: epoll: listen=7 active_threads=0 tvp=NULL
5ec3ec38 daemon: epoll: listen=8 active_threads=0 tvp=NULL
5ec3ec38 connection_get(10)
5ec3ec38 connection_get(10): got connid=1000
5ec3ec38 daemon: epoll: listen=9 active_threads=0 tvp=NULL
5ec3ec38 connection_read(10): checking for input on id=1000
tls_read: want=5, got=5
  :  30 05 02 01 02

Re: factorial function/phase out postfix operators?

2020-05-19 Thread Robert Haas
On Tue, May 19, 2020 at 9:51 AM Tom Lane  wrote:
> Uh ... what exactly would be the point of that?  The real reason to do
> this at all is not that we have it in for '!', but that we want to
> drop the possibility of postfix operators from the grammar altogether,
> which will remove a boatload of ambiguity.

The ambiguity doesn't come from the mere existence of postfix
operators. It comes from the fact that, when we lex the input, we
can't tell whether a particular operator that we happen to encounter
is prefix, infix, or postfix. So hard-coding, for example, a rule that
'!' is always a postfix operator and anything else is never a postfix
operator is sufficient to solve the key problems. Then "SELECT a ! b"
can only be a postfix operator application followed by a column
labeling, a "SELECT a + b" can only be the application of an infix
operator.

The parser ambiguities could also be removed if the source of the
information where a GUC or a catalog lookup; there are good reasons
not to go that way, but my point is that the problem is not that
postfix operators are per se evil, but that the information we need is
not available at the right phase of the process. We can only make use
of the information in pg_operator after we start assigning type
information, which has to happen after we parse, but to avoid the
ambiguity here, we need the information before we parse - i.e. at the
lexing stage.

> In my non-caffeinated state, I don't recall exactly which things are
> blocked by the existence of postfix ops; but I think for instance it might
> become possible to remove the restriction of requiring AS before column
> aliases that happen to be unreserved keywords.

Right - which would be a huge win.

> I would also argue that having a feature that is available to
> built-in operators but not user-defined ones is pretty antithetical
> to Postgres philosophy.

That I think is the policy question before us. I believe that any rule
that tells us which operators are postfix and which are not at the
lexing stage is good enough. I think here you are arguing for the
empty set, which will work, but I believe any other fixed set also
works, such as { '!' }. I don't think we're going to break a ton of
user code no matter which one we pick, but I do think that it's
possible to pick either one and still achieve our goals here, so
that's the issue that I wanted to raise.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Is it useful to record whether plans are generic or custom?

2020-05-19 Thread Atsushi Torikoshi
On Sat, May 16, 2020 at 6:01 PM legrand legrand 
wrote:

> > To track executed plan types, I think execution layer hooks
> > are appropriate.
> > These hooks, however, take QueryDesc as a param and it does
> > not include cached plan information.
>
> It seems that the same QueryDesc entry is reused when executing
> a generic plan.
> For exemple marking queryDesc->plannedstmt->queryId (outside
> pg_stat_statements) with a pseudo tag during ExecutorStart
> reappears in later executions with generic plans ...
>
> Is this QueryDesc reusable by a custom plan ? If not maybe a solution
> could be to add a flag in queryDesc->plannedstmt ?
>

Thanks for your proposal!

I first thought it was a good idea and tried to add a flag to QueryDesc,
but the comments on QueryDesc say it encapsulates everything that
the executor needs to execute a query.

Whether a plan is generic or custom is not what executor needs to
know for running queries, so now I hesitate to do so.

Instead, I'm now considering using a static hash for prepared queries
(static HTAB *prepared_queries).


BTW, I'd also appreciate other opinions about recording the number
of generic and custom plans on pg_stat_statemtents.


Regards,

--
Atsushi Torikoshi


Re: factorial function/phase out postfix operators?

2020-05-19 Thread Tom Lane
Robert Haas  writes:
> I think it's generally a good idea, though perhaps we should consider
> continuing to allow '!' as a postfix operator and just removing
> support for any other.

Uh ... what exactly would be the point of that?  The real reason to do
this at all is not that we have it in for '!', but that we want to
drop the possibility of postfix operators from the grammar altogether,
which will remove a boatload of ambiguity.

> I won't lose a lot of sleep if we decide to rip out '!' as well, but I
> don't think that continuing to support it would cost us much.

AFAICS, it would cost us the entire point of this change.

In my non-caffeinated state, I don't recall exactly which things are
blocked by the existence of postfix ops; but I think for instance it might
become possible to remove the restriction of requiring AS before column
aliases that happen to be unreserved keywords.

If we lobotomize CREATE OPERATOR but don't remove built-in postfix
ops, then none of those improvements will be available.  That seems
like the worst possible choice.

I would also argue that having a feature that is available to
built-in operators but not user-defined ones is pretty antithetical
to Postgres philosophy.

regards, tom lane




Re: Expand the use of check_canonical_path() for more GUCs

2020-05-19 Thread Tom Lane
Michael Paquier  writes:
> While digging into my backlog, I have found this message from Peter E
> mentioning about $subject:
> https://www.postgresql.org/message-id/e6aac026-174c-9952-689f-6bee76f9a...@2ndquadrant.com

> It seems to me that it would be a good idea to make those checks more
> consistent, and attached is a patch.

Hm, I'm pretty certain that data_directory does not need this because
canonicalization is done elsewhere; the most that you could accomplish
there is to cause problems.  Dunno about the rest.

regards, tom lane




Re: factorial function/phase out postfix operators?

2020-05-19 Thread Kenneth Marshall
> 
> I won't lose a lot of sleep if we decide to rip out '!' as well, but I
> don't think that continuing to support it would cost us much.
> 
+1 for keeping ! and nuking the rest, if possible.

Regards,
Ken




Re: Missing grammar production for WITH TIES

2020-05-19 Thread Tom Lane
Michael Paquier  writes:
> On Tue, May 19, 2020 at 12:41:39AM -0400, Tom Lane wrote:
>> Michael Paquier  writes:
>>> This has been committed just after beta1 has been stamped.  So it
>>> means that it won't be included in it, right?

>> Right.

> Still, wouldn't it be better to wait until the version is tagged?

Yeah, that would have been better per project protocol: if a tarball
re-wrap becomes necessary then it would be messy not to include this
change along with fixing whatever urgent bug there might be.

However, I thought the case for delaying this fix till post-wrap was kind
of thin anyway, so if that does happen I won't be too fussed about it.
Otherwise I would've said something earlier on this thread.

regards, tom lane




Problem with pg_atomic_compare_exchange_u64 at 32-bit platformwd

2020-05-19 Thread Konstantin Knizhnik
Definition of pg_atomic_compare_exchange_u64 requires alignment of 
expected pointer on 8-byte boundary.


pg_atomic_compare_exchange_u64(volatile pg_atomic_uint64 *ptr,
                               uint64 *expected, uint64 newval)
{
#ifndef PG_HAVE_ATOMIC_U64_SIMULATION
    AssertPointerAlignment(ptr, 8);
    AssertPointerAlignment(expected, 8);
#endif


I wonder if there are platforms  where such restriction is actually needed.
And if so, looks like our ./src/test/regress/regress.c is working only 
occasionally:


static void
test_atomic_uint64(void)
{
    pg_atomic_uint64 var;
    uint64        expected;
    ...
        if (!pg_atomic_compare_exchange_u64(, , 1))

because there is no warranty that "expected" variable will be aligned on 
stack at 8 byte boundary (at least at Win32).





Re: Warn when parallel restoring a custom dump without data offsets

2020-05-19 Thread Justin Pryzby
On Sat, May 16, 2020 at 04:57:46PM -0400, David Gilman wrote:
> If pg_dump can't seek on its output stream when writing a dump in the
> custom archive format (possibly because you piped its stdout to a file)
> it can't update that file with data offsets. These files will often
> break parallel restoration. Warn when the user is doing pg_restore on
> such a file to give them a hint as to why their restore is about to
> fail.

You didn't say so, but I gather this is related to this other thread (which
seems to represent two separate issues).
https://www.postgresql.org/message-id/flat/1582010626326-0.post%40n3.nabble.com#0891d77011cdb6ca3ad8ab7904a2ed63

> Tom, if you or anyone else with PostgreSQL would appreciate the
> pg_dump file I can send it to you out of band, it's only a few
> megabytes. I have pg_restore with debug symbols too if you want me to
> try anything.

Would you send to me or post a link to a filesharing site and I'll try to
reproduce it ?  So far no luck.

You should include here your diagnosis from that thread, or add it to a commit
message, and mention the suspect commit (548e50976).  Eventually add patch for
the next commitfest.  https://commitfest.postgresql.org/

I guess you're also involved in this conversation:
https://dba.stackexchange.com/questions/257398/pg-restore-with-jobs-flag-results-in-pg-restore-error-a-worker-process-di

-- 
Justin




Re: some grammar refactoring

2020-05-19 Thread Robert Haas
On Tue, May 19, 2020 at 2:43 AM Peter Eisentraut
 wrote:
> Here is a series of patches to do some refactoring in the grammar around
> the commands COMMENT, DROP, SECURITY LABEL, and ALTER EXTENSION ...
> ADD/DROP.  In the grammar, these commands (with some exceptions)
> basically just take a reference to an object and later look it up in C
> code.  Some of that was already generalized individually for each
> command (drop_type_any_name, drop_type_name, etc.).  This patch combines
> it into common lists for all these commands.

I haven't reviewed the code, but +1 for the idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Two fsync related performance issues?

2020-05-19 Thread Robert Haas
On Mon, May 11, 2020 at 8:43 PM Paul Guo  wrote:
> I have this concern since I saw an issue in a real product environment that 
> the startup process needs 10+ seconds to start wal replay after relaunch due 
> to elog(PANIC) (it was seen on postgres based product Greenplum but it is a 
> common issue in postgres also). I highly suspect the delay was mostly due to 
> this. Also it is noticed that on public clouds fsync is much slower than that 
> on local storage so the slowness should be more severe on cloud. If we at 
> least disable fsync on the table directories we could skip a lot of file 
> fsync - this may save a lot of seconds during crash recovery.

I've seen this problem be way worse than that. Running fsync() on all
the files and performing the unlogged table cleanup steps can together
take minutes or, I think, even tens of minutes. What I think sucks
most in this area is that we don't even emit any log messages if the
process takes a long time, so the user has no idea why things are
apparently hanging. I think we really ought to try to figure out some
way to give the user a periodic progress indication when this kind of
thing is underway, so that they at least have some idea what's
happening.

As Tom says, I don't think there's any realistic way that we can
disable it altogether, but maybe there's some way we could make it
quicker, like some kind of parallelism, or by overlapping it with
other things. It seems to me that we have to complete the fsync pass
before we can safely checkpoint, but I don't know that it needs to be
done any sooner than that... not sure though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: factorial function/phase out postfix operators?

2020-05-19 Thread Pavel Stehule
út 19. 5. 2020 v 14:27 odesílatel Robert Haas 
napsal:

> On Mon, May 18, 2020 at 10:42 AM Peter Eisentraut
>  wrote:
> > What are the thoughts about then marking the postfix operator deprecated
> > and eventually removing it?
>
> I wrote a little bit about this last year:
>
>
> http://postgr.es/m/CA+TgmoarLfSQcLCh7jx0737SZ28qwbuy+rUWT6rSHAO=b-6...@mail.gmail.com
>
> I think it's generally a good idea, though perhaps we should consider
> continuing to allow '!' as a postfix operator and just removing
> support for any other. That would probably allow us to have a very
> short deprecation period, since real-world use of user-defined postfix
> operators seems to be nil -- and it would also make this into a change
> that only affects the lexer and parser, which might make it simpler.
>
> I won't lose a lot of sleep if we decide to rip out '!' as well, but I
> don't think that continuing to support it would cost us much.
>

This is little bit obscure feature. It can be removed and relative quickly.
Maybe some warning if somebody use it can be good (for Postgres 13)

Pavel


> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
>


explicit_bzero for sslpassword

2020-05-19 Thread Daniel Gustafsson
Since commit 74a308cf5221f we use explicit_bzero on pgpass and connhost
password in libpq, but not sslpassword which seems an oversight.  The attached
performs an explicit_bzero before freeing like the pattern for other password
variables.

cheers ./daniel



sslpassword_bzero.patch
Description: Binary data


Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-05-19 Thread Amit Kapila
On Fri, May 15, 2020 at 2:48 PM Dilip Kumar  wrote:
>
> On Wed, May 13, 2020 at 4:50 PM Amit Kapila  wrote:
> >
>
> > 3.
> > And, during catalog scan we can check the status of the xid and
> > + * if it is aborted we will report a specific error that we can ignore.  We
> > + * might have already streamed some of the changes for the aborted
> > + * (sub)transaction, but that is fine because when we decode the abort we 
> > will
> > + * stream abort message to truncate the changes in the subscriber.
> > + */
> > +static inline void
> > +SetupCheckXidLive(TransactionId xid)
> >
> > In the above comment, I don't think it is right to say that we ignore
> > the error raised due to the aborted transaction.  We need to say that
> > we discard the already streamed changes on such an error.
>
> Done.
>

In the same comment, there is typo (/messageto/message to).

> > 4.
> > +static inline void
> > +SetupCheckXidLive(TransactionId xid)
> > +{
> >   /*
> > - * If this transaction has no snapshot, it didn't make any changes to the
> > - * database, so there's nothing to decode.  Note that
> > - * ReorderBufferCommitChild will have transferred any snapshots from
> > - * subtransactions if there were any.
> > + * setup CheckXidAlive if it's not committed yet. We don't check if the xid
> > + * aborted. That will happen during catalog access.  Also reset the
> > + * sysbegin_called flag.
> >   */
> > - if (txn->base_snapshot == NULL)
> > + if (!TransactionIdDidCommit(xid))
> >   {
> > - Assert(txn->ninvalidations == 0);
> > - ReorderBufferCleanupTXN(rb, txn);
> > - return;
> > + CheckXidAlive = xid;
> > + bsysscan = false;
> >   }
> >
> > I think this function is inline as it needs to be called for each
> > change. If that is the case and otherwise also, isn't it better that
> > we check if passed xid is the same as CheckXidAlive before checking
> > TransactionIdDidCommit as TransactionIdDidCommit can be costly and
> > calling it for each change might not be a good idea?
>
> Done,  Also I think it is good the check the TransactionIdIsInProgress
> instead of !TransactionIdDidCommit.  I have changed that as well.
>

What if it is aborted just before this check?  I think the decode API
won't be able to detect that and sys* API won't care to check because
CheckXidAlive won't be set for that case.

> > 5.
> > setup CheckXidAlive if it's not committed yet. We don't check if the xid
> > + * aborted. That will happen during catalog access.  Also reset the
> > + * sysbegin_called flag.
> >
> > /if the xid aborted/if the xid is aborted.  missing comma after Also.
>
> Done
>

You forgot to change as per the second part of the comment (missing
comma after Also).


>
> > 8.
> > @@ -1588,8 +1766,6 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId 
> > xid,
> >   * use as a normal record. It'll be cleaned up at the end
> >   * of INSERT processing.
> >   */
> > - if (specinsert == NULL)
> > - elog(ERROR, "invalid ordering of speculative insertion changes");
> >
> > You have removed this check but all other handling of specinsert is
> > same as far as this patch is concerned.  Why so?
>
> Seems like a merge issue, or the leftover from the old design of the
> toast handling where we were streaming with the partial tuple.
> fixed now.
>
> > 9.
> > @@ -1676,8 +1860,6 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId 
> > xid,
> >   * freed/reused while restoring spooled data from
> >   * disk.
> >   */
> > - Assert(change->data.tp.newtuple != NULL);
> > -
> >   dlist_delete(>node);
> >
> > Why is this Assert removed?
>
> Same cause as above so fixed.
>
> > 10.
> > @@ -1753,7 +1935,15 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId 
> > xid,
> >   relations[nrelations++] = relation;
> >   }
> >
> > - rb->apply_truncate(rb, txn, nrelations, relations, change);
> > + if (streaming)
> > + {
> > + rb->stream_truncate(rb, txn, nrelations, relations, change);
> > +
> > + /* Remember that we have sent some data. */
> > + change->txn->any_data_sent = true;
> > + }
> > + else
> > + rb->apply_truncate(rb, txn, nrelations, relations, change);
> >
> > Can we encapsulate this in a separate function like
> > ReorderBufferApplyTruncate or something like that?  Basically, rather
> > than having streaming check in this function, lets do it in some other
> > internal function.  And we can likewise do it for all the streaming
> > checks in this function or at least whereever it is feasible.  That
> > will make this function look clean.
>
> Done for truncate and change.  I think we can create a few more such
> functions for
> start/stop and cleanup handling on error.  I will work on that.
>

Yeah, I think that would be better.

One minor comment change suggestion:
/*
+ * start stream or begin the transaction.  If this is the first
+ * change in the current stream.
+ */

We can write the above comment as "Start the stream or begin the
transaction for the first change in the current stream."

-- 
With Regards,
Amit Kapila.
EnterpriseDB: 

Re: factorial function/phase out postfix operators?

2020-05-19 Thread Robert Haas
On Mon, May 18, 2020 at 10:42 AM Peter Eisentraut
 wrote:
> What are the thoughts about then marking the postfix operator deprecated
> and eventually removing it?

I wrote a little bit about this last year:

http://postgr.es/m/CA+TgmoarLfSQcLCh7jx0737SZ28qwbuy+rUWT6rSHAO=b-6...@mail.gmail.com

I think it's generally a good idea, though perhaps we should consider
continuing to allow '!' as a postfix operator and just removing
support for any other. That would probably allow us to have a very
short deprecation period, since real-world use of user-defined postfix
operators seems to be nil -- and it would also make this into a change
that only affects the lexer and parser, which might make it simpler.

I won't lose a lot of sleep if we decide to rip out '!' as well, but I
don't think that continuing to support it would cost us much.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-05-19 Thread Amit Kapila
On Fri, May 15, 2020 at 2:47 PM Dilip Kumar  wrote:
>
> On Tue, May 12, 2020 at 4:39 PM Amit Kapila  wrote:
> >
>
> > 4.
> > +static void
> > +stream_start_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn)
> > +{
> > + LogicalDecodingContext *ctx = cache->private_data;
> > + LogicalErrorCallbackState state;
> > + ErrorContextCallback errcallback;
> > +
> > + Assert(!ctx->fast_forward);
> > +
> > + /* We're only supposed to call this when streaming is supported. */
> > + Assert(ctx->streaming);
> > +
> > + /* Push callback + info on the error context stack */
> > + state.ctx = ctx;
> > + state.callback_name = "stream_start";
> > + /* state.report_location = apply_lsn; */
> >
> > Why can't we supply the report_location here?  I think here we need to
> > report txn->first_lsn if this is the very first stream and
> > txn->final_lsn if it is any consecutive one.
>
> Done
>

Now after your change in stream_start_cb_wrapper, we assign
report_location as first_lsn passed as input to function but
write_location is still txn->first_lsn.  Shouldn't we assing passed in
first_lsn to write_location?  It seems assigning txn->first_lsn won't
be correct for streams other than first-one.

> > 5.
> > +static void
> > +stream_stop_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn)
> > +{
> > + LogicalDecodingContext *ctx = cache->private_data;
> > + LogicalErrorCallbackState state;
> > + ErrorContextCallback errcallback;
> > +
> > + Assert(!ctx->fast_forward);
> > +
> > + /* We're only supposed to call this when streaming is supported. */
> > + Assert(ctx->streaming);
> > +
> > + /* Push callback + info on the error context stack */
> > + state.ctx = ctx;
> > + state.callback_name = "stream_stop";
> > + /* state.report_location = apply_lsn; */
> >
> > Can't we report txn->final_lsn here
>
> We are already setting this to the  txn->final_ls in 0006 patch, but I
> have moved it into this patch now.
>

Similar to previous point, here also, I think we need to assign report
and write location as last_lsn passed to this API.

>
>
> > v20-0005-Implement-streaming-mode-in-ReorderBuffer
> > -
> > 10.
> > Theoretically, we could get rid of the k-way merge, and append the
> > changes to the toplevel xact directly (and remember the position
> > in the list in case the subxact gets aborted later).
> >
> > I don't think this part of the commit message is correct as we
> > sometimes need to spill even during streaming.  Please check the
> > entire commit message and update according to the latest
> > implementation.
>
> Done
>

You seem to forgot about removing the other part of message ("This
adds a second iterator for the streaming case" which is not
relavant now.

> > 11.
> > - * HeapTupleSatisfiesHistoricMVCC.
> > + * tqual.c's HeapTupleSatisfiesHistoricMVCC.
> > + *
> > + * We do build the hash table even if there are no CIDs. That's
> > + * because when streaming in-progress transactions we may run into
> > + * tuples with the CID before actually decoding them. Think e.g. about
> > + * INSERT followed by TRUNCATE, where the TRUNCATE may not be decoded
> > + * yet when applying the INSERT. So we build a hash table so that
> > + * ResolveCminCmaxDuringDecoding does not segfault in this case.
> > + *
> > + * XXX We might limit this behavior to streaming mode, and just bail
> > + * out when decoding transaction at commit time (at which point it's
> > + * guaranteed to see all CIDs).
> >   */
> >  static void
> >  ReorderBufferBuildTupleCidHash(ReorderBuffer *rb, ReorderBufferTXN *txn)
> > @@ -1350,9 +1498,6 @@ ReorderBufferBuildTupleCidHash(ReorderBuffer
> > *rb, ReorderBufferTXN *txn)
> >   dlist_iter iter;
> >   HASHCTL hash_ctl;
> >
> > - if (!rbtxn_has_catalog_changes(txn) || dlist_is_empty(>tuplecids))
> > - return;
> > -
> >
> > I don't understand this change.  Why would "INSERT followed by
> > TRUNCATE" could lead to a tuple which can come for decode before its
> > CID?  The patch has made changes based on this assumption in
> > HeapTupleSatisfiesHistoricMVCC which appears to be very risky as the
> > behavior could be dependent on whether we are streaming the changes
> > for in-progress xact or at the commit of a transaction.  We might want
> > to generate a test to once validate this behavior.
> >
> > Also, the comment refers to tqual.c which is wrong as this API is now
> > in heapam_visibility.c.
>
> Done.
>

+ * INSERT.  So in such cases we assume the CIDs is from the future command
+ * and return as unresolve.
+ */
+ if (tuplecid_data == NULL)
+ return false;
+

Here lets reword the last line of comment as ".  So in such cases we
assume the CID is from the future command."

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Add A Glossary

2020-05-19 Thread Peter Eisentraut

On 2020-05-19 08:17, Laurenz Albe wrote:

The term "cluster" is unfortunate, because to most people it suggests a group of
machines, so the term "instance" is better, but that ship has sailed long ago.


I don't see what would stop us from renaming some things, with some care.

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-05-19 Thread Amit Kapila
On Tue, May 19, 2020 at 3:31 PM Dilip Kumar  wrote:
>
> On Tue, May 19, 2020 at 2:34 PM Amit Kapila  wrote:
> >
> > On Mon, May 18, 2020 at 5:57 PM Amit Kapila  wrote:
> > >
> > >
> > > 3.
> > > + /*
> > > + * If streaming is enable and we have serialized this transaction because
> > > + * it had incomplete tuple.  So if now we have got the complete tuple we
> > > + * can stream it.
> > > + */
> > > + if (ReorderBufferCanStream(rb) && can_stream && 
> > > rbtxn_is_serialized(toptxn)
> > > + && !(rbtxn_has_toast_insert(txn)) && !(rbtxn_has_spec_insert(txn)))
> > > + {
> > >
> > > This comment is just saying what you are doing in the if-check.  I
> > > think you need to explain the rationale behind it. I don't like the
> > > variable name 'can_stream' because it matches ReorderBufferCanStream
> > > whereas it is for a different purpose, how about naming it as
> > > 'change_complete' or something like that.  The check has many
> > > conditions, can we move it to a separate function to make the code
> > > here look clean?
> > >
> >
> > Do we really need this?  Immediately after this check, we are calling
> > ReorderBufferCheckMemoryLimit which will anyway stream the changes if
> > required.
>
> Actually, ReorderBufferCheckMemoryLimit is only meant for checking
> whether we need to stream the changes due to the memory limit.  But
> suppose when memory limit exceeds that time we could not stream the
> transaction because there was only incomplete toast insert so we
> serialized.  Now,  when we get the tuple which makes the changes
> complete but now it is not crossing the memory limit as changes were
> already serialized.  So I am not sure whether it is a good idea to
> stream the transaction as soon as we get the complete changes or we
> shall wait till next time memory limit exceed and that time we select
> the suitable candidate.
>

I think it is better to wait till next time we exceed the memory threshold.

>  Ideally, we were are in streaming more and
> the transaction is serialized means it was already a candidate for
> streaming but could not stream due to the incomplete changes so
> shouldn't we stream it immediately as soon as its changes are complete
> even though now we are in memory limit.
>

The only time we need to stream or spill is when we exceed memory
threshold.  In the above case, it is possible that next time there is
some other candidate transaction that we can stream.

> >
> > Another comments on v20-0010-Bugfix-handling-of-incomplete-toast-tuple:
> >
> > + else if (rbtxn_has_toast_insert(txn) &&
> > + ChangeIsInsertOrUpdate(change->action))
> > + {
> > + toptxn->txn_flags &= ~RBTXN_HAS_TOAST_INSERT;
> > + can_stream = true;
> > + }
> > ..
> > +#define ChangeIsInsertOrUpdate(action) \
> > + (((action) == REORDER_BUFFER_CHANGE_INSERT) || \
> > + ((action) == REORDER_BUFFER_CHANGE_UPDATE) || \
> > + ((action) == REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT))
> >
> > How can we clear the RBTXN_HAS_TOAST_INSERT flag on
> > REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT action?
>
> Partial toast insert means we have inserted in the toast but not in
> the main table.  So even if it is spec insert we can form the complete
> tuple, however, we can still not stream it because we haven't got
> spec_confirm but for that, we are marking another flag.  So if the
> insert is aspect insert the toast insert will also be spec insert and
> as part of that toast, spec inserts we are marking partial tuple so
> cleaning that flag should happen when the spec insert is done for the
> main table right?
>

Sounds reasonable.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Expand the use of check_canonical_path() for more GUCs

2020-05-19 Thread Peter Eisentraut

On 2020-05-19 09:09, Michael Paquier wrote:

While digging into my backlog, I have found this message from Peter E
mentioning about $subject:
https://www.postgresql.org/message-id/e6aac026-174c-9952-689f-6bee76f9a...@2ndquadrant.com

It seems to me that it would be a good idea to make those checks more
consistent, and attached is a patch.


That thread didn't resolve why check_canonical_path() is necessary 
there.  Maybe the existing uses could be removed?


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Optimizer docs typos

2020-05-19 Thread Etsuro Fujita
On Mon, May 18, 2020 at 7:45 PM Richard Guo  wrote:
> In this same README doc, another suspicious typo to me, which happens in
> section "Optimizer Functions", is in the prefix to query_planner(),
> we should have three dashes, rather than two, since query_planner() is
> called within grouping_planner().
>
> diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
> index 7dcab9a..bace081 100644
> --- a/src/backend/optimizer/README
> +++ b/src/backend/optimizer/README
> @@ -315,7 +315,7 @@ set up for recursive handling of subqueries
>preprocess target list for non-SELECT queries
>handle UNION/INTERSECT/EXCEPT, GROUP BY, HAVING, aggregates,
> ORDER BY, DISTINCT, LIMIT
> ---query_planner()
> +---query_planner()
> make list of base relations used in query
> split up the qual into restrictions (a=1) and joins (b=c)
> find qual clauses that enable merge and hash joins

Yeah, you are right.  Another one would be in the prefix to
standard_join_search(); I think it might be better to have six dashes,
rather than five, because standard_join_search() is called within
make_rel_from_joinlist().

Best regards,
Etsuro Fujita




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-05-19 Thread Dilip Kumar
On Tue, May 19, 2020 at 2:34 PM Amit Kapila  wrote:
>
> On Mon, May 18, 2020 at 5:57 PM Amit Kapila  wrote:
> >
> >
> > 3.
> > + /*
> > + * If streaming is enable and we have serialized this transaction because
> > + * it had incomplete tuple.  So if now we have got the complete tuple we
> > + * can stream it.
> > + */
> > + if (ReorderBufferCanStream(rb) && can_stream && 
> > rbtxn_is_serialized(toptxn)
> > + && !(rbtxn_has_toast_insert(txn)) && !(rbtxn_has_spec_insert(txn)))
> > + {
> >
> > This comment is just saying what you are doing in the if-check.  I
> > think you need to explain the rationale behind it. I don't like the
> > variable name 'can_stream' because it matches ReorderBufferCanStream
> > whereas it is for a different purpose, how about naming it as
> > 'change_complete' or something like that.  The check has many
> > conditions, can we move it to a separate function to make the code
> > here look clean?
> >
>
> Do we really need this?  Immediately after this check, we are calling
> ReorderBufferCheckMemoryLimit which will anyway stream the changes if
> required.

Actually, ReorderBufferCheckMemoryLimit is only meant for checking
whether we need to stream the changes due to the memory limit.  But
suppose when memory limit exceeds that time we could not stream the
transaction because there was only incomplete toast insert so we
serialized.  Now,  when we get the tuple which makes the changes
complete but now it is not crossing the memory limit as changes were
already serialized.  So I am not sure whether it is a good idea to
stream the transaction as soon as we get the complete changes or we
shall wait till next time memory limit exceed and that time we select
the suitable candidate.  Ideally, we were are in streaming more and
the transaction is serialized means it was already a candidate for
streaming but could not stream due to the incomplete changes so
shouldn't we stream it immediately as soon as its changes are complete
even though now we are in memory limit.  Because our target is to
stream not spill so we should try to stream the spilled changes on the
first opportunity.

  Can we move the changes related to the detection of
> incomplete data to a separate function?

Ok.


>
> Another comments on v20-0010-Bugfix-handling-of-incomplete-toast-tuple:
>
> + else if (rbtxn_has_toast_insert(txn) &&
> + ChangeIsInsertOrUpdate(change->action))
> + {
> + toptxn->txn_flags &= ~RBTXN_HAS_TOAST_INSERT;
> + can_stream = true;
> + }
> ..
> +#define ChangeIsInsertOrUpdate(action) \
> + (((action) == REORDER_BUFFER_CHANGE_INSERT) || \
> + ((action) == REORDER_BUFFER_CHANGE_UPDATE) || \
> + ((action) == REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT))
>
> How can we clear the RBTXN_HAS_TOAST_INSERT flag on
> REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT action?

Partial toast insert means we have inserted in the toast but not in
the main table.  So even if it is spec insert we can form the complete
tuple, however, we can still not stream it because we haven't got
spec_confirm but for that, we are marking another flag.  So if the
insert is aspect insert the toast insert will also be spec insert and
as part of that toast, spec inserts we are marking partial tuple so
cleaning that flag should happen when the spec insert is done for the
main table right?


> IIUC, the basic idea used to handle incomplete changes (which is
> possible in case of toast tuples and speculative inserts) is to mark
> such TXNs as containing incomplete changes and then while finding the
> largest top-level TXN for streaming, we ignore such TXN's and move to
> next largest TXN.  If none of the TXNs have complete changes then we
> choose the largest (sub)transaction and spill the same to make the
> in-memory changes below logical_decoding_work_mem threshold.  This
> idea can work but the strategy to choose the transaction is suboptimal
> for cases where TXNs have some changes which are complete followed by
> an incomplete toast or speculative tuple.  I was having an offlist
> discussion with Robert on this problem and he suggested that it would
> be better if we track the complete part of changes separately and then
> we can avoid the drawback mentioned above.  I have thought about this
> and I think it can work if we track the size and LSN of completed
> changes.  I think we need to ensure that if there is concurrent abort
> then we discard all changes for current (sub)transaction not only up
> to completed changes LSN whereas if the streaming is successful then
> we can truncate the changes only up to completed changes LSN. What do
> you think?
>
> I wonder why you have done this as 0010 in the patch series, it should
> be as 0006 after the
> 0005-Implement-streaming-mode-in-ReorderBuffer.patch.  If we can do
> that way then it would be easier for me to review.  Is there a reason
> for not doing so?

No reason, I can do that.  Actually, later we can merge the changes to
0005 only, I kept separate for review.  Anyway, in the next 

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-05-19 Thread Amit Kapila
On Mon, May 18, 2020 at 5:57 PM Amit Kapila  wrote:
>
>
> 3.
> + /*
> + * If streaming is enable and we have serialized this transaction because
> + * it had incomplete tuple.  So if now we have got the complete tuple we
> + * can stream it.
> + */
> + if (ReorderBufferCanStream(rb) && can_stream && rbtxn_is_serialized(toptxn)
> + && !(rbtxn_has_toast_insert(txn)) && !(rbtxn_has_spec_insert(txn)))
> + {
>
> This comment is just saying what you are doing in the if-check.  I
> think you need to explain the rationale behind it. I don't like the
> variable name 'can_stream' because it matches ReorderBufferCanStream
> whereas it is for a different purpose, how about naming it as
> 'change_complete' or something like that.  The check has many
> conditions, can we move it to a separate function to make the code
> here look clean?
>

Do we really need this?  Immediately after this check, we are calling
ReorderBufferCheckMemoryLimit which will anyway stream the changes if
required.  Can we move the changes related to the detection of
incomplete data to a separate function?

Another comments on v20-0010-Bugfix-handling-of-incomplete-toast-tuple:

+ else if (rbtxn_has_toast_insert(txn) &&
+ ChangeIsInsertOrUpdate(change->action))
+ {
+ toptxn->txn_flags &= ~RBTXN_HAS_TOAST_INSERT;
+ can_stream = true;
+ }
..
+#define ChangeIsInsertOrUpdate(action) \
+ (((action) == REORDER_BUFFER_CHANGE_INSERT) || \
+ ((action) == REORDER_BUFFER_CHANGE_UPDATE) || \
+ ((action) == REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT))

How can we clear the RBTXN_HAS_TOAST_INSERT flag on
REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT action?

IIUC, the basic idea used to handle incomplete changes (which is
possible in case of toast tuples and speculative inserts) is to mark
such TXNs as containing incomplete changes and then while finding the
largest top-level TXN for streaming, we ignore such TXN's and move to
next largest TXN.  If none of the TXNs have complete changes then we
choose the largest (sub)transaction and spill the same to make the
in-memory changes below logical_decoding_work_mem threshold.  This
idea can work but the strategy to choose the transaction is suboptimal
for cases where TXNs have some changes which are complete followed by
an incomplete toast or speculative tuple.  I was having an offlist
discussion with Robert on this problem and he suggested that it would
be better if we track the complete part of changes separately and then
we can avoid the drawback mentioned above.  I have thought about this
and I think it can work if we track the size and LSN of completed
changes.  I think we need to ensure that if there is concurrent abort
then we discard all changes for current (sub)transaction not only up
to completed changes LSN whereas if the streaming is successful then
we can truncate the changes only up to completed changes LSN. What do
you think?

I wonder why you have done this as 0010 in the patch series, it should
be as 0006 after the
0005-Implement-streaming-mode-in-ReorderBuffer.patch.  If we can do
that way then it would be easier for me to review.  Is there a reason
for not doing so?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




RE: PostgresSQL project

2020-05-19 Thread Luke Porter
Hi Peter

Thanks for the prompt response.

Yes, this is an open source project which will be shared with the community. We 
will also consider hiring appropriate consultants.

At a summary level, we have a proven approach for how a relational database can 
provide comprehensive logical insert, update and delete functionality through 
an append only paradigm which effectively delivers a perfect audit ie a means 
to access the database at any point in the audit (time) and for all data to be 
in a relationally correct state.

In so doing, we are removing the need for the use of update and delete code (as 
presently used)  with enhancements to the insert code module.

We presently achieve this affect by use of a generator which creates an 
application specific database environment for append only (with a normal 
current view schema being the input for the generator).

The specification for the requirements of the insert code module is very 
detailed. Our challenge is to identify appropriate PostgreSQL 
architects/programmers who have experience of the PostgreSQL database kernel. 
More specifically, to outline the general approach and work packages to go 
about this.

Regards

Luke



-Original Message-
From: Peter Eisentraut  
Sent: 18 May 2020 22:40
To: Luke Porter ; pgsql-hackers@lists.postgresql.org
Subject: Re: PostgresSQL project

On 2020-05-18 18:21, Luke Porter wrote:
> I am a member of a small UK based team with extensive database 
> experience. We are considering a project using PostgresSQL source code 
> which only uses the insert data capabilities.
> 
> Is there a contact who we could speak with and discuss our project 
> aims in principal.

If yours is an open-source project that you eventually want to share with the 
community, then you can discuss it on this mailing list.

If it is a closed-source, proprietary, or in-house project, then the community 
isn't the right place to discuss it, but you could hire professional 
consultants to help you, depending on your needs.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Expand the use of check_canonical_path() for more GUCs

2020-05-19 Thread Michael Paquier
Hi all,

While digging into my backlog, I have found this message from Peter E
mentioning about $subject:
https://www.postgresql.org/message-id/e6aac026-174c-9952-689f-6bee76f9a...@2ndquadrant.com

It seems to me that it would be a good idea to make those checks more
consistent, and attached is a patch.

Thoughts?
--
Michael
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 2f3e0a70e0..2be59caf60 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3781,7 +3781,7 @@ static struct config_string ConfigureNamesString[] =
 		},
 		,
 		"",
-		NULL, NULL, NULL
+		check_canonical_path, NULL, NULL
 	},
 
 	{
@@ -3903,7 +3903,7 @@ static struct config_string ConfigureNamesString[] =
 		},
 		_krb_server_keyfile,
 		PG_KRB_SRVTAB,
-		NULL, NULL, NULL
+		check_canonical_path, NULL, NULL
 	},
 
 	{
@@ -4197,7 +4197,7 @@ static struct config_string ConfigureNamesString[] =
 		},
 		_directory,
 		NULL,
-		NULL, NULL, NULL
+		check_canonical_path, NULL, NULL
 	},
 
 	{
@@ -4208,7 +4208,7 @@ static struct config_string ConfigureNamesString[] =
 		},
 		,
 		NULL,
-		NULL, NULL, NULL
+		check_canonical_path, NULL, NULL
 	},
 
 	{
@@ -4219,7 +4219,7 @@ static struct config_string ConfigureNamesString[] =
 		},
 		,
 		NULL,
-		NULL, NULL, NULL
+		check_canonical_path, NULL, NULL
 	},
 
 	{
@@ -4230,7 +4230,7 @@ static struct config_string ConfigureNamesString[] =
 		},
 		,
 		NULL,
-		NULL, NULL, NULL
+		check_canonical_path, NULL, NULL
 	},
 
 	{
@@ -4266,7 +4266,7 @@ static struct config_string ConfigureNamesString[] =
 		},
 		_cert_file,
 		"server.crt",
-		NULL, NULL, NULL
+		check_canonical_path, NULL, NULL
 	},
 
 	{
@@ -4276,7 +4276,7 @@ static struct config_string ConfigureNamesString[] =
 		},
 		_key_file,
 		"server.key",
-		NULL, NULL, NULL
+		check_canonical_path, NULL, NULL
 	},
 
 	{
@@ -4286,7 +4286,7 @@ static struct config_string ConfigureNamesString[] =
 		},
 		_ca_file,
 		"",
-		NULL, NULL, NULL
+		check_canonical_path, NULL, NULL
 	},
 
 	{
@@ -4296,7 +4296,7 @@ static struct config_string ConfigureNamesString[] =
 		},
 		_crl_file,
 		"",
-		NULL, NULL, NULL
+		check_canonical_path, NULL, NULL
 	},
 
 	{
@@ -4369,7 +4369,7 @@ static struct config_string ConfigureNamesString[] =
 		},
 		_dh_params_file,
 		"",
-		NULL, NULL, NULL
+		check_canonical_path, NULL, NULL
 	},
 
 	{


signature.asc
Description: PGP signature


Re: Add A Glossary

2020-05-19 Thread Andrew Grillet
I think there needs to be a careful analysis of the language and a formal
effort to stabilise it for the future.

In the context of, say, an Oracle T series, which is partitioned into
multiple domains (virtual machines) in it, each
of these has multiple CPUs, and can run an instance of the OS which hosts
multiple virtual instances
of the same or different OSes. Som domains might do this while others do
not!

A host could be a domain, one of many virtual machines, or it could be one
of many hosts on that VM
but even these hosts could be virtual machines that each runs several
virtual servers!

Of course, PostgreSQL can run on any tier of this regime, but the
documentation at least needs to be consistent
about language.

A "machine" should probably refer to hardware, although I would accept that
a domain might count as "virtual
hardware" while a host should probably refer to a single instance of OS.

Of course it is possible for a single  instance of OS to run multiple
instances of PostgreSQL, and people do this. (I have
in the past).

Slightly more confusingly, it would appear possible for a single instance
of an OS to have multiple IP addresses
and if there are multiple instances of PostgreSQL, they may serve different
IP Addresses uniquely, or
share them. I think this case suggests that a host probably best describes
an OS instance. I might be wrong.

The word "server" might be an instance of any of the above, or a waiter
with a bowl of soup. It is best
reserved for situations where clarity is not required.

If you are new to all this, I am sure it is very confusing, and
inconsistent language is not going to help.

Andrew



AFAICT





On Tue, 19 May 2020 at 07:17, Laurenz Albe  wrote:

> On Mon, 2020-05-18 at 18:08 +0200, Jürgen Purtz wrote:
> > cluster/instance: PG (mainly) consists of a group of processes that
> commonly
> > act on shared buffers. The processes are very closely related to each
> other
> > and with the buffers. They exist altogether or not at all. They use a
> common
> > initialization file and are incarnated by one command. Everything exists
> > solely in RAM and therefor has a fluctuating nature. In summary: they
> build
> > a unit and this unit needs to have a name of itself. In some pages we
> used
> > to use the term *instance* - sometimes in extended forms: *database
> instance*,
> > *PG instance*, *standby instance*, *standby server instance*, *server
> instance*,
> > or *remote instance*.  For me, the term *instance* makes sense, the
> extensions
> > *standby instance* and *remote instance* in their context too.
>
> FWIW, I feel somewhat like Alvaro on that point; I use those terms
> synonymously,
> perhaps distinguishing between a "started cluster" and a "stopped cluster".
> After all, "cluster" refers to "a cluster of databases", which are there,
> regardless
> if you start the server or not.
>
> The term "cluster" is unfortunate, because to most people it suggests a
> group of
> machines, so the term "instance" is better, but that ship has sailed long
> ago.
>
> The static part of a cluster to me is the "data directory".
>
> > server/host: We need a term to describe the underlying hardware
> respectively
> > the virtual machine or container, where PG is running. I suggest to use
> both
> > *server* and *host*. In computer science, both have their eligibility
> and are
> > widely used. Everybody understands *client/server architecture* or
> *host* in
> > TCP/IP configuration. We cannot change such matter of course. I suggest
> to
> > use both depending on the context, but with the same meaning: "real
> hardware,
> > a container, or a virtual machine".
>
> On this I have a strong opinion because of my Unix mindset.
> "machine" and "host" are synonyms, and it doesn't matter to the database
> if they
> are virtualized or not.  You can always disambiguate by adding "virtual"
> or "physical".
>
> A "server" is a piece of software that responds to client requests, never
> a machine.
> In my book, this is purely Windows jargon.  The term "client-server
> architecture"
> that you quote emphasized that.
>
> Perhaps "machine" would be the preferable term, because "host" is more
> prone to
> misunderstandings (except in a networking context).
>
> Yours,
> Laurenz Albe
>
>
>
>


some grammar refactoring

2020-05-19 Thread Peter Eisentraut
Here is a series of patches to do some refactoring in the grammar around 
the commands COMMENT, DROP, SECURITY LABEL, and ALTER EXTENSION ... 
ADD/DROP.  In the grammar, these commands (with some exceptions) 
basically just take a reference to an object and later look it up in C 
code.  Some of that was already generalized individually for each 
command (drop_type_any_name, drop_type_name, etc.).  This patch combines 
it into common lists for all these commands.


Advantages:

- Avoids having to list each object type at least four times.

- Object types not supported by security labels or extensions are now 
explicitly listed and give a proper error message.  Previously, this was 
just encoded in the grammar itself and specifying a non-supported object 
type would just give a parse error.


- Reduces lines of code in gram.y.

- Removes some old cruft.

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From bdd2440c66292eb2464a49fdc57ff5fa86838f0a Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 10 Apr 2020 11:22:19 +0200
Subject: [PATCH v1 1/6] Remove redundant grammar symbols

access_method, database_name, and index_name are all just name, and
they are not used consistently for their alleged purpose, so remove
them.  They have been around since ancient times but have no current
reason for existing.  Removing them can simplify future grammar
refactoring.
---
 src/backend/parser/gram.y| 86 +++-
 src/interfaces/ecpg/preproc/ecpg.trailer |  4 +-
 2 files changed, 41 insertions(+), 49 deletions(-)

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 3c78f2d1b5..378390af5d 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -352,9 +352,9 @@ static Node *makeRecursiveViewSelect(char *relname, List 
*aliases, Node *query);
 %type enable_trigger
 
 %type copy_file_name
-   database_name access_method_clause 
access_method attr_name
+   access_method_clause attr_name
table_access_method_clause name cursor_name 
file_name
-   index_name opt_index_name 
cluster_index_specification
+   opt_index_name cluster_index_specification
 
 %typefunc_name handler_name qual_Op qual_all_Op subquery_Op
opt_class opt_inline_handler opt_validator 
validator_clause
@@ -1184,7 +1184,7 @@ AlterRoleStmt:
 
 opt_in_database:
   /* EMPTY */  { $$ = 
NULL; }
-   | IN_P DATABASE database_name   { $$ = $3; }
+   | IN_P DATABASE name{ $$ = $3; }
;
 
 AlterRoleSetStmt:
@@ -3950,7 +3950,7 @@ part_elem: ColId opt_collate opt_class
;
 
 table_access_method_clause:
-   USING access_method 
{ $$ = $2; }
+   USING name  
{ $$ = $2; }
| /*EMPTY*/ 
{ $$ = NULL; }
;
 
@@ -3975,7 +3975,7 @@ OptConsTableSpace:   USING INDEX TABLESPACE name  { $$ = 
$4; }
| /*EMPTY*/ 
{ $$ = NULL; }
;
 
-ExistingIndex:   USING INDEX index_name{ $$ = 
$3; }
+ExistingIndex:   USING INDEX name  { $$ = 
$3; }
;
 
 /*
@@ -4640,7 +4640,7 @@ AlterExtensionContentsStmt:
n->object = (Node *) $6;
$$ = (Node *)n;
}
-   | ALTER EXTENSION name add_drop OPERATOR CLASS any_name 
USING access_method
+   | ALTER EXTENSION name add_drop OPERATOR CLASS any_name 
USING name
{
AlterExtensionContentsStmt *n = 
makeNode(AlterExtensionContentsStmt);
n->extname = $3;
@@ -4649,7 +4649,7 @@ AlterExtensionContentsStmt:
n->object = (Node *) 
lcons(makeString($9), $7);
$$ = (Node *)n;
}
-   | ALTER EXTENSION name add_drop OPERATOR FAMILY 
any_name USING access_method
+   | ALTER EXTENSION name add_drop OPERATOR FAMILY 
any_name USING name
{
AlterExtensionContentsStmt *n = 
makeNode(AlterExtensionContentsStmt);

pg_dump dumps row level policies on extension tables

2020-05-19 Thread Pavan Deolasee
Hi,

I noticed that if a row level policy is defined on an extension
object, even in the extension creation script, pg_dump dumps a
separate CREATE POLICY statement for such policies. That makes the
dump unrestorable because the CREATE EXTENSION and CREATE POLICY then
conflicts.

Here is a simple example. I just abused the pageinspect contrib module
to demonstrate the problem.

```
diff --git a/contrib/pageinspect/pageinspect--1.5.sql
b/contrib/pageinspect/pageinspect--1.5.sql
index 1e40c3c97e..f04d70d1c1 100644
--- a/contrib/pageinspect/pageinspect--1.5.sql
+++ b/contrib/pageinspect/pageinspect--1.5.sql
@@ -277,3 +277,9 @@ CREATE FUNCTION gin_leafpage_items(IN page bytea,
 RETURNS SETOF record
 AS 'MODULE_PATHNAME', 'gin_leafpage_items'
 LANGUAGE C STRICT PARALLEL SAFE;
+
+-- sample table
+CREATE TABLE pf_testtab (a int, b int);
+-- sample policy
+CREATE POLICY p1 ON pf_testtab
+FOR SELECT USING (true);
```

If I now take a dump of a database with pageinspect extension created,
the dump has the following.

```

--
-- Name: pageinspect; Type: EXTENSION; Schema: -; Owner:
--

CREATE EXTENSION IF NOT EXISTS pageinspect WITH SCHEMA public;

--
-- Name: pf_testtab p1; Type: POLICY; Schema: public; Owner: pavan
--

CREATE POLICY p1 ON public.pf_testtab FOR SELECT USING (true);

```

That's a problem. The CREATE POLICY statement fails during restore
because CREATE EXTENSION already creates the policy.

Are we missing recording dependency on extension for row level
policies? Or somehow pg_dump should skip dumping those policies?

Thanks,
Pavan

--
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: Missing grammar production for WITH TIES

2020-05-19 Thread Michael Paquier
On Tue, May 19, 2020 at 12:41:39AM -0400, Tom Lane wrote:
> Michael Paquier  writes:
>> This has been committed just after beta1 has been stamped.  So it
>> means that it won't be included in it, right?
> 
> Right.

Still, wouldn't it be better to wait until the version is tagged?  My
understanding is that we had better not commit anything on a branch
planned for release between the moment the version is stamped and the
moment the tag is pushed so as we have a couple of days to address any
complaints from -packagers.  Here, we are in a state where we have
between the stamp time and tag time an extra commit not related to a
packaging issue.  So, if it happens that we have an issue from
-packagers to address, then we would have to include c301c2e in the
beta1.  Looking at the patch committed, that's not much of an issue,
but I think that we had better avoid that.
--
Michael


signature.asc
Description: PGP signature


Re: Add A Glossary

2020-05-19 Thread Laurenz Albe
On Mon, 2020-05-18 at 18:08 +0200, Jürgen Purtz wrote:
> cluster/instance: PG (mainly) consists of a group of processes that commonly
> act on shared buffers. The processes are very closely related to each other
> and with the buffers. They exist altogether or not at all. They use a common
> initialization file and are incarnated by one command. Everything exists
> solely in RAM and therefor has a fluctuating nature. In summary: they build
> a unit and this unit needs to have a name of itself. In some pages we used
> to use the term *instance* - sometimes in extended forms: *database instance*,
> *PG instance*, *standby instance*, *standby server instance*, *server 
> instance*,
> or *remote instance*.  For me, the term *instance* makes sense, the extensions
> *standby instance* and *remote instance* in their context too.

FWIW, I feel somewhat like Alvaro on that point; I use those terms synonymously,
perhaps distinguishing between a "started cluster" and a "stopped cluster".
After all, "cluster" refers to "a cluster of databases", which are there, 
regardless
if you start the server or not.

The term "cluster" is unfortunate, because to most people it suggests a group of
machines, so the term "instance" is better, but that ship has sailed long ago.

The static part of a cluster to me is the "data directory".

> server/host: We need a term to describe the underlying hardware respectively
> the virtual machine or container, where PG is running. I suggest to use both
> *server* and *host*. In computer science, both have their eligibility and are
> widely used. Everybody understands *client/server architecture* or *host* in
> TCP/IP configuration. We cannot change such matter of course. I suggest to
> use both depending on the context, but with the same meaning: "real hardware,
> a container, or a virtual machine".

On this I have a strong opinion because of my Unix mindset.
"machine" and "host" are synonyms, and it doesn't matter to the database if they
are virtualized or not.  You can always disambiguate by adding "virtual" or 
"physical".

A "server" is a piece of software that responds to client requests, never a 
machine.
In my book, this is purely Windows jargon.  The term "client-server 
architecture"
that you quote emphasized that.

Perhaps "machine" would be the preferable term, because "host" is more prone to
misunderstandings (except in a networking context).

Yours,
Laurenz Albe





Re: could not stat promote trigger file leads to shutdown

2020-05-19 Thread Michael Paquier
On Wed, Dec 04, 2019 at 11:52:33AM +0100, Peter Eisentraut wrote:
> Is it possible to do this in a mostly bullet-proof way?  Just because the
> directory exists and looks pretty good otherwise, doesn't mean we can read a
> file created in it later in a way that doesn't fall afoul of the existing
> error checks.  There could be something like SELinux lurking, for example.
> 
> Maybe some initial checking would be useful, but I think we still need to
> downgrade the error check at use time a bit to not crash in the cases that
> we miss.

I got that thread in my backlog for some time, and was not able to
come back to it.  Reading it again the thread, it seems to me that
using a LOG would make the promote file handling more consistent with
what we do for the SSL context reload.  Still, one downside I can see
here is that this causes the backend to create a new LOG entry each
time the promote file is checked, aka each time we check if WAL is
available.  Couldn't that bloat be a problem?  During the SSL reload,
we only generate LOG entries for each backend on SIGHUP.
--
Michael


signature.asc
Description: PGP signature