Re: explicit_bzero for sslpassword
On Tue, May 19, 2020 at 02:33:40PM +0200, Daniel Gustafsson wrote: > Since commit 74a308cf5221f we use explicit_bzero on pgpass and connhost > password in libpq, but not sslpassword which seems an oversight. The attached > performs an explicit_bzero before freeing like the pattern for other password > variables. Good catch, let's fix that. I would like to apply your suggested fix, but let's see first if others have any comments. -- Michael signature.asc Description: PGP signature
Re: SyncRepLock acquired exclusively in default configuration
On Tue, May 19, 2020 at 08:56:13AM -0700, Ashwin Agrawal wrote: > Sure, add it to commit fest. > Seems though it should be backpatched to relevant branches as well. It does not seem to be listed yet. Are you planning to add it under the section for bug fixes? -- Michael signature.asc Description: PGP signature
Re: Problem with pg_atomic_compare_exchange_u64 at 32-bit platformwd
Hi, On May 19, 2020 8:05:00 PM PDT, Noah Misch wrote: >On Tue, May 19, 2020 at 04:07:29PM +0300, Konstantin Knizhnik wrote: >> Definition of pg_atomic_compare_exchange_u64 requires alignment of >expected >> pointer on 8-byte boundary. >> >> pg_atomic_compare_exchange_u64(volatile pg_atomic_uint64 *ptr, >> uint64 *expected, uint64 newval) >> { >> #ifndef PG_HAVE_ATOMIC_U64_SIMULATION >> AssertPointerAlignment(ptr, 8); >> AssertPointerAlignment(expected, 8); >> #endif >> >> >> I wonder if there are platforms where such restriction is actually >needed. > >In general, sparc Linux does SIGBUS on unaligned access. Other >platforms >function but suffer performance penalties. Indeed. Cross cacheline atomics are e.g. really expensive on x86. Essentially requiring a full blown bus lock iirc. >> And if so, looks like our ./src/test/regress/regress.c is working >only >> occasionally: >> >> static void >> test_atomic_uint64(void) >> { >> pg_atomic_uint64 var; >> uint64 expected; >> ... >> if (!pg_atomic_compare_exchange_u64(, , 1)) >> >> because there is no warranty that "expected" variable will be aligned >on >> stack at 8 byte boundary (at least at Win32). > >src/tools/msvc sets ALIGNOF_LONG_LONG_INT=8, so it believes that win32 >does >guarantee 8-byte alignment of both automatic variables. Is it wrong? Generally the definition of the atomics should ensure the required alignment. E.g. using alignment attributes to the struct. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: MultiXact\SLRU buffers configuration
At Fri, 15 May 2020 14:01:46 +0500, "Andrey M. Borodin" wrote in > > > > 15 мая 2020 г., в 05:03, Kyotaro Horiguchi > > написал(а): > > > > At Thu, 14 May 2020 11:44:01 +0500, "Andrey M. Borodin" > > wrote in > >>> GetMultiXactIdMembers believes that 4 is successfully done if 2 > >>> returned valid offset, but actually that is not obvious. > >>> > >>> If we add a single giant lock just to isolate ,say, > >>> GetMultiXactIdMember and RecordNewMultiXact, it reduces concurrency > >>> unnecessarily. Perhaps we need finer-grained locking-key for standby > >>> that works similary to buffer lock on primary, that doesn't cause > >>> confilicts between irrelevant mxids. > >>> > >> We can just replay members before offsets. If offset is already there - > >> members are there too. > >> But I'd be happy if we could mitigate those 1000us too - with a hint about > >> last maixd state in a shared MX state, for example. > > > > Generally in such cases, condition variables would work. In the > > attached PoC, the reader side gets no penalty in the "likely" code > > path. The writer side always calls ConditionVariableBroadcast but the > > waiter list is empty in almost all cases. But I couldn't cause the > > situation where the sleep 1000u is reached. > Thanks! That really looks like a good solution without magic timeouts. > Beautiful! > I think I can create temporary extension which calls MultiXact API and tests > edge-cases like this 1000us wait. > This extension will also be also useful for me to assess impact of bigger > buffers, reduced read locking (as in my 2nd patch) and other tweaks. Happy to hear that, It would need to use timeout just in case, though. > >> Actually, if we read empty mxid array instead of something that is > >> replayed just yet - it's not a problem of inconsistency, because > >> transaction in this mxid could not commit before we started. ISTM. > >> So instead of fix, we, probably, can just add a comment. If this reasoning > >> is correct. > > > > The step 4 of the reader side reads the members of the target mxid. It > > is already written if the offset of the *next* mxid is filled-in. > Most often - yes, but members are not guaranteed to be filled in order. Those > who win MXMemberControlLock will write first. > But nobody can read members of MXID before it is returned. And its members > will be written before returning MXID. Yeah, right. Otherwise assertion failure happens. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Is it useful to record whether plans are generic or custom?
At Tue, 19 May 2020 22:56:17 +0900, Atsushi Torikoshi wrote in > On Sat, May 16, 2020 at 6:01 PM legrand legrand > wrote: > > > > To track executed plan types, I think execution layer hooks > > > are appropriate. > > > These hooks, however, take QueryDesc as a param and it does > > > not include cached plan information. > > > > It seems that the same QueryDesc entry is reused when executing > > a generic plan. > > For exemple marking queryDesc->plannedstmt->queryId (outside > > pg_stat_statements) with a pseudo tag during ExecutorStart > > reappears in later executions with generic plans ... > > > > Is this QueryDesc reusable by a custom plan ? If not maybe a solution > > could be to add a flag in queryDesc->plannedstmt ? > > > > Thanks for your proposal! > > I first thought it was a good idea and tried to add a flag to QueryDesc, > but the comments on QueryDesc say it encapsulates everything that > the executor needs to execute a query. > > Whether a plan is generic or custom is not what executor needs to > know for running queries, so now I hesitate to do so. > > Instead, I'm now considering using a static hash for prepared queries > (static HTAB *prepared_queries). > > > BTW, I'd also appreciate other opinions about recording the number > of generic and custom plans on pg_stat_statemtents. If you/we just want to know how a prepared statement is executed, couldn't we show that information in pg_prepared_statements view? =# select * from pg_prepared_statements; -[ RECORD 1 ]---+ name| stmt1 statement | prepare stmt1 as select * from t where b = $1; prepare_time| 2020-05-20 12:01:55.733469+09 parameter_types | {text} from_sql| t exec_custom | 5<- existing num_custom_plans exec_total | 40 <- new member of CachedPlanSource regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Trouble with hashagg spill I/O pattern and costing
On Tue, 2020-05-19 at 19:53 +0200, Tomas Vondra wrote: > > And if there a way to pre-allocate larger chunks? Presumably we could > assign the blocks to tape in larger chunks (e.g. 128kB, i.e. 16 x > 8kB) > instead of just single block. I haven't seen anything like that in > tape.c, though ... It turned out to be simple (at least a POC) so I threw together a patch. I just added a 32-element array of block numbers to each tape. When we need a new block, we retrieve a block number from that array; or if it's empty, we fill it by calling ltsGetFreeBlock() 32 times. I reproduced the problem on a smaller scale (330M groups, ~30GB of memory on a 16GB box). Work_mem=64MB. The query is a simple distinct. Unpatched master: Sort: 250s HashAgg: 310s Patched master: Sort: 245s HashAgg: 262s That's a nice improvement for such a simple patch. We can tweak the number of blocks to preallocate, or do other things like double from a small number up to a maximum. Also, a proper patch would probably release the blocks back as free when the tape was rewound. As long as the number of block numbers to preallocate is not too large, I don't think we need to change the API. It seems fine for sort to do the same thing, even though there's not any benefit. Regards, Jeff Davis diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c index bc8d56807e2..f44594191e1 100644 --- a/src/backend/utils/sort/logtape.c +++ b/src/backend/utils/sort/logtape.c @@ -110,6 +110,7 @@ typedef struct TapeBlockTrailer #define TapeBlockSetNBytes(buf, nbytes) \ (TapeBlockGetTrailer(buf)->next = -(nbytes)) +#define TAPE_WRITE_N_PREALLOC 32 /* * This data structure represents a single "logical tape" within the set @@ -151,6 +152,15 @@ typedef struct LogicalTape int max_size; /* highest useful, safe buffer_size */ int pos; /* next read/write position in buffer */ int nbytes; /* total # of valid bytes in buffer */ + + /* + * When multiple tapes are being written to concurrently (as in HashAgg), + * it's imporant to preallocate some block numbers to make writes more + * sequential. This doesn't preallocate the blocks themselves; only the + * block numbers. + */ + int nprealloc; + long prealloc[TAPE_WRITE_N_PREALLOC]; } LogicalTape; /* @@ -557,6 +567,7 @@ ltsInitTape(LogicalTape *lt) lt->max_size = MaxAllocSize; lt->pos = 0; lt->nbytes = 0; + lt->nprealloc = 0; } /* @@ -733,7 +744,13 @@ LogicalTapeWrite(LogicalTapeSet *lts, int tapenum, * First allocate the next block, so that we can store it in the * 'next' pointer of this block. */ - nextBlockNumber = ltsGetFreeBlock(lts); + if (lt->nprealloc == 0) + { +lt->nprealloc = TAPE_WRITE_N_PREALLOC; +for (int i = lt->nprealloc; i > 0; i--) + lt->prealloc[i - 1] = ltsGetFreeBlock(lts); + } + nextBlockNumber = lt->prealloc[--lt->nprealloc]; /* set the next-pointer and dump the current block. */ TapeBlockGetTrailer(lt->buffer)->next = nextBlockNumber;
Re: Warn when parallel restoring a custom dump without data offsets
I started fooling with this at home while our ISP is broke (pardon my brevity). Maybe you also saw commit b779ea8a9a2dc3a089b3ac152b1ec4568bfeb26f "Fix pg_restore so parallel restore doesn't fail when the input file doesn't contain data offsets (which it won't, if pg_dump thought its output wasn't seekable)..." ...which I guess should actually say "doesn't NECESSARILY fail", since it also adds this comment: "This could fail if we are asked to restore items out-of-order." So this is a known issue and not a regression. I think the PG11 commit you mentioned (548e5097) happens to make some databases fail in parallel restore that previously worked (I didn't check). Possibly also some databases (or some pre-existing dumps) which used to fail might possibly now succeed. Your patch adds a warning if unseekable output might fail during parallel restore. I'm not opposed to that, but can we just make pg_restore work in that case? If the input is unseekable, then we can never do a parallel restore at all. If it *is* seekable, could we make _PrintTocData rewind if it gets to EOF using ftello(SEEK_SET, 0) and re-scan again from the beginning? Would you want to try that ?
Re: Parallel Seq Scan vs kernel read ahead
On Wed, May 20, 2020 at 2:23 PM Amit Kapila wrote: > Good experiment. IIRC, we have discussed a similar idea during the > development of this feature but we haven't seen any better results by > allocating in ranges on the systems we have tried. So, we want with > the current approach which is more granular and seems to allow better > parallelism. I feel we need to ensure that we don't regress > parallelism in existing cases, otherwise, the idea sounds promising to > me. Yeah, Linux seems to do pretty well at least with smallish numbers of workers, and when you use large numbers you can probably tune your way out of the problem. ZFS seems to do fine. I wonder how well the other OSes cope.
Re: Problem with pg_atomic_compare_exchange_u64 at 32-bit platformwd
On Tue, May 19, 2020 at 04:07:29PM +0300, Konstantin Knizhnik wrote: > Definition of pg_atomic_compare_exchange_u64 requires alignment of expected > pointer on 8-byte boundary. > > pg_atomic_compare_exchange_u64(volatile pg_atomic_uint64 *ptr, > uint64 *expected, uint64 newval) > { > #ifndef PG_HAVE_ATOMIC_U64_SIMULATION > AssertPointerAlignment(ptr, 8); > AssertPointerAlignment(expected, 8); > #endif > > > I wonder if there are platforms where such restriction is actually needed. In general, sparc Linux does SIGBUS on unaligned access. Other platforms function but suffer performance penalties. > And if so, looks like our ./src/test/regress/regress.c is working only > occasionally: > > static void > test_atomic_uint64(void) > { > pg_atomic_uint64 var; > uint64 expected; > ... > if (!pg_atomic_compare_exchange_u64(, , 1)) > > because there is no warranty that "expected" variable will be aligned on > stack at 8 byte boundary (at least at Win32). src/tools/msvc sets ALIGNOF_LONG_LONG_INT=8, so it believes that win32 does guarantee 8-byte alignment of both automatic variables. Is it wrong?
Re: Parallel Seq Scan vs kernel read ahead
On Wed, May 20, 2020 at 7:24 AM Thomas Munro wrote: > > Hello hackers, > > Parallel sequential scan relies on the kernel detecting sequential > access, but we don't make the job easy. The resulting striding > pattern works terribly on strict next-block systems like FreeBSD UFS, > and degrades rapidly when you add too many workers on sliding window > systems like Linux. > > Demonstration using FreeBSD on UFS on a virtual machine, taking ball > park figures from iostat: > > create table t as select generate_series(1, 2)::int i; > > set max_parallel_workers_per_gather = 0; > select count(*) from t; > -> execution time 13.3s, average read size = ~128kB, ~500MB/s > > set max_parallel_workers_per_gather = 1; > select count(*) from t; > -> execution time 24.9s, average read size = ~32kB, ~250MB/s > > Note the small read size, which means that there was no read > clustering happening at all: that's the logical block size of this > filesystem. > > That explains some complaints I've heard about PostgreSQL performance > on that filesystem: parallel query destroys I/O performance. > > As a quick experiment, I tried teaching the block allocated to > allocate ranges of up 64 blocks at a time, ramping up incrementally, > and ramping down at the end, and I got: > Good experiment. IIRC, we have discussed a similar idea during the development of this feature but we haven't seen any better results by allocating in ranges on the systems we have tried. So, we want with the current approach which is more granular and seems to allow better parallelism. I feel we need to ensure that we don't regress parallelism in existing cases, otherwise, the idea sounds promising to me. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Parallel Seq Scan vs kernel read ahead
Hello hackers, Parallel sequential scan relies on the kernel detecting sequential access, but we don't make the job easy. The resulting striding pattern works terribly on strict next-block systems like FreeBSD UFS, and degrades rapidly when you add too many workers on sliding window systems like Linux. Demonstration using FreeBSD on UFS on a virtual machine, taking ball park figures from iostat: create table t as select generate_series(1, 2)::int i; set max_parallel_workers_per_gather = 0; select count(*) from t; -> execution time 13.3s, average read size = ~128kB, ~500MB/s set max_parallel_workers_per_gather = 1; select count(*) from t; -> execution time 24.9s, average read size = ~32kB, ~250MB/s Note the small read size, which means that there was no read clustering happening at all: that's the logical block size of this filesystem. That explains some complaints I've heard about PostgreSQL performance on that filesystem: parallel query destroys I/O performance. As a quick experiment, I tried teaching the block allocated to allocate ranges of up 64 blocks at a time, ramping up incrementally, and ramping down at the end, and I got: set max_parallel_workers_per_gather = 1; select count(*) from t; -> execution time 7.5s, average read size = ~128kB, ~920MB/s set max_parallel_workers_per_gather = 3; select count(*) from t; -> execution time 5.2s, average read size = ~128kB, ~1.2GB/s I've attached the quick and dirty patch I used for that. From 65726902f42a19c319623eb7194b0040517008a6 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Tue, 19 May 2020 09:16:07 +1200 Subject: [PATCH] Use larger step sizes for Parallel Seq Scan. Instead of handing out a single block at a time, which confuses the read ahead heuristics of many systems, use an increasing step size. XXX Proof of concept grade code only, needs better ramp down algorithm and contains a magic number 16 --- src/backend/access/heap/heapam.c | 22 +-- src/backend/access/table/tableam.c | 34 +++--- src/include/access/relscan.h | 13 +++- src/include/access/tableam.h | 2 ++ 4 files changed, 61 insertions(+), 10 deletions(-) diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 0d4ed602d7..8982d59ff0 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -520,12 +520,14 @@ heapgettup(HeapScanDesc scan, { ParallelBlockTableScanDesc pbscan = (ParallelBlockTableScanDesc) scan->rs_base.rs_parallel; +ParallelBlockTableScanWork pbscanwork = +(ParallelBlockTableScanWork) scan->rs_base.rs_parallel_work; table_block_parallelscan_startblock_init(scan->rs_base.rs_rd, - pbscan); + pbscanwork, pbscan); page = table_block_parallelscan_nextpage(scan->rs_base.rs_rd, - pbscan); + pbscanwork, pbscan); /* Other processes might have already finished the scan. */ if (page == InvalidBlockNumber) @@ -720,9 +722,11 @@ heapgettup(HeapScanDesc scan, { ParallelBlockTableScanDesc pbscan = (ParallelBlockTableScanDesc) scan->rs_base.rs_parallel; + ParallelBlockTableScanWork pbscanwork = + (ParallelBlockTableScanWork) scan->rs_base.rs_parallel_work; page = table_block_parallelscan_nextpage(scan->rs_base.rs_rd, - pbscan); + pbscanwork, pbscan); finished = (page == InvalidBlockNumber); } else @@ -834,12 +838,14 @@ heapgettup_pagemode(HeapScanDesc scan, { ParallelBlockTableScanDesc pbscan = (ParallelBlockTableScanDesc) scan->rs_base.rs_parallel; +ParallelBlockTableScanWork pbscanwork = +(ParallelBlockTableScanWork) scan->rs_base.rs_parallel_work; table_block_parallelscan_startblock_init(scan->rs_base.rs_rd, - pbscan); + pbscanwork, pbscan); page = table_block_parallelscan_nextpage(scan->rs_base.rs_rd, - pbscan); + pbscanwork, pbscan); /* Other processes might have already finished the scan. */ if (page == InvalidBlockNumber) @@ -1019,9 +1025,11 @@ heapgettup_pagemode(HeapScanDesc scan, { ParallelBlockTableScanDesc pbscan = (ParallelBlockTableScanDesc) scan->rs_base.rs_parallel; + ParallelBlockTableScanWork pbscanwork = + (ParallelBlockTableScanWork) scan->rs_base.rs_parallel_work; page = table_block_parallelscan_nextpage(scan->rs_base.rs_rd, - pbscan); + pbscanwork, pbscan); finished = (page == InvalidBlockNumber); } else @@ -1155,6 +1163,8 @@ heap_beginscan(Relation relation, Snapshot snapshot, scan->rs_base.rs_nkeys = nkeys; scan->rs_base.rs_flags = flags; scan->rs_base.rs_parallel = parallel_scan; + scan->rs_base.rs_parallel_work = + palloc0(sizeof(ParallelBlockTableScanWorkData)); scan->rs_strategy = NULL; /* set in initscan */ /* diff --git a/src/backend/access/table/tableam.c
Re: pg_stat_wal_receiver and flushedUpto/writtenUpto
On 2020/05/20 8:31, Michael Paquier wrote: On Tue, May 19, 2020 at 11:38:52PM +0900, Fujii Masao wrote: I found that "received_lsn" is still used in high-availability.sgml. We should apply the following change in high-availability? - view's received_lsn indicates that WAL is being + view's flushed_lsn indicates that WAL is being Oops, thanks. Will fix. Thanks for the fix! BTW, we have pg_last_wal_receive_lsn() that returns the same lsn as pg_stat_wal_receiver.flushed_lsn. Previously both used the term "receive" in their names, but currently not. IMO it's better to use the same term in those names for the consistency, but it's not good idea to rename pg_last_wal_receive_lsn() to something like pg_last_wal_receive_lsn(). I have no better idea for now. So I'm ok with the current names. I think you mean renaming pg_last_wal_receive_lsn() to something like pg_last_wal_flushed_lsn(), no? No, that's not good idea, as I told upthread. This name may become confusing because we lose the "receive" idea in the function, that we have with the "receiver" part of pg_stat_wal_receiver. Maybe something like that, though that's long: - pg_last_wal_receive_flushed_lsn() - pg_last_wal_receive_written_lsn() Yes, that's long. Anyway, a rename of this function does not strike me as strongly necessary, as that's less tied with the shared memory structure, and we document that pg_last_wal_receive_lsn() tracks the current LSN received and flushed. I am actually wondering if in the future it may not be better to remove this function, but it has no maintenance cost either so I would just let it as-is. Yeah, agreed. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: factorial function/phase out postfix operators?
I wrote: > However, we do have to have a benefit to show those people whose > queries we break. Hence my insistence on having a working AS fix > (or some other benefit) before not after. I experimented with this a bit more, and came up with the attached. It's not a working patch, just a set of grammar changes that Bison is happy with. (Getting to a working patch would require fixing the various build infrastructure that knows about the keyword classification, which seems straightforward but tedious.) As Robert theorized, it works to move a fairly-small number of unreserved keywords into a new slightly-reserved category. However, as the patch stands, only the remaining fully-unreserved keywords can be used as bare column labels. I'd hoped to be able to also use col_name keywords in that way (which'd make the set of legal bare column labels mostly the same as ColId). The col_name keywords that cause problems are, it appears, only PRECISION, CHARACTER, and CHAR_P. So in principle we could move those three into yet another keyword category and then let the remaining col_name keywords be included in BareColLabel. I kind of think that that's more complication than it's worth, though. regards, tom lane diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index a24b30f..0b034b6 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -542,13 +542,14 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); %type Sconst comment_text notify_payload %type RoleId opt_boolean_or_string %type var_list -%type ColId ColLabel var_name type_function_name param_name +%type ColId ColLabel BareColLabel %type NonReservedWord NonReservedWord_or_Sconst +%type var_name type_function_name param_name %type createdb_opt_name %type var_value zone_value %type auth_ident RoleSpec opt_granted_by -%type unreserved_keyword type_func_name_keyword +%type unreserved_keyword non_label_keyword type_func_name_keyword %type col_name_keyword reserved_keyword %type TableConstraint TableLikeClause @@ -744,7 +745,6 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); %nonassoc '<' '>' '=' LESS_EQUALS GREATER_EQUALS NOT_EQUALS %nonassoc BETWEEN IN_P LIKE ILIKE SIMILAR NOT_LA %nonassoc ESCAPE /* ESCAPE must be just above LIKE/ILIKE/SIMILAR */ -%left POSTFIXOP /* dummy for postfix Op rules */ /* * To support target_el without AS, we must give IDENT an explicit priority * between POSTFIXOP and Op. We can safely assign the same priority to @@ -3908,6 +3908,7 @@ PartitionSpec: PARTITION BY part_strategy '(' part_params ')' part_strategy: IDENT { $$ = $1; } | unreserved_keyword { $$ = pstrdup($1); } +| non_label_keyword { $$ = pstrdup($1); } ; part_params: part_elem { $$ = list_make1($1); } @@ -13230,8 +13231,6 @@ a_expr: c_expr { $$ = $1; } { $$ = (Node *) makeA_Expr(AEXPR_OP, $2, $1, $3, @2); } | qual_Op a_expr %prec Op { $$ = (Node *) makeA_Expr(AEXPR_OP, $1, NULL, $2, @1); } - | a_expr qual_Op %prec POSTFIXOP -{ $$ = (Node *) makeA_Expr(AEXPR_OP, $2, $1, NULL, @2); } | a_expr AND a_expr { $$ = makeAndExpr($1, $3, @2); } @@ -13645,8 +13644,6 @@ b_expr: c_expr { $$ = (Node *) makeA_Expr(AEXPR_OP, $2, $1, $3, @2); } | qual_Op b_expr %prec Op { $$ = (Node *) makeA_Expr(AEXPR_OP, $1, NULL, $2, @1); } - | b_expr qual_Op %prec POSTFIXOP -{ $$ = (Node *) makeA_Expr(AEXPR_OP, $2, $1, NULL, @2); } | b_expr IS DISTINCT FROM b_expr %prec IS { $$ = (Node *) makeSimpleA_Expr(AEXPR_DISTINCT, "=", $1, $5, @2); @@ -14910,7 +14907,7 @@ target_el: a_expr AS ColLabel * as an infix expression, which we accomplish by assigning * IDENT a precedence higher than POSTFIXOP. */ - | a_expr IDENT + | a_expr BareColLabel { $$ = makeNode(ResTarget); $$->name = $2; @@ -15228,6 +15225,7 @@ role_list: RoleSpec */ ColId: IDENT { $$ = $1; } | unreserved_keyword { $$ = pstrdup($1); } + | non_label_keyword { $$ = pstrdup($1); } | col_name_keyword { $$ = pstrdup($1); } ; @@ -15235,6 +15233,7 @@ ColId: IDENT { $$ = $1; } */ type_function_name: IDENT { $$ = $1; } | unreserved_keyword { $$ = pstrdup($1); } + | non_label_keyword { $$ = pstrdup($1); } | type_func_name_keyword{ $$ = pstrdup($1); } ; @@ -15242,15 +15241,23 @@ type_function_name: IDENT { $$ = $1; } */ NonReservedWord: IDENT { $$ = $1; } | unreserved_keyword { $$ = pstrdup($1); } + | non_label_keyword { $$ = pstrdup($1); } | col_name_keyword { $$ = pstrdup($1); } | type_func_name_keyword{ $$ = pstrdup($1); } ; +/* Bare column label --- names that can be column labels without writing "AS". + */ +BareColLabel: IDENT{ $$ = $1; } + | unreserved_keyword { $$ =
Re: pg_stat_wal_receiver and flushedUpto/writtenUpto
On Tue, May 19, 2020 at 11:38:52PM +0900, Fujii Masao wrote: > I found that "received_lsn" is still used in high-availability.sgml. > We should apply the following change in high-availability? > > - view's received_lsn indicates that WAL is being > + view's flushed_lsn indicates that WAL is being Oops, thanks. Will fix. > BTW, we have pg_last_wal_receive_lsn() that returns the same lsn as > pg_stat_wal_receiver.flushed_lsn. Previously both used the term "receive" > in their names, but currently not. IMO it's better to use the same term in > those names for the consistency, but it's not good idea to rename > pg_last_wal_receive_lsn() to something like pg_last_wal_receive_lsn(). > I have no better idea for now. So I'm ok with the current names. I think you mean renaming pg_last_wal_receive_lsn() to something like pg_last_wal_flushed_lsn(), no? This name may become confusing because we lose the "receive" idea in the function, that we have with the "receiver" part of pg_stat_wal_receiver. Maybe something like that, though that's long: - pg_last_wal_receive_flushed_lsn() - pg_last_wal_receive_written_lsn() Anyway, a rename of this function does not strike me as strongly necessary, as that's less tied with the shared memory structure, and we document that pg_last_wal_receive_lsn() tracks the current LSN received and flushed. I am actually wondering if in the future it may not be better to remove this function, but it has no maintenance cost either so I would just let it as-is. -- Michael signature.asc Description: PGP signature
Re: Extension ownership and misuse of SET ROLE/SET SESSION AUTHORIZATION
> On 19 May 2020, at 17:34, Tom Lane wrote: > > Daniel Gustafsson writes: >>> On 13 Feb 2020, at 23:55, Tom Lane wrote: >>> Given the current behavior of SET ROLE and SET SESSION AUTHORIZATION, >>> I don't actually see any way that we could get these features to >>> play together. > >> Is this being worked on for the 13 cycle such that it should be an open item? > > I didn't have it on my list, but yeah maybe we should add it to the > "pre-existing issues" list. > >>> The quick-and-dirty answer is to disallow these switches from being >>> used together in pg_restore, and I'm inclined to think maybe we should >>> do that in the back branches. > >> ..or should we do this for v13 and back-branches and leave fixing it for 14? >> Considering the potential invasiveness of the fix I think the latter sounds >> rather appealing at this point in the cycle. Something like the attached >> should be enough IIUC. > > pg_dump and pg_dumpall also have that switch no? They do, but there the switches actually work as intended and the combination should be allowed AFAICT. Since SET ROLE is sent to the source server and not the output we can use for starting the dump without being a superuser. cheers ./daniel
Re: Two fsync related performance issues?
On Wed, May 20, 2020 at 12:51 AM Robert Haas wrote: > On Mon, May 11, 2020 at 8:43 PM Paul Guo wrote: > > I have this concern since I saw an issue in a real product environment that > > the startup process needs 10+ seconds to start wal replay after relaunch > > due to elog(PANIC) (it was seen on postgres based product Greenplum but it > > is a common issue in postgres also). I highly suspect the delay was mostly > > due to this. Also it is noticed that on public clouds fsync is much slower > > than that on local storage so the slowness should be more severe on cloud. > > If we at least disable fsync on the table directories we could skip a lot > > of file fsync - this may save a lot of seconds during crash recovery. > > I've seen this problem be way worse than that. Running fsync() on all > the files and performing the unlogged table cleanup steps can together > take minutes or, I think, even tens of minutes. What I think sucks > most in this area is that we don't even emit any log messages if the > process takes a long time, so the user has no idea why things are > apparently hanging. I think we really ought to try to figure out some > way to give the user a periodic progress indication when this kind of > thing is underway, so that they at least have some idea what's > happening. > > As Tom says, I don't think there's any realistic way that we can > disable it altogether, but maybe there's some way we could make it > quicker, like some kind of parallelism, or by overlapping it with > other things. It seems to me that we have to complete the fsync pass > before we can safely checkpoint, but I don't know that it needs to be > done any sooner than that... not sure though. I suppose you could with the whole directory tree what register_dirty_segment() does, for the pathnames that you recognise as regular md.c segment names. Then it'd be done as part of the next checkpoint, though you might want to bring the pre_sync_fname() stuff back into it somehow to get more I/O parallelism on Linux (elsewhere it does nothing). Of course that syscall could block, and the checkpointer queue can fill up and then you have to do it synchronously anyway, so you'd have to look into whether that's enough. The whole concept of SyncDataDirectory() bothers me a bit though, because although it's apparently trying to be safe by being conservative, it assumes a model of write back error handling that we now know to be false on Linux. And then it thrashes the inode cache to make it more likely that error state is forgotten, just for good measure. What would a precise version of this look like? Maybe we really only need to fsync relation files that recovery modifies (as we already do), plus those that it would have touched but didn't because of the page LSN (a new behaviour to catch segment files that were written by the last run but not yet flushed, which I guess in practice would only happen with full_page_writes=off)? (If you were paranoid enough to believe that the buffer cache were actively trying to trick you and marked dirty pages clean and lost the error state so you'll never hear about it, you might even want to rewrite such pages once.)
Re: factorial function/phase out postfix operators?
Robert Haas writes: > On Tue, May 19, 2020 at 2:30 PM Tom Lane wrote: >> Anyway, the bottom-line conclusion remains the same: let's make sure >> we know what we'd do after getting rid of postfix ops, before we do >> that. > Well, I don't think we really need to get too conservative here. > ... It seems to me that > the first thing that we need to do here is get a deprecation notice > out, so that people know that we're planning to break this. No, I disagree with that, because from what I've seen so far it's not really clear to me that we have a full solution to the AS problem excepting only postfix ops. I don't want to deprecate postfix ops before it's completely clear that we can get something out of it. Otherwise, we'll either end up un-deprecating them, which makes us look silly, or removing a feature for zero benefit. Stephen's nearby proposal to deprecate only after a patch has been committed doesn't seem all that unreasonable, if you're only intending to allow one cycle's worth of notice. In particular, I could easily see us committing a fix sometime this summer and then sticking deprecation notices into the back branches before v13 goes gold. But let's have the complete fix in hand first. > I'm still interested in hearing what people think about hard-coding ! > as a postfix operator vs. removing postfix operators altogether. I > think Vik and Tom are against keeping just !, Kenneth Marshall are for > it, and I'm not sure I understand Pavel's position. Yes, I'm VERY strongly against keeping just !. I think it'd be a ridiculous, and probably very messy, backwards-compatibility hack; and the fact that it will break valid use-cases that we don't need to break seems to me to well outweigh the possibility that someone would rather not change their queries to use factorial() or !!. However, we do have to have a benefit to show those people whose queries we break. Hence my insistence on having a working AS fix (or some other benefit) before not after. regards, tom lane
Re: factorial function/phase out postfix operators?
On Tue, May 19, 2020 at 2:30 PM Tom Lane wrote: > Might work. My main concern would be if we have to forbid those keywords > as column names --- for words like "year", in particular, that'd be a > disaster. If the net effect is only that they can't be AS-less col labels, > it won't break any cases that worked before. ISTM that all we have to do to avoid that is switch from a four-way classification to a five-way classification: just split unreserved_keyword into totally_unreserved_keyword and very_slightly_reserved_keyword. > Our existing four-way keyword classification is not something that was > handed down on stone tablets. I wonder whether postfix-ectomy changes > the situation enough that a complete rethinking would be helpful. I don't see that they do, but I might be missing something. I think there's an excellent argument for adding one new category, but it's not clear to me why it should reshape the landscape any more than that. > I also continue to think that more lookahead and token-merging would > be interesting to pursue. It'd hardly surprise anybody if the > token pair "character varying" were always treated as a type name, > for instance. I think that line of attack will not buy very much. The ability to avoid unexpected consequences is entirely contingent on the unlikeliness of the keywords appearing adjacent to each other in some other context, and the only argument for that here is that neither of those words is a terribly likely column name. I think that when you try to solve interesting problems with this, though, you very quickly run into problems where that's not the case, and you'll need a technique that has some knowledge of the parser state to actually do something that works well. I read a paper some years ago that proposed a solution to this problem: if the parser generator sees a shift/reduce conflict, it checks whether the conflict can be resolve by looking ahead one or more additional tokens. If so, it can build a little DFA that gets run when you enter that state, with edges labeled with lookahead tokens, and it runs that DFA whenever you reach the problematic state. Since, hopefully, such states are relatively rarely encountered, the overhead is low, yet it still gives you a way out of conflicts in many practical cases. Unfortunately, the chances of bison implementing such a thing do not seem very good. > Anyway, the bottom-line conclusion remains the same: let's make sure > we know what we'd do after getting rid of postfix ops, before we do > that. Well, I don't think we really need to get too conservative here. I've studied this issue enough over the years to be pretty darn sure that this is a necessary prerequisite to doing something about the "AS unreserved_keyword" issue, and that it is by far the most significant issue in doing something about that problem. Sure, there are other issues, but I think they are basically matters of politics or policy. For example, if some key people DID think that the four-way keyword classification was handed down on stone tablets, that could be quite a problem, but if we're willing to take the view that solving the "AS unreserved_keyword" problem is pretty important and we need to find a way to get it done, then I think we an do that. It seems to me that the first thing that we need to do here is get a deprecation notice out, so that people know that we're planning to break this. I think we should go ahead and make that happen now, or at least pretty soon. I'm still interested in hearing what people think about hard-coding ! as a postfix operator vs. removing postfix operators altogether. I think Vik and Tom are against keeping just !, Kenneth Marshall are for it, and I'm not sure I understand Pavel's position. I'm about +0.3 for keeping just ! myself. Maybe we'll get some other votes. If you're willing to be persuaded that keeping only ! is a sensible thing to consider, I could probably draft a very rough patch showing that it would still be sufficient to get us out from under the "AS unreserved_keyword" problem, but if you and/or enough other people hate that direction with a fiery passion, I won't bother. I'm pretty sure it's technically possible, but the issue is more about what people actually want. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: factorial function/phase out postfix operators?
Robert Haas writes: > On Tue, May 19, 2020 at 11:32 AM Tom Lane wrote: >> Before we go much further on this, we should have some proof >> that there's actually material benefit to be gained. I spent some >> time just now trying to relax the AS restriction by ripping out >> postfix ops, and the results were not too promising. > I came to similar conclusions a couple of years ago: > https://www.postgresql.org/message-id/ca+tgmoyzpvt7uihjwgktytivhhlncp0ylavcoipe-lyg3w2...@mail.gmail.com Ah, right. > What I proposed at the time was creating a new category of keywords. Might work. My main concern would be if we have to forbid those keywords as column names --- for words like "year", in particular, that'd be a disaster. If the net effect is only that they can't be AS-less col labels, it won't break any cases that worked before. Our existing four-way keyword classification is not something that was handed down on stone tablets. I wonder whether postfix-ectomy changes the situation enough that a complete rethinking would be helpful. I also continue to think that more lookahead and token-merging would be interesting to pursue. It'd hardly surprise anybody if the token pair "character varying" were always treated as a type name, for instance. Anyway, the bottom-line conclusion remains the same: let's make sure we know what we'd do after getting rid of postfix ops, before we do that. regards, tom lane
Re: factorial function/phase out postfix operators?
On Tue, May 19, 2020 at 11:32 AM Tom Lane wrote: > Before we go much further on this, we should have some proof > that there's actually material benefit to be gained. I spent some > time just now trying to relax the AS restriction by ripping out > postfix ops, and the results were not too promising. Indeed the > postfix-ops problem goes away, but then you find out that SQL's > random syntax choices for type names become the stumbling block. > An example here is that given > > SELECT 'foo'::character varying > > it's not clear if "varying" is supposed to be part of the type name or a > column label. It looks to me like we'd have to increase the reserved-ness > of VARYING, PRECISION, and about half a dozen currently-unreserved > keywords involved in INTERVAL syntax, including such popular column names > as "month", "day", and "year". > > Plus I got conflicts on WITHIN, GROUP, and FILTER from ordered-set > aggregate syntax; those are currently unreserved keywords, but they > can't be allowed as AS-less column labels. I came to similar conclusions a couple of years ago: https://www.postgresql.org/message-id/ca+tgmoyzpvt7uihjwgktytivhhlncp0ylavcoipe-lyg3w2...@mail.gmail.com What I proposed at the time was creating a new category of keywords. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Trouble with hashagg spill I/O pattern and costing
On Tue, May 19, 2020 at 09:27:34AM -0700, Jeff Davis wrote: On Tue, 2020-05-19 at 17:12 +0200, Tomas Vondra wrote: I think there are two related problem - with costing and with excessive I/O due to using logical tapes. Thank you for the detailed analysis. I am still digesting this information. This kinda makes me question whether logical tapes are the right tool for hashagg. I've read the explanation in logtape.c why it's about the same amount of I/O as using separate files, but IMO that only really works for I/O patters similar to merge sort - the more I think about this, the more I'm convinced we should just do what hashjoin is doing. Fundamentally, sort writes sequentially and reads randomly; while HashAgg writes randomly and reads sequentially. Not sure. I think the charts and stats of iosnoop data show that an awful lot of reads during sort is actually pretty sequential. Moreover, sort manages to read the data in much larger blocks - 128kB instead of just 8kB (which is what hashagg seems to be doing). I wonder why is that and if we could achieve that for hashagg too ... If the random writes of HashAgg end up fragmented too much on disk, then clearly the sequential reads are not so sequential anyway. The only way to avoid fragmentation on disk is to preallocate for the tape/file. And if there a way to pre-allocate larger chunks? Presumably we could assign the blocks to tape in larger chunks (e.g. 128kB, i.e. 16 x 8kB) instead of just single block. I haven't seen anything like that in tape.c, though ... BufFile (relying more on the OS) would probably do a better job of preallocating the disk space in a useful way; whereas logtape.c makes it easier to manage buffers and the overall number of files created (thereby allowing higher fanout of partitions). We have a number of possibilities here: 1. Improve costing to reflect that HashAgg is creating more random IOs than Sort. I think we'll need to do something about this, but I think we should try improving the behavior first and then model the costing based on that. 2. Reduce the partition fanout in the hopes that the OS does a better job with readahead. I doubt this will make a significant difference. I think the problem is the partitions end up interleaved way too much in the temp file, and I don't see how a lower fanout would fix that. BTW what do you mean when you say "fanout"? Do you mean how fast we increase the number of partitions, or some parameter in particular? 3. Switch back to BufFile, in which case we probably need to reduce the fanout for other reasons. Maybe, although that seems pretty invasive post beta1. 4. Change logtape.c to allow preallocation or to write in larger blocks. I think this is what I suggested above (allocating 16 blocks at a time, or something). I wonder how wasteful this would be, but I think not very much. Essentially, with 1024 partitions and pre-allocating space in 128kB chunks, that means 128MB may end up unused, which seems ok-ish, and I guess we could further restrict that by starting with lower value and gradually increasing the number. Or something like that ... 5. Change BufFile to allow more control over buffer usage, and switch to that. Maybe. I don't recall what exactly is the issue with buffer usage, but I think it has the same invasiveness issue as (3). OTOH it's what hashjoin does, and we've lived with it for ages ... #1 or #2 are the least invasive, and I think we can get a satisfactory solution by combining those. OK. I think tweaking the costing (and essentially reverting to what 12 does for those queries) is perfectly reasonable. But if we can actually get some speedup thanks to hashagg, even better. I saw good results with the high fanout and low work_mem when there is still a lot of system memory. That's a nice benefit, but perhaps it's safer to use a lower fanout (which will lead to recursion) until we get a better handle on the IO patterns. I don't know how much we can rely on that - once we push some of the data from page cache, it has the issues I described. The trouble is people may not have enough memory to keep everything in cache, otherwise they might just as well bump up work_mem and not spill at all. Perhaps you can try recompiling with a lower max partitions and rerun the query? How much would we have to lower it for either the cost to approach reality or the OS readahead to become effective? I can try that, of course. Which parameters should I tweak / how? I can also try running it with BufFile, in case you prepare a WIP patch. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: PostgreSQL 13 Beta 1 Release Announcement Draft
On Mon, May 18, 2020 at 7:29 PM Jonathan S. Katz wrote: > Attached is a draft of the release announcement for the PostgreSQL 13 > Beta 1 release this week. > We could call out the additional commits that Tom has done for wait event renaming, re: compatibility - next to "Rename some recovery-related wait events". These are the relevant commits, I think: Rename SLRU structures and associated LWLocks. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=5da14938f7bfb96b648ee3c47e7ea2afca5bcc4a Rename assorted LWLock tranches. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=36ac359d3621578cefc2156a3917024cdd3b1829 Drop the redundant "Lock" suffix from LWLock wait event names. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=14a91010912632cae322b06fce0425faedcf7353 Mop-up for wait event naming issues. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=3048898e73c75f54bb259323382e0e7f6368cb6f Thanks, Lukas -- Lukas Fittl
Re: Trouble with hashagg spill I/O pattern and costing
On Tue, 2020-05-19 at 17:12 +0200, Tomas Vondra wrote: > I think there are two related problem - with costing and with > excessive > I/O due to using logical tapes. Thank you for the detailed analysis. I am still digesting this information. > This kinda makes me question whether logical tapes are the right tool > for hashagg. I've read the explanation in logtape.c why it's about > the > same amount of I/O as using separate files, but IMO that only really > works for I/O patters similar to merge sort - the more I think about > this, the more I'm convinced we should just do what hashjoin is > doing. Fundamentally, sort writes sequentially and reads randomly; while HashAgg writes randomly and reads sequentially. If the random writes of HashAgg end up fragmented too much on disk, then clearly the sequential reads are not so sequential anyway. The only way to avoid fragmentation on disk is to preallocate for the tape/file. BufFile (relying more on the OS) would probably do a better job of preallocating the disk space in a useful way; whereas logtape.c makes it easier to manage buffers and the overall number of files created (thereby allowing higher fanout of partitions). We have a number of possibilities here: 1. Improve costing to reflect that HashAgg is creating more random IOs than Sort. 2. Reduce the partition fanout in the hopes that the OS does a better job with readahead. 3. Switch back to BufFile, in which case we probably need to reduce the fanout for other reasons. 4. Change logtape.c to allow preallocation or to write in larger blocks. 5. Change BufFile to allow more control over buffer usage, and switch to that. #1 or #2 are the least invasive, and I think we can get a satisfactory solution by combining those. I saw good results with the high fanout and low work_mem when there is still a lot of system memory. That's a nice benefit, but perhaps it's safer to use a lower fanout (which will lead to recursion) until we get a better handle on the IO patterns. Perhaps you can try recompiling with a lower max partitions and rerun the query? How much would we have to lower it for either the cost to approach reality or the OS readahead to become effective? Regards, Jeff Davis
Re: SyncRepLock acquired exclusively in default configuration
On Mon, May 18, 2020 at 7:41 PM Masahiko Sawada < masahiko.saw...@2ndquadrant.com> wrote: > This item is for PG14, right? If so I'd like to add this item to the > next commit fest. > Sure, add it to commit fest. Seems though it should be backpatched to relevant branches as well.
Re: Missing grammar production for WITH TIES
On 2020-May-19, Tom Lane wrote: > Yeah, that would have been better per project protocol: if a tarball > re-wrap becomes necessary then it would be messy not to include this > change along with fixing whatever urgent bug there might be. > > However, I thought the case for delaying this fix till post-wrap was kind > of thin anyway, so if that does happen I won't be too fussed about it. > Otherwise I would've said something earlier on this thread. In the end, it's a judgement call. In this case, my assessment was that the risk was small enough that I could push after I saw the tarballs announced. In other cases I've judged differently and waited for longer. If the fix had been even simpler, I would have pushed it right away, but my confidence with grammar changes is not as high as I would like. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Extension ownership and misuse of SET ROLE/SET SESSION AUTHORIZATION
Daniel Gustafsson writes: >> On 13 Feb 2020, at 23:55, Tom Lane wrote: >> Given the current behavior of SET ROLE and SET SESSION AUTHORIZATION, >> I don't actually see any way that we could get these features to >> play together. > Is this being worked on for the 13 cycle such that it should be an open item? I didn't have it on my list, but yeah maybe we should add it to the "pre-existing issues" list. >> The quick-and-dirty answer is to disallow these switches from being >> used together in pg_restore, and I'm inclined to think maybe we should >> do that in the back branches. > ..or should we do this for v13 and back-branches and leave fixing it for 14? > Considering the potential invasiveness of the fix I think the latter sounds > rather appealing at this point in the cycle. Something like the attached > should be enough IIUC. pg_dump and pg_dumpall also have that switch no? regards, tom lane
Re: factorial function/phase out postfix operators?
Vik Fearing writes: > I'm -1 on keeping ! around as a hard-coded postfix operator. Before we go much further on this, we should have some proof that there's actually material benefit to be gained. I spent some time just now trying to relax the AS restriction by ripping out postfix ops, and the results were not too promising. Indeed the postfix-ops problem goes away, but then you find out that SQL's random syntax choices for type names become the stumbling block. An example here is that given SELECT 'foo'::character varying it's not clear if "varying" is supposed to be part of the type name or a column label. It looks to me like we'd have to increase the reserved-ness of VARYING, PRECISION, and about half a dozen currently-unreserved keywords involved in INTERVAL syntax, including such popular column names as "month", "day", and "year". Plus I got conflicts on WITHIN, GROUP, and FILTER from ordered-set aggregate syntax; those are currently unreserved keywords, but they can't be allowed as AS-less column labels. We could possibly minimize the damage by inventing another keyword classification besides the four we have now. Or maybe we should think harder about using more lookahead between the lexer and grammar. But this is going to be a lot more ticklish than I would've hoped, and possibly not cost-free, so we might well end up never pulling the trigger on such a change. So right at the moment I'm agreeing with Stephen's nearby opinion: let's not deprecate these until we've got a patch that gets some concrete benefit from removing them. regards, tom lane
Re: Extension ownership and misuse of SET ROLE/SET SESSION AUTHORIZATION
> On 13 Feb 2020, at 23:55, Tom Lane wrote: Is this being worked on for the 13 cycle such that it should be an open item? > Given the current behavior of SET ROLE and SET SESSION AUTHORIZATION, > I don't actually see any way that we could get these features to > play together. SET SESSION AUTHORIZATION insists on the originally > authenticated user being a superuser, so that the documented point of > --role (to allow you to start the restore from a not-superuser role) > isn't going to work. I thought about starting to use SET ROLE for > both purposes, but it checks whether you have role privilege based > on the session userid, so that a previous SET ROLE doesn't get you > past that check even if it was a successful SET ROLE to a superuser. > > The quick-and-dirty answer is to disallow these switches from being > used together in pg_restore, and I'm inclined to think maybe we should > do that in the back branches. ..or should we do this for v13 and back-branches and leave fixing it for 14? Considering the potential invasiveness of the fix I think the latter sounds rather appealing at this point in the cycle. Something like the attached should be enough IIUC. cheers ./daniel pg_restore_role.patch Description: Binary data
Re: factorial function/phase out postfix operators?
Greetings, * Vik Fearing (v...@postgresfriends.org) wrote: > On 5/19/20 4:03 AM, Tom Lane wrote: > > Peter Eisentraut writes: > >> What are the thoughts about then marking the postfix operator deprecated > >> and eventually removing it? > > > > If we do this it'd require a plan. We'd have to also warn about the > > feature deprecation in (at least) the CREATE OPERATOR man page, and > > we'd have to decide how many release cycles the deprecation notices > > need to stand for. > > I have never come across any custom postfix operators in the wild, and > I've never even seen ! used in practice. > > So I would suggest a very short deprecation period. Deprecate now in > 13, let 14 go by, and rip it all out for 15. That should give us enough > time to extend the deprecation period if we need to, or go back on it > entirely (like I seem to remember we did with VACUUM FREEZE). > > > If that's the intention, though, it'd be good to get those deprecation > > notices published in v13 not v14. > > +1 I agree with putting notices into v13 saying they're deprecated, but then actually removing them in v14. For that matter, I'd vote that we generally accept a system whereby when we commit something that removes a feature in the next major version, we put out some kind of notice that it's been deprecated and won't be in v14. We don't want to run the risk of saying XYZ has been deprecated and then it staying around for a few years, nor trying to say "it'll be removed in v14" before we actually know that it's been committed for v14. In other words, wait to deprecate until the commit has happened for v14 (and maybe wait a couple days in case someone wasn't watching and argues to revert, but not longer than any normal commit), and then go back and mark it as "deprecated and removed in v14" for all back-branches. Users will continue to have 5 years (by upgrading to v13, or whatever the last release was before their favorite feature was removed, if they really need to) to update their systems to deal with the change. We do not do ourselves nor our users a real service by carrying forward deprecated code/interfaces/views/etc, across major versions; instead they tend to live on in infamy, with some users actually updating and some not, ever, and then complaining when we suggest actually removing it (we have lots of good examples of that too) and then we have to have the debate again about removing it and, in some cases, we end up un-deprecating it, which is confusing for users and a bit ridiculous. Let's not do that. Thanks, Stephen signature.asc Description: PGP signature
Re: factorial function/phase out postfix operators?
Robert Haas writes: > The ambiguity doesn't come from the mere existence of postfix > operators. It comes from the fact that, when we lex the input, we > can't tell whether a particular operator that we happen to encounter > is prefix, infix, or postfix. So hard-coding, for example, a rule that > '!' is always a postfix operator and anything else is never a postfix > operator is sufficient to solve the key problems. If we were willing to say that '!' could *only* be a postfix operator, then maybe the ambiguity would go away. Or maybe it wouldn't; if you're seriously proposing this, I think it'd be incumbent on you to demonstrate that we could still simplify the grammar to the same extent. But that will incur its own set of compatibility problems, because there's no reason to assume that nobody has made prefix or infix '!' operators. In any case, it's hard to decide that that's a less klugy solution than getting rid of postfix ops altogether. There's a reason why few programming languages have those. In general, I put this on about the same level as when we decided to remove ';' and ':' as operators (cf 259489bab, 766fb7f70). Somebody thought it was cute that it was possible to have that, which maybe it was, but it wasn't really sane in the big picture. And as I recall, the amount of pushback we got was nil. regards, tom lane
Re: Performance penalty when requesting text values in binary format
On Mon, May 18, 2020 at 7:07 AM Laurenz Albe wrote: > Did you profile your benchmark? > It would be interesting to know where the time is spent. > Unfortunately, I have not. Fortunately, it appears that Tom Lane recognized this as a part of another issue and has prepared a patch. https://www.postgresql.org/message-id/6648.1589819885%40sss.pgh.pa.us Thanks, Jack
Re: pg_stat_wal_receiver and flushedUpto/writtenUpto
On 2020/05/17 10:08, Michael Paquier wrote: On Sat, May 16, 2020 at 10:15:47AM +0900, Michael Paquier wrote: Thanks. If there are no objections, I'll revisit that tomorrow and apply it with those changes, just in time for beta1. Okay, done this part then. I found that "received_lsn" is still used in high-availability.sgml. We should apply the following change in high-availability? - view's received_lsn indicates that WAL is being + view's flushed_lsn indicates that WAL is being BTW, we have pg_last_wal_receive_lsn() that returns the same lsn as pg_stat_wal_receiver.flushed_lsn. Previously both used the term "receive" in their names, but currently not. IMO it's better to use the same term in those names for the consistency, but it's not good idea to rename pg_last_wal_receive_lsn() to something like pg_last_wal_receive_lsn(). I have no better idea for now. So I'm ok with the current names. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: factorial function/phase out postfix operators?
On 5/19/20 4:22 PM, Robert Haas wrote: > On Tue, May 19, 2020 at 9:51 AM Tom Lane wrote: >> Uh ... what exactly would be the point of that? The real reason to do >> this at all is not that we have it in for '!', but that we want to >> drop the possibility of postfix operators from the grammar altogether, >> which will remove a boatload of ambiguity. > > The ambiguity doesn't come from the mere existence of postfix > operators. It comes from the fact that, when we lex the input, we > can't tell whether a particular operator that we happen to encounter > is prefix, infix, or postfix. So hard-coding, for example, a rule that > '!' is always a postfix operator and anything else is never a postfix > operator is sufficient to solve the key problems. Then "SELECT a ! b" > can only be a postfix operator application followed by a column > labeling, a "SELECT a + b" can only be the application of an infix > operator. So if I make a complex UDT where a NOT operator makes a lot of sense[*], why wouldn't I be allowed to make a prefix operator ! for it? All for what? That one person in the corner over there who doesn't want to rewrite their query to use factorial() instead? I'm -1 on keeping ! around as a hard-coded postfix operator. [*] I don't have a concrete example in mind, just this abstract one. -- Vik Fearing
Re: ldap tls test fails in some environments
Re: Thomas Munro > In your transcript for test 20, it looks like the client (PostgreSQL) > is hanging up without even sending a TLS ClientHello: Maybe tests 19 and 20 are failing because 18 was already bad. (But probably not.) > I wonder how to figure out why... does this tell you anything? > > $ENV{'LDAPCONF'} = $ldap_conf; > +$ENV{"GNUTLS_DEBUG_LEVEL"} = '99'; Sorry, I had not seen your message until now. Logs attached. Even ldapsearch ... -ZZ is failing with that slapd config, so it's not a PG problem. $ GNUTLS_DEBUG_LEVEL=99 ldapsearch -h localhost -p 56118 -s base -b dc=example,dc=net -D cn=Manager,dc=example,dc=net -y /home/myon/postgresql-13-13~~devel~20200515.0434/build/src/test/ldap/tmp_check/ldappassword -n 'objectclass=*' -ZZ gnutls[2]: Enabled GnuTLS 3.6.13 logging... gnutls[2]: getrandom random generator was detected gnutls[2]: Intel SSSE3 was detected gnutls[2]: Intel AES accelerator was detected gnutls[2]: Intel GCM accelerator was detected gnutls[2]: cfg: unable to access: /etc/gnutls/config: 2 5ec3ec38 daemon: activity on 1 descriptor 5ec3ec38 daemon: activity on: 5ec3ec38 slap_listener_activate(6): 5ec3ec38 daemon: epoll: listen=6 busy 5ec3ec38 daemon: epoll: listen=7 active_threads=0 tvp=NULL 5ec3ec38 daemon: epoll: listen=8 active_threads=0 tvp=NULL 5ec3ec38 daemon: epoll: listen=9 active_threads=0 tvp=NULL 5ec3ec38 >>> slap_listener(ldap://localhost:56118) 5ec3ec38 daemon: accept() = 10 5ec3ec38 daemon: listen=6, new connection on 10 5ec3ec38 daemon: activity on 1 descriptor 5ec3ec38 daemon: activity on: 5ec3ec38 daemon: epoll: listen=6 active_threads=0 tvp=NULL 5ec3ec38 daemon: epoll: listen=7 active_threads=0 tvp=NULL 5ec3ec38 daemon: epoll: listen=8 active_threads=0 tvp=NULL 5ec3ec38 daemon: epoll: listen=9 active_threads=0 tvp=NULL 5ec3ec38 daemon: added 10r (active) listener=(nil) 5ec3ec38 daemon: activity on 1 descriptor 5ec3ec38 daemon: activity on: 10r 5ec3ec38 daemon: read active on 10 5ec3ec38 daemon: epoll: listen=6 active_threads=0 tvp=NULL 5ec3ec38 connection_get(10) 5ec3ec38 daemon: epoll: listen=7 active_threads=0 tvp=NULL 5ec3ec38 connection_get(10): got connid=1000 5ec3ec38 connection_read(10): checking for input on id=1000 5ec3ec38 daemon: epoll: listen=8 active_threads=0 tvp=NULL 5ec3ec38 daemon: epoll: listen=9 active_threads=0 tvp=NULL 5ec3ec38 daemon: activity on 1 descriptor ber_get_next 5ec3ec38 daemon: activity on: ldap_read: want=8, got=8 : 30 1d 02 01 01 77 18 800w.. 5ec3ec38 daemon: epoll: listen=6 active_threads=0 tvp=NULL 5ec3ec38 daemon: epoll: listen=7 active_threads=0 tvp=NULL 5ec3ec38 daemon: epoll: listen=8 active_threads=0 tvp=NULL 5ec3ec38 daemon: epoll: listen=9 active_threads=0 tvp=NULL ldap_read: want=23, got=23 : 16 31 2e 33 2e 36 2e 31 2e 34 2e 31 2e 31 34 36 .1.3.6.1.4.1.146 0010: 36 2e 32 30 30 33 37 6.20037 ber_get_next: tag 0x30 len 29 contents: ber_dump: buf=0x7f0c5bc0 ptr=0x7f0c5bc0 end=0x7f0c5bdd len=29 : 02 01 01 77 18 80 16 31 2e 33 2e 36 2e 31 2e 34 ...w...1.3.6.1.4 0010: 2e 31 2e 31 34 36 36 2e 32 30 30 33 37.1.1466.20037 5ec3ec38 op tag 0x77, time 1589898296 ber_get_next ldap_read: want=8 error=Resource temporarily unavailable 5ec3ec38 conn=1000 op=0 do_extended 5ec3ec38 daemon: activity on 1 descriptor 5ec3ec38 daemon: activity on: ber_scanf fmt ({m) ber: ber_dump: buf=0x7f0c5bc0 ptr=0x7f0c5bc3 end=0x7f0c5bdd len=26 5ec3ec38 daemon: epoll: listen=6 active_threads=0 tvp=NULL : 77 18 80 16 31 2e 33 2e 36 2e 31 2e 34 2e 31 2e w...1.3.6.1.4.1. 0010: 31 34 36 36 2e 32 30 30 33 37 1466.20037 5ec3ec38 daemon: epoll: listen=7 active_threads=0 tvp=NULL 5ec3ec38 daemon: epoll: listen=8 active_threads=0 tvp=NULL 5ec3ec38 daemon: epoll: listen=9 active_threads=0 tvp=NULL 5ec3ec38 do_extended: oid=1.3.6.1.4.1.1466.20037 5ec3ec38 send_ldap_extended: err=0 oid= len=0 5ec3ec38 send_ldap_response: msgid=1 tag=120 err=0 ber_flush2: 14 bytes to sd 10 : 30 0c 02 01 01 78 07 0a 01 00 04 00 04 00 0x ldap_write: want=14, written=14 : 30 0c 02 01 01 78 07 0a 01 00 04 00 04 00 0x gnutls[2]: added 6 protocols, 29 ciphersuites, 19 sig algos and 10 groups into priority list gnutls[3]: ASSERT: ../../../lib/x509/verify-high2.c[gnutls_x509_trust_list_add_trust_file]:361 ldap_start_tls: Connect error (-11) 5ec3ec38 daemon: activity on 1 descriptor 5ec3ec38 daemon: activity on: 10r 5ec3ec38 daemon: read active on 10 5ec3ec38 daemon: epoll: listen=6 active_threads=0 tvp=NULL 5ec3ec38 daemon: epoll: listen=7 active_threads=0 tvp=NULL 5ec3ec38 daemon: epoll: listen=8 active_threads=0 tvp=NULL 5ec3ec38 connection_get(10) 5ec3ec38 connection_get(10): got connid=1000 5ec3ec38 daemon: epoll: listen=9 active_threads=0 tvp=NULL 5ec3ec38 connection_read(10): checking for input on id=1000 tls_read: want=5, got=5 : 30 05 02 01 02
Re: factorial function/phase out postfix operators?
On Tue, May 19, 2020 at 9:51 AM Tom Lane wrote: > Uh ... what exactly would be the point of that? The real reason to do > this at all is not that we have it in for '!', but that we want to > drop the possibility of postfix operators from the grammar altogether, > which will remove a boatload of ambiguity. The ambiguity doesn't come from the mere existence of postfix operators. It comes from the fact that, when we lex the input, we can't tell whether a particular operator that we happen to encounter is prefix, infix, or postfix. So hard-coding, for example, a rule that '!' is always a postfix operator and anything else is never a postfix operator is sufficient to solve the key problems. Then "SELECT a ! b" can only be a postfix operator application followed by a column labeling, a "SELECT a + b" can only be the application of an infix operator. The parser ambiguities could also be removed if the source of the information where a GUC or a catalog lookup; there are good reasons not to go that way, but my point is that the problem is not that postfix operators are per se evil, but that the information we need is not available at the right phase of the process. We can only make use of the information in pg_operator after we start assigning type information, which has to happen after we parse, but to avoid the ambiguity here, we need the information before we parse - i.e. at the lexing stage. > In my non-caffeinated state, I don't recall exactly which things are > blocked by the existence of postfix ops; but I think for instance it might > become possible to remove the restriction of requiring AS before column > aliases that happen to be unreserved keywords. Right - which would be a huge win. > I would also argue that having a feature that is available to > built-in operators but not user-defined ones is pretty antithetical > to Postgres philosophy. That I think is the policy question before us. I believe that any rule that tells us which operators are postfix and which are not at the lexing stage is good enough. I think here you are arguing for the empty set, which will work, but I believe any other fixed set also works, such as { '!' }. I don't think we're going to break a ton of user code no matter which one we pick, but I do think that it's possible to pick either one and still achieve our goals here, so that's the issue that I wanted to raise. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Is it useful to record whether plans are generic or custom?
On Sat, May 16, 2020 at 6:01 PM legrand legrand wrote: > > To track executed plan types, I think execution layer hooks > > are appropriate. > > These hooks, however, take QueryDesc as a param and it does > > not include cached plan information. > > It seems that the same QueryDesc entry is reused when executing > a generic plan. > For exemple marking queryDesc->plannedstmt->queryId (outside > pg_stat_statements) with a pseudo tag during ExecutorStart > reappears in later executions with generic plans ... > > Is this QueryDesc reusable by a custom plan ? If not maybe a solution > could be to add a flag in queryDesc->plannedstmt ? > Thanks for your proposal! I first thought it was a good idea and tried to add a flag to QueryDesc, but the comments on QueryDesc say it encapsulates everything that the executor needs to execute a query. Whether a plan is generic or custom is not what executor needs to know for running queries, so now I hesitate to do so. Instead, I'm now considering using a static hash for prepared queries (static HTAB *prepared_queries). BTW, I'd also appreciate other opinions about recording the number of generic and custom plans on pg_stat_statemtents. Regards, -- Atsushi Torikoshi
Re: factorial function/phase out postfix operators?
Robert Haas writes: > I think it's generally a good idea, though perhaps we should consider > continuing to allow '!' as a postfix operator and just removing > support for any other. Uh ... what exactly would be the point of that? The real reason to do this at all is not that we have it in for '!', but that we want to drop the possibility of postfix operators from the grammar altogether, which will remove a boatload of ambiguity. > I won't lose a lot of sleep if we decide to rip out '!' as well, but I > don't think that continuing to support it would cost us much. AFAICS, it would cost us the entire point of this change. In my non-caffeinated state, I don't recall exactly which things are blocked by the existence of postfix ops; but I think for instance it might become possible to remove the restriction of requiring AS before column aliases that happen to be unreserved keywords. If we lobotomize CREATE OPERATOR but don't remove built-in postfix ops, then none of those improvements will be available. That seems like the worst possible choice. I would also argue that having a feature that is available to built-in operators but not user-defined ones is pretty antithetical to Postgres philosophy. regards, tom lane
Re: Expand the use of check_canonical_path() for more GUCs
Michael Paquier writes: > While digging into my backlog, I have found this message from Peter E > mentioning about $subject: > https://www.postgresql.org/message-id/e6aac026-174c-9952-689f-6bee76f9a...@2ndquadrant.com > It seems to me that it would be a good idea to make those checks more > consistent, and attached is a patch. Hm, I'm pretty certain that data_directory does not need this because canonicalization is done elsewhere; the most that you could accomplish there is to cause problems. Dunno about the rest. regards, tom lane
Re: factorial function/phase out postfix operators?
> > I won't lose a lot of sleep if we decide to rip out '!' as well, but I > don't think that continuing to support it would cost us much. > +1 for keeping ! and nuking the rest, if possible. Regards, Ken
Re: Missing grammar production for WITH TIES
Michael Paquier writes: > On Tue, May 19, 2020 at 12:41:39AM -0400, Tom Lane wrote: >> Michael Paquier writes: >>> This has been committed just after beta1 has been stamped. So it >>> means that it won't be included in it, right? >> Right. > Still, wouldn't it be better to wait until the version is tagged? Yeah, that would have been better per project protocol: if a tarball re-wrap becomes necessary then it would be messy not to include this change along with fixing whatever urgent bug there might be. However, I thought the case for delaying this fix till post-wrap was kind of thin anyway, so if that does happen I won't be too fussed about it. Otherwise I would've said something earlier on this thread. regards, tom lane
Problem with pg_atomic_compare_exchange_u64 at 32-bit platformwd
Definition of pg_atomic_compare_exchange_u64 requires alignment of expected pointer on 8-byte boundary. pg_atomic_compare_exchange_u64(volatile pg_atomic_uint64 *ptr, uint64 *expected, uint64 newval) { #ifndef PG_HAVE_ATOMIC_U64_SIMULATION AssertPointerAlignment(ptr, 8); AssertPointerAlignment(expected, 8); #endif I wonder if there are platforms where such restriction is actually needed. And if so, looks like our ./src/test/regress/regress.c is working only occasionally: static void test_atomic_uint64(void) { pg_atomic_uint64 var; uint64 expected; ... if (!pg_atomic_compare_exchange_u64(, , 1)) because there is no warranty that "expected" variable will be aligned on stack at 8 byte boundary (at least at Win32).
Re: Warn when parallel restoring a custom dump without data offsets
On Sat, May 16, 2020 at 04:57:46PM -0400, David Gilman wrote: > If pg_dump can't seek on its output stream when writing a dump in the > custom archive format (possibly because you piped its stdout to a file) > it can't update that file with data offsets. These files will often > break parallel restoration. Warn when the user is doing pg_restore on > such a file to give them a hint as to why their restore is about to > fail. You didn't say so, but I gather this is related to this other thread (which seems to represent two separate issues). https://www.postgresql.org/message-id/flat/1582010626326-0.post%40n3.nabble.com#0891d77011cdb6ca3ad8ab7904a2ed63 > Tom, if you or anyone else with PostgreSQL would appreciate the > pg_dump file I can send it to you out of band, it's only a few > megabytes. I have pg_restore with debug symbols too if you want me to > try anything. Would you send to me or post a link to a filesharing site and I'll try to reproduce it ? So far no luck. You should include here your diagnosis from that thread, or add it to a commit message, and mention the suspect commit (548e50976). Eventually add patch for the next commitfest. https://commitfest.postgresql.org/ I guess you're also involved in this conversation: https://dba.stackexchange.com/questions/257398/pg-restore-with-jobs-flag-results-in-pg-restore-error-a-worker-process-di -- Justin
Re: some grammar refactoring
On Tue, May 19, 2020 at 2:43 AM Peter Eisentraut wrote: > Here is a series of patches to do some refactoring in the grammar around > the commands COMMENT, DROP, SECURITY LABEL, and ALTER EXTENSION ... > ADD/DROP. In the grammar, these commands (with some exceptions) > basically just take a reference to an object and later look it up in C > code. Some of that was already generalized individually for each > command (drop_type_any_name, drop_type_name, etc.). This patch combines > it into common lists for all these commands. I haven't reviewed the code, but +1 for the idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Two fsync related performance issues?
On Mon, May 11, 2020 at 8:43 PM Paul Guo wrote: > I have this concern since I saw an issue in a real product environment that > the startup process needs 10+ seconds to start wal replay after relaunch due > to elog(PANIC) (it was seen on postgres based product Greenplum but it is a > common issue in postgres also). I highly suspect the delay was mostly due to > this. Also it is noticed that on public clouds fsync is much slower than that > on local storage so the slowness should be more severe on cloud. If we at > least disable fsync on the table directories we could skip a lot of file > fsync - this may save a lot of seconds during crash recovery. I've seen this problem be way worse than that. Running fsync() on all the files and performing the unlogged table cleanup steps can together take minutes or, I think, even tens of minutes. What I think sucks most in this area is that we don't even emit any log messages if the process takes a long time, so the user has no idea why things are apparently hanging. I think we really ought to try to figure out some way to give the user a periodic progress indication when this kind of thing is underway, so that they at least have some idea what's happening. As Tom says, I don't think there's any realistic way that we can disable it altogether, but maybe there's some way we could make it quicker, like some kind of parallelism, or by overlapping it with other things. It seems to me that we have to complete the fsync pass before we can safely checkpoint, but I don't know that it needs to be done any sooner than that... not sure though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: factorial function/phase out postfix operators?
út 19. 5. 2020 v 14:27 odesílatel Robert Haas napsal: > On Mon, May 18, 2020 at 10:42 AM Peter Eisentraut > wrote: > > What are the thoughts about then marking the postfix operator deprecated > > and eventually removing it? > > I wrote a little bit about this last year: > > > http://postgr.es/m/CA+TgmoarLfSQcLCh7jx0737SZ28qwbuy+rUWT6rSHAO=b-6...@mail.gmail.com > > I think it's generally a good idea, though perhaps we should consider > continuing to allow '!' as a postfix operator and just removing > support for any other. That would probably allow us to have a very > short deprecation period, since real-world use of user-defined postfix > operators seems to be nil -- and it would also make this into a change > that only affects the lexer and parser, which might make it simpler. > > I won't lose a lot of sleep if we decide to rip out '!' as well, but I > don't think that continuing to support it would cost us much. > This is little bit obscure feature. It can be removed and relative quickly. Maybe some warning if somebody use it can be good (for Postgres 13) Pavel > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > >
explicit_bzero for sslpassword
Since commit 74a308cf5221f we use explicit_bzero on pgpass and connhost password in libpq, but not sslpassword which seems an oversight. The attached performs an explicit_bzero before freeing like the pattern for other password variables. cheers ./daniel sslpassword_bzero.patch Description: Binary data
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
On Fri, May 15, 2020 at 2:48 PM Dilip Kumar wrote: > > On Wed, May 13, 2020 at 4:50 PM Amit Kapila wrote: > > > > > 3. > > And, during catalog scan we can check the status of the xid and > > + * if it is aborted we will report a specific error that we can ignore. We > > + * might have already streamed some of the changes for the aborted > > + * (sub)transaction, but that is fine because when we decode the abort we > > will > > + * stream abort message to truncate the changes in the subscriber. > > + */ > > +static inline void > > +SetupCheckXidLive(TransactionId xid) > > > > In the above comment, I don't think it is right to say that we ignore > > the error raised due to the aborted transaction. We need to say that > > we discard the already streamed changes on such an error. > > Done. > In the same comment, there is typo (/messageto/message to). > > 4. > > +static inline void > > +SetupCheckXidLive(TransactionId xid) > > +{ > > /* > > - * If this transaction has no snapshot, it didn't make any changes to the > > - * database, so there's nothing to decode. Note that > > - * ReorderBufferCommitChild will have transferred any snapshots from > > - * subtransactions if there were any. > > + * setup CheckXidAlive if it's not committed yet. We don't check if the xid > > + * aborted. That will happen during catalog access. Also reset the > > + * sysbegin_called flag. > > */ > > - if (txn->base_snapshot == NULL) > > + if (!TransactionIdDidCommit(xid)) > > { > > - Assert(txn->ninvalidations == 0); > > - ReorderBufferCleanupTXN(rb, txn); > > - return; > > + CheckXidAlive = xid; > > + bsysscan = false; > > } > > > > I think this function is inline as it needs to be called for each > > change. If that is the case and otherwise also, isn't it better that > > we check if passed xid is the same as CheckXidAlive before checking > > TransactionIdDidCommit as TransactionIdDidCommit can be costly and > > calling it for each change might not be a good idea? > > Done, Also I think it is good the check the TransactionIdIsInProgress > instead of !TransactionIdDidCommit. I have changed that as well. > What if it is aborted just before this check? I think the decode API won't be able to detect that and sys* API won't care to check because CheckXidAlive won't be set for that case. > > 5. > > setup CheckXidAlive if it's not committed yet. We don't check if the xid > > + * aborted. That will happen during catalog access. Also reset the > > + * sysbegin_called flag. > > > > /if the xid aborted/if the xid is aborted. missing comma after Also. > > Done > You forgot to change as per the second part of the comment (missing comma after Also). > > > 8. > > @@ -1588,8 +1766,6 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId > > xid, > > * use as a normal record. It'll be cleaned up at the end > > * of INSERT processing. > > */ > > - if (specinsert == NULL) > > - elog(ERROR, "invalid ordering of speculative insertion changes"); > > > > You have removed this check but all other handling of specinsert is > > same as far as this patch is concerned. Why so? > > Seems like a merge issue, or the leftover from the old design of the > toast handling where we were streaming with the partial tuple. > fixed now. > > > 9. > > @@ -1676,8 +1860,6 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId > > xid, > > * freed/reused while restoring spooled data from > > * disk. > > */ > > - Assert(change->data.tp.newtuple != NULL); > > - > > dlist_delete(>node); > > > > Why is this Assert removed? > > Same cause as above so fixed. > > > 10. > > @@ -1753,7 +1935,15 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId > > xid, > > relations[nrelations++] = relation; > > } > > > > - rb->apply_truncate(rb, txn, nrelations, relations, change); > > + if (streaming) > > + { > > + rb->stream_truncate(rb, txn, nrelations, relations, change); > > + > > + /* Remember that we have sent some data. */ > > + change->txn->any_data_sent = true; > > + } > > + else > > + rb->apply_truncate(rb, txn, nrelations, relations, change); > > > > Can we encapsulate this in a separate function like > > ReorderBufferApplyTruncate or something like that? Basically, rather > > than having streaming check in this function, lets do it in some other > > internal function. And we can likewise do it for all the streaming > > checks in this function or at least whereever it is feasible. That > > will make this function look clean. > > Done for truncate and change. I think we can create a few more such > functions for > start/stop and cleanup handling on error. I will work on that. > Yeah, I think that would be better. One minor comment change suggestion: /* + * start stream or begin the transaction. If this is the first + * change in the current stream. + */ We can write the above comment as "Start the stream or begin the transaction for the first change in the current stream." -- With Regards, Amit Kapila. EnterpriseDB:
Re: factorial function/phase out postfix operators?
On Mon, May 18, 2020 at 10:42 AM Peter Eisentraut wrote: > What are the thoughts about then marking the postfix operator deprecated > and eventually removing it? I wrote a little bit about this last year: http://postgr.es/m/CA+TgmoarLfSQcLCh7jx0737SZ28qwbuy+rUWT6rSHAO=b-6...@mail.gmail.com I think it's generally a good idea, though perhaps we should consider continuing to allow '!' as a postfix operator and just removing support for any other. That would probably allow us to have a very short deprecation period, since real-world use of user-defined postfix operators seems to be nil -- and it would also make this into a change that only affects the lexer and parser, which might make it simpler. I won't lose a lot of sleep if we decide to rip out '!' as well, but I don't think that continuing to support it would cost us much. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
On Fri, May 15, 2020 at 2:47 PM Dilip Kumar wrote: > > On Tue, May 12, 2020 at 4:39 PM Amit Kapila wrote: > > > > > 4. > > +static void > > +stream_start_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn) > > +{ > > + LogicalDecodingContext *ctx = cache->private_data; > > + LogicalErrorCallbackState state; > > + ErrorContextCallback errcallback; > > + > > + Assert(!ctx->fast_forward); > > + > > + /* We're only supposed to call this when streaming is supported. */ > > + Assert(ctx->streaming); > > + > > + /* Push callback + info on the error context stack */ > > + state.ctx = ctx; > > + state.callback_name = "stream_start"; > > + /* state.report_location = apply_lsn; */ > > > > Why can't we supply the report_location here? I think here we need to > > report txn->first_lsn if this is the very first stream and > > txn->final_lsn if it is any consecutive one. > > Done > Now after your change in stream_start_cb_wrapper, we assign report_location as first_lsn passed as input to function but write_location is still txn->first_lsn. Shouldn't we assing passed in first_lsn to write_location? It seems assigning txn->first_lsn won't be correct for streams other than first-one. > > 5. > > +static void > > +stream_stop_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn) > > +{ > > + LogicalDecodingContext *ctx = cache->private_data; > > + LogicalErrorCallbackState state; > > + ErrorContextCallback errcallback; > > + > > + Assert(!ctx->fast_forward); > > + > > + /* We're only supposed to call this when streaming is supported. */ > > + Assert(ctx->streaming); > > + > > + /* Push callback + info on the error context stack */ > > + state.ctx = ctx; > > + state.callback_name = "stream_stop"; > > + /* state.report_location = apply_lsn; */ > > > > Can't we report txn->final_lsn here > > We are already setting this to the txn->final_ls in 0006 patch, but I > have moved it into this patch now. > Similar to previous point, here also, I think we need to assign report and write location as last_lsn passed to this API. > > > > v20-0005-Implement-streaming-mode-in-ReorderBuffer > > - > > 10. > > Theoretically, we could get rid of the k-way merge, and append the > > changes to the toplevel xact directly (and remember the position > > in the list in case the subxact gets aborted later). > > > > I don't think this part of the commit message is correct as we > > sometimes need to spill even during streaming. Please check the > > entire commit message and update according to the latest > > implementation. > > Done > You seem to forgot about removing the other part of message ("This adds a second iterator for the streaming case" which is not relavant now. > > 11. > > - * HeapTupleSatisfiesHistoricMVCC. > > + * tqual.c's HeapTupleSatisfiesHistoricMVCC. > > + * > > + * We do build the hash table even if there are no CIDs. That's > > + * because when streaming in-progress transactions we may run into > > + * tuples with the CID before actually decoding them. Think e.g. about > > + * INSERT followed by TRUNCATE, where the TRUNCATE may not be decoded > > + * yet when applying the INSERT. So we build a hash table so that > > + * ResolveCminCmaxDuringDecoding does not segfault in this case. > > + * > > + * XXX We might limit this behavior to streaming mode, and just bail > > + * out when decoding transaction at commit time (at which point it's > > + * guaranteed to see all CIDs). > > */ > > static void > > ReorderBufferBuildTupleCidHash(ReorderBuffer *rb, ReorderBufferTXN *txn) > > @@ -1350,9 +1498,6 @@ ReorderBufferBuildTupleCidHash(ReorderBuffer > > *rb, ReorderBufferTXN *txn) > > dlist_iter iter; > > HASHCTL hash_ctl; > > > > - if (!rbtxn_has_catalog_changes(txn) || dlist_is_empty(>tuplecids)) > > - return; > > - > > > > I don't understand this change. Why would "INSERT followed by > > TRUNCATE" could lead to a tuple which can come for decode before its > > CID? The patch has made changes based on this assumption in > > HeapTupleSatisfiesHistoricMVCC which appears to be very risky as the > > behavior could be dependent on whether we are streaming the changes > > for in-progress xact or at the commit of a transaction. We might want > > to generate a test to once validate this behavior. > > > > Also, the comment refers to tqual.c which is wrong as this API is now > > in heapam_visibility.c. > > Done. > + * INSERT. So in such cases we assume the CIDs is from the future command + * and return as unresolve. + */ + if (tuplecid_data == NULL) + return false; + Here lets reword the last line of comment as ". So in such cases we assume the CID is from the future command." -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Add A Glossary
On 2020-05-19 08:17, Laurenz Albe wrote: The term "cluster" is unfortunate, because to most people it suggests a group of machines, so the term "instance" is better, but that ship has sailed long ago. I don't see what would stop us from renaming some things, with some care. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
On Tue, May 19, 2020 at 3:31 PM Dilip Kumar wrote: > > On Tue, May 19, 2020 at 2:34 PM Amit Kapila wrote: > > > > On Mon, May 18, 2020 at 5:57 PM Amit Kapila wrote: > > > > > > > > > 3. > > > + /* > > > + * If streaming is enable and we have serialized this transaction because > > > + * it had incomplete tuple. So if now we have got the complete tuple we > > > + * can stream it. > > > + */ > > > + if (ReorderBufferCanStream(rb) && can_stream && > > > rbtxn_is_serialized(toptxn) > > > + && !(rbtxn_has_toast_insert(txn)) && !(rbtxn_has_spec_insert(txn))) > > > + { > > > > > > This comment is just saying what you are doing in the if-check. I > > > think you need to explain the rationale behind it. I don't like the > > > variable name 'can_stream' because it matches ReorderBufferCanStream > > > whereas it is for a different purpose, how about naming it as > > > 'change_complete' or something like that. The check has many > > > conditions, can we move it to a separate function to make the code > > > here look clean? > > > > > > > Do we really need this? Immediately after this check, we are calling > > ReorderBufferCheckMemoryLimit which will anyway stream the changes if > > required. > > Actually, ReorderBufferCheckMemoryLimit is only meant for checking > whether we need to stream the changes due to the memory limit. But > suppose when memory limit exceeds that time we could not stream the > transaction because there was only incomplete toast insert so we > serialized. Now, when we get the tuple which makes the changes > complete but now it is not crossing the memory limit as changes were > already serialized. So I am not sure whether it is a good idea to > stream the transaction as soon as we get the complete changes or we > shall wait till next time memory limit exceed and that time we select > the suitable candidate. > I think it is better to wait till next time we exceed the memory threshold. > Ideally, we were are in streaming more and > the transaction is serialized means it was already a candidate for > streaming but could not stream due to the incomplete changes so > shouldn't we stream it immediately as soon as its changes are complete > even though now we are in memory limit. > The only time we need to stream or spill is when we exceed memory threshold. In the above case, it is possible that next time there is some other candidate transaction that we can stream. > > > > Another comments on v20-0010-Bugfix-handling-of-incomplete-toast-tuple: > > > > + else if (rbtxn_has_toast_insert(txn) && > > + ChangeIsInsertOrUpdate(change->action)) > > + { > > + toptxn->txn_flags &= ~RBTXN_HAS_TOAST_INSERT; > > + can_stream = true; > > + } > > .. > > +#define ChangeIsInsertOrUpdate(action) \ > > + (((action) == REORDER_BUFFER_CHANGE_INSERT) || \ > > + ((action) == REORDER_BUFFER_CHANGE_UPDATE) || \ > > + ((action) == REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT)) > > > > How can we clear the RBTXN_HAS_TOAST_INSERT flag on > > REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT action? > > Partial toast insert means we have inserted in the toast but not in > the main table. So even if it is spec insert we can form the complete > tuple, however, we can still not stream it because we haven't got > spec_confirm but for that, we are marking another flag. So if the > insert is aspect insert the toast insert will also be spec insert and > as part of that toast, spec inserts we are marking partial tuple so > cleaning that flag should happen when the spec insert is done for the > main table right? > Sounds reasonable. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Expand the use of check_canonical_path() for more GUCs
On 2020-05-19 09:09, Michael Paquier wrote: While digging into my backlog, I have found this message from Peter E mentioning about $subject: https://www.postgresql.org/message-id/e6aac026-174c-9952-689f-6bee76f9a...@2ndquadrant.com It seems to me that it would be a good idea to make those checks more consistent, and attached is a patch. That thread didn't resolve why check_canonical_path() is necessary there. Maybe the existing uses could be removed? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Optimizer docs typos
On Mon, May 18, 2020 at 7:45 PM Richard Guo wrote: > In this same README doc, another suspicious typo to me, which happens in > section "Optimizer Functions", is in the prefix to query_planner(), > we should have three dashes, rather than two, since query_planner() is > called within grouping_planner(). > > diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README > index 7dcab9a..bace081 100644 > --- a/src/backend/optimizer/README > +++ b/src/backend/optimizer/README > @@ -315,7 +315,7 @@ set up for recursive handling of subqueries >preprocess target list for non-SELECT queries >handle UNION/INTERSECT/EXCEPT, GROUP BY, HAVING, aggregates, > ORDER BY, DISTINCT, LIMIT > ---query_planner() > +---query_planner() > make list of base relations used in query > split up the qual into restrictions (a=1) and joins (b=c) > find qual clauses that enable merge and hash joins Yeah, you are right. Another one would be in the prefix to standard_join_search(); I think it might be better to have six dashes, rather than five, because standard_join_search() is called within make_rel_from_joinlist(). Best regards, Etsuro Fujita
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
On Tue, May 19, 2020 at 2:34 PM Amit Kapila wrote: > > On Mon, May 18, 2020 at 5:57 PM Amit Kapila wrote: > > > > > > 3. > > + /* > > + * If streaming is enable and we have serialized this transaction because > > + * it had incomplete tuple. So if now we have got the complete tuple we > > + * can stream it. > > + */ > > + if (ReorderBufferCanStream(rb) && can_stream && > > rbtxn_is_serialized(toptxn) > > + && !(rbtxn_has_toast_insert(txn)) && !(rbtxn_has_spec_insert(txn))) > > + { > > > > This comment is just saying what you are doing in the if-check. I > > think you need to explain the rationale behind it. I don't like the > > variable name 'can_stream' because it matches ReorderBufferCanStream > > whereas it is for a different purpose, how about naming it as > > 'change_complete' or something like that. The check has many > > conditions, can we move it to a separate function to make the code > > here look clean? > > > > Do we really need this? Immediately after this check, we are calling > ReorderBufferCheckMemoryLimit which will anyway stream the changes if > required. Actually, ReorderBufferCheckMemoryLimit is only meant for checking whether we need to stream the changes due to the memory limit. But suppose when memory limit exceeds that time we could not stream the transaction because there was only incomplete toast insert so we serialized. Now, when we get the tuple which makes the changes complete but now it is not crossing the memory limit as changes were already serialized. So I am not sure whether it is a good idea to stream the transaction as soon as we get the complete changes or we shall wait till next time memory limit exceed and that time we select the suitable candidate. Ideally, we were are in streaming more and the transaction is serialized means it was already a candidate for streaming but could not stream due to the incomplete changes so shouldn't we stream it immediately as soon as its changes are complete even though now we are in memory limit. Because our target is to stream not spill so we should try to stream the spilled changes on the first opportunity. Can we move the changes related to the detection of > incomplete data to a separate function? Ok. > > Another comments on v20-0010-Bugfix-handling-of-incomplete-toast-tuple: > > + else if (rbtxn_has_toast_insert(txn) && > + ChangeIsInsertOrUpdate(change->action)) > + { > + toptxn->txn_flags &= ~RBTXN_HAS_TOAST_INSERT; > + can_stream = true; > + } > .. > +#define ChangeIsInsertOrUpdate(action) \ > + (((action) == REORDER_BUFFER_CHANGE_INSERT) || \ > + ((action) == REORDER_BUFFER_CHANGE_UPDATE) || \ > + ((action) == REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT)) > > How can we clear the RBTXN_HAS_TOAST_INSERT flag on > REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT action? Partial toast insert means we have inserted in the toast but not in the main table. So even if it is spec insert we can form the complete tuple, however, we can still not stream it because we haven't got spec_confirm but for that, we are marking another flag. So if the insert is aspect insert the toast insert will also be spec insert and as part of that toast, spec inserts we are marking partial tuple so cleaning that flag should happen when the spec insert is done for the main table right? > IIUC, the basic idea used to handle incomplete changes (which is > possible in case of toast tuples and speculative inserts) is to mark > such TXNs as containing incomplete changes and then while finding the > largest top-level TXN for streaming, we ignore such TXN's and move to > next largest TXN. If none of the TXNs have complete changes then we > choose the largest (sub)transaction and spill the same to make the > in-memory changes below logical_decoding_work_mem threshold. This > idea can work but the strategy to choose the transaction is suboptimal > for cases where TXNs have some changes which are complete followed by > an incomplete toast or speculative tuple. I was having an offlist > discussion with Robert on this problem and he suggested that it would > be better if we track the complete part of changes separately and then > we can avoid the drawback mentioned above. I have thought about this > and I think it can work if we track the size and LSN of completed > changes. I think we need to ensure that if there is concurrent abort > then we discard all changes for current (sub)transaction not only up > to completed changes LSN whereas if the streaming is successful then > we can truncate the changes only up to completed changes LSN. What do > you think? > > I wonder why you have done this as 0010 in the patch series, it should > be as 0006 after the > 0005-Implement-streaming-mode-in-ReorderBuffer.patch. If we can do > that way then it would be easier for me to review. Is there a reason > for not doing so? No reason, I can do that. Actually, later we can merge the changes to 0005 only, I kept separate for review. Anyway, in the next
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
On Mon, May 18, 2020 at 5:57 PM Amit Kapila wrote: > > > 3. > + /* > + * If streaming is enable and we have serialized this transaction because > + * it had incomplete tuple. So if now we have got the complete tuple we > + * can stream it. > + */ > + if (ReorderBufferCanStream(rb) && can_stream && rbtxn_is_serialized(toptxn) > + && !(rbtxn_has_toast_insert(txn)) && !(rbtxn_has_spec_insert(txn))) > + { > > This comment is just saying what you are doing in the if-check. I > think you need to explain the rationale behind it. I don't like the > variable name 'can_stream' because it matches ReorderBufferCanStream > whereas it is for a different purpose, how about naming it as > 'change_complete' or something like that. The check has many > conditions, can we move it to a separate function to make the code > here look clean? > Do we really need this? Immediately after this check, we are calling ReorderBufferCheckMemoryLimit which will anyway stream the changes if required. Can we move the changes related to the detection of incomplete data to a separate function? Another comments on v20-0010-Bugfix-handling-of-incomplete-toast-tuple: + else if (rbtxn_has_toast_insert(txn) && + ChangeIsInsertOrUpdate(change->action)) + { + toptxn->txn_flags &= ~RBTXN_HAS_TOAST_INSERT; + can_stream = true; + } .. +#define ChangeIsInsertOrUpdate(action) \ + (((action) == REORDER_BUFFER_CHANGE_INSERT) || \ + ((action) == REORDER_BUFFER_CHANGE_UPDATE) || \ + ((action) == REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT)) How can we clear the RBTXN_HAS_TOAST_INSERT flag on REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT action? IIUC, the basic idea used to handle incomplete changes (which is possible in case of toast tuples and speculative inserts) is to mark such TXNs as containing incomplete changes and then while finding the largest top-level TXN for streaming, we ignore such TXN's and move to next largest TXN. If none of the TXNs have complete changes then we choose the largest (sub)transaction and spill the same to make the in-memory changes below logical_decoding_work_mem threshold. This idea can work but the strategy to choose the transaction is suboptimal for cases where TXNs have some changes which are complete followed by an incomplete toast or speculative tuple. I was having an offlist discussion with Robert on this problem and he suggested that it would be better if we track the complete part of changes separately and then we can avoid the drawback mentioned above. I have thought about this and I think it can work if we track the size and LSN of completed changes. I think we need to ensure that if there is concurrent abort then we discard all changes for current (sub)transaction not only up to completed changes LSN whereas if the streaming is successful then we can truncate the changes only up to completed changes LSN. What do you think? I wonder why you have done this as 0010 in the patch series, it should be as 0006 after the 0005-Implement-streaming-mode-in-ReorderBuffer.patch. If we can do that way then it would be easier for me to review. Is there a reason for not doing so? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
RE: PostgresSQL project
Hi Peter Thanks for the prompt response. Yes, this is an open source project which will be shared with the community. We will also consider hiring appropriate consultants. At a summary level, we have a proven approach for how a relational database can provide comprehensive logical insert, update and delete functionality through an append only paradigm which effectively delivers a perfect audit ie a means to access the database at any point in the audit (time) and for all data to be in a relationally correct state. In so doing, we are removing the need for the use of update and delete code (as presently used) with enhancements to the insert code module. We presently achieve this affect by use of a generator which creates an application specific database environment for append only (with a normal current view schema being the input for the generator). The specification for the requirements of the insert code module is very detailed. Our challenge is to identify appropriate PostgreSQL architects/programmers who have experience of the PostgreSQL database kernel. More specifically, to outline the general approach and work packages to go about this. Regards Luke -Original Message- From: Peter Eisentraut Sent: 18 May 2020 22:40 To: Luke Porter ; pgsql-hackers@lists.postgresql.org Subject: Re: PostgresSQL project On 2020-05-18 18:21, Luke Porter wrote: > I am a member of a small UK based team with extensive database > experience. We are considering a project using PostgresSQL source code > which only uses the insert data capabilities. > > Is there a contact who we could speak with and discuss our project > aims in principal. If yours is an open-source project that you eventually want to share with the community, then you can discuss it on this mailing list. If it is a closed-source, proprietary, or in-house project, then the community isn't the right place to discuss it, but you could hire professional consultants to help you, depending on your needs. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Expand the use of check_canonical_path() for more GUCs
Hi all, While digging into my backlog, I have found this message from Peter E mentioning about $subject: https://www.postgresql.org/message-id/e6aac026-174c-9952-689f-6bee76f9a...@2ndquadrant.com It seems to me that it would be a good idea to make those checks more consistent, and attached is a patch. Thoughts? -- Michael diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 2f3e0a70e0..2be59caf60 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -3781,7 +3781,7 @@ static struct config_string ConfigureNamesString[] = }, , "", - NULL, NULL, NULL + check_canonical_path, NULL, NULL }, { @@ -3903,7 +3903,7 @@ static struct config_string ConfigureNamesString[] = }, _krb_server_keyfile, PG_KRB_SRVTAB, - NULL, NULL, NULL + check_canonical_path, NULL, NULL }, { @@ -4197,7 +4197,7 @@ static struct config_string ConfigureNamesString[] = }, _directory, NULL, - NULL, NULL, NULL + check_canonical_path, NULL, NULL }, { @@ -4208,7 +4208,7 @@ static struct config_string ConfigureNamesString[] = }, , NULL, - NULL, NULL, NULL + check_canonical_path, NULL, NULL }, { @@ -4219,7 +4219,7 @@ static struct config_string ConfigureNamesString[] = }, , NULL, - NULL, NULL, NULL + check_canonical_path, NULL, NULL }, { @@ -4230,7 +4230,7 @@ static struct config_string ConfigureNamesString[] = }, , NULL, - NULL, NULL, NULL + check_canonical_path, NULL, NULL }, { @@ -4266,7 +4266,7 @@ static struct config_string ConfigureNamesString[] = }, _cert_file, "server.crt", - NULL, NULL, NULL + check_canonical_path, NULL, NULL }, { @@ -4276,7 +4276,7 @@ static struct config_string ConfigureNamesString[] = }, _key_file, "server.key", - NULL, NULL, NULL + check_canonical_path, NULL, NULL }, { @@ -4286,7 +4286,7 @@ static struct config_string ConfigureNamesString[] = }, _ca_file, "", - NULL, NULL, NULL + check_canonical_path, NULL, NULL }, { @@ -4296,7 +4296,7 @@ static struct config_string ConfigureNamesString[] = }, _crl_file, "", - NULL, NULL, NULL + check_canonical_path, NULL, NULL }, { @@ -4369,7 +4369,7 @@ static struct config_string ConfigureNamesString[] = }, _dh_params_file, "", - NULL, NULL, NULL + check_canonical_path, NULL, NULL }, { signature.asc Description: PGP signature
Re: Add A Glossary
I think there needs to be a careful analysis of the language and a formal effort to stabilise it for the future. In the context of, say, an Oracle T series, which is partitioned into multiple domains (virtual machines) in it, each of these has multiple CPUs, and can run an instance of the OS which hosts multiple virtual instances of the same or different OSes. Som domains might do this while others do not! A host could be a domain, one of many virtual machines, or it could be one of many hosts on that VM but even these hosts could be virtual machines that each runs several virtual servers! Of course, PostgreSQL can run on any tier of this regime, but the documentation at least needs to be consistent about language. A "machine" should probably refer to hardware, although I would accept that a domain might count as "virtual hardware" while a host should probably refer to a single instance of OS. Of course it is possible for a single instance of OS to run multiple instances of PostgreSQL, and people do this. (I have in the past). Slightly more confusingly, it would appear possible for a single instance of an OS to have multiple IP addresses and if there are multiple instances of PostgreSQL, they may serve different IP Addresses uniquely, or share them. I think this case suggests that a host probably best describes an OS instance. I might be wrong. The word "server" might be an instance of any of the above, or a waiter with a bowl of soup. It is best reserved for situations where clarity is not required. If you are new to all this, I am sure it is very confusing, and inconsistent language is not going to help. Andrew AFAICT On Tue, 19 May 2020 at 07:17, Laurenz Albe wrote: > On Mon, 2020-05-18 at 18:08 +0200, Jürgen Purtz wrote: > > cluster/instance: PG (mainly) consists of a group of processes that > commonly > > act on shared buffers. The processes are very closely related to each > other > > and with the buffers. They exist altogether or not at all. They use a > common > > initialization file and are incarnated by one command. Everything exists > > solely in RAM and therefor has a fluctuating nature. In summary: they > build > > a unit and this unit needs to have a name of itself. In some pages we > used > > to use the term *instance* - sometimes in extended forms: *database > instance*, > > *PG instance*, *standby instance*, *standby server instance*, *server > instance*, > > or *remote instance*. For me, the term *instance* makes sense, the > extensions > > *standby instance* and *remote instance* in their context too. > > FWIW, I feel somewhat like Alvaro on that point; I use those terms > synonymously, > perhaps distinguishing between a "started cluster" and a "stopped cluster". > After all, "cluster" refers to "a cluster of databases", which are there, > regardless > if you start the server or not. > > The term "cluster" is unfortunate, because to most people it suggests a > group of > machines, so the term "instance" is better, but that ship has sailed long > ago. > > The static part of a cluster to me is the "data directory". > > > server/host: We need a term to describe the underlying hardware > respectively > > the virtual machine or container, where PG is running. I suggest to use > both > > *server* and *host*. In computer science, both have their eligibility > and are > > widely used. Everybody understands *client/server architecture* or > *host* in > > TCP/IP configuration. We cannot change such matter of course. I suggest > to > > use both depending on the context, but with the same meaning: "real > hardware, > > a container, or a virtual machine". > > On this I have a strong opinion because of my Unix mindset. > "machine" and "host" are synonyms, and it doesn't matter to the database > if they > are virtualized or not. You can always disambiguate by adding "virtual" > or "physical". > > A "server" is a piece of software that responds to client requests, never > a machine. > In my book, this is purely Windows jargon. The term "client-server > architecture" > that you quote emphasized that. > > Perhaps "machine" would be the preferable term, because "host" is more > prone to > misunderstandings (except in a networking context). > > Yours, > Laurenz Albe > > > >
some grammar refactoring
Here is a series of patches to do some refactoring in the grammar around the commands COMMENT, DROP, SECURITY LABEL, and ALTER EXTENSION ... ADD/DROP. In the grammar, these commands (with some exceptions) basically just take a reference to an object and later look it up in C code. Some of that was already generalized individually for each command (drop_type_any_name, drop_type_name, etc.). This patch combines it into common lists for all these commands. Advantages: - Avoids having to list each object type at least four times. - Object types not supported by security labels or extensions are now explicitly listed and give a proper error message. Previously, this was just encoded in the grammar itself and specifying a non-supported object type would just give a parse error. - Reduces lines of code in gram.y. - Removes some old cruft. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From bdd2440c66292eb2464a49fdc57ff5fa86838f0a Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Fri, 10 Apr 2020 11:22:19 +0200 Subject: [PATCH v1 1/6] Remove redundant grammar symbols access_method, database_name, and index_name are all just name, and they are not used consistently for their alleged purpose, so remove them. They have been around since ancient times but have no current reason for existing. Removing them can simplify future grammar refactoring. --- src/backend/parser/gram.y| 86 +++- src/interfaces/ecpg/preproc/ecpg.trailer | 4 +- 2 files changed, 41 insertions(+), 49 deletions(-) diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 3c78f2d1b5..378390af5d 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -352,9 +352,9 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); %type enable_trigger %type copy_file_name - database_name access_method_clause access_method attr_name + access_method_clause attr_name table_access_method_clause name cursor_name file_name - index_name opt_index_name cluster_index_specification + opt_index_name cluster_index_specification %typefunc_name handler_name qual_Op qual_all_Op subquery_Op opt_class opt_inline_handler opt_validator validator_clause @@ -1184,7 +1184,7 @@ AlterRoleStmt: opt_in_database: /* EMPTY */ { $$ = NULL; } - | IN_P DATABASE database_name { $$ = $3; } + | IN_P DATABASE name{ $$ = $3; } ; AlterRoleSetStmt: @@ -3950,7 +3950,7 @@ part_elem: ColId opt_collate opt_class ; table_access_method_clause: - USING access_method { $$ = $2; } + USING name { $$ = $2; } | /*EMPTY*/ { $$ = NULL; } ; @@ -3975,7 +3975,7 @@ OptConsTableSpace: USING INDEX TABLESPACE name { $$ = $4; } | /*EMPTY*/ { $$ = NULL; } ; -ExistingIndex: USING INDEX index_name{ $$ = $3; } +ExistingIndex: USING INDEX name { $$ = $3; } ; /* @@ -4640,7 +4640,7 @@ AlterExtensionContentsStmt: n->object = (Node *) $6; $$ = (Node *)n; } - | ALTER EXTENSION name add_drop OPERATOR CLASS any_name USING access_method + | ALTER EXTENSION name add_drop OPERATOR CLASS any_name USING name { AlterExtensionContentsStmt *n = makeNode(AlterExtensionContentsStmt); n->extname = $3; @@ -4649,7 +4649,7 @@ AlterExtensionContentsStmt: n->object = (Node *) lcons(makeString($9), $7); $$ = (Node *)n; } - | ALTER EXTENSION name add_drop OPERATOR FAMILY any_name USING access_method + | ALTER EXTENSION name add_drop OPERATOR FAMILY any_name USING name { AlterExtensionContentsStmt *n = makeNode(AlterExtensionContentsStmt);
pg_dump dumps row level policies on extension tables
Hi, I noticed that if a row level policy is defined on an extension object, even in the extension creation script, pg_dump dumps a separate CREATE POLICY statement for such policies. That makes the dump unrestorable because the CREATE EXTENSION and CREATE POLICY then conflicts. Here is a simple example. I just abused the pageinspect contrib module to demonstrate the problem. ``` diff --git a/contrib/pageinspect/pageinspect--1.5.sql b/contrib/pageinspect/pageinspect--1.5.sql index 1e40c3c97e..f04d70d1c1 100644 --- a/contrib/pageinspect/pageinspect--1.5.sql +++ b/contrib/pageinspect/pageinspect--1.5.sql @@ -277,3 +277,9 @@ CREATE FUNCTION gin_leafpage_items(IN page bytea, RETURNS SETOF record AS 'MODULE_PATHNAME', 'gin_leafpage_items' LANGUAGE C STRICT PARALLEL SAFE; + +-- sample table +CREATE TABLE pf_testtab (a int, b int); +-- sample policy +CREATE POLICY p1 ON pf_testtab +FOR SELECT USING (true); ``` If I now take a dump of a database with pageinspect extension created, the dump has the following. ``` -- -- Name: pageinspect; Type: EXTENSION; Schema: -; Owner: -- CREATE EXTENSION IF NOT EXISTS pageinspect WITH SCHEMA public; -- -- Name: pf_testtab p1; Type: POLICY; Schema: public; Owner: pavan -- CREATE POLICY p1 ON public.pf_testtab FOR SELECT USING (true); ``` That's a problem. The CREATE POLICY statement fails during restore because CREATE EXTENSION already creates the policy. Are we missing recording dependency on extension for row level policies? Or somehow pg_dump should skip dumping those policies? Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Missing grammar production for WITH TIES
On Tue, May 19, 2020 at 12:41:39AM -0400, Tom Lane wrote: > Michael Paquier writes: >> This has been committed just after beta1 has been stamped. So it >> means that it won't be included in it, right? > > Right. Still, wouldn't it be better to wait until the version is tagged? My understanding is that we had better not commit anything on a branch planned for release between the moment the version is stamped and the moment the tag is pushed so as we have a couple of days to address any complaints from -packagers. Here, we are in a state where we have between the stamp time and tag time an extra commit not related to a packaging issue. So, if it happens that we have an issue from -packagers to address, then we would have to include c301c2e in the beta1. Looking at the patch committed, that's not much of an issue, but I think that we had better avoid that. -- Michael signature.asc Description: PGP signature
Re: Add A Glossary
On Mon, 2020-05-18 at 18:08 +0200, Jürgen Purtz wrote: > cluster/instance: PG (mainly) consists of a group of processes that commonly > act on shared buffers. The processes are very closely related to each other > and with the buffers. They exist altogether or not at all. They use a common > initialization file and are incarnated by one command. Everything exists > solely in RAM and therefor has a fluctuating nature. In summary: they build > a unit and this unit needs to have a name of itself. In some pages we used > to use the term *instance* - sometimes in extended forms: *database instance*, > *PG instance*, *standby instance*, *standby server instance*, *server > instance*, > or *remote instance*. For me, the term *instance* makes sense, the extensions > *standby instance* and *remote instance* in their context too. FWIW, I feel somewhat like Alvaro on that point; I use those terms synonymously, perhaps distinguishing between a "started cluster" and a "stopped cluster". After all, "cluster" refers to "a cluster of databases", which are there, regardless if you start the server or not. The term "cluster" is unfortunate, because to most people it suggests a group of machines, so the term "instance" is better, but that ship has sailed long ago. The static part of a cluster to me is the "data directory". > server/host: We need a term to describe the underlying hardware respectively > the virtual machine or container, where PG is running. I suggest to use both > *server* and *host*. In computer science, both have their eligibility and are > widely used. Everybody understands *client/server architecture* or *host* in > TCP/IP configuration. We cannot change such matter of course. I suggest to > use both depending on the context, but with the same meaning: "real hardware, > a container, or a virtual machine". On this I have a strong opinion because of my Unix mindset. "machine" and "host" are synonyms, and it doesn't matter to the database if they are virtualized or not. You can always disambiguate by adding "virtual" or "physical". A "server" is a piece of software that responds to client requests, never a machine. In my book, this is purely Windows jargon. The term "client-server architecture" that you quote emphasized that. Perhaps "machine" would be the preferable term, because "host" is more prone to misunderstandings (except in a networking context). Yours, Laurenz Albe
Re: could not stat promote trigger file leads to shutdown
On Wed, Dec 04, 2019 at 11:52:33AM +0100, Peter Eisentraut wrote: > Is it possible to do this in a mostly bullet-proof way? Just because the > directory exists and looks pretty good otherwise, doesn't mean we can read a > file created in it later in a way that doesn't fall afoul of the existing > error checks. There could be something like SELinux lurking, for example. > > Maybe some initial checking would be useful, but I think we still need to > downgrade the error check at use time a bit to not crash in the cases that > we miss. I got that thread in my backlog for some time, and was not able to come back to it. Reading it again the thread, it seems to me that using a LOG would make the promote file handling more consistent with what we do for the SSL context reload. Still, one downside I can see here is that this causes the backend to create a new LOG entry each time the promote file is checked, aka each time we check if WAL is available. Couldn't that bloat be a problem? During the SSL reload, we only generate LOG entries for each backend on SIGHUP. -- Michael signature.asc Description: PGP signature