Re: pg_dump multi VALUES INSERT
On Wed, 23 Jan 2019 at 02:13, Surafel Temesgen wrote: > okay .thank you for explaining. i attach a patch corrected as such I did a bit of work to this to fix a bunch of things: 1. Docs for --rows-per-insert didn't mention anything about a parameter. 2. You'd not followed the alphabetical order of how the parameters are documented. 3. Various parts of the docs claimed that --inserts just inserted 1 row per statement. Those needed to be updated. 4. New options out of order in --help. The rest were in alphabetical order. 5. DumpOptions struct variable was not in the right place. It was grouped in with some parameterless options. 6. Code in dumpTableData_insert() was convoluted. Not sure what you had added end_of_statement for or why you were checking PQntuples(res) == 0. You'd also made the number_of_row variable 1-based and set it to 1 when we had added 0 rows. You then checked for the existence of 1 row by checking the variable was > 1... That made very little sense to me. I've pretty much put back the code that I had sent to you previously, just without the part where I split the row building code out into another function. 7. A comment in dumpTableData_insert() claimed that the insertStmt would end in "VALUES(", but it'll end in "VALUES ". I had updated that in my last version, but you must have missed that. 8. I've made it so --rows-per-insert implies --inserts. This is aligned with how --column-inserts behaves. I changed a few other things. I simplified the condition that raises an error when someone does --on-conflict-do-nothing without the --inserts option. There was no need to check for the other options that imply --inserts since that will already be enabled if one of the other options has. I also removed most of the error checking you'd added to ensure that the --rows-per-insert parameter was a number. I'd have kept this but I saw that we do nothing that special for the compression option. I didn't see why --rows-per-insert was any more special. It was quite a bit of code for very little reward. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services pg_dump-rows-per-insert-option_v10.patch Description: Binary data
Re: Protect syscache from bloating with negative cache entries
At Mon, 21 Jan 2019 17:22:55 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI wrote in <20190121.172255.226467552.horiguchi.kyot...@lab.ntt.co.jp> > An option is an additional PGPROC member and interface functions. > > struct PGPROC > { > ... > int syscahe_usage_track_interval; /* track interval, 0 to disable */ > > =# select syscahce_usage_track_add(, [, ]); > =# select syscahce_usage_track_remove(2134); > > > Or, just provide an one-shot triggering function. > > =# select syscahce_take_usage_track(); > > This can use both a similar PGPROC variable or SendProcSignal() > but the former doesn't fire while idle time unless using timer. The attached is revised version of this patchset, where the third patch is the remote setting feature. It uses static shared memory. =# select pg_backend_catcache_stats(, ); Activates or changes catcache stats feature on the backend with PID. (The name should be changed to .._syscache_stats, though.) It is far smaller than the remote-GUC feature. (It contains a part that should be in the previous patch. I will fix it later.) regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 067f8ad60f259453271d2bf8323505beb5b9e0a9 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Tue, 16 Oct 2018 13:04:30 +0900 Subject: [PATCH 1/3] Remove entries that haven't been used for a certain time Catcache entries can be left alone for several reasons. It is not desirable that they eat up memory. With this patch, This adds consideration of removal of entries that haven't been used for a certain time before enlarging the hash array. --- doc/src/sgml/config.sgml | 38 ++ src/backend/access/transam/xact.c | 5 + src/backend/utils/cache/catcache.c| 166 -- src/backend/utils/misc/guc.c | 23 src/backend/utils/misc/postgresql.conf.sample | 2 + src/include/utils/catcache.h | 28 - 6 files changed, 254 insertions(+), 8 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index b6f5822b84..af3c52b868 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1662,6 +1662,44 @@ include_dir 'conf.d' + + syscache_memory_target (integer) + + syscache_memory_target configuration parameter + + + + +Specifies the maximum amount of memory to which syscache is expanded +without pruning. The value defaults to 0, indicating that pruning is +always considered. After exceeding this size, syscache pruning is +considered according to +. If you need to keep +certain amount of syscache entries with intermittent usage, try +increase this setting. + + + + + + syscache_prune_min_age (integer) + + syscache_prune_min_age configuration parameter + + + + +Specifies the minimum amount of unused time in seconds at which a +syscache entry is considered to be removed. -1 indicates that syscache +pruning is disabled at all. The value defaults to 600 seconds +(10 minutes). The syscache entries that are not +used for the duration can be removed to prevent syscache bloat. This +behavior is suppressed until the size of syscache exceeds +. + + + + max_stack_depth (integer) diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index 18467d96d2..dbffec8067 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -733,7 +733,12 @@ void SetCurrentStatementStartTimestamp(void) { if (!IsParallelWorker()) + { stmtStartTimestamp = GetCurrentTimestamp(); + + /* Set this timestamp as aproximated current time */ + SetCatCacheClock(stmtStartTimestamp); + } else Assert(stmtStartTimestamp != 0); } diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c index 8152f7e21e..ee40093553 100644 --- a/src/backend/utils/cache/catcache.c +++ b/src/backend/utils/cache/catcache.c @@ -72,9 +72,24 @@ #define CACHE6_elog(a,b,c,d,e,f,g) #endif +/* + * GUC variable to define the minimum size of hash to cosider entry eviction. + * This variable is shared among various cache mechanisms. + */ +int cache_memory_target = 0; + +/* GUC variable to define the minimum age of entries that will be cosidered to + * be evicted in seconds. This variable is shared among various cache + * mechanisms. + */ +int cache_prune_min_age = 600; + /* Cache management header --- pointer is NULL until created */ static CatCacheHeader *CacheHdr = NULL; +/* Timestamp used for any operation on caches. */ +TimestampTz catcacheclock = 0; + static inline HeapTuple SearchCatCacheInternal(CatCache *cache, int nkeys, Datum v1, Datum v2, @@ -491,6 +506,7 @@ CatCacheRemoveCTup(CatCache *cache, CatCTup *ct
Analyze all plans
Hi, I'm working on an extension which analyzes all possible plans generated by the planner. I believe this extension would become useful for benchmarking the planner (e.g. the performance of the estimation and the cost model) and better understanding the cases where the planners would make a suboptimal move. Here are my procedures: 1. Enumerate all the plans 1.1 Create a hook for add_path so that it won't discard the expensive paths from the planner's point of view. 1.2 Collect all the paths from final_rel->pathlist, and turn them into PlannedStmt nodes by patching standard_planner. 1.3 Mark the cheapest path from the planner's point of view. 2. Explain all the plans 2.1 First explain the cheapest plan 2.1.1 If analyzing, collect the execution time and use it to set a timer interrupt. 2.2 Explain the remaining plans 2.2.1 The other plans can be disastrous; the plans may never finish in a reasonable amount of time. If analyzing, the timer interrupt shall stop the executor. 2.2.2 Move on to the next plan Are those procedures reasonable? I'm able to implement all the steps except for 2.2.1. - Attempt 1 Our most common way of handling the timeouts is to kill the process. However, it would terminate the entire explain statement, yet there're still plans to be explained. - Attempt 2 Fork before explain so it would possible to kill the child process without disrupting the explain statement. However, simply allocating shared memory for the QueryDesc would not work (PANIC: failed to re-find shared lock object). To me, this plan is more doable, but it seems there exist other mechanisms I need to be aware, to execute a plan in the child process and report the result in the parent process? What do you think? I will appreciate any feedback. Thank you, Donald Dong
yet another comment typo patch
An assorted set of command typos is fixed in the attached patch. -- Fabien.diff --git a/contrib/pgcrypto/pgp-decrypt.c b/contrib/pgcrypto/pgp-decrypt.c index 7d31e5354b..96c344c30b 100644 --- a/contrib/pgcrypto/pgp-decrypt.c +++ b/contrib/pgcrypto/pgp-decrypt.c @@ -132,7 +132,7 @@ pgp_parse_pkt_hdr(PullFilter *src, uint8 *tag, int *len_p, int allow_ctx) int res; uint8 *p; - /* EOF is normal here, thus we dont use GETBYTE */ + /* EOF is normal here, thus we don't use GETBYTE */ res = pullf_read(src, 1, &p); if (res < 0) return res; diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index d85a83abe9..961838ed93 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -2771,7 +2771,7 @@ estimate_path_cost_size(PlannerInfo *root, run_cost += cpu_tuple_cost * numGroups; run_cost += ptarget->cost.per_tuple * numGroups; - /* Accout for the eval cost of HAVING quals, if any */ + /* Account for the eval cost of HAVING quals, if any */ if (root->parse->havingQual) { QualCost remote_cost; diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c index fb37e2b6c4..b18ae2b3ed 100644 --- a/src/backend/access/gin/ginget.c +++ b/src/backend/access/gin/ginget.c @@ -442,7 +442,7 @@ restartScanEntry: /* * Lock the entry leaf page. This is more coarse-grained than * necessary, because it will conflict with any insertions that - * land on the same leaf page, not only the exacty key we searched + * land on the same leaf page, not only the exact key we searched * for. But locking an individual tuple would require updating * that lock whenever it moves because of insertions or vacuums, * which seems too complicated. diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c index d793efdd9c..5a7206f588 100644 --- a/src/backend/executor/execExprInterp.c +++ b/src/backend/executor/execExprInterp.c @@ -1873,7 +1873,7 @@ CheckOpSlotCompatibility(ExprEvalStep *op, TupleTableSlot *slot) /* * Should probably fixed at some point, but for now it's easier to allow - * buffer and heap tuples to be used interchangably. + * buffer and heap tuples to be used interchangeably. */ if (slot->tts_ops == &TTSOpsBufferHeapTuple && op->d.fetch.kind == &TTSOpsHeapTuple) diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c index d6cfd28ddc..ceea3d6d72 100644 --- a/src/backend/executor/execParallel.c +++ b/src/backend/executor/execParallel.c @@ -1049,7 +1049,7 @@ ExecParallelRetrieveJitInstrumentation(PlanState *planstate, MemoryContextAllocZero(planstate->state->es_query_cxt, sizeof(JitInstrumentation)); combined = planstate->state->es_jit_worker_instr; - /* Accummulate all the workers' instrumentations. */ + /* Accumulate all the workers' instrumentations. */ for (n = 0; n < shared_jit->num_workers; ++n) InstrJitAgg(combined, &shared_jit->jit_instr[n]); diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index 0c0891b33e..e773f20d9f 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -2243,7 +2243,7 @@ check_log_duration(char *msec_str, bool was_logged) /* * Do not log if log_statement_sample_rate = 0. Log a sample if - * log_statement_sample_rate <= 1 and avoid unecessary random() call + * log_statement_sample_rate <= 1 and avoid unnecessary random() call * if log_statement_sample_rate = 1. But don't compute any of this * unless needed. */ diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c index ba28611d8c..f09e3a9aff 100644 --- a/src/backend/utils/cache/inval.c +++ b/src/backend/utils/cache/inval.c @@ -8,7 +8,7 @@ * When a tuple is updated or deleted, our standard visibility rules * consider that it is *still valid* so long as we are in the same command, * ie, until the next CommandCounterIncrement() or transaction commit. - * (See acces/heap/heapam_visibility.c, and note that system catalogs are + * (See access/heap/heapam_visibility.c, and note that system catalogs are * generally scanned under the most current snapshot available, rather than * the transaction snapshot.) At the command boundary, the old tuple stops * being valid and the new version, if any, becomes valid. Therefore, diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c index bca2dc118d..2fb7dd3b8e 100644 --- a/src/backend/utils/cache/typcache.c +++ b/src/backend/utils/cache/typcache.c @@ -151,7 +151,7 @@ typedef struct RecordCacheEntry /* * To deal with non-anonymous record types that are exchanged by backends - * involved in a parallel query, we also need a shared verion of the above. + * involved in a parallel query, we also need a shared version of the above. */ struct SharedRecordTypmodRegistry { diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pg
Re: Analyze all plans
On Wed, Jan 23, 2019 at 9:44 AM Donald Dong wrote: > > 1. Enumerate all the plans > Not sure this is going to work. Because of the total number of possible plans is somewhere around O(n!), if I'm not mistaken, in terms of number of joined relations times the available access methods times the possible join strategies. So enumerating all possible plans stops being practical for even slightly complicated queries. Regards, -- Alex
Re: pg_dump multi VALUES INSERT
Hello David & Surafel, About this v10: Patch applies and compiles cleanly, local & global "make check" ok. A few comments, possibly redundant with some already in the thread. Out of abc-order rows-per-inserts option in getopt struct. help string does not specify the expected argument. I still think that the added rows_per_insert field is useless, ISTM That "int dump_inserts" can be reused, and you could also drop boolean got_rows_per_insert. The feature is not tested anywhere. I still think that there should be a test on empty/small/larger-than-rows-per-insert tables, possibly added to existing TAP-tests. -- Fabien.
inherited primary key misbehavior
Hi, It seems that ATExecDetachPartition misses to mark a child's primary key constraint entry (if any) as local (conislocal=true, coninhcount=0), which causes this: create table p (a int primary key) partition by list (a); create table p2 partition of p for values in (2); alter table p detach partition p2; alter table p2 drop constraint p2_pkey ; ERROR: cannot drop inherited constraint "p2_pkey" of relation "p2" select conname, conislocal, coninhcount from pg_constraint where conrelid = 'p2'::regclass; conname │ conislocal │ coninhcount ─┼┼─ p2_pkey │ f │ 1 (1 row) How about the attached patch? Thanks, Amit From 42cb636ce6acee717fa19f6b69bf40aacc402852 Mon Sep 17 00:00:00 2001 From: amit Date: Wed, 23 Jan 2019 18:33:09 +0900 Subject: [PATCH] Detach primary key constraint when detaching partition --- src/backend/commands/tablecmds.c | 11 +++ src/test/regress/expected/indexing.out | 9 + src/test/regress/sql/indexing.sql | 9 + 3 files changed, 29 insertions(+) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 28a137bb53..897f74119d 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -15094,6 +15094,7 @@ ATExecDetachPartition(Relation rel, RangeVar *name) foreach(cell, indexes) { Oid idxid = lfirst_oid(cell); + Oid constrOid; Relationidx; if (!has_superclass(idxid)) @@ -15106,6 +15107,16 @@ ATExecDetachPartition(Relation rel, RangeVar *name) IndexSetParentIndex(idx, InvalidOid); update_relispartition(classRel, idxid, false); index_close(idx, NoLock); + + /* Detach any associated constraint as well. */ + if (!idx->rd_index->indisprimary) + continue; + constrOid = get_relation_idx_constraint_oid(RelationGetRelid(partRel), + idxid); + if (!OidIsValid(constrOid)) + elog(ERROR, "missing pg_constraint entry of index %u of %u", +idxid, RelationGetRelid(partRel)); + ConstraintSetParentConstraint(constrOid, InvalidOid); } table_close(classRel, RowExclusiveLock); diff --git a/src/test/regress/expected/indexing.out b/src/test/regress/expected/indexing.out index 118f2c78df..ebf86d7bfc 100644 --- a/src/test/regress/expected/indexing.out +++ b/src/test/regress/expected/indexing.out @@ -1414,3 +1414,12 @@ DETAIL: Key (a)=(4) already exists. create unique index on covidxpart (b) include (a); -- should fail ERROR: insufficient columns in UNIQUE constraint definition DETAIL: UNIQUE constraint on table "covidxpart" lacks column "a" which is part of the partition key. +-- check that detaching a partition also detaches the primary key constraint +create table parted_pk_detach_test (a int primary key) partition by list (a); +create table parted_pk_detach_test1 partition of parted_pk_detach_test for values in (1); +alter table parted_pk_detach_test detach partition parted_pk_detach_test1; +alter table parted_pk_detach_test1 drop constraint parted_pk_detach_test1_pkey; +alter table parted_pk_detach_test attach partition parted_pk_detach_test1 for values in (1); +alter table parted_pk_detach_test1 drop constraint parted_pk_detach_test1_pkey;-- should fail +ERROR: cannot drop inherited constraint "parted_pk_detach_test1_pkey" of relation "parted_pk_detach_test1" +drop table parted_pk_detach_test; diff --git a/src/test/regress/sql/indexing.sql b/src/test/regress/sql/indexing.sql index d4a64c18c7..c110f0bdb8 100644 --- a/src/test/regress/sql/indexing.sql +++ b/src/test/regress/sql/indexing.sql @@ -757,3 +757,12 @@ alter table covidxpart attach partition covidxpart4 for values in (4); insert into covidxpart values (4, 1); insert into covidxpart values (4, 1); create unique index on covidxpart (b) include (a); -- should fail + +-- check that detaching a partition also detaches the primary key constraint +create table parted_pk_detach_test (a int primary key) partition by list (a); +create table parted_pk_detach_test1 partition of parted_pk_detach_test for values in (1); +alter table parted_pk_detach_test detach partition parted_pk_detach_test1; +alter table parted_pk_detach_test1 drop constraint parted_pk_detach_test1_pkey; +alter table parted_pk_detach_test attach partition parted_pk_detach_test1 for values in (1); +alter table parted_pk_detach_test1 drop constraint parted_pk_detach_test1_pkey;-- should fail +drop table parted_pk_detach_test; -- 2.11.0
Re: Proposal for Signal Detection Refactoring
attached is a new signal handing patch. Path is corrected and moved. The documentation is sightly streamlined in some places and expanded in others. Feedback requested. -- Best Regards, Chris Travers Head of Database Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin signal_readme_patch.diff Description: Binary data
Almost bug in COPY FROM processing of GB18030 encoded input
Hi, I happened to notice that when CopyReadLineText() calls mblen(), it passes only the first byte of the multi-byte characters. However, pg_gb18030_mblen() looks at the first and the second byte. CopyReadLineText() always passes \0 as the second byte, so pg_gb18030_mblen() will incorrectly report the length of 4-byte encoded characters as 2. It works out fine, though, because the second half of the 4-byte encoded character always looks like another 2-byte encoded character, in GB18030. CopyReadLineText() is looking for delimiter and escape characters and newlines, and only single-byte characters are supported for those, so treating a 4-byte character as two 2-byte characters is harmless. Attached is a patch to explain that in the comments. Grepping for mblen(), I didn't find any other callers that used mblen() like that. - Heikki >From b0a09c240a2b4d1120433828ca052a2542ff362d Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Wed, 23 Jan 2019 13:04:05 +0200 Subject: [PATCH 1/1] Fix comments to that claimed that mblen() only looks at first byte. GB18030's mblen() function looks at the first and the second byte of the multibyte character, to determine its length. copy.c had made the assumption that mblen() only looks at the first byte, but it turns out to work out fine, because of the way the GB18030 encoding works. COPY will see a 4-byte encoded character as two 2-byte encoded characters, which is enough for COPY's purposes. It cannot mix those up with delimiter or escaping characters, because only single-byte characters are supported as delimiters or escape characters. --- src/backend/commands/copy.c | 7 ++- src/backend/utils/mb/wchar.c | 32 +--- 2 files changed, 31 insertions(+), 8 deletions(-) diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index a61a628471..5074ce0daa 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -4073,9 +4073,14 @@ not_end_of_copy: { int mblen; + /* + * It is enough to look at the first byte in all our encodings, to + * get the length. (GB18030 is a bit special, but still works for + * our purposes; see comment in pg_gb18030_mblen()) + */ mblen_str[0] = c; - /* All our encodings only read the first byte to get the length */ mblen = pg_encoding_mblen(cstate->file_encoding, mblen_str); + IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(mblen - 1); IF_NEED_REFILL_AND_EOF_BREAK(mblen - 1); raw_buf_ptr += mblen - 1; diff --git a/src/backend/utils/mb/wchar.c b/src/backend/utils/mb/wchar.c index a5fdda456e..8e5116dfc1 100644 --- a/src/backend/utils/mb/wchar.c +++ b/src/backend/utils/mb/wchar.c @@ -15,16 +15,23 @@ /* - * conversion to pg_wchar is done by "table driven." - * to add an encoding support, define mb2wchar_with_len(), mblen(), dsplen() - * for the particular encoding. Note that if the encoding is only - * supported in the client, you don't need to define - * mb2wchar_with_len() function (SJIS is the case). + * Operations on multi-byte encodings are driven by a table of helper + * functions. + * + * To add an encoding support, define mblen(), dsplen() and verifier() for + * the encoding. For server-encodings, also define mb2wchar() and wchar2mb() + * conversion functions. * * These functions generally assume that their input is validly formed. * The "verifier" functions, further down in the file, have to be more - * paranoid. We expect that mblen() does not need to examine more than - * the first byte of the character to discover the correct length. + * paranoid. + * + * We expect that mblen() does not need to examine more than the first byte + * of the character to discover the correct length. GB18030 is an exception + * to that rule, though, as it also looks at second byte. But even that + * behaves in a predictable way, if you only pass the first byte: it will + * treat 4-byte encoded characters as two 2-byte encoded characters, which is + * good enough for all current uses. * * Note: for the display output of psql to work properly, the return values * of the dsplen functions must conform to the Unicode standard. In particular @@ -1073,6 +1080,17 @@ pg_uhc_dsplen(const unsigned char *s) * GB18030 * Added by Bill Huang , */ + +/* + * Unlike all other mblen() functions, this also looks at the second byte of + * the input. However, if you only pass the first byte of a multi-byte + * string, and \0 as the second byte, this still works in a predictable way: + * a 4-byte character will be reported as two 2-byte characters. That's + * enough for all current uses, as a client-only encoding. It works that + * way, because in any valid 4-byte GB18030-encoded character, the third and + * fourth byte look like a 2-byte encoded character, when looked at + * separately. + */ static int pg_gb18030_mblen(const unsigned char *s) { -- 2.20.1
Re: postgres_fdw: oddity in costing aggregate pushdown paths
(2019/01/04 20:33), Etsuro Fujita wrote: > Here is a new version of the patch. Attached is an updated version of the patch. Changes: * Fixed a stupid bug in the case when use_remote_estimate * Fixed a typo in a comment I added * Modified comments I added a little bit further * Added the commit message If there are no objections, I'll push that. Best regards, Etsuro Fujita >From 3ff65a6f05a330140bf5de94fbcf6ab2a04f040b Mon Sep 17 00:00:00 2001 From: Etsuro Fujita Date: Wed, 23 Jan 2019 20:30:30 +0900 Subject: [PATCH] postgres_fdw: Account for tlist eval costs in estimate_path_cost_size() Previously, estimate_path_cost_size() didn't account for tlist eval costs, except when costing a foreign-grouping path using local statistics, but such costs should be accounted for when costing that path using remote estimates, because some of the tlist expressions might be evaluated locally. Also, such costs should be accounted for in the case of a foreign-scan or foreign-join path, because currently, postgres_fdw evaluates PlaceHolderVars (if any) locally. This also fixes an oversight in my commit f8f6e44676ef38fee7a5bbe4f256a34ea7799ac1. Like that commit, apply this to HEAD only to avoid destabilizing existing plan choices. Author: Etsuro Fujita Discussion: https://postgr.es/m/5BFD3EAD.2060301%40lab.ntt.co.jp --- contrib/postgres_fdw/postgres_fdw.c | 41 +++-- 1 file changed, 33 insertions(+), 8 deletions(-) diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index d85a83abe9..f460fd2ec4 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -2499,6 +2499,9 @@ estimate_path_cost_size(PlannerInfo *root, Cost total_cost; Cost cpu_per_tuple; + /* Make sure the core code has set up the relation's reltarget */ + Assert(foreignrel->reltarget); + /* * If the table or the server is configured to use remote estimates, * connect to the foreign server and execute EXPLAIN to estimate the @@ -2576,6 +2579,24 @@ estimate_path_cost_size(PlannerInfo *root, cost_qual_eval(&local_cost, local_param_join_conds, root); startup_cost += local_cost.startup; total_cost += local_cost.per_tuple * retrieved_rows; + + /* + * Add in tlist eval cost for each output row. In case of an + * aggregate, some of the tlist expressions such as grouping + * exressions will be evaluated remotely, so adjust the costs. + */ + startup_cost += foreignrel->reltarget->cost.startup; + total_cost += foreignrel->reltarget->cost.startup; + total_cost += foreignrel->reltarget->cost.per_tuple * rows; + if (IS_UPPER_REL(foreignrel)) + { + QualCost tlist_cost; + + cost_qual_eval(&tlist_cost, fdw_scan_tlist, root); + startup_cost -= tlist_cost.startup; + total_cost -= tlist_cost.startup; + total_cost -= tlist_cost.per_tuple * rows; + } } else { @@ -2674,19 +2695,19 @@ estimate_path_cost_size(PlannerInfo *root, nrows = clamp_row_est(nrows * fpinfo->joinclause_sel); run_cost += nrows * remote_conds_cost.per_tuple; run_cost += fpinfo->local_conds_cost.per_tuple * retrieved_rows; + + /* Add in tlist eval cost for each output row */ + startup_cost += foreignrel->reltarget->cost.startup; + run_cost += foreignrel->reltarget->cost.per_tuple * rows; } else if (IS_UPPER_REL(foreignrel)) { PgFdwRelationInfo *ofpinfo; - PathTarget *ptarget = foreignrel->reltarget; AggClauseCosts aggcosts; double input_rows; int numGroupCols; double numGroups = 1; - /* Make sure the core code set the pathtarget. */ - Assert(ptarget != NULL); - /* * This cost model is mixture of costing done for sorted and * hashed aggregates in cost_agg(). We are not sure which @@ -2750,26 +2771,22 @@ estimate_path_cost_size(PlannerInfo *root, * Startup cost includes: * 1. Startup cost for underneath input relation * 2. Cost of performing aggregation, per cost_agg() - * 3. Startup cost for PathTarget eval *- */ startup_cost = ofpinfo->rel_startup_cost; startup_cost += aggcosts.transCost.startup; startup_cost += aggcosts.transCost.per_tuple * input_rows; startup_cost += (cpu_operator_cost * numGroupCols) * input_rows; - startup_cost += ptarget->cost.startup; /*- * Run time cost includes: * 1. Run time cost of underneath input relation * 2. Run time cost of performing aggregation, per cost_agg() - * 3. PathTarget eval cost for each output row *- */ run_cost = ofpinfo->rel_total_cost - ofpinfo->rel_startup_cost; run_cost += aggcosts.finalCost * numGroups; run_cost += cpu_tuple_cost * numGroups; - run_cost += ptarget->cost.per_tuple * numGroups; /* Accout for the eval cost of HAVING quals, if any */ if (root->parse->havingQual) @@ -2784,6 +2801,10 @@ estimate_path_cost_size(PlannerInfo *root, startup_cost += fpinfo->local_conds_cost.startup; run_cost +
Re: yet another comment typo patch
On Wed, Jan 23, 2019 at 2:18 PM Fabien COELHO wrote: > > > An assorted set of command typos is fixed in the attached patch. > The patch looks good to me. I will take another pass over it and commit. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: yet another comment typo patch
On 23/01/2019 13:51, Amit Kapila wrote: On Wed, Jan 23, 2019 at 2:18 PM Fabien COELHO wrote: An assorted set of command typos is fixed in the attached patch. The patch looks good to me. I will take another pass over it and commit. I just committed this. Sorry, I didn't see your mail until just afterwards. - Heikki
Re: yet another comment typo patch
On Wed, Jan 23, 2019 at 5:28 PM Heikki Linnakangas wrote: > > On 23/01/2019 13:51, Amit Kapila wrote: > > On Wed, Jan 23, 2019 at 2:18 PM Fabien COELHO wrote: > >> > >> > >> An assorted set of command typos is fixed in the attached patch. > > > > The patch looks good to me. I will take another pass over it and commit. > > I just committed this. Sorry, I didn't see your mail until just afterwards. > No problem, thanks! -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Sun, Jan 20, 2019 at 5:19 AM John Naylor wrote: > > I have a test for in-range and out-of-range for each relation fork. > I think the first two patches (a) removal of dead code in bootstrap and (b) the core patch to avoid creation of FSM file for the small table are good now. I have prepared the patches along with commit message. There is no change except for some changes in README and commit message of the second patch. Kindly let me know what you think about them? I think these two patches can go even without the upgrade patch (during pg_upgrade, conditionally skip transfer of FSMs.) which is still under discussion. However, I am not in a hurry if you or other thinks that upgrade patch must be committed along with the second patch. I think the upgrade patch is generally going on track but might need some more review. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com v02-0001-In-bootstrap-mode-don-t-allow-the-creation-of-files-.patch Description: Binary data v17-0002-Avoid-creation-of-the-free-space-map-for-small-heap-.patch Description: Binary data
Re: postgres on a non-journaling filesystem
On 23/01/2019 01:03, maayan mordehai wrote: hello, I'm Maayan, I'm in a DBA team that uses postgresql. I saw in the documentation on wals: https://www.postgresql.org/docs/10/wal-intro.html In the tip box that, it's better not to use a journaling filesystem. and I wanted to ask how it works? can't we get corruption that we can't recover from? I mean what if postgres in the middle of a write to a wal and there is a crash, and it didn't finish. I'm assuming it will detect it when we will start postgres and write that it was rolled back, am I right? Yep, any half-written transactions will be rolled back. and how does it work in the data level? if some of the 8k block is written but not all of it, and then there is a crash, how postgres deals with it? The first time a block is modified after a checkpoint, a copy of the block is written to the WAL. At crash recovery, the block is restored from the WAL. This mechanism is called "full page writes". The WAL works just like the journal in a journaling filesystem. That's why it's not necessary to have journaling at the filesystem level. - Heikki
Re: pg_dump multi VALUES INSERT
On Wed, 23 Jan 2019 at 22:08, Fabien COELHO wrote: > Out of abc-order rows-per-inserts option in getopt struct. I missed that. Thanks. Fixed in attached. > help string does not specify the expected argument. Also fixed. > I still think that the added rows_per_insert field is useless, ISTM That > "int dump_inserts" can be reused, and you could also drop boolean > got_rows_per_insert. I thought about this and looked into it, but I decided it didn't look like a smart thing to do. The reason is that if --inserts sets dump_inserts to 1 then --rows-per-insert sets it to something else that's likely fine, but if that happens in the opposite order then the --rows-per-insert gets overwritten with 1. The bad news is the order that happens is defined by the order of the command line args. It might be possible to make it work by having --inserts set some other variable, then set dump_inserts to 1 if it's set to 0 and the other variable is set to >= 1... but that requires another variable, which is what you want to avoid... I think it's best to have a variable per argument. I could get rid of the got_rows_per_insert variable, but it would require setting the default value for rows_per_insert in the main() function rather than in InitDumpOptions(). I thought InitDumpOptions() looked like just the place to do this, so went with that option. To make it work without got_rows_per_insert, rows_per_insert would have to be 0 by default and we'd know we saw a --rows-per-insert command line arg by the fact that rows_per_insert was non-zero. Would you rather have it that way? > The feature is not tested anywhere. I still think that there should be a > test on empty/small/larger-than-rows-per-insert tables, possibly added to > existing TAP-tests. I was hoping to get away with not having to do that... mainly because I've no idea how. Please have at it if you know how it's done. FWIW I looked at 002_pg_dump.pl and did add a test, but I was unable to make it pass because I couldn't really figure out how the regex matching is meant to work. It does not seem to be explained very well in the comments and I lack patience for Perl. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services pg_dump-rows-per-insert-option_v11.patch Description: Binary data
Re: postgres on a non-journaling filesystem
Thank you!! On Wed, Jan 23, 2019, 2:20 PM Heikki Linnakangas On 23/01/2019 01:03, maayan mordehai wrote: > > hello, > > > > I'm Maayan, I'm in a DBA team that uses postgresql. > > I saw in the documentation on wals: > > https://www.postgresql.org/docs/10/wal-intro.html > > In the tip box that, it's better not to use a journaling filesystem. > and I > > wanted to ask how it works? > > can't we get corruption that we can't recover from? > > I mean what if postgres in the middle of a write to a wal and there is a > > crash, and it didn't finish. > > I'm assuming it will detect it when we will start postgres and write that > > it was rolled back, am I right? > > Yep, any half-written transactions will be rolled back. > > > and how does it work in the data level? if some of the 8k block is > written > > but not all of it, and then there is a crash, how postgres deals with it? > > The first time a block is modified after a checkpoint, a copy of the > block is written to the WAL. At crash recovery, the block is restored > from the WAL. This mechanism is called "full page writes". > > The WAL works just like the journal in a journaling filesystem. That's > why it's not necessary to have journaling at the filesystem level. > > - Heikki >
Re: Analyze all plans
Oleksandr Shulgin writes: > On Wed, Jan 23, 2019 at 9:44 AM Donald Dong wrote: >> 1. Enumerate all the plans > So enumerating all possible plans stops being practical for even slightly > complicated queries. Yes. You can *not* disable the planner's aggressive pruning of losing paths and subpaths without ending up with something that's completely impractical for anything beyond toy queries. That's just from the standpoint of planner runtime. Adding on the cost of actually creating a finished plan and then running it for long enough to get a reliable runtime for each case would ... well, let's just say you'd be safely dead before you got any interesting answers. regards, tom lane
Re: ArchiveEntry optional arguments refactoring
> On 2019-01-17 09:29:04 -0800, Andres Freund wrote: > On 2019-01-17 10:23:39 -0500, Tom Lane wrote: > > I don't buy the argument that this would move the goalposts in terms > > of how much work it is to add a new argument. You'd still end up > > touching every call site. > > Why? A lot of arguments that'd be potentially added or removed would not > be set by each callsites. > > If you e.g. look at > > you can see that a lot of changes where like > ArchiveEntry(fout, nilCatalogId, createDumpId(), > "pg_largeobject", NULL, NULL, "", > -false, "pg_largeobject", SECTION_PRE_DATA, > +"pg_largeobject", SECTION_PRE_DATA, > loOutQry->data, "", NULL, > NULL, 0, > NULL, NULL); > > i.e. just removing a 'false' argument. In like 70+ callsites. With the > above scheme, we'd instead just have removed a single .withoids = true, > from dumpTableSchema()'s ArchiveEntry() call. To make this discussion a bit more specific, I've created a patch of how it can look like. All the arguments, except Archive, CatalogId and DumpId I've moved into the ArchiveOpts structure. Not all of them could be empty before, but anyway it seems better for consistency and readability. Some of the arguments had empty string as a default value, I haven't changed anything here yet (although this mixture of NULL and "" in ArchiveEntry looks a bit confusing). As Andres mentioned above, for 578b229718e / the WITH OID removal and pg_dump modification from pluggable storage thread, this patch reduces number of changes, related to ArchiveEntry, from 70+ to just one. 0001-ArchiveOpts-structure.patch Description: Binary data
Re: pg_dump multi VALUES INSERT
Hello David, I thought about this and looked into it, but I decided it didn't look like a smart thing to do. The reason is that if --inserts sets dump_inserts to 1 then --rows-per-insert sets it to something else that's likely fine, but if that happens in the opposite order then the --rows-per-insert gets overwritten with 1. You can test before doing that! case X: if (opt.dump_inserts == 0) opt.dump_inserts = 1; // otherwise option is already set The bad news is the order that happens is defined by the order of the command line args. It might be possible to make it work by having --inserts set some other variable, ISTM that it is enough to test whether the variable is zero. then set dump_inserts to 1 if it's set to 0 and the other variable is set to >= 1... but that requires another variable, which is what you want to avoid... I still do not understand the need for another variable. int ninserts = 0; // default is to use copy while (getopt...) { switch (...) { case "--inserts": if (ninserts == 0) ninserts = 1; break; case "--rows-per-insert": ninserts = arg_value; checks... break; ... I think it's best to have a variable per argument. I disagree, because it adds complexity where none is needed: here the new option is an extension of a previous one, thus the previous one just becomes a particular case, so it seems simpler to manage it as the particular case it is rather than a special case, creating the need for checking the consistency and so if two variables are used. I could get rid of the got_rows_per_insert variable, but it would require setting the default value for rows_per_insert in the main() function rather than in InitDumpOptions(). I thought InitDumpOptions() looked like just the place to do this, so went with that option. To make it work without got_rows_per_insert, rows_per_insert would have to be 0 by default and we'd know we saw a --rows-per-insert command line arg by the fact that rows_per_insert was non-zero. Would you rather have it that way? Yep, esp as rows_per_insert & dump_inserts could be the same. The feature is not tested anywhere. I still think that there should be a test on empty/small/larger-than-rows-per-insert tables, possibly added to existing TAP-tests. I was hoping to get away with not having to do that... mainly because I've no idea how. Hmmm. That is another question! Maybe someone will help. -- Fabien.
Re: WIP: Avoid creation of the free space map for small tables
On Wed, Jan 23, 2019 at 7:09 AM Amit Kapila wrote: > I think the first two patches (a) removal of dead code in bootstrap > and (b) the core patch to avoid creation of FSM file for the small > table are good now. I have prepared the patches along with commit > message. There is no change except for some changes in README and > commit message of the second patch. Kindly let me know what you think > about them? Good to hear! The additional language is fine. In "Once the FSM is created for heap", I would just change that to "...for a heap". > I think these two patches can go even without the upgrade patch > (during pg_upgrade, conditionally skip transfer of FSMs.) which is > still under discussion. However, I am not in a hurry if you or other > thinks that upgrade patch must be committed along with the second > patch. I think the upgrade patch is generally going on track but > might need some more review. The pg_upgrade piece is a nice-to-have feature and not essential, so can go in later. Additional review is also welcome. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: postgres on a non-journaling filesystem
On 2019-01-23 14:20:52 +0200, Heikki Linnakangas wrote: > On 23/01/2019 01:03, maayan mordehai wrote: > > hello, > > > > I'm Maayan, I'm in a DBA team that uses postgresql. > > I saw in the documentation on wals: > > https://www.postgresql.org/docs/10/wal-intro.html > > In the tip box that, it's better not to use a journaling filesystem. and I > > wanted to ask how it works? > > can't we get corruption that we can't recover from? > > I mean what if postgres in the middle of a write to a wal and there is a > > crash, and it didn't finish. > > I'm assuming it will detect it when we will start postgres and write that > > it was rolled back, am I right? > > Yep, any half-written transactions will be rolled back. > > > and how does it work in the data level? if some of the 8k block is written > > but not all of it, and then there is a crash, how postgres deals with it? > > The first time a block is modified after a checkpoint, a copy of the block > is written to the WAL. At crash recovery, the block is restored from the > WAL. This mechanism is called "full page writes". > > The WAL works just like the journal in a journaling filesystem. That's why > it's not necessary to have journaling at the filesystem level. But note not having journaling on the FS level often makes OS start after a crash *painfully* slow, because fsck or similar will be run. And that's often necessary for the internal FS consistency. Note that even with journaling enabled, most filesystem by default don't journal data, so you can get those partial writes anyway. Greetings, Andres Freund
Re: Typo: llvm*.cpp files identified as llvm*.c
Hi, On 2019-01-23 12:01:10 +0900, Michael Paquier wrote: > On Tue, Jan 22, 2019 at 05:49:46PM -0800, Andres Freund wrote: > > On 2019-01-23 14:43:15 +1300, Thomas Munro wrote: > >> The function name comments are similar, though less consistent so I'm > >> too lazy to write a script to find one that is actually wrong (with > >> which to trigger Andres's let's-delete-them-all response :-D). > > I am not sure if anybody uses them for anything automatically, still I > find myself from time to time looking at them to remember on which > path the file is located when opened in emacs. I often want to know that too - isn't it easier to just meta-x f and see the directory displayed? Because that doesn't require jumping to the head of the file... Greetings, Andres Freund
Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory
On 10/12/2018 23:37, Dmitry Dolgov wrote: On Thu, Nov 29, 2018 at 6:48 PM Dmitry Dolgov <9erthali...@gmail.com> wrote: On Tue, Oct 2, 2018 at 4:53 AM Michael Paquier wrote: On Mon, Aug 06, 2018 at 06:00:54PM +0900, Yoshimi Ichiyanagi wrote: The libpmem's pmem_map_file() supported 2M/1G(the size of huge page) alignment, since it could reduce the number of page faults. In addition, libpmem's pmem_memcpy_nodrain() is the function to copy data using single instruction, multiple data(SIMD) instructions and NT store instructions(MOVNT). As a result, using these APIs is faster than using old mmap()/memcpy(). Please see the PGCon2018 presentation[1] for the details. [1] https://www.pgcon.org/2018/schedule/attachments/507_PGCon2018_Introducing_PMDK_into_PostgreSQL.pdf So you say that this represents a 3% gain based on the presentation? That may be interesting to dig into it. Could you provide fresher performance numbers? I am moving this patch to the next CF 2018-10 for now, waiting for input from the author. Unfortunately, the patch has some conflicts now, so probably not only fresher performance numbers are necessary, but also a rebased version. I believe the idea behind this patch is quite important (thanks to CMU DG for inspiring lectures), so I decided to put some efforts and rebase it to prevent from rotting. At the same time I have a vague impression that the patch itself suggests quite narrow way of using of PMDK. Thanks. To re-iterate what I said earlier in this thread, I think the next step here is to write a patch that modifies xlog.c to use plain old mmap()/msync() to memory-map the WAL files, to replace the WAL buffers. Let's see what the performance of that is, with or without NVM hardware. I think that might actually make the code simpler. There's a bunch of really hairy code around locking the WAL buffers, which could be made simpler if each backend memory-mapped the WAL segment files independently. One thing to watch out for, is that if you read() a file, and there's an I/O error, you have a chance to ereport() it. If you try to read from a memory-mapped file, and there's an I/O error, the process is killed with SIGBUS. So I think we have to be careful with using memory-mapped I/O for reading files. But for writing WAL files, it seems like a good fit. Once we have a reliable mmap()/msync() implementation running, it should be straightforward to change it to use MAP_SYNC and the special CPU instructions for the flushing. - Heikki
Re: ArchiveEntry optional arguments refactoring
Hi, On 2019-01-23 16:12:15 +0100, Dmitry Dolgov wrote: > To make this discussion a bit more specific, I've created a patch of how it > can > look like. Thanks. > All the arguments, except Archive, CatalogId and DumpId I've moved > into the ArchiveOpts structure. Not all of them could be empty before, but > anyway it seems better for consistency and readability. Some of the arguments > had empty string as a default value, I haven't changed anything here yet > (although this mixture of NULL and "" in ArchiveEntry looks a bit confusing). Probably worth changing at the same time, if we decide to go for it. To me this does look like it'd be more maintainable going forward. Greetings, Andres Freund
Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory
Hi, On 2019-01-23 18:45:42 +0200, Heikki Linnakangas wrote: > To re-iterate what I said earlier in this thread, I think the next step here > is to write a patch that modifies xlog.c to use plain old mmap()/msync() to > memory-map the WAL files, to replace the WAL buffers. Let's see what the > performance of that is, with or without NVM hardware. I think that might > actually make the code simpler. There's a bunch of really hairy code around > locking the WAL buffers, which could be made simpler if each backend > memory-mapped the WAL segment files independently. > > One thing to watch out for, is that if you read() a file, and there's an I/O > error, you have a chance to ereport() it. If you try to read from a > memory-mapped file, and there's an I/O error, the process is killed with > SIGBUS. So I think we have to be careful with using memory-mapped I/O for > reading files. But for writing WAL files, it seems like a good fit. > > Once we have a reliable mmap()/msync() implementation running, it should be > straightforward to change it to use MAP_SYNC and the special CPU > instructions for the flushing. FWIW, I don't think we should go there as the sole implementation. I'm fairly convinced that we're going to need to go to direct-IO in more cases here, and that'll not work well with mmap. I think this'd be a worthwhile experiment, but I'm doubtful it'd end up simplifying our code. Greetings, Andres Freund
Re: Typo: llvm*.cpp files identified as llvm*.c
Andres Freund writes: > On 2019-01-23 12:01:10 +0900, Michael Paquier wrote: >> I am not sure if anybody uses them for anything automatically, still I >> find myself from time to time looking at them to remember on which >> path the file is located when opened in emacs. > I often want to know that too - isn't it easier to just meta-x f and see > the directory displayed? Because that doesn't require jumping to the > head of the file... My own workflow often involves editing copies that are not in the original directory, so that answer doesn't work for me. In general it won't work anytime you are looking at a file out-of-context. regards, tom lane
Re: ArchiveEntry optional arguments refactoring
Hello On 2019-Jan-23, Andres Freund wrote: > > All the arguments, except Archive, CatalogId and DumpId I've moved > > into the ArchiveOpts structure. Not all of them could be empty before, but > > anyway it seems better for consistency and readability. Some of the > > arguments > > had empty string as a default value, I haven't changed anything here yet > > (although this mixture of NULL and "" in ArchiveEntry looks a bit > > confusing). > > Probably worth changing at the same time, if we decide to go for it. > > To me this does look like it'd be more maintainable going forward. It does. How does pgindent behave with it? I'd use ArchiveEntryOpts as struct name; ArchiveOpts sounds wrong. Also, the struct members could use better names -- "defn" for example could perhaps be "createStmt" (to match dropStmt/copyStmt), and expand "desc" to "description". -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: ArchiveEntry optional arguments refactoring
On 1/23/19 10:12 AM, Dmitry Dolgov wrote: > To make this discussion a bit more specific, I've created a patch of how > it can look like. A little bit of vararg-macro action can make such a design look even tidier, cf. [1]. Or are compilers without vararg macros still in the supported mix? -Chap [1] https://github.com/NetBSD/src/blob/trunk/sys/sys/midiio.h#L709 The macros in [1] are not defined to create a function call, but only the argument structure because there might be several functions to pass it to, so a call would be written like func(&SEQ_MK_CHN(NOTEON, ...)). In ArchiveEntry's case, if there's only one function involved, there'd be no reason not to have a macro produce the whole call.
Re: ArchiveEntry optional arguments refactoring
Chapman Flack writes: > Or are compilers without vararg macros still in the supported mix? No. regards, tom lane
Re: ArchiveEntry optional arguments refactoring
Hi, On 2019-01-23 13:58:07 -0300, Alvaro Herrera wrote: > I'd use ArchiveEntryOpts as struct name; ArchiveOpts sounds wrong. Brevity would be of some advantage IMO, because it'll probably determine how pgindent indents the arguments, because the struct name will be in the arguments. > Also, the struct members could use better names -- "defn" for example > could perhaps be "createStmt" (to match dropStmt/copyStmt), and expand > "desc" to "description". True. Greetings, Andres Freund
Re: ArchiveEntry optional arguments refactoring
Hi, On 2019-01-23 12:05:10 -0500, Chapman Flack wrote: > On 1/23/19 10:12 AM, Dmitry Dolgov wrote: > > To make this discussion a bit more specific, I've created a patch of how > > it can look like. > A little bit of vararg-macro action can make such a design look > even tidier, cf. [1]. > [1] https://github.com/NetBSD/src/blob/trunk/sys/sys/midiio.h#L709 > > The macros in [1] are not defined to create a function call, but only > the argument structure because there might be several functions to pass > it to, so a call would be written like func(&SEQ_MK_CHN(NOTEON, ...)). > > In ArchiveEntry's case, if there's only one function involved, there'd > be no reason not to have a macro produce the whole call. I'm not really seeing this being more than obfuscation in this case. The only point of the macro is to set the .tag and .op elements to something without adding redundancies due to the struct name. Which we'd not have. Greetings, Andres Freund
Re: Referential Integrity Checks with Statement-level Triggers
Attached is a patch that refactors DELETE triggers to fire at the statement level. I chose delete triggers partly out of simplicity, and partly because there some before/after row linkage in the ON UPDATE CASCADE cases where statement level triggers might not be feasible as we have currently implemented them. After having done the work, I think INSERT triggers would be similarly straightforward, but wanted to limit scope. Also, after having stripped the delete cases out of the update-or-delete functions, it became obvious that the on-update-set-null and on-update-set-default cases differed by only 3-4 lines, so those functions were combined. On a vagrant VM running on my desktop machine, I'm seeing a speed-up of about 25% in the benchmark provided. I think that figure is cloudy and below my expectations. Perhaps we'd get a much better picture of whether or not this is worth it on a bare metal machine, or at least a VM better suited to benchmarking. Currently 4 make-check tests are failing. Two of which appear to false positives (the test makes assumptions about triggers that are no longer true), and the other two are outside the scope of this benchmark so I'll revisit them if we go forward. ri-set-logic.sql is an edited benchmark script adapted from Kevin Grittner's benchmark that he ran against hand-rolled triggers and posted on 2016-11-02 ri_test.out is a copy paste of two runs of the benchmark script. Many thanks to everyone who helped, often despite their own objections to the overall reasoning behind the endeavor. I'm aware that a large contingent of highly experienced people would very much like to replace our entire trigger architecture, or at least divorce RI checks from triggers. Maybe this patch spurs on that change. Even if nothing comes of it, it's been a great learning experience. On Sat, Dec 22, 2018 at 11:28 AM Emre Hasegeli wrote: > > It is far from a premature optimization IMO, it is super useful and > something I was hoping would happen ever since I heard about transition > tables being worked on. > > Me too. Never-ending DELETEs are a common pain point especially for > people migrated from MySQL which creates indexes for foreign keys > automatically. > From 8a73f9233211076421a565b5c90ecd029b5e6581 Mon Sep 17 00:00:00 2001 From: vagrant Date: Wed, 23 Jan 2019 16:59:17 + Subject: [PATCH] Change Delete RI triggers to Statement-Level Triggers --- src/backend/commands/tablecmds.c| 9 +- src/backend/commands/trigger.c | 2 + src/backend/utils/adt/ri_triggers.c | 779 3 files changed, 566 insertions(+), 224 deletions(-) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 28a137bb53..21f5bf94a4 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -8954,6 +8954,11 @@ createForeignKeyActionTriggers(Relation rel, Oid refRelOid, Constraint *fkconstr { CreateTrigStmt *fk_trigger; + TriggerTransition *del = makeNode(TriggerTransition); + del->name = "pg_deleted_transition_table"; + del->isNew = false; + del->isTable = true; + /* * Build and execute a CREATE CONSTRAINT TRIGGER statement for the ON * DELETE action on the referenced table. @@ -8961,11 +8966,11 @@ createForeignKeyActionTriggers(Relation rel, Oid refRelOid, Constraint *fkconstr fk_trigger = makeNode(CreateTrigStmt); fk_trigger->trigname = "RI_ConstraintTrigger_a"; fk_trigger->relation = NULL; - fk_trigger->row = true; + fk_trigger->row = false; fk_trigger->timing = TRIGGER_TYPE_AFTER; fk_trigger->events = TRIGGER_TYPE_DELETE; fk_trigger->columns = NIL; - fk_trigger->transitionRels = NIL; + fk_trigger->transitionRels = list_make1(del); fk_trigger->whenClause = NULL; fk_trigger->isconstraint = true; fk_trigger->constrrel = NULL; diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c index 7ffaeaffc6..080587215f 100644 --- a/src/backend/commands/trigger.c +++ b/src/backend/commands/trigger.c @@ -510,7 +510,9 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString, * * Currently this is enforced by the grammar, so just Assert here. */ + /* Assert(!stmt->isconstraint); + */ if (tt->isNew) { diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c index e1aa3d0044..6f89ab4c77 100644 --- a/src/backend/utils/adt/ri_triggers.c +++ b/src/backend/utils/adt/ri_triggers.c @@ -194,9 +194,10 @@ static int ri_constraint_cache_valid_count = 0; static bool ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel, HeapTuple old_row, const RI_ConstraintInfo *riinfo); -static Datum ri_restrict(TriggerData *trigdata, bool is_no_action); -static Datum ri_setnull(TriggerData *trigdata); -static Datum ri_setdefault(TriggerData *trigdata); +static Datum ri_on_update_restrict(TriggerData *trigdata, bool is_no_action); +static Datum ri_on_delete_restrict(TriggerData *trigdata, bool is_no_action); +static Datu
Re: ArchiveEntry optional arguments refactoring
On 1/23/19 12:10 PM, Andres Freund wrote: > On 2019-01-23 12:05:10 -0500, Chapman Flack wrote: >> [1] https://github.com/NetBSD/src/blob/trunk/sys/sys/midiio.h#L709 > I'm not really seeing this being more than obfuscation in this case. The > only point of the macro is to set the .tag and .op elements to something > without adding redundancies due to the struct name. Which we'd not have. Granted, that example is more elaborate than this case, but writing ArchiveEntry(fout, dbCatId, dbDumpId, .tag = datname, .owner = dba, .desc = "DATABASE", .section = SECTION_PRE_DATA, .defn = creaQry->data, .dropStmt = delQry->data); instead of ArchiveEntry(fout, dbCatId, dbDumpId, &(ArchiveOpts){.tag = datname, .owner = dba, .desc = "DATABASE", .section = SECTION_PRE_DATA, .defn = creaQry->data, .dropStmt = delQry->data}); would be easy, and still save a bit of visual noise. Regards, -Chap
Re: ArchiveEntry optional arguments refactoring
On 2019-01-23 13:58:07 -0300, Alvaro Herrera wrote: > Hello > > On 2019-Jan-23, Andres Freund wrote: > > > > All the arguments, except Archive, CatalogId and DumpId I've moved > > > into the ArchiveOpts structure. Not all of them could be empty before, but > > > anyway it seems better for consistency and readability. Some of the > > > arguments > > > had empty string as a default value, I haven't changed anything here yet > > > (although this mixture of NULL and "" in ArchiveEntry looks a bit > > > confusing). > > > > Probably worth changing at the same time, if we decide to go for it. > > > > To me this does look like it'd be more maintainable going forward. > > It does. How does pgindent behave with it? It craps out: Error@3649: Unbalanced parens Warning@3657: Extra ) But that can be worked around with something like te = ArchiveEntry(fout, tdinfo->dobj.catId, tdinfo->dobj.dumpId, ARCHIVE_ARGS(.tag = tbinfo->dobj.name, .namespace = tbinfo->dobj.namespace->dobj.name, .owner = tbinfo->rolname, .desc = "TABLE DATA", .section = SECTION_DATA, .copyStmt = copyStmt, .deps = &(tbinfo->dobj.dumpId), .nDeps = 1, .dumpFn = dumpFn, .dumpArg = tdinfo, )); which looks mildly simpler too. Greetings, Andres Freund
Re: ArchiveEntry optional arguments refactoring
On 2019-01-23 12:22:23 -0500, Chapman Flack wrote: > On 1/23/19 12:10 PM, Andres Freund wrote: > > On 2019-01-23 12:05:10 -0500, Chapman Flack wrote: > >> [1] https://github.com/NetBSD/src/blob/trunk/sys/sys/midiio.h#L709 > > > I'm not really seeing this being more than obfuscation in this case. The > > only point of the macro is to set the .tag and .op elements to something > > without adding redundancies due to the struct name. Which we'd not have. > > Granted, that example is more elaborate than this case, but writing > > > ArchiveEntry(fout, dbCatId, dbDumpId, .tag = datname, .owner = dba, > .desc = "DATABASE", .section = SECTION_PRE_DATA, > .defn = creaQry->data, .dropStmt = delQry->data); > > instead of > > ArchiveEntry(fout, dbCatId, dbDumpId, &(ArchiveOpts){.tag = datname, > .owner = dba, .desc = "DATABASE", > .section = SECTION_PRE_DATA, .defn = creaQry->data, > .dropStmt = delQry->data}); > > would be easy, and still save a bit of visual noise. IDK, it'd be harder to parse correctly as a C programmer though. I'm up with a wrapper macro like #define ARCHIVE_ARGS(...) &(ArchiveOpts){__VA_ARGS__} but weirdly mixing struct arguments and normal function arguments seems quite confusing. Greetings, Andres Freund
Re: ArchiveEntry optional arguments refactoring
Andres Freund writes: > On 2019-01-23 13:58:07 -0300, Alvaro Herrera wrote: >> It does. How does pgindent behave with it? > It craps out: > Error@3649: Unbalanced parens > Warning@3657: Extra ) > But that can be worked around with something like > te = ArchiveEntry(fout, tdinfo->dobj.catId, tdinfo->dobj.dumpId, > ARCHIVE_ARGS(.tag = tbinfo->dobj.name, >.namespace = > tbinfo->dobj.namespace->dobj.name, >.owner = tbinfo->rolname, >.desc = "TABLE DATA", >.section = SECTION_DATA, >.copyStmt = copyStmt, >.deps = &(tbinfo->dobj.dumpId), >.nDeps = 1, >.dumpFn = dumpFn, >.dumpArg = tdinfo, >)); > which looks mildly simpler too. That looks fairly reasonable from here, but I'd suggest ARCHIVE_OPTS rather than ARCHIVE_ARGS. Can we omit the initial dots if we use a wrapper macro? Would it be a good idea to do so (I'm not really sure)? regards, tom lane
Re: ArchiveEntry optional arguments refactoring
On 2019-01-23 12:32:06 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2019-01-23 13:58:07 -0300, Alvaro Herrera wrote: > >> It does. How does pgindent behave with it? > > > It craps out: > > Error@3649: Unbalanced parens > > Warning@3657: Extra ) > > > But that can be worked around with something like > > > te = ArchiveEntry(fout, tdinfo->dobj.catId, tdinfo->dobj.dumpId, > > ARCHIVE_ARGS(.tag = tbinfo->dobj.name, > >.namespace = > > tbinfo->dobj.namespace->dobj.name, > >.owner = tbinfo->rolname, > >.desc = "TABLE DATA", > >.section = SECTION_DATA, > >.copyStmt = copyStmt, > >.deps = &(tbinfo->dobj.dumpId), > >.nDeps = 1, > >.dumpFn = dumpFn, > >.dumpArg = tdinfo, > >)); > > which looks mildly simpler too. > > That looks fairly reasonable from here, but I'd suggest > ARCHIVE_OPTS rather than ARCHIVE_ARGS. WFM. Seems quite possible that we'd grow a few more of these over time, so establishing some common naming seems good. Btw, do you have an opionion on keeping catId / dumpId outside/inside the argument struct? > Can we omit the initial dots if we use a wrapper macro? Would it be > a good idea to do so (I'm not really sure)? Not easily, if at all, I think. We'd have to do a fair bit of weird macro magic (and then still end up with limitations) to "process" each argument individually. And even if it were easy, I don't think it's particularly advantageous. Greetings, Andres Freund
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
On Fri, Jan 18, 2019 at 9:01 PM Vik Fearing wrote: > I don't want a situation like this: > CREATE INDEX CONCURRENTLY ... > DROP INDEX CONCURRENTLY ... > REINDEX INDEX (CONCURRENTLY) ... > > All three should be the same, and my suggestion is to add the > parenthesized version to CREATE and DROP and not add the unparenthesized > version to REINDEX. +1 for all three being the same. I could see allowing only the unparenthesized format for all three, or allowing both forms for all three, but I think having only one form for each and having them not agree will be too confusing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
st 23. 1. 2019 19:17 odesílatel Robert Haas napsal: > On Fri, Jan 18, 2019 at 9:01 PM Vik Fearing > wrote: > > I don't want a situation like this: > > CREATE INDEX CONCURRENTLY ... > > DROP INDEX CONCURRENTLY ... > > REINDEX INDEX (CONCURRENTLY) ... > > > > All three should be the same, and my suggestion is to add the > > parenthesized version to CREATE and DROP and not add the unparenthesized > > version to REINDEX. > > +1 for all three being the same. I could see allowing only the > unparenthesized format for all three, or allowing both forms for all > three, but I think having only one form for each and having them not > agree will be too confusing. > +1 Pavel > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > >
Re: ArchiveEntry optional arguments refactoring
Andres Freund writes: > Btw, do you have an opionion on keeping catId / dumpId outside/inside > the argument struct? I'd go for outside, since they're not optional. Not dead set on that though. regards, tom lane
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
Hi, On 2019-01-23 13:17:26 -0500, Robert Haas wrote: > On Fri, Jan 18, 2019 at 9:01 PM Vik Fearing > wrote: > > I don't want a situation like this: > > CREATE INDEX CONCURRENTLY ... > > DROP INDEX CONCURRENTLY ... > > REINDEX INDEX (CONCURRENTLY) ... > > > > All three should be the same, and my suggestion is to add the > > parenthesized version to CREATE and DROP and not add the unparenthesized > > version to REINDEX. > > +1 for all three being the same. I could see allowing only the > unparenthesized format for all three, or allowing both forms for all > three, but I think having only one form for each and having them not > agree will be too confusing. It seems quite unnecesarily confusing to me to require parens for REINDEX CONCURRENTLY when we've historically not required that for CREATE/DROP INDEX CONCURRENTLY. Besides that, training people that it's the correct form to use parens for CIC/DIC, creates an unnecessary version dependency. I think it actually makes sense to see the CONCURRENTLY versions as somewhat separate types of statements than the non concurrent versions. They have significantly different transactional behaviour (like not being able to be run within one, and leaving gunk behind in case of error). For me it semantically makes sense to have that denoted at the toplevel, it's a related but different type of DDL statement. Greetings, Andres Freund
Re: ArchiveEntry optional arguments refactoring
On 1/23/19 12:25 PM, Andres Freund wrote: > On 2019-01-23 12:22:23 -0500, Chapman Flack wrote: >> ArchiveEntry(fout, dbCatId, dbDumpId, .tag = datname, .owner = dba, >> .desc = "DATABASE", .section = SECTION_PRE_DATA, >> .defn = creaQry->data, .dropStmt = delQry->data); > IDK, it'd be harder to parse correctly as a C programmer though. ... > weirdly mixing struct arguments and normal function arguments seems > quite confusing. Hmm, I guess the rubric I think with goes something like "is a C programmer who encounters this in a source file for the first time likely to guess wrong about what it means?", and in the case above, I can scarcely imagine it. ISTM that these days, many people are familiar with several languages that allow a few mandatory, positional parameters followed by optional named ones, and so a likely reaction would be "hey look, somebody used a macro here to make C look more like ." On 1/23/19 12:32 PM, Tom Lane wrote: > Can we omit the initial dots if we use a wrapper macro? That, I think, is hard. Getting to the form above is downright easy; making the dots go away, even if achievable, seems way further down the path of diminishing returns. Regards, -Chap
Re: Non-deterministic IndexTuple toast compression from index_form_tuple() + amcheck false positives
On Mon, Jan 14, 2019 at 2:37 PM Peter Geoghegan wrote: > The fix here must be to normalize index tuples that are compressed > within amcheck, both during initial fingerprinting, and during > subsequent probes of the Bloom filter in bt_tuple_present_callback(). I happened to talk to Andres about this in person yesterday. He thought that there was reason to be concerned about the need for logical normalization beyond TOAST issues. Expression indexes were a particular concern, because they could in principle have a change in the on-disk representation without a change of logical values -- false positives could result. He suggested that the long term solution was to bring hash operator class hash functions into Bloom filter hashing, at least where available. I wasn't very enthused about this idea, because it will be expensive and complicated for an uncertain benefit. There are hardly any btree operator classes that can ever have bitwise distinct datums that are equal, anyway (leaving aside issues with TOAST). For the cases that do exist (e.g. numeric_ops display scale), we may not really want to normalize the differences away. Having an index tuple with a numeric_ops datum containing the wrong display scale but with everything else correct still counts as corruption. It now occurs to me that if we wanted to go further than simply normalizing away TOAST differences, my pending nbtree patch could enable a simpler and more flexible way of doing that than bringing hash opclasses into it, at least on the master branch. We could simply do an index look-up for the exact tuple of interest in the event of a Bloom filter probe indicating its apparent absence (corruption) -- even heap TID can participate in the search. In addition, that would cover the whole universe of logical differences, known and unknown (e.g. it wouldn't matter if somebody initialized alignment padding to something non-zero, since that doesn't cause wrong answers to queries). We might even want to offer an option where the Bloom filter is bypassed (we go straight to probing the indexes) some proportion of the time, or when a big misestimation when sizing the Bloom filter is detected (i.e. almost all bits in the Bloom filter bitset are set at the time we start probing the filter). -- Peter Geoghegan
Re: Analyze all plans
On Jan 23, 2019, at 6:46 AM, Tom Lane wrote: > > Oleksandr Shulgin writes: >> On Wed, Jan 23, 2019 at 9:44 AM Donald Dong wrote: >>> 1. Enumerate all the plans > >> So enumerating all possible plans stops being practical for even slightly >> complicated queries. > > Yes. You can *not* disable the planner's aggressive pruning of losing > paths and subpaths without ending up with something that's completely > impractical for anything beyond toy queries. That's just from the > standpoint of planner runtime. Adding on the cost of actually creating > a finished plan and then running it for long enough to get a reliable > runtime for each case would ... well, let's just say you'd be safely dead > before you got any interesting answers. Thank you for the feedback. I didn't think about this carefully. I thought the planner would use GEQO when the number of joins is large, but indeed it's still n! for normal path selection. Now I keep top-k cheapest sub plans, where k is configured by users. If the k is set up properly, setting a timeout may not be necessary. However, if I do want to set a timeout, I would run into the issues I had in step 2. I'm looking forward to hearing more thoughts :) Thanks again!
Re: inherited primary key misbehavior
Hello On 2019-Jan-23, Amit Langote wrote: > It seems that ATExecDetachPartition misses to mark a child's primary key > constraint entry (if any) as local (conislocal=true, coninhcount=0), which > causes this: > > create table p (a int primary key) partition by list (a); > create table p2 partition of p for values in (2); > alter table p detach partition p2; > alter table p2 drop constraint p2_pkey ; > ERROR: cannot drop inherited constraint "p2_pkey" of relation "p2" Ugh. I suppose unique keys would have the same problem, so the check for indisprimary is wrong. Other than that, looks correct. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: problems with foreign keys on partitioned tables
On 2019-Jan-22, Amit Langote wrote: > On 2019/01/22 8:30, Alvaro Herrera wrote: > > Hi Amit, > > > > Will you please rebase 0002? Please add your proposed tests cases to > > it, too. > > Done. See the attached patches for HEAD and PG 11. I'm not quite sure I understand why the one in DefineIndex needs the deps but nothing else does. I fear that you added that one just to appease the existing test that breaks otherwise, and I worry that with that addition we're papering over some other, more fundamental bug. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: Avoid creation of the free space map for small tables
On Mon, Jan 21, 2019 at 6:32 AM Amit Kapila wrote: > Also, another case to think in this regard is the upgrade for standby > servers, if you read below paragraph from the user manual [1], you > will see what I am worried about? > > "What this does is to record the links created by pg_upgrade's link > mode that connect files in the old and new clusters on the primary > server. It then finds matching files in the standby's old cluster and > creates links for them in the standby's new cluster. Files that were > not linked on the primary are copied from the primary to the standby. > (They are usually small.)" > > [1] - https://www.postgresql.org/docs/devel/pgupgrade.html I am still not able to get the upgraded standby to go into recovery without resorting to pg_basebackup, but in another attempt to investigate your question I tried the following (data1 = old cluster, data2 = new cluster): mkdir -p data1 data2 standby echo 'heap' > data1/foo echo 'fsm' > data1/foo_fsm # simulate streaming replication rsync --archive data1 standby # simulate pg_upgrade, skipping FSM ln data1/foo -t data2/ rsync --archive --delete --hard-links --size-only --no-inc-recursive data1 data2 standby # result ls standby/data1 ls standby/data2 The result is that foo_fsm is not copied to standby/data2, contrary to what the docs above imply for other unlinked files. Can anyone shed light on this? -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Thread-unsafe coding in ecpg
On 1/22/19 12:50 PM, Andrew Dunstan wrote: > On 1/21/19 10:00 PM, Andrew Dunstan wrote: >> On 1/21/19 3:25 PM, Tom Lane wrote: >>> "Joshua D. Drake" writes: On 1/21/19 12:05 PM, Tom Lane wrote: > Is there a newer version of mingw that does have this functionality? Apparently this can be done with thee 64bit version: https://stackoverflow.com/questions/33647271/how-to-use-configthreadlocale-in-mingw >>> Hmm, the followup question makes it sound like it still didn't work :-(. >>> >>> However, since the mingw build is autoconf-based, seems like we can >>> install a configure check instead of guessing. Will make it so. >>> >>> Task for somebody else: run a MinGW64 buildfarm member. >>> >>> >> I could set that up in just about two shakes of a lamb's tail - I have a >> script to do so all tested on vagrant/aws within the last few months. >> >> >> What I don't have is resources. My Windows resources are pretty much >> tapped out. I would need either some modest hardware or a Windows VM >> somewhere in the cloud - could be anywhere but I'm most at home on AWS. >> > > Incidentally, jacana *is* running mingw64, but possibly not a young > enough version. I just ran a test on an up to date version and it found > the config setting happily, and passed the make stage. > > > I can look at upgrading jacana to a more modern compiler. The version > it's running is 4.9.1, apparently built around Oct 2014, so it's not > that old. > > I have just spent a large amount of time testing the committed fix with a number of versions of gcc. It blows up on any compiler modern enough to know about _configthreadlocale Essentially what happens is this: == running regression test queries == test compat_informix/dec_test ... ok test compat_informix/charfuncs ... ok test compat_informix/rfmtdate ... ok test compat_informix/rfmtlong ... ok test compat_informix/rnull ... stdout stderr FAILED test compat_informix/sqlda ... stdout stderr FAILED (test process exited with exit code 1) test compat_informix/describe ... stdout stderr FAILED (test process exited with exit code 1) test compat_informix/test_informix ... At this point the regression tests hang and the disk starts filling up rapidly. This has been tested with versions 5.4.0, 6.4.0, 7.3.0, 8.1.0 and 8.2.1. The first 4 are downloaded direct from the Mingw64 repo, the last is from Msys2's mingw64 toolchain. I'm trying to find out more info, but what we have right now is pretty broken. Reverting commits ee27584c4a and 8eb4a9312 cures[*] the problem. cheers andrew [*] FSVO "cure". I am still getting intermittent errors but no hangs. -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Thread-unsafe coding in ecpg
Andrew Dunstan writes: > I have just spent a large amount of time testing the committed fix with > a number of versions of gcc. It blows up on any compiler modern enough > to know about _configthreadlocale Bleah. Since the regular Windows buildfarm members seem happy, this evidently means that MinGW's _configthreadlocale is broken in some way. I suppose we could just remove the autoconf test and build without _configthreadlocale on MinGW, but that's kind of sad ... Perhaps there's some sort of setup that MinGW's version needs that we're not doing? regards, tom lane
Re: index_build does not need its isprimary argument
On Tue, Jan 22, 2019 at 05:08:52PM +0900, Michael Paquier wrote: > While working on REINDEX CONCURRENTLY, I have noticed that > index_build() does not need its argument isprimary. Perhaps it is > not worth bothering, but for the REINDEX CONCURRENTLY business this > removes the need to open an index when triggering a concurrent > build. > > The flag was introduced in 3fdeb189, but f66e8bf actually forgot to > finish the cleanup as index_update_stats() has simplified its > interface. And committed as of 289198c, and back to the real deal.. -- Michael signature.asc Description: PGP signature
Re: Protect syscache from bloating with negative cache entries
On Wed, Jan 23, 2019 at 05:35:02PM +0900, Kyotaro HORIGUCHI wrote: > At Mon, 21 Jan 2019 17:22:55 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI > wrote in > <20190121.172255.226467552.horiguchi.kyot...@lab.ntt.co.jp> > > An option is an additional PGPROC member and interface functions. > > > > struct PGPROC > > { > > ... > > int syscahe_usage_track_interval; /* track interval, 0 to disable */ > > > > =# select syscahce_usage_track_add(, [, ]); > > =# select syscahce_usage_track_remove(2134); > > > > > > Or, just provide an one-shot triggering function. > > > > =# select syscahce_take_usage_track(); > > > > This can use both a similar PGPROC variable or SendProcSignal() > > but the former doesn't fire while idle time unless using timer. > > The attached is revised version of this patchset, where the third > patch is the remote setting feature. It uses static shared memory. > > =# select pg_backend_catcache_stats(, ); > > Activates or changes catcache stats feature on the backend with > PID. (The name should be changed to .._syscache_stats, though.) > It is far smaller than the remote-GUC feature. (It contains a > part that should be in the previous patch. I will fix it later.) I have a few questions to make sure we have not made the API too complex. First, for syscache_prune_min_age, that is the minimum age that we prune, and entries could last twice that long. Is there any value to doing the scan at 50% of the age so that the syscache_prune_min_age is the max age? For example, if our age cutoff is 10 minutes, we could scan every 5 minutes so 10 minutes would be the maximum age kept. Second, when would you use syscache_memory_target != 0? If you had syscache_prune_min_age really fast, e.g. 10 seconds? What is the use-case for this? You have a query that touches 10k objects, and then the connection stays active but doesn't touch many of those 10k objects, and you want it cleaned up in seconds instead of minutes? (I can't see why you would not clean up all unreferenced objects after _minutes_ of disuse, but removing them after seconds of disuse seems undesirable.) What are the odds you would retain the entires you want with a fast target? What is the value of being able to change a specific backend's stat interval? I don't remember any other setting having this ability. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: pg_dump multi VALUES INSERT
On Thu, 24 Jan 2019 at 04:45, Fabien COELHO wrote: > I still do not understand the need for another variable. > >int ninserts = 0; // default is to use copy >while (getopt...) >{ > switch (...) { >case "--inserts": > if (ninserts == 0) ninserts = 1; > break; >case "--rows-per-insert": > ninserts = arg_value; > checks... > break; > ... I didn't think of that. Attached is a version that changes it to work along those lines. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services pg_dump-rows-per-insert-option_v12.patch Description: Binary data
Re: Thread-unsafe coding in ecpg
On 1/23/19 6:01 PM, Tom Lane wrote: > Andrew Dunstan writes: >> I have just spent a large amount of time testing the committed fix with >> a number of versions of gcc. It blows up on any compiler modern enough >> to know about _configthreadlocale > Bleah. Since the regular Windows buildfarm members seem happy, this > evidently means that MinGW's _configthreadlocale is broken in some way. > > I suppose we could just remove the autoconf test and build without > _configthreadlocale on MinGW, but that's kind of sad ... > > Perhaps there's some sort of setup that MinGW's version needs that > we're not doing? This seems to suggest something worse: https://reviews.llvm.org/D40181 Not sure I fully understand what's happening here, though. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Delay locking partitions during INSERT and UPDATE
On Tue, Jan 22, 2019 at 4:31 PM David Rowley wrote: > Thanks. I've attached a rebased version. Hmm, now instead of an 85x speedup over master in the 10k partition case, I only get 20x. Anyone else see this? -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Delay locking partitions during INSERT and UPDATE
On Thu, 24 Jan 2019 at 13:38, John Naylor wrote: > > On Tue, Jan 22, 2019 at 4:31 PM David Rowley > wrote: > > Thanks. I've attached a rebased version. > > Hmm, now instead of an 85x speedup over master in the 10k partition > case, I only get 20x. Anyone else see this? What's it like with fsync off? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: inherited primary key misbehavior
On 2019/01/24 6:10, Alvaro Herrera wrote: > Hello > > On 2019-Jan-23, Amit Langote wrote: > >> It seems that ATExecDetachPartition misses to mark a child's primary key >> constraint entry (if any) as local (conislocal=true, coninhcount=0), which >> causes this: >> >> create table p (a int primary key) partition by list (a); >> create table p2 partition of p for values in (2); >> alter table p detach partition p2; >> alter table p2 drop constraint p2_pkey ; >> ERROR: cannot drop inherited constraint "p2_pkey" of relation "p2" > > Ugh. > > I suppose unique keys would have the same problem, so the check for > indisprimary is wrong. Fixed in the attached. We don't support inheriting EXCLUSION constraints yet, so no need to consider their indexes. Thanks, Amit From d794906e8d2a0839f548f20be5817f306b833b15 Mon Sep 17 00:00:00 2001 From: amit Date: Wed, 23 Jan 2019 18:33:09 +0900 Subject: [PATCH v2] Detach primary key constraint when detaching partition --- src/backend/commands/tablecmds.c | 16 src/test/regress/expected/indexing.out | 15 +++ src/test/regress/sql/indexing.sql | 14 ++ 3 files changed, 45 insertions(+) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 28a137bb53..e3baf73e7b 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -15094,6 +15094,7 @@ ATExecDetachPartition(Relation rel, RangeVar *name) foreach(cell, indexes) { Oid idxid = lfirst_oid(cell); + Oid constrOid; Relationidx; if (!has_superclass(idxid)) @@ -15106,6 +15107,21 @@ ATExecDetachPartition(Relation rel, RangeVar *name) IndexSetParentIndex(idx, InvalidOid); update_relispartition(classRel, idxid, false); index_close(idx, NoLock); + + /* +* Detach any constraints associated with the index too. Only UNIQUE +* and PRIMARY KEY index constraints can be inherited. +*/ + if (!idx->rd_index->indisprimary && !idx->rd_index->indisunique) + continue; + + constrOid = get_relation_idx_constraint_oid(RelationGetRelid(partRel), + idxid); + if (!OidIsValid(constrOid)) + elog(ERROR, "missing pg_constraint entry of index %u of %u", +idxid, RelationGetRelid(partRel)); + + ConstraintSetParentConstraint(constrOid, InvalidOid); } table_close(classRel, RowExclusiveLock); diff --git a/src/test/regress/expected/indexing.out b/src/test/regress/expected/indexing.out index 118f2c78df..b344351ee7 100644 --- a/src/test/regress/expected/indexing.out +++ b/src/test/regress/expected/indexing.out @@ -1414,3 +1414,18 @@ DETAIL: Key (a)=(4) already exists. create unique index on covidxpart (b) include (a); -- should fail ERROR: insufficient columns in UNIQUE constraint definition DETAIL: UNIQUE constraint on table "covidxpart" lacks column "a" which is part of the partition key. +-- check that detaching a partition also detaches the primary key constraint +create table parted_pk_detach_test (a int primary key) partition by list (a); +create table parted_pk_detach_test1 partition of parted_pk_detach_test for values in (1); +alter table parted_pk_detach_test1 drop constraint parted_pk_detach_test1_pkey;-- should fail +ERROR: cannot drop inherited constraint "parted_pk_detach_test1_pkey" of relation "parted_pk_detach_test1" +alter table parted_pk_detach_test detach partition parted_pk_detach_test1; +alter table parted_pk_detach_test1 drop constraint parted_pk_detach_test1_pkey; +drop table parted_pk_detach_test, parted_pk_detach_test1; +create table parted_uniq_detach_test (a int unique) partition by list (a); +create table parted_uniq_detach_test1 partition of parted_uniq_detach_test for values in (1); +alter table parted_uniq_detach_test1 drop constraint parted_uniq_detach_test1_a_key; -- should fail +ERROR: cannot drop inherited constraint "parted_uniq_detach_test1_a_key" of relation "parted_uniq_detach_test1" +alter table parted_uniq_detach_test detach partition parted_uniq_detach_test1; +alter table parted_uniq_detach_test1 drop constraint parted_uniq_detach_test1_a_key; +drop table parted_uniq_detach_test, parted_uniq_detach_test1; diff --git a/src/test/regress/sql/indexing.sql b/src/test/regress/sql/indexing.sql index d4a64c18c7..3d43a57da2 100644 --- a/src/test/regress/sql/indexing.sql +++ b/src/test/regress/sql/indexing.sql @@ -757,3 +757,17 @@ alter table covidxpart attach partition covidxpart4 for values in (4); insert into covidxpart values (4, 1); insert into covidxpart values (4, 1); create unique index on c
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
On Wed, 23 Jan 2019 at 12:46, David Rowley wrote: > (Stopped in statext_mcv_build(). Need to take a break) Continuing... 27. statext_mcv_build() could declare the int j,k variables in the scope that they're required in. 28. "an array" * build array of SortItems for distinct groups and counts matching items 29. No need to set isnull to false in statext_mcv_load() 30. Wondering about the reason in statext_mcv_serialize() that you're not passing the collation to sort the array. You have: ssup[dim].ssup_collation = DEFAULT_COLLATION_OID; should it not be: ssup[dim].ssup_collation = stats[dim]->attrcollid; ? 31. uint32 should use %u, not %d: if (mcvlist->magic != STATS_MCV_MAGIC) elog(ERROR, "invalid MCV magic %d (expected %d)", mcvlist->magic, STATS_MCV_MAGIC); and if (mcvlist->type != STATS_MCV_TYPE_BASIC) elog(ERROR, "invalid MCV type %d (expected %d)", mcvlist->type, STATS_MCV_TYPE_BASIC); and ereport(ERROR, (errcode(ERRCODE_DATA_CORRUPTED), errmsg("invalid length (%d) item array in MCVList", mcvlist->nitems))); I don't think %ld is the correct format for VARSIZE_ANY_EXHDR. %u or %d seem more suited. I see that value is quite often assigned to int, so probably can't argue much with %d. elog(ERROR, "invalid MCV size %ld (expected %zu)", VARSIZE_ANY_EXHDR(data), expected_size); 32. I think the format is wrong here too: elog(ERROR, "invalid MCV size %ld (expected %ld)", VARSIZE_ANY_EXHDR(data), expected_size); I'd expect "invalid MCV size %d (expected %zu)" 33. How do you allocate a single chunk non-densely? * Allocate one large chunk of memory for the intermediate data, needed * only for deserializing the MCV list (and allocate densely to minimize * the palloc overhead). 34. I thought I saw a few issues with pg_stats_ext_mcvlist_items() so tried to test it: create table ab (a int, b int); insert into ab select x,x from generate_serieS(1,10)x; create statistics ab_ab_stat (mcv) on a,b from ab; analyze ab; select pg_mcv_list_items(stxmcv) from pg_Statistic_ext where stxmcv is not null; ERROR: cache lookup failed for type 2139062143 The issues I saw were: You do: appendStringInfoString(&itemValues, "{"); appendStringInfoString(&itemNulls, "{"); but never append '}' after building the string. (can use appendStringInfoChar() BTW) also: if (i == 0) { appendStringInfoString(&itemValues, ", "); appendStringInfoString(&itemNulls, ", "); } I'd have expected you to append the ", " only when i > 0. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Block level parallel vacuum
On Tue, Jan 22, 2019 at 9:59 PM Haribabu Kommi wrote: > > > Thanks for the latest patch. I have some more minor comments. Thank you for reviewing the patch. > > + Execute index vacuum and cleanup index in parallel with > > Better to use vacuum index and cleanup index? This is in same with > the description of vacuum phases. It is better to follow same notation > in the patch. Agreed. I've changed it to "Vacuum index and cleanup index in parallel with ...". > > > + dead_tuples = lazy_space_alloc(lvstate, nblocks, parallel_workers); > > With the change, the lazy_space_alloc takes care of initializing the > parallel vacuum, can we write something related to that in the comments. > Agreed. > > + initprog_val[2] = dead_tuples->max_dead_tuples; > > dead_tuples variable may need rename for better reading? > I might not get your comment correctly but I've tried to fix it. Please review it. > > > + if (lvshared->indstats[idx].updated) > + result = &(lvshared->indstats[idx].stats); > + else > + copy_result = true; > > > I don't see a need for copy_result variable, how about directly using > the updated flag to decide whether to copy or not? Once the result is > copied update the flag. > You're right. Fixed. > > +use Test::More tests => 34; > > I don't find any new tetst are added in this patch. Fixed. > > I am thinking of performance penalty if we use the parallel option of > vacuum on a small sized table? Hm, unlike other parallel operations other than ParallelAppend the parallel vacuum executes multiple index vacuum simultaneously. Therefore this can avoid contension. I think there is a performance penalty but it would not be big. Attached the latest patches. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center From 53a21a6983d6ef2787bc6e104d0ce62ed905a4d3 Mon Sep 17 00:00:00 2001 From: Masahiko Sawada Date: Tue, 18 Dec 2018 14:48:34 +0900 Subject: [PATCH v13 1/2] Add parallel option to VACUUM command In parallel vacuum, we do both index vacuum and cleanup vacuum in parallel with parallel worker processes if the table has more than one index. All processes including the leader process process indexes one by one. Parallel vacuum can be performed by specifying like VACUUM (PARALLEL 2) tbl, meaning that performing vacuum with 2 parallel worker processes. Or the setting parallel_workers reloption more than 0 invokes parallel vacuum. The parallel vacuum degree is limited by both the number of indexes the table has and max_parallel_maintenance_workers. --- doc/src/sgml/config.sgml | 24 +- doc/src/sgml/ref/vacuum.sgml | 28 ++ src/backend/access/heap/vacuumlazy.c | 892 +- src/backend/access/transam/parallel.c | 4 + src/backend/commands/vacuum.c | 78 +-- src/backend/nodes/equalfuncs.c| 6 +- src/backend/parser/gram.y | 73 ++- src/backend/postmaster/autovacuum.c | 13 +- src/backend/tcop/utility.c| 4 +- src/include/access/heapam.h | 5 +- src/include/commands/vacuum.h | 2 +- src/include/nodes/parsenodes.h| 19 +- src/test/regress/expected/vacuum.out | 2 + src/test/regress/sql/vacuum.sql | 3 + 14 files changed, 945 insertions(+), 208 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index b6f5822..b77a2bd 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -2185,18 +2185,18 @@ include_dir 'conf.d' Sets the maximum number of parallel workers that can be - started by a single utility command. Currently, the only - parallel utility command that supports the use of parallel - workers is CREATE INDEX, and only when - building a B-tree index. Parallel workers are taken from the - pool of processes established by , limited by . Note that the requested - number of workers may not actually be available at run time. - If this occurs, the utility operation will run with fewer - workers than expected. The default value is 2. Setting this - value to 0 disables the use of parallel workers by utility - commands. + started by a single utility command. Currently, the parallel + utility commands that support the use of parallel workers are + CREATE INDEX and VACUUM + without FULL option, and only when building + a B-tree index. Parallel workers are taken from the pool of + processes established by , + limited by . + Note that the requested number of workers may not actually be + available at run time. If this occurs, the utility operation + will run with fewer workers than expected. The default value + is 2. Setting this value to 0 disables the use of parallel + workers by utility commands. diff --git a/doc/src/sgml/ref/vacuum
Re: inherited primary key misbehavior
On 2019-Jan-24, Amit Langote wrote: > Fixed in the attached. We don't support inheriting EXCLUSION constraints > yet, so no need to consider their indexes. Looks good, pushed. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Thread-unsafe coding in ecpg
Andrew Dunstan writes: > On 1/23/19 6:01 PM, Tom Lane wrote: >> Perhaps there's some sort of setup that MinGW's version needs that >> we're not doing? > This seems to suggest something worse: https://reviews.llvm.org/D40181 > Not sure I fully understand what's happening here, though. Me either, but I noted a couple of interesting extracts from that page: Normal mingw that uses msvcrt.dll doesn't have per-thread locales so it won't really work in any case (but I think it does define some sort of dummy functions that at least will allow it to build). Nowadays, mingw can be built to target ucrtbase.dll as well though, and there it should be possible to make it work just like for MSVC although it might need some patches. ... Looked into MinGW-w64 sources and this is indeed the case. _configthreadlocale will return -1 and will not do anything. This suggests that, rather than throwing up our hands if the initial _configthreadlocale call returns -1, we should act as though the function doesn't exist, and just soldier on the same as before. The code in there assumes that -1 is a can't-happen case and doesn't try to recover, but apparently that's over-optimistic. regards, tom lane
Re: WIP: Avoid creation of the free space map for small tables
On Wed, Jan 23, 2019 at 9:18 PM John Naylor wrote: > > On Wed, Jan 23, 2019 at 7:09 AM Amit Kapila wrote: > > I think the first two patches (a) removal of dead code in bootstrap > > and (b) the core patch to avoid creation of FSM file for the small > > table are good now. I have prepared the patches along with commit > > message. There is no change except for some changes in README and > > commit message of the second patch. Kindly let me know what you think > > about them? > > Good to hear! The additional language is fine. In "Once the FSM is > created for heap", I would just change that to "...for a heap". > Sure, apart from this I have run pgindent on the patches and make some changes accordingly. Latest patches attached (only second patch has some changes). I will take one more pass on Monday morning (28th Jan) and will commit unless you or others see any problem. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com v02-0001-In-bootstrap-mode-don-t-allow-the-creation-of-files-.patch Description: Binary data v18-0002-Avoid-creation-of-the-free-space-map-for-small-heap-.patch Description: Binary data
Re: Thread-unsafe coding in ecpg
I wrote: > This suggests that, rather than throwing up our hands if the initial > _configthreadlocale call returns -1, we should act as though the function > doesn't exist, and just soldier on the same as before. The code in there > assumes that -1 is a can't-happen case and doesn't try to recover, > but apparently that's over-optimistic. I pushed a patch to fix that. It looks to me like the reason that the ecpg tests went into an infinite loop is that compat_informix/test_informix.pgc has not considered the possibility of repeated statement failures: while (1) { $fetch forward c into :i, :j, :c; if (sqlca.sqlcode == 100) break; else if (sqlca.sqlcode != 0) printf ("Error: %ld\n", sqlca.sqlcode); if (risnull(CDECIMALTYPE, (char *)&j)) printf("%d NULL\n", i); else { int a; dectoint(&j, &a); printf("%d %d \"%s\"\n", i, a, c); } } I know zip about ecpg coding practices, but wouldn't it be a better idea to break out of the loop on seeing an error? regards, tom lane
Re: inherited primary key misbehavior
On 2019/01/24 12:03, Alvaro Herrera wrote: > On 2019-Jan-24, Amit Langote wrote: > >> Fixed in the attached. We don't support inheriting EXCLUSION constraints >> yet, so no need to consider their indexes. > > Looks good, pushed. Thank you. Regards, Amit
Re: WIP: Avoid creation of the free space map for small tables
On Thu, Jan 24, 2019 at 3:39 AM John Naylor wrote: > > On Mon, Jan 21, 2019 at 6:32 AM Amit Kapila wrote: > > Also, another case to think in this regard is the upgrade for standby > > servers, if you read below paragraph from the user manual [1], you > > will see what I am worried about? > > > > "What this does is to record the links created by pg_upgrade's link > > mode that connect files in the old and new clusters on the primary > > server. It then finds matching files in the standby's old cluster and > > creates links for them in the standby's new cluster. Files that were > > not linked on the primary are copied from the primary to the standby. > > (They are usually small.)" > > > > [1] - https://www.postgresql.org/docs/devel/pgupgrade.html > > I am still not able to get the upgraded standby to go into recovery > without resorting to pg_basebackup, but in another attempt to > investigate your question I tried the following (data1 = old cluster, > data2 = new cluster): > > > mkdir -p data1 data2 standby > > echo 'heap' > data1/foo > echo 'fsm' > data1/foo_fsm > > # simulate streaming replication > rsync --archive data1 standby > > # simulate pg_upgrade, skipping FSM > ln data1/foo -t data2/ > > rsync --archive --delete --hard-links --size-only --no-inc-recursive > data1 data2 standby > > # result > ls standby/data1 > ls standby/data2 > > > The result is that foo_fsm is not copied to standby/data2, contrary to > what the docs above imply for other unlinked files. Can anyone shed > light on this? > Is foo_fsm present in standby/data1? I think what doc means to say is that it copies any unlinked files present in primary's new cluster (which in your case will be data2). -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Thread-unsafe coding in ecpg
Oh, I just noticed something else: several of the ecpg test programs contain #ifdef WIN32 #ifdef _MSC_VER/* requires MSVC */ _configthreadlocale(_ENABLE_PER_THREAD_LOCALE); #endif #endif Surely this is a kluge that we could now remove? We've protected the only setlocale calls that the ecpg tests could reach in threaded mode. If there is still a problem, it'd behoove us to find it, because ecpglib should not expect that the calling app has done this for it. regards, tom lane
Re: RTLD_GLOBAL (& JIT inlining)
Hello. At Mon, 26 Feb 2018 23:50:41 +0200, Ants Aasma wrote in > On Mon, Feb 26, 2018 at 11:28 PM, Andres Freund wrote: > > So RTLD_LOCAL is out of the question, but I think we can get a good bit > > of the benefit by either specifying -Wl,-Bsymbolic at shlib build time, > > or RTLD_DEEPBIND at dlopen() time. Either leads to the opened shared > > library effectively being put at the beginning of the search path, > > therefore avoiding the issue that an earlier loaded shared library or > > symbols from the main binary can accidentally overwrite things in the > > shared library itself. Which incidentally also makes loading a bit > > faster. > > I think this would also fix oracle_fdw crashing when postgres is > compiled with --with-ldap. At least RTLD_DEEPBIND helped. [1] > > [1] > https://www.postgresql.org/message-id/CA%2BCSw_tPDYgnzCYW0S4oU0mTUoUhZ9pc7MRBPXVD-3Zbiwni9w%40mail.gmail.com I saw several cases of the misbinding specifically among extensions using different versions of the same library or a part of which was just transplanted from another extension. Now I'm seeing a case nearby again and found this thread. Some pl-libraries (plpython does but plperl doesn't for me) mandates to export their symbols for external modules. So RTLD_GLOBAL is needed for such extensions but not needed in most cases. We need a means to instruct whether a module wants to expose symbols or not. The extension control file doesn't seem the place. The most appropriate way seems to be another magic data. With the attached patch, external modules are loaded RTLD_LOCAL by default. PG_MODULE_EXPORT_SYMBOL in a module source files let it loaded RTLD_GLOBAL. I suppose this is the change with the least impact on existing modules because I believe most of them don't need to export symbols. I think this works for all platforms that have dlopen or shl_load. Windows doesn't the equivalent but anyway we should explicitly declare using dllexport. Any opinions or thoughts? regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 114692655060bf74d01e0f5452e89f3bd332dfa1 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 24 Jan 2019 13:35:19 +0900 Subject: [PATCH] Add control on wheter symbols in external module is exported We load exteral modlues with RTLD_GLOBAL, which is problematic when two or more modules share the same symbol. On the other hand some modules mandates to export its symbols. This patch enables modules to decide whether to export them or not. They is defaultly hidden. plpython module is modified along with as it needs to export symbols. --- doc/src/sgml/xfunc.sgml| 8 src/backend/utils/fmgr/dfmgr.c | 13 - src/include/fmgr.h | 8 src/pl/plpython/plpy_main.c| 1 + 4 files changed, 29 insertions(+), 1 deletion(-) diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml index e18272c33a..9a65e76eda 100644 --- a/doc/src/sgml/xfunc.sgml +++ b/doc/src/sgml/xfunc.sgml @@ -1875,6 +1875,14 @@ PG_MODULE_MAGIC; (Presently, unloads are disabled and will never occur, but this may change in the future.) + + The dynamic loader loads a file as its symbols are hidden from external + modules by default. A file is loaded exporting the symbols by writhing + this in one of the module source files. + +PG_MODULE_EXPORT_SYMBOL; + + diff --git a/src/backend/utils/fmgr/dfmgr.c b/src/backend/utils/fmgr/dfmgr.c index 456297a531..98ac1cc438 100644 --- a/src/backend/utils/fmgr/dfmgr.c +++ b/src/backend/utils/fmgr/dfmgr.c @@ -236,7 +236,7 @@ internal_load_library(const char *libname) #endif file_scanner->next = NULL; - file_scanner->handle = dlopen(file_scanner->filename, RTLD_NOW | RTLD_GLOBAL); + file_scanner->handle = dlopen(file_scanner->filename, RTLD_NOW | RTLD_LOCAL); if (file_scanner->handle == NULL) { load_error = dlerror(); @@ -281,6 +281,17 @@ internal_load_library(const char *libname) errhint("Extension libraries are required to use the PG_MODULE_MAGIC macro."))); } + /* Check if the module wants to export symbols */ + if (dlsym(file_scanner->handle, PG_MODULE_EXPORT_SYMBOL_NAME_STRING)) + { + elog(DEBUG3, + "loaded dynamic-link library \"%s\" with RTLD_GLOBAL", libname); + /* want to export, reload it the way */ + dlclose(file_scanner->handle); + file_scanner->handle = dlopen(file_scanner->filename, + RTLD_NOW | RTLD_GLOBAL); + } + /* * If the library has a _PG_init() function, call it. */ diff --git a/src/include/fmgr.h b/src/include/fmgr.h index ead17f0e44..c072850646 100644 --- a/src/include/fmgr.h +++ b/src/include/fmgr.h @@ -456,6 +456,14 @@ PG_MAGIC_FUNCTION_NAME(void) \ } \ extern int no_such_variable +#define PG_MODULE_EXPORT_SYMBOL_NAME Pg_module_exrpot_symbols +#define PG_MODULE_EXPORT_SYMBOL_NAME_STRING "Pg_module_exrpot_symbols" + +#define PG_MODULE_EXPORT_SYMBOL \ +extern PGDLLEXPORT void PG_MODULE_EXPORT_SYMBOL
Re: RTLD_GLOBAL (& JIT inlining)
Kyotaro HORIGUCHI writes: > With the attached patch, external modules are loaded RTLD_LOCAL > by default. PG_MODULE_EXPORT_SYMBOL in a module source files let > it loaded RTLD_GLOBAL. I suppose this is the change with the > least impact on existing modules because I believe most of them > don't need to export symbols. This seems really hacky :-( In the PL cases, at least, it's not so much the PL itself that wants to export symbols as the underlying language library (libpython, libperl, etc). You're assuming that an action on the PL .so will propagate to the underlying language .so, which seems like a pretty shaky assumption. I wonder also whether closing and reopening the DL handle will really do anything at all for already-loaded libraries ... regards, tom lane
RE: speeding up planning with partitions
On Wed, Jan 23, 2019 at 1:35 AM, Amit Langote wrote: > Rebased due to the heap_open/close() -> table_open/close() change. Maybe there are not many things I can point out through reviewing the patch, so I ran the performance test against v17 patches instead of reviewing codes. There are already a lot of tests about partition pruning case and we confirmed performance improves in those cases. In this time, I tested about accessing all partitions case. I tested with master, master + 0001, master + 0001 + 0002, ..., master + 0001 + 0002 + 0003 + 0004. I ran pgbench 3 times in each test case and below results are average of those. [postgresql.conf] max_parallel_workers = 0 max_parallel_workers_per_gather = 0 [partition table definitions(8192 partitions case)] create table rt (a int, b int, c int) partition by range (a) create table rt_1 partition of rt for values from (1) to (2); ... create table rt_8192 partition of rt for values from (8191) to (8192); [pgbench commands] pgbench -n -f update.sql -T 30 postgres [update.sql(updating partkey case)] update rt set a = 1; [update.sql(updating non-partkey case)] update rt set b = 1; [results] updating partkey case: part-num master 0001 0002 0003 0004 18215.34 7924.99 7931.15 8407.40 8475.65 27137.49 7026.45 7128.84 7583.08 7593.73 45880.54 5896.47 6014.82 6405.33 6398.71 84222.96 4446.40 4518.54 4802.43 4785.82 16 2634.91 2891.51 2946.99 3085.81 3087.91 32935.12 1125.28 1169.17 1199.44 1202.04 64352.37 405.27 417.09 425.78 424.53 128 236.26 310.01 307.70 315.29 312.81 25665.3686.8487.6784.3989.27 51218.3424.8423.5523.9123.91 10244.83 6.93 6.51 6.45 6.49 updating non-partkey case: part-num master0001 0002 0003 0004 18862.58 8421.49 8575.35 9843.71 10065.30 27715.05 7575.78 7654.28 8800.84 8720.60 46249.95 6321.32 6470.26 7278.14 7280.10 84514.82 4730.48 4823.37 5382.93 5341.10 16 2815.21 3123.27 3162.51 3422.36 3393.94 32968.45 1702.47 1722.38 1809.89 1799.88 64364.17 420.48 432.87 440.20435.31 128 119.94 148.77 150.47 152.18143.35 25645.0946.3546.9348.30 45.85 512 8.7410.5910.2310.27 10.13 10242.28 2.60 2.56 2.57 2.51 Looking at the results, if we only apply 0001 or 0001 + 0002 and if number of partition is few like 1 or 2, performance degrades compare to master(A maximum reduction is about 5%, which is 8863->8421). In all other cases, performance improves compare to master. -- Yoshikazu Imai
Re: postgres_fdw: oddity in costing aggregate pushdown paths
(2019/01/23 20:35), Etsuro Fujita wrote: > Attached is an updated version of the patch. > > Changes: > * Fixed a stupid bug in the case when use_remote_estimate > * Fixed a typo in a comment I added > * Modified comments I added a little bit further > * Added the commit message > > If there are no objections, I'll push that. Pushed after fixing another typo in a comment. Best regards, Etsuro Fujita
Re: pg_dump multi VALUES INSERT
On Wed, 23 Jan 2019 at 02:13, Surafel Temesgen wrote: > okay .thank you for explaining. i attach a patch corrected as such I did a bit of work to this to fix a bunch of things: 1. Docs for --rows-per-insert didn't mention anything about a parameter. 2. You'd not followed the alphabetical order of how the parameters are documented. 3. Various parts of the docs claimed that --inserts just inserted 1 row per statement. Those needed to be updated. 4. New options out of order in --help. The rest were in alphabetical order. 5. DumpOptions struct variable was not in the right place. It was grouped in with some parameterless options. 6. Code in dumpTableData_insert() was convoluted. Not sure what you had added end_of_statement for or why you were checking PQntuples(res) == 0. You'd also made the number_of_row variable 1-based and set it to 1 when we had added 0 rows. You then checked for the existence of 1 row by checking the variable was > 1... That made very little sense to me. I've pretty much put back the code that I had sent to you previously, just without the part where I split the row building code out into another function. 7. A comment in dumpTableData_insert() claimed that the insertStmt would end in "VALUES(", but it'll end in "VALUES ". I had updated that in my last version, but you must have missed that. 8. I've made it so --rows-per-insert implies --inserts. This is aligned with how --column-inserts behaves. I changed a few other things. I simplified the condition that raises an error when someone does --on-conflict-do-nothing without the --inserts option. There was no need to check for the other options that imply --inserts since that will already be enabled if one of the other options has. I also removed most of the error checking you'd added to ensure that the --rows-per-insert parameter was a number. I'd have kept this but I saw that we do nothing that special for the compression option. I didn't see why --rows-per-insert was any more special. It was quite a bit of code for very little reward. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services pg_dump-rows-per-insert-option_v10.patch Description: Binary data
Re: Protect syscache from bloating with negative cache entries
At Mon, 21 Jan 2019 17:22:55 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI wrote in <20190121.172255.226467552.horiguchi.kyot...@lab.ntt.co.jp> > An option is an additional PGPROC member and interface functions. > > struct PGPROC > { > ... > int syscahe_usage_track_interval; /* track interval, 0 to disable */ > > =# select syscahce_usage_track_add(, [, ]); > =# select syscahce_usage_track_remove(2134); > > > Or, just provide an one-shot triggering function. > > =# select syscahce_take_usage_track(); > > This can use both a similar PGPROC variable or SendProcSignal() > but the former doesn't fire while idle time unless using timer. The attached is revised version of this patchset, where the third patch is the remote setting feature. It uses static shared memory. =# select pg_backend_catcache_stats(, ); Activates or changes catcache stats feature on the backend with PID. (The name should be changed to .._syscache_stats, though.) It is far smaller than the remote-GUC feature. (It contains a part that should be in the previous patch. I will fix it later.) regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 067f8ad60f259453271d2bf8323505beb5b9e0a9 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Tue, 16 Oct 2018 13:04:30 +0900 Subject: [PATCH 1/3] Remove entries that haven't been used for a certain time Catcache entries can be left alone for several reasons. It is not desirable that they eat up memory. With this patch, This adds consideration of removal of entries that haven't been used for a certain time before enlarging the hash array. --- doc/src/sgml/config.sgml | 38 ++ src/backend/access/transam/xact.c | 5 + src/backend/utils/cache/catcache.c| 166 -- src/backend/utils/misc/guc.c | 23 src/backend/utils/misc/postgresql.conf.sample | 2 + src/include/utils/catcache.h | 28 - 6 files changed, 254 insertions(+), 8 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index b6f5822b84..af3c52b868 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1662,6 +1662,44 @@ include_dir 'conf.d' + + syscache_memory_target (integer) + + syscache_memory_target configuration parameter + + + + +Specifies the maximum amount of memory to which syscache is expanded +without pruning. The value defaults to 0, indicating that pruning is +always considered. After exceeding this size, syscache pruning is +considered according to +. If you need to keep +certain amount of syscache entries with intermittent usage, try +increase this setting. + + + + + + syscache_prune_min_age (integer) + + syscache_prune_min_age configuration parameter + + + + +Specifies the minimum amount of unused time in seconds at which a +syscache entry is considered to be removed. -1 indicates that syscache +pruning is disabled at all. The value defaults to 600 seconds +(10 minutes). The syscache entries that are not +used for the duration can be removed to prevent syscache bloat. This +behavior is suppressed until the size of syscache exceeds +. + + + + max_stack_depth (integer) diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index 18467d96d2..dbffec8067 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -733,7 +733,12 @@ void SetCurrentStatementStartTimestamp(void) { if (!IsParallelWorker()) + { stmtStartTimestamp = GetCurrentTimestamp(); + + /* Set this timestamp as aproximated current time */ + SetCatCacheClock(stmtStartTimestamp); + } else Assert(stmtStartTimestamp != 0); } diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c index 8152f7e21e..ee40093553 100644 --- a/src/backend/utils/cache/catcache.c +++ b/src/backend/utils/cache/catcache.c @@ -72,9 +72,24 @@ #define CACHE6_elog(a,b,c,d,e,f,g) #endif +/* + * GUC variable to define the minimum size of hash to cosider entry eviction. + * This variable is shared among various cache mechanisms. + */ +int cache_memory_target = 0; + +/* GUC variable to define the minimum age of entries that will be cosidered to + * be evicted in seconds. This variable is shared among various cache + * mechanisms. + */ +int cache_prune_min_age = 600; + /* Cache management header --- pointer is NULL until created */ static CatCacheHeader *CacheHdr = NULL; +/* Timestamp used for any operation on caches. */ +TimestampTz catcacheclock = 0; + static inline HeapTuple SearchCatCacheInternal(CatCache *cache, int nkeys, Datum v1, Datum v2, @@ -491,6 +506,7 @@ CatCacheRemoveCTup(CatCache *cache, CatCTup *ct
Analyze all plans
Hi, I'm working on an extension which analyzes all possible plans generated by the planner. I believe this extension would become useful for benchmarking the planner (e.g. the performance of the estimation and the cost model) and better understanding the cases where the planners would make a suboptimal move. Here are my procedures: 1. Enumerate all the plans 1.1 Create a hook for add_path so that it won't discard the expensive paths from the planner's point of view. 1.2 Collect all the paths from final_rel->pathlist, and turn them into PlannedStmt nodes by patching standard_planner. 1.3 Mark the cheapest path from the planner's point of view. 2. Explain all the plans 2.1 First explain the cheapest plan 2.1.1 If analyzing, collect the execution time and use it to set a timer interrupt. 2.2 Explain the remaining plans 2.2.1 The other plans can be disastrous; the plans may never finish in a reasonable amount of time. If analyzing, the timer interrupt shall stop the executor. 2.2.2 Move on to the next plan Are those procedures reasonable? I'm able to implement all the steps except for 2.2.1. - Attempt 1 Our most common way of handling the timeouts is to kill the process. However, it would terminate the entire explain statement, yet there're still plans to be explained. - Attempt 2 Fork before explain so it would possible to kill the child process without disrupting the explain statement. However, simply allocating shared memory for the QueryDesc would not work (PANIC: failed to re-find shared lock object). To me, this plan is more doable, but it seems there exist other mechanisms I need to be aware, to execute a plan in the child process and report the result in the parent process? What do you think? I will appreciate any feedback. Thank you, Donald Dong
yet another comment typo patch
An assorted set of command typos is fixed in the attached patch. -- Fabien.diff --git a/contrib/pgcrypto/pgp-decrypt.c b/contrib/pgcrypto/pgp-decrypt.c index 7d31e5354b..96c344c30b 100644 --- a/contrib/pgcrypto/pgp-decrypt.c +++ b/contrib/pgcrypto/pgp-decrypt.c @@ -132,7 +132,7 @@ pgp_parse_pkt_hdr(PullFilter *src, uint8 *tag, int *len_p, int allow_ctx) int res; uint8 *p; - /* EOF is normal here, thus we dont use GETBYTE */ + /* EOF is normal here, thus we don't use GETBYTE */ res = pullf_read(src, 1, &p); if (res < 0) return res; diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index d85a83abe9..961838ed93 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -2771,7 +2771,7 @@ estimate_path_cost_size(PlannerInfo *root, run_cost += cpu_tuple_cost * numGroups; run_cost += ptarget->cost.per_tuple * numGroups; - /* Accout for the eval cost of HAVING quals, if any */ + /* Account for the eval cost of HAVING quals, if any */ if (root->parse->havingQual) { QualCost remote_cost; diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c index fb37e2b6c4..b18ae2b3ed 100644 --- a/src/backend/access/gin/ginget.c +++ b/src/backend/access/gin/ginget.c @@ -442,7 +442,7 @@ restartScanEntry: /* * Lock the entry leaf page. This is more coarse-grained than * necessary, because it will conflict with any insertions that - * land on the same leaf page, not only the exacty key we searched + * land on the same leaf page, not only the exact key we searched * for. But locking an individual tuple would require updating * that lock whenever it moves because of insertions or vacuums, * which seems too complicated. diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c index d793efdd9c..5a7206f588 100644 --- a/src/backend/executor/execExprInterp.c +++ b/src/backend/executor/execExprInterp.c @@ -1873,7 +1873,7 @@ CheckOpSlotCompatibility(ExprEvalStep *op, TupleTableSlot *slot) /* * Should probably fixed at some point, but for now it's easier to allow - * buffer and heap tuples to be used interchangably. + * buffer and heap tuples to be used interchangeably. */ if (slot->tts_ops == &TTSOpsBufferHeapTuple && op->d.fetch.kind == &TTSOpsHeapTuple) diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c index d6cfd28ddc..ceea3d6d72 100644 --- a/src/backend/executor/execParallel.c +++ b/src/backend/executor/execParallel.c @@ -1049,7 +1049,7 @@ ExecParallelRetrieveJitInstrumentation(PlanState *planstate, MemoryContextAllocZero(planstate->state->es_query_cxt, sizeof(JitInstrumentation)); combined = planstate->state->es_jit_worker_instr; - /* Accummulate all the workers' instrumentations. */ + /* Accumulate all the workers' instrumentations. */ for (n = 0; n < shared_jit->num_workers; ++n) InstrJitAgg(combined, &shared_jit->jit_instr[n]); diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index 0c0891b33e..e773f20d9f 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -2243,7 +2243,7 @@ check_log_duration(char *msec_str, bool was_logged) /* * Do not log if log_statement_sample_rate = 0. Log a sample if - * log_statement_sample_rate <= 1 and avoid unecessary random() call + * log_statement_sample_rate <= 1 and avoid unnecessary random() call * if log_statement_sample_rate = 1. But don't compute any of this * unless needed. */ diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c index ba28611d8c..f09e3a9aff 100644 --- a/src/backend/utils/cache/inval.c +++ b/src/backend/utils/cache/inval.c @@ -8,7 +8,7 @@ * When a tuple is updated or deleted, our standard visibility rules * consider that it is *still valid* so long as we are in the same command, * ie, until the next CommandCounterIncrement() or transaction commit. - * (See acces/heap/heapam_visibility.c, and note that system catalogs are + * (See access/heap/heapam_visibility.c, and note that system catalogs are * generally scanned under the most current snapshot available, rather than * the transaction snapshot.) At the command boundary, the old tuple stops * being valid and the new version, if any, becomes valid. Therefore, diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c index bca2dc118d..2fb7dd3b8e 100644 --- a/src/backend/utils/cache/typcache.c +++ b/src/backend/utils/cache/typcache.c @@ -151,7 +151,7 @@ typedef struct RecordCacheEntry /* * To deal with non-anonymous record types that are exchanged by backends - * involved in a parallel query, we also need a shared verion of the above. + * involved in a parallel query, we also need a shared version of the above. */ struct SharedRecordTypmodRegistry { diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pg
Re: Analyze all plans
On Wed, Jan 23, 2019 at 9:44 AM Donald Dong wrote: > > 1. Enumerate all the plans > Not sure this is going to work. Because of the total number of possible plans is somewhere around O(n!), if I'm not mistaken, in terms of number of joined relations times the available access methods times the possible join strategies. So enumerating all possible plans stops being practical for even slightly complicated queries. Regards, -- Alex
Re: pg_dump multi VALUES INSERT
Hello David & Surafel, About this v10: Patch applies and compiles cleanly, local & global "make check" ok. A few comments, possibly redundant with some already in the thread. Out of abc-order rows-per-inserts option in getopt struct. help string does not specify the expected argument. I still think that the added rows_per_insert field is useless, ISTM That "int dump_inserts" can be reused, and you could also drop boolean got_rows_per_insert. The feature is not tested anywhere. I still think that there should be a test on empty/small/larger-than-rows-per-insert tables, possibly added to existing TAP-tests. -- Fabien.
inherited primary key misbehavior
Hi, It seems that ATExecDetachPartition misses to mark a child's primary key constraint entry (if any) as local (conislocal=true, coninhcount=0), which causes this: create table p (a int primary key) partition by list (a); create table p2 partition of p for values in (2); alter table p detach partition p2; alter table p2 drop constraint p2_pkey ; ERROR: cannot drop inherited constraint "p2_pkey" of relation "p2" select conname, conislocal, coninhcount from pg_constraint where conrelid = 'p2'::regclass; conname │ conislocal │ coninhcount ─┼┼─ p2_pkey │ f │ 1 (1 row) How about the attached patch? Thanks, Amit From 42cb636ce6acee717fa19f6b69bf40aacc402852 Mon Sep 17 00:00:00 2001 From: amit Date: Wed, 23 Jan 2019 18:33:09 +0900 Subject: [PATCH] Detach primary key constraint when detaching partition --- src/backend/commands/tablecmds.c | 11 +++ src/test/regress/expected/indexing.out | 9 + src/test/regress/sql/indexing.sql | 9 + 3 files changed, 29 insertions(+) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 28a137bb53..897f74119d 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -15094,6 +15094,7 @@ ATExecDetachPartition(Relation rel, RangeVar *name) foreach(cell, indexes) { Oid idxid = lfirst_oid(cell); + Oid constrOid; Relationidx; if (!has_superclass(idxid)) @@ -15106,6 +15107,16 @@ ATExecDetachPartition(Relation rel, RangeVar *name) IndexSetParentIndex(idx, InvalidOid); update_relispartition(classRel, idxid, false); index_close(idx, NoLock); + + /* Detach any associated constraint as well. */ + if (!idx->rd_index->indisprimary) + continue; + constrOid = get_relation_idx_constraint_oid(RelationGetRelid(partRel), + idxid); + if (!OidIsValid(constrOid)) + elog(ERROR, "missing pg_constraint entry of index %u of %u", +idxid, RelationGetRelid(partRel)); + ConstraintSetParentConstraint(constrOid, InvalidOid); } table_close(classRel, RowExclusiveLock); diff --git a/src/test/regress/expected/indexing.out b/src/test/regress/expected/indexing.out index 118f2c78df..ebf86d7bfc 100644 --- a/src/test/regress/expected/indexing.out +++ b/src/test/regress/expected/indexing.out @@ -1414,3 +1414,12 @@ DETAIL: Key (a)=(4) already exists. create unique index on covidxpart (b) include (a); -- should fail ERROR: insufficient columns in UNIQUE constraint definition DETAIL: UNIQUE constraint on table "covidxpart" lacks column "a" which is part of the partition key. +-- check that detaching a partition also detaches the primary key constraint +create table parted_pk_detach_test (a int primary key) partition by list (a); +create table parted_pk_detach_test1 partition of parted_pk_detach_test for values in (1); +alter table parted_pk_detach_test detach partition parted_pk_detach_test1; +alter table parted_pk_detach_test1 drop constraint parted_pk_detach_test1_pkey; +alter table parted_pk_detach_test attach partition parted_pk_detach_test1 for values in (1); +alter table parted_pk_detach_test1 drop constraint parted_pk_detach_test1_pkey;-- should fail +ERROR: cannot drop inherited constraint "parted_pk_detach_test1_pkey" of relation "parted_pk_detach_test1" +drop table parted_pk_detach_test; diff --git a/src/test/regress/sql/indexing.sql b/src/test/regress/sql/indexing.sql index d4a64c18c7..c110f0bdb8 100644 --- a/src/test/regress/sql/indexing.sql +++ b/src/test/regress/sql/indexing.sql @@ -757,3 +757,12 @@ alter table covidxpart attach partition covidxpart4 for values in (4); insert into covidxpart values (4, 1); insert into covidxpart values (4, 1); create unique index on covidxpart (b) include (a); -- should fail + +-- check that detaching a partition also detaches the primary key constraint +create table parted_pk_detach_test (a int primary key) partition by list (a); +create table parted_pk_detach_test1 partition of parted_pk_detach_test for values in (1); +alter table parted_pk_detach_test detach partition parted_pk_detach_test1; +alter table parted_pk_detach_test1 drop constraint parted_pk_detach_test1_pkey; +alter table parted_pk_detach_test attach partition parted_pk_detach_test1 for values in (1); +alter table parted_pk_detach_test1 drop constraint parted_pk_detach_test1_pkey;-- should fail +drop table parted_pk_detach_test; -- 2.11.0
Re: Proposal for Signal Detection Refactoring
attached is a new signal handing patch. Path is corrected and moved. The documentation is sightly streamlined in some places and expanded in others. Feedback requested. -- Best Regards, Chris Travers Head of Database Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin signal_readme_patch.diff Description: Binary data
Almost bug in COPY FROM processing of GB18030 encoded input
Hi, I happened to notice that when CopyReadLineText() calls mblen(), it passes only the first byte of the multi-byte characters. However, pg_gb18030_mblen() looks at the first and the second byte. CopyReadLineText() always passes \0 as the second byte, so pg_gb18030_mblen() will incorrectly report the length of 4-byte encoded characters as 2. It works out fine, though, because the second half of the 4-byte encoded character always looks like another 2-byte encoded character, in GB18030. CopyReadLineText() is looking for delimiter and escape characters and newlines, and only single-byte characters are supported for those, so treating a 4-byte character as two 2-byte characters is harmless. Attached is a patch to explain that in the comments. Grepping for mblen(), I didn't find any other callers that used mblen() like that. - Heikki >From b0a09c240a2b4d1120433828ca052a2542ff362d Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Wed, 23 Jan 2019 13:04:05 +0200 Subject: [PATCH 1/1] Fix comments to that claimed that mblen() only looks at first byte. GB18030's mblen() function looks at the first and the second byte of the multibyte character, to determine its length. copy.c had made the assumption that mblen() only looks at the first byte, but it turns out to work out fine, because of the way the GB18030 encoding works. COPY will see a 4-byte encoded character as two 2-byte encoded characters, which is enough for COPY's purposes. It cannot mix those up with delimiter or escaping characters, because only single-byte characters are supported as delimiters or escape characters. --- src/backend/commands/copy.c | 7 ++- src/backend/utils/mb/wchar.c | 32 +--- 2 files changed, 31 insertions(+), 8 deletions(-) diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index a61a628471..5074ce0daa 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -4073,9 +4073,14 @@ not_end_of_copy: { int mblen; + /* + * It is enough to look at the first byte in all our encodings, to + * get the length. (GB18030 is a bit special, but still works for + * our purposes; see comment in pg_gb18030_mblen()) + */ mblen_str[0] = c; - /* All our encodings only read the first byte to get the length */ mblen = pg_encoding_mblen(cstate->file_encoding, mblen_str); + IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(mblen - 1); IF_NEED_REFILL_AND_EOF_BREAK(mblen - 1); raw_buf_ptr += mblen - 1; diff --git a/src/backend/utils/mb/wchar.c b/src/backend/utils/mb/wchar.c index a5fdda456e..8e5116dfc1 100644 --- a/src/backend/utils/mb/wchar.c +++ b/src/backend/utils/mb/wchar.c @@ -15,16 +15,23 @@ /* - * conversion to pg_wchar is done by "table driven." - * to add an encoding support, define mb2wchar_with_len(), mblen(), dsplen() - * for the particular encoding. Note that if the encoding is only - * supported in the client, you don't need to define - * mb2wchar_with_len() function (SJIS is the case). + * Operations on multi-byte encodings are driven by a table of helper + * functions. + * + * To add an encoding support, define mblen(), dsplen() and verifier() for + * the encoding. For server-encodings, also define mb2wchar() and wchar2mb() + * conversion functions. * * These functions generally assume that their input is validly formed. * The "verifier" functions, further down in the file, have to be more - * paranoid. We expect that mblen() does not need to examine more than - * the first byte of the character to discover the correct length. + * paranoid. + * + * We expect that mblen() does not need to examine more than the first byte + * of the character to discover the correct length. GB18030 is an exception + * to that rule, though, as it also looks at second byte. But even that + * behaves in a predictable way, if you only pass the first byte: it will + * treat 4-byte encoded characters as two 2-byte encoded characters, which is + * good enough for all current uses. * * Note: for the display output of psql to work properly, the return values * of the dsplen functions must conform to the Unicode standard. In particular @@ -1073,6 +1080,17 @@ pg_uhc_dsplen(const unsigned char *s) * GB18030 * Added by Bill Huang , */ + +/* + * Unlike all other mblen() functions, this also looks at the second byte of + * the input. However, if you only pass the first byte of a multi-byte + * string, and \0 as the second byte, this still works in a predictable way: + * a 4-byte character will be reported as two 2-byte characters. That's + * enough for all current uses, as a client-only encoding. It works that + * way, because in any valid 4-byte GB18030-encoded character, the third and + * fourth byte look like a 2-byte encoded character, when looked at + * separately. + */ static int pg_gb18030_mblen(const unsigned char *s) { -- 2.20.1
Re: postgres_fdw: oddity in costing aggregate pushdown paths
(2019/01/04 20:33), Etsuro Fujita wrote: > Here is a new version of the patch. Attached is an updated version of the patch. Changes: * Fixed a stupid bug in the case when use_remote_estimate * Fixed a typo in a comment I added * Modified comments I added a little bit further * Added the commit message If there are no objections, I'll push that. Best regards, Etsuro Fujita >From 3ff65a6f05a330140bf5de94fbcf6ab2a04f040b Mon Sep 17 00:00:00 2001 From: Etsuro Fujita Date: Wed, 23 Jan 2019 20:30:30 +0900 Subject: [PATCH] postgres_fdw: Account for tlist eval costs in estimate_path_cost_size() Previously, estimate_path_cost_size() didn't account for tlist eval costs, except when costing a foreign-grouping path using local statistics, but such costs should be accounted for when costing that path using remote estimates, because some of the tlist expressions might be evaluated locally. Also, such costs should be accounted for in the case of a foreign-scan or foreign-join path, because currently, postgres_fdw evaluates PlaceHolderVars (if any) locally. This also fixes an oversight in my commit f8f6e44676ef38fee7a5bbe4f256a34ea7799ac1. Like that commit, apply this to HEAD only to avoid destabilizing existing plan choices. Author: Etsuro Fujita Discussion: https://postgr.es/m/5BFD3EAD.2060301%40lab.ntt.co.jp --- contrib/postgres_fdw/postgres_fdw.c | 41 +++-- 1 file changed, 33 insertions(+), 8 deletions(-) diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index d85a83abe9..f460fd2ec4 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -2499,6 +2499,9 @@ estimate_path_cost_size(PlannerInfo *root, Cost total_cost; Cost cpu_per_tuple; + /* Make sure the core code has set up the relation's reltarget */ + Assert(foreignrel->reltarget); + /* * If the table or the server is configured to use remote estimates, * connect to the foreign server and execute EXPLAIN to estimate the @@ -2576,6 +2579,24 @@ estimate_path_cost_size(PlannerInfo *root, cost_qual_eval(&local_cost, local_param_join_conds, root); startup_cost += local_cost.startup; total_cost += local_cost.per_tuple * retrieved_rows; + + /* + * Add in tlist eval cost for each output row. In case of an + * aggregate, some of the tlist expressions such as grouping + * exressions will be evaluated remotely, so adjust the costs. + */ + startup_cost += foreignrel->reltarget->cost.startup; + total_cost += foreignrel->reltarget->cost.startup; + total_cost += foreignrel->reltarget->cost.per_tuple * rows; + if (IS_UPPER_REL(foreignrel)) + { + QualCost tlist_cost; + + cost_qual_eval(&tlist_cost, fdw_scan_tlist, root); + startup_cost -= tlist_cost.startup; + total_cost -= tlist_cost.startup; + total_cost -= tlist_cost.per_tuple * rows; + } } else { @@ -2674,19 +2695,19 @@ estimate_path_cost_size(PlannerInfo *root, nrows = clamp_row_est(nrows * fpinfo->joinclause_sel); run_cost += nrows * remote_conds_cost.per_tuple; run_cost += fpinfo->local_conds_cost.per_tuple * retrieved_rows; + + /* Add in tlist eval cost for each output row */ + startup_cost += foreignrel->reltarget->cost.startup; + run_cost += foreignrel->reltarget->cost.per_tuple * rows; } else if (IS_UPPER_REL(foreignrel)) { PgFdwRelationInfo *ofpinfo; - PathTarget *ptarget = foreignrel->reltarget; AggClauseCosts aggcosts; double input_rows; int numGroupCols; double numGroups = 1; - /* Make sure the core code set the pathtarget. */ - Assert(ptarget != NULL); - /* * This cost model is mixture of costing done for sorted and * hashed aggregates in cost_agg(). We are not sure which @@ -2750,26 +2771,22 @@ estimate_path_cost_size(PlannerInfo *root, * Startup cost includes: * 1. Startup cost for underneath input relation * 2. Cost of performing aggregation, per cost_agg() - * 3. Startup cost for PathTarget eval *- */ startup_cost = ofpinfo->rel_startup_cost; startup_cost += aggcosts.transCost.startup; startup_cost += aggcosts.transCost.per_tuple * input_rows; startup_cost += (cpu_operator_cost * numGroupCols) * input_rows; - startup_cost += ptarget->cost.startup; /*- * Run time cost includes: * 1. Run time cost of underneath input relation * 2. Run time cost of performing aggregation, per cost_agg() - * 3. PathTarget eval cost for each output row *- */ run_cost = ofpinfo->rel_total_cost - ofpinfo->rel_startup_cost; run_cost += aggcosts.finalCost * numGroups; run_cost += cpu_tuple_cost * numGroups; - run_cost += ptarget->cost.per_tuple * numGroups; /* Accout for the eval cost of HAVING quals, if any */ if (root->parse->havingQual) @@ -2784,6 +2801,10 @@ estimate_path_cost_size(PlannerInfo *root, startup_cost += fpinfo->local_conds_cost.startup; run_cost +
Re: yet another comment typo patch
On Wed, Jan 23, 2019 at 2:18 PM Fabien COELHO wrote: > > > An assorted set of command typos is fixed in the attached patch. > The patch looks good to me. I will take another pass over it and commit. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: yet another comment typo patch
On 23/01/2019 13:51, Amit Kapila wrote: On Wed, Jan 23, 2019 at 2:18 PM Fabien COELHO wrote: An assorted set of command typos is fixed in the attached patch. The patch looks good to me. I will take another pass over it and commit. I just committed this. Sorry, I didn't see your mail until just afterwards. - Heikki
Re: yet another comment typo patch
On Wed, Jan 23, 2019 at 5:28 PM Heikki Linnakangas wrote: > > On 23/01/2019 13:51, Amit Kapila wrote: > > On Wed, Jan 23, 2019 at 2:18 PM Fabien COELHO wrote: > >> > >> > >> An assorted set of command typos is fixed in the attached patch. > > > > The patch looks good to me. I will take another pass over it and commit. > > I just committed this. Sorry, I didn't see your mail until just afterwards. > No problem, thanks! -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Sun, Jan 20, 2019 at 5:19 AM John Naylor wrote: > > I have a test for in-range and out-of-range for each relation fork. > I think the first two patches (a) removal of dead code in bootstrap and (b) the core patch to avoid creation of FSM file for the small table are good now. I have prepared the patches along with commit message. There is no change except for some changes in README and commit message of the second patch. Kindly let me know what you think about them? I think these two patches can go even without the upgrade patch (during pg_upgrade, conditionally skip transfer of FSMs.) which is still under discussion. However, I am not in a hurry if you or other thinks that upgrade patch must be committed along with the second patch. I think the upgrade patch is generally going on track but might need some more review. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com v02-0001-In-bootstrap-mode-don-t-allow-the-creation-of-files-.patch Description: Binary data v17-0002-Avoid-creation-of-the-free-space-map-for-small-heap-.patch Description: Binary data
Re: postgres on a non-journaling filesystem
On 23/01/2019 01:03, maayan mordehai wrote: hello, I'm Maayan, I'm in a DBA team that uses postgresql. I saw in the documentation on wals: https://www.postgresql.org/docs/10/wal-intro.html In the tip box that, it's better not to use a journaling filesystem. and I wanted to ask how it works? can't we get corruption that we can't recover from? I mean what if postgres in the middle of a write to a wal and there is a crash, and it didn't finish. I'm assuming it will detect it when we will start postgres and write that it was rolled back, am I right? Yep, any half-written transactions will be rolled back. and how does it work in the data level? if some of the 8k block is written but not all of it, and then there is a crash, how postgres deals with it? The first time a block is modified after a checkpoint, a copy of the block is written to the WAL. At crash recovery, the block is restored from the WAL. This mechanism is called "full page writes". The WAL works just like the journal in a journaling filesystem. That's why it's not necessary to have journaling at the filesystem level. - Heikki
Re: pg_dump multi VALUES INSERT
On Wed, 23 Jan 2019 at 22:08, Fabien COELHO wrote: > Out of abc-order rows-per-inserts option in getopt struct. I missed that. Thanks. Fixed in attached. > help string does not specify the expected argument. Also fixed. > I still think that the added rows_per_insert field is useless, ISTM That > "int dump_inserts" can be reused, and you could also drop boolean > got_rows_per_insert. I thought about this and looked into it, but I decided it didn't look like a smart thing to do. The reason is that if --inserts sets dump_inserts to 1 then --rows-per-insert sets it to something else that's likely fine, but if that happens in the opposite order then the --rows-per-insert gets overwritten with 1. The bad news is the order that happens is defined by the order of the command line args. It might be possible to make it work by having --inserts set some other variable, then set dump_inserts to 1 if it's set to 0 and the other variable is set to >= 1... but that requires another variable, which is what you want to avoid... I think it's best to have a variable per argument. I could get rid of the got_rows_per_insert variable, but it would require setting the default value for rows_per_insert in the main() function rather than in InitDumpOptions(). I thought InitDumpOptions() looked like just the place to do this, so went with that option. To make it work without got_rows_per_insert, rows_per_insert would have to be 0 by default and we'd know we saw a --rows-per-insert command line arg by the fact that rows_per_insert was non-zero. Would you rather have it that way? > The feature is not tested anywhere. I still think that there should be a > test on empty/small/larger-than-rows-per-insert tables, possibly added to > existing TAP-tests. I was hoping to get away with not having to do that... mainly because I've no idea how. Please have at it if you know how it's done. FWIW I looked at 002_pg_dump.pl and did add a test, but I was unable to make it pass because I couldn't really figure out how the regex matching is meant to work. It does not seem to be explained very well in the comments and I lack patience for Perl. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services pg_dump-rows-per-insert-option_v11.patch Description: Binary data
Re: postgres on a non-journaling filesystem
Thank you!! On Wed, Jan 23, 2019, 2:20 PM Heikki Linnakangas On 23/01/2019 01:03, maayan mordehai wrote: > > hello, > > > > I'm Maayan, I'm in a DBA team that uses postgresql. > > I saw in the documentation on wals: > > https://www.postgresql.org/docs/10/wal-intro.html > > In the tip box that, it's better not to use a journaling filesystem. > and I > > wanted to ask how it works? > > can't we get corruption that we can't recover from? > > I mean what if postgres in the middle of a write to a wal and there is a > > crash, and it didn't finish. > > I'm assuming it will detect it when we will start postgres and write that > > it was rolled back, am I right? > > Yep, any half-written transactions will be rolled back. > > > and how does it work in the data level? if some of the 8k block is > written > > but not all of it, and then there is a crash, how postgres deals with it? > > The first time a block is modified after a checkpoint, a copy of the > block is written to the WAL. At crash recovery, the block is restored > from the WAL. This mechanism is called "full page writes". > > The WAL works just like the journal in a journaling filesystem. That's > why it's not necessary to have journaling at the filesystem level. > > - Heikki >
Re: Analyze all plans
Oleksandr Shulgin writes: > On Wed, Jan 23, 2019 at 9:44 AM Donald Dong wrote: >> 1. Enumerate all the plans > So enumerating all possible plans stops being practical for even slightly > complicated queries. Yes. You can *not* disable the planner's aggressive pruning of losing paths and subpaths without ending up with something that's completely impractical for anything beyond toy queries. That's just from the standpoint of planner runtime. Adding on the cost of actually creating a finished plan and then running it for long enough to get a reliable runtime for each case would ... well, let's just say you'd be safely dead before you got any interesting answers. regards, tom lane
Re: ArchiveEntry optional arguments refactoring
> On 2019-01-17 09:29:04 -0800, Andres Freund wrote: > On 2019-01-17 10:23:39 -0500, Tom Lane wrote: > > I don't buy the argument that this would move the goalposts in terms > > of how much work it is to add a new argument. You'd still end up > > touching every call site. > > Why? A lot of arguments that'd be potentially added or removed would not > be set by each callsites. > > If you e.g. look at > > you can see that a lot of changes where like > ArchiveEntry(fout, nilCatalogId, createDumpId(), > "pg_largeobject", NULL, NULL, "", > -false, "pg_largeobject", SECTION_PRE_DATA, > +"pg_largeobject", SECTION_PRE_DATA, > loOutQry->data, "", NULL, > NULL, 0, > NULL, NULL); > > i.e. just removing a 'false' argument. In like 70+ callsites. With the > above scheme, we'd instead just have removed a single .withoids = true, > from dumpTableSchema()'s ArchiveEntry() call. To make this discussion a bit more specific, I've created a patch of how it can look like. All the arguments, except Archive, CatalogId and DumpId I've moved into the ArchiveOpts structure. Not all of them could be empty before, but anyway it seems better for consistency and readability. Some of the arguments had empty string as a default value, I haven't changed anything here yet (although this mixture of NULL and "" in ArchiveEntry looks a bit confusing). As Andres mentioned above, for 578b229718e / the WITH OID removal and pg_dump modification from pluggable storage thread, this patch reduces number of changes, related to ArchiveEntry, from 70+ to just one. 0001-ArchiveOpts-structure.patch Description: Binary data
Re: pg_dump multi VALUES INSERT
Hello David, I thought about this and looked into it, but I decided it didn't look like a smart thing to do. The reason is that if --inserts sets dump_inserts to 1 then --rows-per-insert sets it to something else that's likely fine, but if that happens in the opposite order then the --rows-per-insert gets overwritten with 1. You can test before doing that! case X: if (opt.dump_inserts == 0) opt.dump_inserts = 1; // otherwise option is already set The bad news is the order that happens is defined by the order of the command line args. It might be possible to make it work by having --inserts set some other variable, ISTM that it is enough to test whether the variable is zero. then set dump_inserts to 1 if it's set to 0 and the other variable is set to >= 1... but that requires another variable, which is what you want to avoid... I still do not understand the need for another variable. int ninserts = 0; // default is to use copy while (getopt...) { switch (...) { case "--inserts": if (ninserts == 0) ninserts = 1; break; case "--rows-per-insert": ninserts = arg_value; checks... break; ... I think it's best to have a variable per argument. I disagree, because it adds complexity where none is needed: here the new option is an extension of a previous one, thus the previous one just becomes a particular case, so it seems simpler to manage it as the particular case it is rather than a special case, creating the need for checking the consistency and so if two variables are used. I could get rid of the got_rows_per_insert variable, but it would require setting the default value for rows_per_insert in the main() function rather than in InitDumpOptions(). I thought InitDumpOptions() looked like just the place to do this, so went with that option. To make it work without got_rows_per_insert, rows_per_insert would have to be 0 by default and we'd know we saw a --rows-per-insert command line arg by the fact that rows_per_insert was non-zero. Would you rather have it that way? Yep, esp as rows_per_insert & dump_inserts could be the same. The feature is not tested anywhere. I still think that there should be a test on empty/small/larger-than-rows-per-insert tables, possibly added to existing TAP-tests. I was hoping to get away with not having to do that... mainly because I've no idea how. Hmmm. That is another question! Maybe someone will help. -- Fabien.
Re: WIP: Avoid creation of the free space map for small tables
On Wed, Jan 23, 2019 at 7:09 AM Amit Kapila wrote: > I think the first two patches (a) removal of dead code in bootstrap > and (b) the core patch to avoid creation of FSM file for the small > table are good now. I have prepared the patches along with commit > message. There is no change except for some changes in README and > commit message of the second patch. Kindly let me know what you think > about them? Good to hear! The additional language is fine. In "Once the FSM is created for heap", I would just change that to "...for a heap". > I think these two patches can go even without the upgrade patch > (during pg_upgrade, conditionally skip transfer of FSMs.) which is > still under discussion. However, I am not in a hurry if you or other > thinks that upgrade patch must be committed along with the second > patch. I think the upgrade patch is generally going on track but > might need some more review. The pg_upgrade piece is a nice-to-have feature and not essential, so can go in later. Additional review is also welcome. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: postgres on a non-journaling filesystem
On 2019-01-23 14:20:52 +0200, Heikki Linnakangas wrote: > On 23/01/2019 01:03, maayan mordehai wrote: > > hello, > > > > I'm Maayan, I'm in a DBA team that uses postgresql. > > I saw in the documentation on wals: > > https://www.postgresql.org/docs/10/wal-intro.html > > In the tip box that, it's better not to use a journaling filesystem. and I > > wanted to ask how it works? > > can't we get corruption that we can't recover from? > > I mean what if postgres in the middle of a write to a wal and there is a > > crash, and it didn't finish. > > I'm assuming it will detect it when we will start postgres and write that > > it was rolled back, am I right? > > Yep, any half-written transactions will be rolled back. > > > and how does it work in the data level? if some of the 8k block is written > > but not all of it, and then there is a crash, how postgres deals with it? > > The first time a block is modified after a checkpoint, a copy of the block > is written to the WAL. At crash recovery, the block is restored from the > WAL. This mechanism is called "full page writes". > > The WAL works just like the journal in a journaling filesystem. That's why > it's not necessary to have journaling at the filesystem level. But note not having journaling on the FS level often makes OS start after a crash *painfully* slow, because fsck or similar will be run. And that's often necessary for the internal FS consistency. Note that even with journaling enabled, most filesystem by default don't journal data, so you can get those partial writes anyway. Greetings, Andres Freund
Re: Typo: llvm*.cpp files identified as llvm*.c
Hi, On 2019-01-23 12:01:10 +0900, Michael Paquier wrote: > On Tue, Jan 22, 2019 at 05:49:46PM -0800, Andres Freund wrote: > > On 2019-01-23 14:43:15 +1300, Thomas Munro wrote: > >> The function name comments are similar, though less consistent so I'm > >> too lazy to write a script to find one that is actually wrong (with > >> which to trigger Andres's let's-delete-them-all response :-D). > > I am not sure if anybody uses them for anything automatically, still I > find myself from time to time looking at them to remember on which > path the file is located when opened in emacs. I often want to know that too - isn't it easier to just meta-x f and see the directory displayed? Because that doesn't require jumping to the head of the file... Greetings, Andres Freund
Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory
On 10/12/2018 23:37, Dmitry Dolgov wrote: On Thu, Nov 29, 2018 at 6:48 PM Dmitry Dolgov <9erthali...@gmail.com> wrote: On Tue, Oct 2, 2018 at 4:53 AM Michael Paquier wrote: On Mon, Aug 06, 2018 at 06:00:54PM +0900, Yoshimi Ichiyanagi wrote: The libpmem's pmem_map_file() supported 2M/1G(the size of huge page) alignment, since it could reduce the number of page faults. In addition, libpmem's pmem_memcpy_nodrain() is the function to copy data using single instruction, multiple data(SIMD) instructions and NT store instructions(MOVNT). As a result, using these APIs is faster than using old mmap()/memcpy(). Please see the PGCon2018 presentation[1] for the details. [1] https://www.pgcon.org/2018/schedule/attachments/507_PGCon2018_Introducing_PMDK_into_PostgreSQL.pdf So you say that this represents a 3% gain based on the presentation? That may be interesting to dig into it. Could you provide fresher performance numbers? I am moving this patch to the next CF 2018-10 for now, waiting for input from the author. Unfortunately, the patch has some conflicts now, so probably not only fresher performance numbers are necessary, but also a rebased version. I believe the idea behind this patch is quite important (thanks to CMU DG for inspiring lectures), so I decided to put some efforts and rebase it to prevent from rotting. At the same time I have a vague impression that the patch itself suggests quite narrow way of using of PMDK. Thanks. To re-iterate what I said earlier in this thread, I think the next step here is to write a patch that modifies xlog.c to use plain old mmap()/msync() to memory-map the WAL files, to replace the WAL buffers. Let's see what the performance of that is, with or without NVM hardware. I think that might actually make the code simpler. There's a bunch of really hairy code around locking the WAL buffers, which could be made simpler if each backend memory-mapped the WAL segment files independently. One thing to watch out for, is that if you read() a file, and there's an I/O error, you have a chance to ereport() it. If you try to read from a memory-mapped file, and there's an I/O error, the process is killed with SIGBUS. So I think we have to be careful with using memory-mapped I/O for reading files. But for writing WAL files, it seems like a good fit. Once we have a reliable mmap()/msync() implementation running, it should be straightforward to change it to use MAP_SYNC and the special CPU instructions for the flushing. - Heikki
Re: ArchiveEntry optional arguments refactoring
Hi, On 2019-01-23 16:12:15 +0100, Dmitry Dolgov wrote: > To make this discussion a bit more specific, I've created a patch of how it > can > look like. Thanks. > All the arguments, except Archive, CatalogId and DumpId I've moved > into the ArchiveOpts structure. Not all of them could be empty before, but > anyway it seems better for consistency and readability. Some of the arguments > had empty string as a default value, I haven't changed anything here yet > (although this mixture of NULL and "" in ArchiveEntry looks a bit confusing). Probably worth changing at the same time, if we decide to go for it. To me this does look like it'd be more maintainable going forward. Greetings, Andres Freund
Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory
Hi, On 2019-01-23 18:45:42 +0200, Heikki Linnakangas wrote: > To re-iterate what I said earlier in this thread, I think the next step here > is to write a patch that modifies xlog.c to use plain old mmap()/msync() to > memory-map the WAL files, to replace the WAL buffers. Let's see what the > performance of that is, with or without NVM hardware. I think that might > actually make the code simpler. There's a bunch of really hairy code around > locking the WAL buffers, which could be made simpler if each backend > memory-mapped the WAL segment files independently. > > One thing to watch out for, is that if you read() a file, and there's an I/O > error, you have a chance to ereport() it. If you try to read from a > memory-mapped file, and there's an I/O error, the process is killed with > SIGBUS. So I think we have to be careful with using memory-mapped I/O for > reading files. But for writing WAL files, it seems like a good fit. > > Once we have a reliable mmap()/msync() implementation running, it should be > straightforward to change it to use MAP_SYNC and the special CPU > instructions for the flushing. FWIW, I don't think we should go there as the sole implementation. I'm fairly convinced that we're going to need to go to direct-IO in more cases here, and that'll not work well with mmap. I think this'd be a worthwhile experiment, but I'm doubtful it'd end up simplifying our code. Greetings, Andres Freund
Re: Typo: llvm*.cpp files identified as llvm*.c
Andres Freund writes: > On 2019-01-23 12:01:10 +0900, Michael Paquier wrote: >> I am not sure if anybody uses them for anything automatically, still I >> find myself from time to time looking at them to remember on which >> path the file is located when opened in emacs. > I often want to know that too - isn't it easier to just meta-x f and see > the directory displayed? Because that doesn't require jumping to the > head of the file... My own workflow often involves editing copies that are not in the original directory, so that answer doesn't work for me. In general it won't work anytime you are looking at a file out-of-context. regards, tom lane
Re: ArchiveEntry optional arguments refactoring
Hello On 2019-Jan-23, Andres Freund wrote: > > All the arguments, except Archive, CatalogId and DumpId I've moved > > into the ArchiveOpts structure. Not all of them could be empty before, but > > anyway it seems better for consistency and readability. Some of the > > arguments > > had empty string as a default value, I haven't changed anything here yet > > (although this mixture of NULL and "" in ArchiveEntry looks a bit > > confusing). > > Probably worth changing at the same time, if we decide to go for it. > > To me this does look like it'd be more maintainable going forward. It does. How does pgindent behave with it? I'd use ArchiveEntryOpts as struct name; ArchiveOpts sounds wrong. Also, the struct members could use better names -- "defn" for example could perhaps be "createStmt" (to match dropStmt/copyStmt), and expand "desc" to "description". -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: ArchiveEntry optional arguments refactoring
On 1/23/19 10:12 AM, Dmitry Dolgov wrote: > To make this discussion a bit more specific, I've created a patch of how > it can look like. A little bit of vararg-macro action can make such a design look even tidier, cf. [1]. Or are compilers without vararg macros still in the supported mix? -Chap [1] https://github.com/NetBSD/src/blob/trunk/sys/sys/midiio.h#L709 The macros in [1] are not defined to create a function call, but only the argument structure because there might be several functions to pass it to, so a call would be written like func(&SEQ_MK_CHN(NOTEON, ...)). In ArchiveEntry's case, if there's only one function involved, there'd be no reason not to have a macro produce the whole call.
Re: ArchiveEntry optional arguments refactoring
Chapman Flack writes: > Or are compilers without vararg macros still in the supported mix? No. regards, tom lane