Re: fixing more format truncation issues
On Wed, Feb 28, 2018 at 11:14:23PM -0500, Peter Eisentraut wrote: > AFAICT, the issues addressed here either can't really happen without > trying very hard, or would cause harmless output truncation. Still, it > seems good to clean this up properly and not rely on made-up buffer size > guesses that turn out to be wrong, even if we don't want to adopt the > warning options by default. Good idea. > One issue that is of external interest is that I increase BGW_MAXLEN > from 64 to 96. Apparently, the old value would cause the bgw_name of > logical replication workers to be truncated in some circumstances. I > have also seen truncated background worker names with third-party > packages, so giving some more room here would be useful. OK, no complains about that. @@ -89,7 +89,7 @@ static Datum build_pgstattuple_type(pgstattuple_type *stat, FunctionCallInfo fcinfo) { #define NCOLUMNS 9 -#define NCHARS 32 +#define NCHARS 314 So this one is caused by the output of %.2f... Enabling them by default would generate some useless noise if the patch is let as-is as a couple of them are not addressed. Please see the full report attached. Is that intentional? I am using GCC 7.3 here. interval.c: In function ‘AppendSeconds’: interval.c:759:22: warning: ‘%0*d’ directive output between 1 and 2147483648 bytes may exceed minimum required size of 4095 [-Wformat-overflow=] sprintf(cp, "%02d.%0*d", abs(sec), precision, (int) Abs(fsec)); pg_rusage.c:64:5: note: in expansion of macro ‘_’ _("CPU: user: %d.%02d s, system: %d.%02d s, elapsed: %d.%02d s"), ^ pg_rusage.c:63:2: note: ‘snprintf’ output between 51 and 108 bytes into a destination of size 100 snprintf(result, sizeof(result), -- Michael :0:0: note: this is the location of the previous definition In file included from ../../../src/include/postgres.h:46:0, from be-secure-openssl.c:17: be-secure-openssl.c: In function ‘SSLerrmessage’: ../../../src/include/c.h:1009:20: warning: ‘%lu’ directive output may be truncated writing between 1 and 20 bytes into a region of size 17 [-Wformat-truncation=] #define gettext(x) (x) ^ ../../../src/include/c.h:1015:14: note: in expansion of macro ‘gettext’ #define _(x) gettext(x) ^~~ be-secure-openssl.c:1023:35: note: in expansion of macro ‘_’ snprintf(errbuf, sizeof(errbuf), _("SSL error code %lu"), ecode); ^ be-secure-openssl.c:1023:2: note: ‘snprintf’ output between 17 and 36 bytes into a destination of size 32 snprintf(errbuf, sizeof(errbuf), _("SSL error code %lu"), ecode); ^~~~ postgres.c: In function ‘check_log_duration’: postgres.c:2156:36: warning: ‘snprintf’ output may be truncated before the last format character [-Wformat-truncation=] snprintf(msec_str, 32, "%ld.%03d", ^ postgres.c:2156:4: note: ‘snprintf’ output between 6 and 33 bytes into a destination of size 32 snprintf(msec_str, 32, "%ld.%03d", ^~ secs * 1000 + msecs, usecs % 1000); ~~ formatting.c: In function ‘DCH_to_char’: formatting.c:2455:17: warning: ‘%0*d’ directive output between 1 and 2147483648 bytes may exceed minimum required size of 4095 [-Wformat-overflow=] sprintf(s, "%0*d", S_FM(n->suffix) ? 0 : (tm->tm_hour >= 0) ? 2 : 3, ^~~~ formatting.c:2455:16: note: assuming directive output of 11 bytes sprintf(s, "%0*d", S_FM(n->suffix) ? 0 : (tm->tm_hour >= 0) ? 2 : 3, ^~ formatting.c:2463:17: warning: ‘%0*d’ directive output between 1 and 2147483648 bytes may exceed minimum required size of 4095 [-Wformat-overflow=] sprintf(s, "%0*d", S_FM(n->suffix) ? 0 : (tm->tm_hour >= 0) ? 2 : 3, ^~~~ formatting.c:2463:16: note: assuming directive output of 11 bytes sprintf(s, "%0*d", S_FM(n->suffix) ? 0 : (tm->tm_hour >= 0) ? 2 : 3, ^~ formatting.c:2470:17: warning: ‘%0*d’ directive output between 1 and 2147483648 bytes may exceed minimum required size of 4095 [-Wformat-overflow=] sprintf(s, "%0*d", S_FM(n->suffix) ? 0 : (tm->tm_min >= 0) ? 2 : 3, ^~~~ formatting.c:2470:16: note: assuming directive output of 11 bytes sprintf(s, "%0*d", S_FM(n->suffix) ? 0 : (tm->tm_min >= 0) ? 2 : 3, ^~ formatting.c:2477:17: warning: ‘%0*d’ directive output between 1 and 2147483648 bytes may exceed minimum required size of 4095 [-Wformat-overflow=] sprintf(s, "%0*d", S_FM(n->suffix) ? 0 : (tm->tm_sec >= 0) ? 2 : 3, ^~~~ formatting.c:2477:16: note: assuming directive output of 11 bytes sprintf(s, "%0*d", S_FM(n->suffix) ? 0 : (tm->tm_sec >= 0) ? 2 : 3, ^~ formatting.c:2538:19: warning: ‘%0*d’ directive output between 1 and 2147483648 bytes may exceed minimum req
Re: inserts into partitioned table may cause crash
Fujita-san, Thanks for the updates and sorry I couldn't reply sooner. On 2018/03/06 21:26, Etsuro Fujita wrote: > One thing I notice while working on this is this in ExecInsert/CopyFrom, > which I moved to ExecPrepareTupleRouting as-is for the former: > > /* > * If we're capturing transition tuples, we might need to convert from > the > * partition rowtype to parent rowtype. > */ > if (mtstate->mt_transition_capture != NULL) > { > if (resultRelInfo->ri_TrigDesc && > (resultRelInfo->ri_TrigDesc->trig_insert_before_row || > resultRelInfo->ri_TrigDesc->trig_insert_instead_row)) > { > /* > * If there are any BEFORE or INSTEAD triggers on the partition, > * we'll have to be ready to convert their result back to > * tuplestore format. > */ > mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL; > mtstate->mt_transition_capture->tcs_map = > TupConvMapForLeaf(proute, rootRelInfo, leaf_part_index); > } > > Do we need to consider INSTEAD triggers here? The partition is either a > plain table or a foreign table, so I don't think it can have those > triggers. Am I missing something? I think you're right. We don't need to consider INSTEAD OF triggers here if we know we're dealing with a partition which cannot have those. Just to make sure, a word from Thomas would help. On 2018/03/12 12:25, Etsuro Fujita wrote: > (2018/03/09 20:18), Etsuro Fujita wrote: >> Here are updated patches for PG10 and HEAD. >> >> Other changes: >> * Add regression tests based on your test cases shown upthread > > I added a little bit more regression tests and revised comments. Please > find attached an updated patch. Thanks for adding the test. I was concerned that ExecMaterializeSlot will be called twice now -- first in ExecPrepareTupleRouting and then in ExecInsert -- instead of only once in ExecInsert with the existing code, but perhaps it doesn't matter much because most of the work would be done in the 1st call anyway. Sorry that this may be nitpicking that I should've brought up before, but doesn't ExecPrepareTupleRouting do all the work that's needed for routing a tuple and hence isn't the name a bit misleading? Maybe, ExecPerformTupleRouting? Btw, I noticed that the patches place ExecPrepareTupleRouting (both the declaration and the definition) at different relative locations within nodeModifyTable.c in the HEAD branch vs. in the 10 branch. It might be a good idea to bring them to the same relative location to avoid hassle when back-patching relevant patches in the future. I did that in the attached updated version (v4) of the patch for HEAD, which does not make any other changes. Although, the patch for PG-10 is unchanged, I still changed its file name to contain v4. Regards, Amit *** a/src/backend/commands/copy.c --- b/src/backend/commands/copy.c *** *** 2656,2668 CopyFrom(CopyState cstate) if (cstate->transition_capture != NULL) { if (resultRelInfo->ri_TrigDesc && ! (resultRelInfo->ri_TrigDesc->trig_insert_before_row || ! resultRelInfo->ri_TrigDesc->trig_insert_instead_row)) { /* !* If there are any BEFORE or INSTEAD triggers on the !* partition, we'll have to be ready to convert their !* result back to tuplestore format. */ cstate->transition_capture->tcs_original_insert_tuple = NULL; cstate->transition_capture->tcs_map = --- 2656,2667 if (cstate->transition_capture != NULL) { if (resultRelInfo->ri_TrigDesc && ! resultRelInfo->ri_TrigDesc->trig_insert_before_row) { /* !* If there are any BEFORE triggers on the partition, !* we'll have to be ready to convert their result back to !* tuplestore format. */ cstate->transition_capture->tcs_original_insert_tuple = NULL; cstate->transition_capture->tcs_map = *** *** 2803,2814 CopyFrom(CopyState cstate) * tuples inserted by an INSERT command. */ processed++; !
Re: remove pg_class.relhaspkey
On Sat, Mar 10, 2018 at 01:52:56PM +0100, Tomas Vondra wrote: > I agree with this sentiment - I don't think those flags are particularly > helpful for client applications, and would vote +1 for removal. OK, so I can see that we are moving to a consensus here. > For the other flags we would probably need to test what impact would it > have (e.g. table with no indexes, many indexes on other tables, and > something calling get_relation_info a lot). But this patch proposes to > remove only relhaspkey. Yes, you are right here. Peter, would you do that within this commit fest or not? As we are half-way through the commit fest, we could also cut the apple in half and just remove relhaspkey for now as that's a no-brainer. So I would suggest to just do the latter and consider this patch as done. Attached is a rebased patch, there were some conflicts in pg_class.h by the way. -- Michael From 1601b84f9ef2e8d0467b109c0b1d3df435edb59a Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Wed, 14 Mar 2018 14:46:43 +0900 Subject: [PATCH] Remove pg_class.relhaspkey It's not used for anything internally, and it can't be relied on for external uses, so it can just be removed. Patch by Peter Eisentraut. --- doc/src/sgml/catalogs.sgml | 9 - src/backend/catalog/heap.c | 1 - src/backend/catalog/index.c | 32 ++- src/backend/commands/vacuum.c | 10 -- src/backend/rewrite/rewriteDefine.c | 1 - src/include/catalog/pg_class.h | 38 ++--- 6 files changed, 20 insertions(+), 71 deletions(-) diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index a0e6d7062b..30e6741305 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -1848,15 +1848,6 @@ SCRAM-SHA-256$:&l - - relhaspkey - bool - - - True if the table has (or once had) a primary key - - - relhasrules bool diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c index cf36ce4add..3d80ff9e5b 100644 --- a/src/backend/catalog/heap.c +++ b/src/backend/catalog/heap.c @@ -798,7 +798,6 @@ InsertPgClassTuple(Relation pg_class_desc, values[Anum_pg_class_relnatts - 1] = Int16GetDatum(rd_rel->relnatts); values[Anum_pg_class_relchecks - 1] = Int16GetDatum(rd_rel->relchecks); values[Anum_pg_class_relhasoids - 1] = BoolGetDatum(rd_rel->relhasoids); - values[Anum_pg_class_relhaspkey - 1] = BoolGetDatum(rd_rel->relhaspkey); values[Anum_pg_class_relhasrules - 1] = BoolGetDatum(rd_rel->relhasrules); values[Anum_pg_class_relhastriggers - 1] = BoolGetDatum(rd_rel->relhastriggers); values[Anum_pg_class_relrowsecurity - 1] = BoolGetDatum(rd_rel->relrowsecurity); diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index 431bc31969..9e2dd0e729 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -125,7 +125,7 @@ static void UpdateIndexRelation(Oid indexoid, Oid heapoid, bool isvalid, bool isready); static void index_update_stats(Relation rel, - bool hasindex, bool isprimary, + bool hasindex, double reltuples); static void IndexCheckExclusion(Relation heapRelation, Relation indexRelation, @@ -1162,7 +1162,6 @@ index_create(Relation heapRelation, */ index_update_stats(heapRelation, true, - isprimary, -1.0); /* Make the above update visible */ CommandCounterIncrement(); @@ -1364,21 +1363,6 @@ index_constraint_create(Relation heapRelation, InvalidOid, conOid, indexRelationId, true); } - /* - * If needed, mark the table as having a primary key. We assume it can't - * have been so marked already, so no need to clear the flag in the other - * case. - * - * Note: this might better be done by callers. We do it here to avoid - * exposing index_update_stats() globally, but that wouldn't be necessary - * if relhaspkey went away. - */ - if (mark_as_primary) - index_update_stats(heapRelation, - true, - true, - -1.0); - /* * If needed, mark the index as primary and/or deferred in pg_index. * @@ -2041,7 +2025,6 @@ FormIndexDatum(IndexInfo *indexInfo, * to ensure we can do all the necessary work in just one update. * * hasindex: set relhasindex to this value - * isprimary: if true, set relhaspkey true; else no change * reltuples: if >= 0, set reltuples to this value; else no change * * If reltuples >= 0, relpages and relallvisible are also updated (using @@ -2058,7 +2041,6 @@ FormIndexDatum(IndexInfo *indexInfo, static void index_update_stats(Relation rel, bool hasindex, - bool isprimary, double reltuples) { Oid relid = RelationGetRelid(rel); @@ -2088,7 +2070,7 @@ index_update_stats(Relation rel, * It is safe to use a non-transactional update even though our * transaction could still fail before committing. Set
Re: Faster inserts with mostly-monotonically increasing values
On Wed, Mar 14, 2018 at 10:58 AM, Claudio Freire wrote: > > > I'm thinking there could be contention on some lock somewhere. > > Can you attach the benchmark script you're using so I can try to reproduce > it? > I am using a very ad-hoc script.. But here it is.. It assumes a presence of a branch named "btree_rightmost" with the patched code. You will need to make necessary adjustments of course. Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services start_options="-c shared_buffers=16GB -c max_wal_size=64GB -c min_wal_size=16GB -c checkpoint_timeout=60min -c bgwriter_delay=100 -c bgwriter_lru_maxpages=100 -c bgwriter_lru_multiplier=3 -c bgwriter_flush_after=256 -c checkpoint_completion_target=0.9" PGDATA=/data/pgsql echo "INSERT INTO testtab(b) SELECT generate_series(1,10);" > s1.sql for b in master btree_rightmost; do git checkout $b > /dev/null 2>&1 make -s clean > /dev/null 2>&1 ./configure --prefix=$HOME/pg-install/$b make -s -j8 install > /dev/null 2>&1 export PATH=$HOME/pg-install/$b/bin:$PATH for c in 1 2 4 8; do pg_ctl -D $PGDATA -w stop rm -rf $PGDATA initdb -D $PGDATA pg_ctl -D $PGDATA -o "$start_options" -w start -l logfile.$b psql -c "CREATE TABLE testtab (a bigserial UNIQUE, b bigint);" postgres echo "$b $c" >> results.txt num_txns_per_client=$((1024 / $c)) time pgbench -n -l -c $c -j 1 -t $num_txns_per_client -f s1.sql postgres >> results.txt done done
Re: add queryEnv to ExplainOneQuery_hook
Michael Paquier writes: > On Wed, Mar 14, 2018 at 05:36:26PM +1300, Thomas Munro wrote: >> Hmm. I suppose we could have invented a new extended hook with a >> different name and back-patched it so that PG10 would support both. >> Then binary compatibility with existing compiled extensions wouldn't >> be affected AFAICS, but you could use the new extended hook in (say) >> 10.4 or higher. Then for PG11 (or later) we could remove the old hook >> and just keep the new one. I suppose that option is still technically >> open to us, though I'm not sure of the committers' appetite for messing >> with back branches like that. > The interactions between both hooks would not be difficult to define: if > the original hook is not defined, just do not trigger the second. Still > that's too late for v10, so I would rather let it go. New features are > not backpatched. Yeah. There would be no good way for a v10 extension to know whether the additional hook is available --- it would have to know that at compile time, and it can't --- so I don't see that this is practical. Ideally we'd have noticed the problem before v10 got out ... so my own takeaway here is that this is a reminder to extension authors that they ought to test their stuff during beta period. regards, tom lane
Re: planner bug regarding lateral and subquery?
Hi David and Stephen, On 2018/03/14 12:59, David G. Johnston wrote: On Tuesday, March 13, 2018, Tatsuro Yamada mailto:yamada.tats...@lab.ntt.co.jp>> wrote: Hi Hackers, I found a bug, maybe. If it is able to get an explain command result from below query successfully, I think that it means the query is executable. There is a difference between executable, compilable, and able to execute to completion, runtime, on specific data. You've proven the former but as the error indicates specific data causes the complete execution of the query to fail. I can write "select cola / colb from tbl" and as long as there are no zeros in colb the query will complete, but if there is you get a divide by zero runtime error. This is similar. David J. Thank you for the explanation. I understand that it's a runtime error and not able to avoid in parser and planner. :) Thanks, Tatsuro Yamada
Re: add queryEnv to ExplainOneQuery_hook
On Wed, Mar 14, 2018 at 05:36:26PM +1300, Thomas Munro wrote: > Hmm. I suppose we could have invented a new extended hook with a > different name and back-patched it so that PG10 would support both. > Then binary compatibility with existing compiled extensions wouldn't > be affected AFAICS, but you could use the new extended hook in (say) > 10.4 or higher. Then for PG11 (or later) we could remove the old hook > and just keep the new one. I suppose that option is still technically > open to us, though I'm not sure of the committers' appetite for messing > with back branches like that. The interactions between both hooks would not be difficult to define: if the original hook is not defined, just do not trigger the second. Still that's too late for v10, so I would rather let it go. New features are not backpatched. -- Michael signature.asc Description: PGP signature
Re: Faster inserts with mostly-monotonically increasing values
On Wed, Mar 14, 2018 at 1:36 AM, Pavan Deolasee wrote: > > > On Sun, Mar 11, 2018 at 9:18 PM, Claudio Freire > wrote: >> >> On Sun, Mar 11, 2018 at 2:27 AM, Pavan Deolasee >> >> > >> > Yes, I will try that next - it seems like a good idea. So the idea would >> > be: >> > check if the block is still the rightmost block and the insertion-key is >> > greater than the first key in the page. If those conditions are >> > satisfied, >> > then we do a regular binary search within the page to find the correct >> > location. This might add an overhead of binary search when keys are >> > strictly >> > ordered and a single client is inserting the data. If that becomes a >> > concern, we might be able to look for that special case too and optimise >> > for >> > it too. >> >> Yeah, pretty much that's the idea. Beware, if the new item doesn't >> fall in the rightmost place, you still need to check for serialization >> conflicts. > > > So I've been toying with this idea since yesterday and I am quite puzzled > with the results. See the attached patch which compares the insertion key > with the last key inserted by this backend, if the cached block is still the > rightmost block in the tree. I initially only compared with the first key in > the page, but I tried this version because of the strange performance > regression which I still have no answers. > > For a small number of clients, the patched version does better. But as the > number of clients go up, the patched version significantly underperforms > master. I roughly counted the number of times the fastpath is taken and I > noticed that almost 98% inserts take the fastpath. I first thought that the > "firstkey" location in the page might be becoming a hot-spot for concurrent > processes and hence changed that to track the per-backend last offset and > compare against that the next time. But that did not help much. > > +-++---+ > | clients | Master - Avg load time in sec | Patched - Avg load time in sec | > +-++---+ > | 1 | 500.0725203| 347.632079| > +-++---+ > | 2 | 308.4580771| 263.9120163 | > +-++---+ > | 4 | 359.4851779| 514.7187444 | > +-++---+ > | 8 | 476.4062592| 780.2540855 | > +-++---+ > > The perf data does not show anything interesting either. I mean there is a > reduction in CPU time spent in btree related code in the patched version, > but the overall execution time to insert the same number of records go up > significantly. > > Perf (master): > === > > + 72.59% 1.81% postgres postgres[.] ExecInsert > + 47.55% 1.27% postgres postgres[.] > ExecInsertIndexTuples > + 44.24% 0.48% postgres postgres[.] btinsert > - 42.40% 0.87% postgres postgres[.] _bt_doinsert >- 41.52% _bt_doinsert > + 21.14% _bt_search > + 12.57% _bt_insertonpg > + 2.03% _bt_binsrch > 1.60% _bt_mkscankey > 1.20% LWLockAcquire > + 1.03% _bt_freestack > 0.67% LWLockRelease > 0.57% _bt_check_unique >+ 0.87% _start > + 26.03% 0.95% postgres postgres[.] ExecScan > + 21.14% 0.82% postgres postgres[.] _bt_search > + 20.70% 1.31% postgres postgres[.] ExecInterpExpr > + 19.05% 1.14% postgres postgres[.] heap_insert > + 18.84% 1.16% postgres postgres[.] nextval_internal > + 18.08% 0.84% postgres postgres[.] ReadBufferExtended > + 17.24% 2.03% postgres postgres[.] ReadBuffer_common > + 12.57% 0.59% postgres postgres[.] _bt_insertonpg > + 11.12% 1.63% postgres postgres[.] XLogInsert > +9.90% 0.10% postgres postgres[.] _bt_relandgetbuf > +8.97% 1.16% postgres postgres[.] LWLockAcquire > +8.42% 2.03% postgres postgres[.] XLogInsertRecord > +7.26% 1.01% postgres postgres[.] _bt_binsrch > +7.07% 1.20% postgres postgres[.] > RelationGetBufferForTuple > +6.27% 4.92% postgres postgres[.] _bt_compare > +5.97% 0.63% postgres postgres[.] > read_seq_tuple.isra.3 > +5.70% 4.89% postgres postgres[.] > hash_search_with_hash_value > +5.44% 5.44% postgres postgres[.] LWLockAttemptLock > > > Perf (Patched): >
Re: [bug fix] Cascaded standby cannot start after a clean shutdown
On Tue, Feb 27, 2018 at 05:15:29AM +, Tsunakawa, Takayuki wrote: > That was my first thought, and I gave it up. As you say, > XLogReadRecord() could allocate up to 1 GB of memory for a garbage. > That allocation can fail due to memory shortage, which prevents the > recovery from proceeding. Even with that, the resulting patch is sort of ugly... So it seems to me that the conclusion to this thread is that there is no clear winner, and that the problem is so unlikely to happen that it is not worth the performance impact to zero all the WAL pages fetched. Still, the attached has the advantage to not cause the startup process to fail suddendly because of the allocation failure but to fail afterwards when validating the record's contents, which has the disadvantage to allocate temporarily up to 1GB of memory for some of the garbage, but that allocation is short-lived. So that would switch the failure from a FATAL allocation failure taking down Postgres to something which will fetching of new WAL records to be tried again. All in all, that would be a win for the case reported on this thread. Thoughts from anybody? -- Michael diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c index 3a86f3497e..5bf50a2656 100644 --- a/src/backend/access/transam/xlogreader.c +++ b/src/backend/access/transam/xlogreader.c @@ -25,6 +25,10 @@ #include "common/pg_lzcompress.h" #include "replication/origin.h" +#ifndef FRONTEND +#include "utils/memutils.h" +#endif + static bool allocate_recordbuf(XLogReaderState *state, uint32 reclength); static bool ValidXLogPageHeader(XLogReaderState *state, XLogRecPtr recptr, @@ -309,7 +313,14 @@ XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg) * rest of the header after reading it from the next page. The xl_tot_len * check is necessary here to ensure that we enter the "Need to reassemble * record" code path below; otherwise we might fail to apply - * ValidXLogRecordHeader at all. + * ValidXLogRecordHeader at all. Note that in much unlucky circumstances, + * the random data read from a recycled segment could allow the set of + * sanity checks to pass. If the garbage data read in xl_tot_len is + * higher than MaxAllocSize on the backend, then the startup process + * would retry to fetch fresh WAL data. If this is lower than + * MaxAllocSize, then a more-than-necessary memory allocation will + * happen for a short amount of time, until the WAL reader fails at + * validating the next record's data. */ if (targetRecOff <= XLOG_BLCKSZ - SizeOfXLogRecord) { @@ -321,7 +332,11 @@ XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg) else { /* XXX: more validation should be done here */ - if (total_len < SizeOfXLogRecord) + if (total_len < SizeOfXLogRecord +#ifndef FRONTEND + || !AllocSizeIsValid(total_len) +#endif + ) { report_invalid_record(state, "invalid record length at %X/%X: wanted %u, got %u", signature.asc Description: PGP signature
Re: planner bug regarding lateral and subquery?
Hi Stephen, On 2018/03/14 12:36, Stephen Frost wrote: Greetings, * Tatsuro Yamada (yamada.tats...@lab.ntt.co.jp) wrote: I found a bug, maybe. I don't think so... * Result of Select: failed # select subq_1.c0 from test as ref_0, lateral (select subq_0.c0 as c0 from (select ref_0.c2 as c0, (select c1 from test) as c1 from test as ref_1 where (select c3 from test) is NULL) as subq_0 right join test as ref_2 on (subq_0.c1 = ref_2.c1 )) as subq_1; ERROR: more than one row returned by a subquery used as an expression You don't need LATERAL or anything complicated to reach that error, simply do: =*> select * from test where (select c1 from test) is null; ERROR: more than one row returned by a subquery used as an expression The problem there is that the WHERE clause is trying to evaluate an expression, which is "(select c1 from test) is null" and you aren't allowed to have multiple rows returned from that subquery (otherwise, how would we know which row to compare in the expression..?). If you're actually intending to refer to the 'c3' column from the test through the lateral join, you would just refer to it as 'ref_0.c3', as you do in another part of that query. Thanks for your reply. The query is not useful for me and it's just a test query for planner because it is made by sqlsmith. :) My question is that was it possible to handle the error only in executer phase? I expected that it is checked in parsing or planning phase. Thanks, Tatsuro Yamada
Re: Ambigous Plan - Larger Table on Hash Side
On Tue, Mar 13, 2018 at 4:32 PM, Narendra Pradeep U U wrote: > Hi, > Thanks everyone for your suggestions. I would like to add explain > analyze of both the plans so that we can have broader picture. > > I have a work_mem of 1000 MB. > > The Plan which we get regularly with table being analyzed . > > tpch=# explain analyze select b from tab2 left join tab1 on a = b; >QUERY PLAN > > Hash Left Join (cost=945515.68..1071064.34 rows=78264 width=4) (actual > time=9439.410..20445.620 rows=78264 loops=1) >Hash Cond: (tab2.b = tab1.a) >-> Seq Scan on tab2 (cost=0.00..1129.64 rows=78264 width=4) (actual > time=0.006..5.116 rows=78264 loops=1) >-> Hash (cost=442374.30..442374.30 rows=30667630 width=4) (actual > time=9133.593..9133.593 rows=30667722 loops=1) > Buckets: 33554432 Batches: 2 Memory Usage: 801126kB > -> Seq Scan on tab1 (cost=0.00..442374.30 rows=30667630 width=4) > (actual time=0.030..3584.652 rows=30667722 loops=1) > Planning time: 0.055 ms > Execution time: 20472.603 ms > (8 rows) > > > > I reproduced the other plan by not analyzing the smaller table. > > tpch=# explain analyze select b from tab2 left join tab1 on a = b; > QUERY PLAN > -- > Hash Right Join (cost=2102.88..905274.97 rows=78039 width=4) (actual > time=15.331..7590.406 rows=78264 loops=1) >Hash Cond: (tab1.a = tab2.b) >-> Seq Scan on tab1 (cost=0.00..442375.48 rows=30667748 width=4) > (actual time=0.046..2697.480 rows=30667722 loops=1) >-> Hash (cost=1127.39..1127.39 rows=78039 width=4) (actual > time=15.133..15.133 rows=78264 loops=1) > Buckets: 131072 Batches: 1 Memory Usage: 3776kB > -> Seq Scan on tab2 (cost=0.00..1127.39 rows=78039 width=4) > (actual time=0.009..5.516 rows=78264 loops=1) > Planning time: 0.053 ms > Execution time: 7592.688 ms > (8 rows) I am surprised to see the estimates to be very close to the actual values even without analysing the small table. > > > The actual plan seems to be Slower. The smaller table (tab2) has exactly > each row duplicated 8 times and all the rows in larger table (tab2) are > distinct. what may be the exact reason and can we fix this ? After analysing the small table, the first plan is chosen as the cheapest. This means that the plan with smaller table being hashed has cost higher than the plan with larger table being hashed. We need to examine that costing to see what went wrong in costing. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] GUC for cleanup indexes threshold.
On Sat, Mar 10, 2018 at 3:40 AM, Alexander Korotkov wrote: > On Fri, Mar 9, 2018 at 3:12 PM, Masahiko Sawada > wrote: >> >> On Fri, Mar 9, 2018 at 8:43 AM, Alexander Korotkov >> wrote: >> > 2) These parameters are reset during btbulkdelete() and set during >> > btvacuumcleanup(). >> >> Can't we set these parameters even during btbulkdelete()? By keeping >> them up to date, we will able to avoid an unnecessary cleanup vacuums >> even after index bulk-delete. > > > We certainly can update cleanup-related parameters during btbulkdelete(). > However, in this case we would update B-tree meta-page during each > VACUUM cycle. That may cause some overhead for non append-only > workloads. I don't think this overhead would be sensible, because in > non append-only scenarios VACUUM typically writes much more of information. > But I would like this oriented to append-only workload patch to be > as harmless as possible for other workloads. What overhead are you referring here? I guess the overhead is only the calculating the oldest btpo.xact. And I think it would be harmless. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: add queryEnv to ExplainOneQuery_hook
On Wed, Mar 14, 2018 at 2:57 PM, Jim Finnerty wrote: > Passing NULL in place of queryEnv in PG10 causes a failure in installcheck > tests portals and plpgsql. > > For PG10, if you want a both an ExplainOneQuery hook and clean run of > installcheck, is there a better workaround than to (a) pass NULL in place of > queryEnv, and (b) to comment out the portals and plpgsql tests? Hi Jim, I can't think of a good way right now. It's unfortunate that we couldn't back-patch 4d41b2e0 because 10 was out the door; but perhaps you can? Hmm. I suppose we could have invented a new extended hook with a different name and back-patched it so that PG10 would support both. Then binary compatibility with existing compiled extensions wouldn't be affected AFAICS, but you could use the new extended hook in (say) 10.4 or higher. Then for PG11 (or later) we could remove the old hook and just keep the new one. I suppose that option is still technically open to us, though I'm not sure of the committers' appetite for messing with back branches like that. -- Thomas Munro http://www.enterprisedb.com
Re: Faster inserts with mostly-monotonically increasing values
On Sun, Mar 11, 2018 at 9:18 PM, Claudio Freire wrote: > On Sun, Mar 11, 2018 at 2:27 AM, Pavan Deolasee > > > > > Yes, I will try that next - it seems like a good idea. So the idea would > be: > > check if the block is still the rightmost block and the insertion-key is > > greater than the first key in the page. If those conditions are > satisfied, > > then we do a regular binary search within the page to find the correct > > location. This might add an overhead of binary search when keys are > strictly > > ordered and a single client is inserting the data. If that becomes a > > concern, we might be able to look for that special case too and optimise > for > > it too. > > Yeah, pretty much that's the idea. Beware, if the new item doesn't > fall in the rightmost place, you still need to check for serialization > conflicts. > So I've been toying with this idea since yesterday and I am quite puzzled with the results. See the attached patch which compares the insertion key with the last key inserted by this backend, if the cached block is still the rightmost block in the tree. I initially only compared with the first key in the page, but I tried this version because of the strange performance regression which I still have no answers. For a small number of clients, the patched version does better. But as the number of clients go up, the patched version significantly underperforms master. I roughly counted the number of times the fastpath is taken and I noticed that almost 98% inserts take the fastpath. I first thought that the "firstkey" location in the page might be becoming a hot-spot for concurrent processes and hence changed that to track the per-backend last offset and compare against that the next time. But that did not help much. +-++---+ | clients | Master - Avg load time in sec | Patched - Avg load time in sec | +-++---+ | 1 | 500.0725203| 347.632079| +-++---+ | 2 | 308.4580771| 263.9120163 | +-++---+ | 4 | 359.4851779| 514.7187444 | +-++---+ | 8 | 476.4062592| 780.2540855 | +-++---+ The perf data does not show anything interesting either. I mean there is a reduction in CPU time spent in btree related code in the patched version, but the overall execution time to insert the same number of records go up significantly. Perf (master): === + 72.59% 1.81% postgres postgres[.] ExecInsert + 47.55% 1.27% postgres postgres[.] ExecInsertIndexTuples + 44.24% 0.48% postgres postgres[.] btinsert - 42.40% 0.87% postgres postgres[.] _bt_doinsert - 41.52% _bt_doinsert + 21.14% _bt_search + 12.57% _bt_insertonpg + 2.03% _bt_binsrch 1.60% _bt_mkscankey 1.20% LWLockAcquire + 1.03% _bt_freestack 0.67% LWLockRelease 0.57% _bt_check_unique + 0.87% _start + 26.03% 0.95% postgres postgres[.] ExecScan + 21.14% 0.82% postgres postgres[.] _bt_search + 20.70% 1.31% postgres postgres[.] ExecInterpExpr + 19.05% 1.14% postgres postgres[.] heap_insert + 18.84% 1.16% postgres postgres[.] nextval_internal + 18.08% 0.84% postgres postgres[.] ReadBufferExtended + 17.24% 2.03% postgres postgres[.] ReadBuffer_common + 12.57% 0.59% postgres postgres[.] _bt_insertonpg + 11.12% 1.63% postgres postgres[.] XLogInsert +9.90% 0.10% postgres postgres[.] _bt_relandgetbuf +8.97% 1.16% postgres postgres[.] LWLockAcquire +8.42% 2.03% postgres postgres[.] XLogInsertRecord +7.26% 1.01% postgres postgres[.] _bt_binsrch +7.07% 1.20% postgres postgres[.] RelationGetBufferForTuple +6.27% 4.92% postgres postgres[.] _bt_compare +5.97% 0.63% postgres postgres[.] read_seq_tuple.isra.3 +5.70% 4.89% postgres postgres[.] hash_search_with_hash_value +5.44% 5.44% postgres postgres[.] LWLockAttemptLock Perf (Patched): + 69.33% 2.36% postgres postgres[.] ExecInsert + 35.21% 0.64% postgres postgres[.] ExecInsertIndexTuples - 32.14% 0.45% postgres postgres[.] btinser
Re: planner bug regarding lateral and subquery?
On Tuesday, March 13, 2018, Tatsuro Yamada wrote: > Hi Hackers, > > I found a bug, maybe. > If it is able to get an explain command result from below query > successfully, > I think that it means the query is executable. > There is a difference between executable, compilable, and able to execute to completion, runtime, on specific data. You've proven the former but as the error indicates specific data causes the complete execution of the query to fail. I can write "select cola / colb from tbl" and as long as there are no zeros in colb the query will complete, but if there is you get a divide by zero runtime error. This is similar. David J.
Re: planner bug regarding lateral and subquery?
Greetings, * Tatsuro Yamada (yamada.tats...@lab.ntt.co.jp) wrote: > I found a bug, maybe. I don't think so... > * Result of Select: failed > > # select > subq_1.c0 > from > test as ref_0, > lateral (select subq_0.c0 as c0 >from > (select ref_0.c2 as c0, > (select c1 from test) as c1 from test as ref_1 >where (select c3 from test) is NULL) as subq_0 >right join test as ref_2 >on (subq_0.c1 = ref_2.c1 )) as subq_1; > > ERROR: more than one row returned by a subquery used as an expression You don't need LATERAL or anything complicated to reach that error, simply do: =*> select * from test where (select c1 from test) is null; ERROR: more than one row returned by a subquery used as an expression The problem there is that the WHERE clause is trying to evaluate an expression, which is "(select c1 from test) is null" and you aren't allowed to have multiple rows returned from that subquery (otherwise, how would we know which row to compare in the expression..?). If you're actually intending to refer to the 'c3' column from the test through the lateral join, you would just refer to it as 'ref_0.c3', as you do in another part of that query. Thanks! Stephen signature.asc Description: PGP signature
Re:Re: Re: [GSOC 18] Performance Farm Project——Initialization Project
Hi Dave, I am willing to use React to re-create the front-end application. Since I plan to use separate front-end and back-end development methods, this means that there will be no html files in Django applications. Front-end and back-end applications will interact via restful api. I will use react to rewrite some existing html files if needed. Before initializing the react application, I want to learn more about your tendency toward front-end technology. - About React: Which version of React (15.x or 16) and react-router (2.x or 4) do you tend to use? - About Package Manager: Do you tend to use yarn or npm? - About the UI library: I guess you might prefer Bootstrap or Material-UI. At the same time I hope you can help me understand the functional requirements of this project more clearly. Here are some of my thoughts on PerfFarm: - I see this comment in the client folder (In line 15 of the "pgperffarm \ client \ benchmarks \ pgbench.py" file): ''' # TODO allow running custom scripts, not just the default ''' Will PerfFarm have many test items so that the test item search function or even the test item sort function is expected to be provided? - What value will be used to determine if a machine's performance is improving or declining? - After the user logs in to the site, I think it may be necessary to provide a page for the user to browse or manage his own machine. Is there any function that requires users to log in before they can use it? - When I registered a machine on BuildFarm, I did not receive the confirmation email immediately but several days later. In PerfFarm, when a user registers a machine, will the administrator review it before sending a confirmation email? - I see BuildFarm assigning an animal name to each registered machine. Will PerfFarm also have this interesting feature? My postgresql.org community account is: - Username: maleicacid - Email: cs_maleica...@163.com I hope to get the commit permission of the pgperffarm.git repository. I am willing to continue coding and complete the project on the existing code.Looking forward to your reply. Best Regards, Hongyuan Ma (cs_maleica...@163.com) At 2018-03-13 03:03:08, "Dave Page" wrote: Hi On Mon, Mar 12, 2018 at 9:57 AM, Hongyuan Ma wrote: Hi Dave, Thank you for your reminder. This is indeed my negligence. In fact, I have browsed the code in the pgperffarm.git repository, including the client folder and the web folder. However, I found that the web folder has some unfinished html files and does not contain model class files. And the project used Django 1.8 without importing the Django REST Framework (When I talked to Mark about the PerfFarm project, he told me he insisted on using Django 1.11 and considered using the REST framework). So I mistakenly thought that the code in the web folder had been shelved. Nope, not at all. It just wasn't updated to meet the latest PG infrastructure requirements yet (basically just the update to Django 11). The rest should be fine as-is. In the newly initialized Django application, I upgraded its version and assigned the directory to make the project structure clearer. I want to use a separate front-end development approach (The front-end applications will use vue.js for programming.). I hope you can allow me to use it instead of the old one. I am willing to integrate the auth module into this new application as soon as possible. I would much prefer to see jQuery + React, purely because there are likely more PostgreSQL developers (particularly from the pgAdmin team) that know those technologies. It is important to consider long term maintenance as well as ease of initial development with any project. Thanks. Regards, Hongyuan Ma (cs_maleica...@163.com) 在 2018-03-12 02:26:43,"Dave Page" 写道: Hi Maybe I’m missing something (I’ve been offline a lot recently for unavoidable reasons), but the perf farm project already has a Django backend initialised and configured to work with community auth, on community infrastructure. https://git.postgresql.org/gitweb/?p=pgperffarm.git;a=summary On Sunday, March 11, 2018, Hongyuan Ma wrote: Hello, mark I initialized a Django project and imported the Django REST Framework. Its github address is: https://github.com/PGPerfFarm/server-code I created some model classes and also wrote scripts in the dbtools folder to import simulation data for development. I am hesitant to use admin or xadmin as the administrator's backend for the site (I prefer to use xadmin). I also initialized the website's front-end application. Here is its address: https://github.com/PGPerfFarm/front-end-code.git I wrote the header component as shown: I hope this effect can enhance the future user experience:). This application uses vue.js and element-ui, but if you insist on using other js libraries or not using js, please let me know. I will empty this project and then rewrite it as you wish. My next step is to d
Re: [HACKERS] path toward faster partition pruning
On 14 March 2018 at 06:54, Jesper Pedersen wrote: > * "Add more expression types..." -- Are you planning to add more of > these ? Otherwise change the comment Run-time pruning will deal with Param types here. The comment there might be just to remind us all that the function must remain generic enough so we can support more node types later. I don't particularly need the comment for that patch, and I'm not quite sure if I should be removing it in that patch. My imagination is not stretching far enough today to think what we could use beyond Const and Param. I don't feel strongly about the comment either way. This is just to let you know that there are more up and coming things to do in that spot. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
planner bug regarding lateral and subquery?
Hi Hackers, I found a bug, maybe. If it is able to get an explain command result from below query successfully, I think that it means the query is executable. However, I got an error by executing the query without an explain command. I guess that planner makes a wrong plan. I share a reproduction procedure and query results on 3b7ab4380440d7b14ee390fabf39f6d87d7491e2. * Reproduction create table test (c1 integer, c2 integer, c3 text); insert into test values (1, 3, 'a'); insert into test values (2, 4, 'b'); explain (costs off) select subq_1.c0 from test as ref_0, lateral (select subq_0.c0 as c0 from (select ref_0.c2 as c0, (select c1 from test) as c1 from test as ref_1 where (select c3 from test) is NULL) as subq_0 right join test as ref_2 on (subq_0.c1 = ref_2.c1 )) as subq_1; select subq_1.c0 from test as ref_0, lateral (select subq_0.c0 as c0 from (select ref_0.c2 as c0, (select c1 from test) as c1 from test as ref_1 where (select c3 from test) is NULL) as subq_0 right join test as ref_2 on (subq_0.c1 = ref_2.c1 )) as subq_1; * Result of Explain: succeeded # explain (costs off) select subq_1.c0 from test as ref_0, lateral (select subq_0.c0 as c0 from (select ref_0.c2 as c0, (select c1 from test) as c1 from test as ref_1 where (select c3 from test) is NULL) as subq_0 right join test as ref_2 on (subq_0.c1 = ref_2.c1 )) as subq_1; QUERY PLAN --- Nested Loop InitPlan 1 (returns $0) -> Seq Scan on test InitPlan 2 (returns $1) -> Seq Scan on test test_1 -> Seq Scan on test ref_0 -> Nested Loop Left Join Join Filter: ($1 = ref_2.c1) -> Seq Scan on test ref_2 -> Materialize -> Result One-Time Filter: ($0 IS NULL) -> Seq Scan on test ref_1 * Result of Select: failed # select subq_1.c0 from test as ref_0, lateral (select subq_0.c0 as c0 from (select ref_0.c2 as c0, (select c1 from test) as c1 from test as ref_1 where (select c3 from test) is NULL) as subq_0 right join test as ref_2 on (subq_0.c1 = ref_2.c1 )) as subq_1; ERROR: more than one row returned by a subquery used as an expression * The error message came from here ./src/backend/executor/nodeSubplan.c if (found && (subLinkType == EXPR_SUBLINK || subLinkType == MULTIEXPR_SUBLINK || subLinkType == ROWCOMPARE_SUBLINK)) ereport(ERROR, (errcode(ERRCODE_CARDINALITY_VIOLATION), errmsg("more than one row returned by a subquery used as an expression"))); Thanks, Tatsuro Yamada
Re: Fixes for missing schema qualifications
On Tue, Mar 13, 2018 at 07:30:23PM -0700, David G. Johnston wrote: > On Tue, Mar 13, 2018 at 6:50 PM, Michael Paquier > wrote: >> To simplify user's life, we >> could also recommend just to users to issue a ALTER FUNCTION SET >> search_path to fix the problem for all functions, that's easier to >> digest. > > I'm unclear as to what scope you are suggesting the above advice (and > option #1) apply to. All pg_catalog/information_schema functions or all > functions including those created by users? > I am suggesting that to keep simple the release notes of the next minor versions, but still patch information_schema.sql with the changes based on operator(pg_catalog.foo) for all functions internally as as new deployments get the right call. -- Michael signature.asc Description: PGP signature
Re: Changing the autovacuum launcher scheduling; oldest table first algorithm
On Tue, Mar 6, 2018 at 11:27 PM, Alvaro Herrera wrote: > Hello > > I haven't read your respective patches yet, but both these threads > brought to memory a patch I proposed a few years ago that I never > completed: > > https://www.postgresql.org/message-id/flat/20130124215715.GE4528%40alvh.no-ip.org Thank you for sharing the thread. > > In that thread I posted a patch to implement a prioritisation scheme for > autovacuum, based on an equation which was still under discussion when > I abandoned it. Chris Browne proposed a crazy equation to mix in both > XID age and fraction of dead tuples; probably that idea is worth > studying further. I tried to implement that in my patch but I probably > didn't do it correctly (because, as I recall, it failed to work as > expected). Nowadays I think we would also consider the multixact freeze > age, too. > > Maybe that's worth giving a quick look in case some of the ideas there > are useful for the patches now being proposed. Yeah, that's definitely useful for the patches. I'll look at this and will discuss that here. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: Fixes for missing schema qualifications
On Tue, Mar 13, 2018 at 6:50 PM, Michael Paquier wrote: > On Sat, Mar 10, 2018 at 08:36:34AM +, Noah Misch wrote: > > This qualifies some functions, but it leaves plenty of unqualified > operators. > > Yeah, I know that, and i don't have a perfect reply to offer to you. > There are a couple of methods that we could use to tackle that: > 1) For functions, enforce search_path with a SET search_path = > 'pg_catalog' command. However this has a performance impact. > 2) Enforce operators qualification with operator(pg_catalog.foo). This > has no impact on performance, but repeating that all over the place is > rather ugly, particularly for psql's describe.c and tab-completion.c. > 3) Tweak dynamically search_path before running a query: > - Save the existing search_path value by issuing SHOW search_path. > - Use ALWAYS_SECURE_SEARCH_PATH_SQL to enforce the path. > - Set back search_path based on the previous value. > This logic can happen in a dedicated wrapper, but this impacts > performance as it requires extra round trips to the server. > > For information_schema.sql, we are talking about tweaking 12 functions. > So I think that we could live with 2). That seems ideal. > To simplify user's life, we > could also recommend just to users to issue a ALTER FUNCTION SET > search_path to fix the problem for all functions, that's easier to > digest. > I'm unclear as to what scope you are suggesting the above advice (and option #1) apply to. All pg_catalog/information_schema functions or all functions including those created by users? > > For the rest, which basically concerns psql, I have been thinking that > actually using 2) would be the most painful approach, still something > which does not impact the user experience, while 3) is easier to > back-patch by minimizing the code footprint and avoids also any kind of > future problems. > In furtherance of option 2 is there some way to execute a query (at least in a development build) with no search_path in place - thus requiring every object reference to be schema-qualified - and in doing so all such unadorned operators/functions/relations would fail to be found quickly at parse time? Given the number of user-hours spent running describe commands and tab-completion the extra round-time solution is definitely less appealing in terms of long term time expended. David J.
Re: User defined data types in Logical Replication
On Wed, Mar 7, 2018 at 9:19 PM, Masahiko Sawada wrote: > On Wed, Mar 7, 2018 at 2:52 AM, Alvaro Herrera > wrote: >> Masahiko Sawada wrote: >>> On Tue, Mar 6, 2018 at 8:35 AM, Alvaro Herrera >>> wrote: >> >>> > Therefore, I'm inclined to make this function raise a warning, then >>> > return a substitute value (something like "unrecognized type XYZ"). >>> > [...] >>> >>> I agree with you about not hiding the actual reason for the error but >>> if we raise a warning at logicalrep_typmap_gettypname don't we call >>> slot_store_error_callback recursively? >> >> Hmm, now that you mention it, I don't really know. I think it's >> supposed not to happen, since calling ereport() again opens a new >> recursion level, but then maybe errcontext doesn't depend on the >> recursion level ... I haven't checked. This is why the TAP test would >> be handy :-) > > The calling ereport opens a new recursion level. The calling ereport > with error doesn't return to caller but the calling with warning does. > So the recursively calling ereport(WARNING) ends up with exceeding the > errordata stack size. So it seems to me that we can set errcontext in > logicalrep_typmap_gettypname() instead of raising warning or error. > >> >>> Agreed. Will add a TAP test. >> >> Great. This patch waits on that, then. >> > > Okay. I think the most simple and convenient way to reproduce this > issue is to call an elog(LOG) in input function of a user-defined data > type. So I'm thinking to create the test in src/test/module directory. > After more thought, I think since the errors in logicalrep_typmap_gettypname are not relevant with the actual error (i.g. type conversion error) it would not be good idea to show the error message of "could not found data type entry" in errcontext. It might be more appropriate if we return a substitute value ("unrecognized type" or "unrecognized built-in type") without raising neither an error nor a warning. Thoughts? Regarding to regression test, I added a new test module test_subscription that creates a new user-defined data type. In a subscription regression test, using test_subscription we make subscriber call slot_store_error_callback and check if the subscriber can correctly look up both remote and local data type strings. One downside of this regression test is that in a failure case, the duration of the test will be long (up to 180sec) because it has to wait for the polling timeout. Attached an updated patch with a regression test. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center fix_slot_store_error_callback_v13.patch Description: Binary data
Re: PQHost() undefined behavior if connecting string contains both host and hostaddr types
On Fri, Mar 09, 2018 at 04:42:30PM -0500, Peter Eisentraut wrote: > It seems, however, that PGhost() has always been broken for hostaddr > use. In 9.6 (before the multiple-hosts stuff was introduced), when > connecting to "hostaddr=127.0.0.1", PGhost() returns "/tmp". Urgh. > > I think we should decide what PGhost() is supposed to mean when hostaddr > is in use, and then make a fix for that consistently across all versions. Hm. The only inconsistency I can see here is when "host" is not defined but "hostaddr" is, so why not make PQhost return the value of "hostaddr" in this case? -- Michael signature.asc Description: PGP signature
Re: ALTER TABLE ADD COLUMN fast default
> On Mar 14, 2018, at 10:58 AM, David Rowley > wrote: > > On 14 March 2018 at 11:36, Andrew Dunstan > wrote: >> Here are the benchmark results from the v15 patch. Fairly similar to >> previous results. I'm going to run some profiling again to see if I >> can identify any glaring hotspots. I do suspect that the "physical >> tlist" optimization sometimes turns out not to be one. It seems >> perverse to be able to improve a query's performance by dropping a >> column. > > Can you explain what "fdnmiss" is that appears in the results? > > It’s the patched code run against a materialized version of the table, i.e. one with no missing attributes. Cheers Andrew
Re: Fixes for missing schema qualifications
On Wed, Mar 14, 2018 at 10:50:38AM +0900, Michael Paquier wrote: > For the rest, which basically concerns psql, I have been thinking that > actually using 2) would be the most painful approach, still something > which does not impact the user experience, while 3) is easier to > back-patch by minimizing the code footprint and avoids also any kind of > future problems. Actually, there is also the case of pgbench, where there are two items to be careful about: 1) Tweak correctly parameters in built-in benchmark queries. 2) Make pgbench gain an extra option allowing one to run the benchmark on a wanted schema. 1) is essential to do, 2) perhaps less as custom benchmarks can always be designed so the query strings are secured. Most users don't run pgbench on multi-user, untrusted systems as well (right?), giving less value to 2). -- Michael signature.asc Description: PGP signature
Re: neqjoinsel versus "refresh materialized view concurrently"
On Tue, Mar 13, 2018 at 4:57 PM, Thomas Munro wrote: > On Wed, Mar 14, 2018 at 12:29 PM, Tom Lane wrote: > > Thomas Munro writes: > >> There is a fundamental and complicated estimation problem lurking here > >> of course and I'm not sure what to think about that yet. Maybe there > >> is a very simple fix for this particular problem: > > > > Ah, I see you thought of the same hack I did. > > > > I think this may actually be a good fix, and here's the reason: this plan > > is in fact being driven entirely off planner default estimates, because > > we don't have any estimation code that knows what to do with > > "wholerowvar *= wholerowvar". I'm suspicious that we could drop the > > preceding ANALYZE as being a waste of cycles, except maybe it's finding > > out the number of rows for us. In any case, LIMIT 1 is only a good idea > > to the extent that the planner knows what it's doing, and this is an > > example where it demonstrably doesn't and won't any time soon. > > Hmm. I wonder if the ANALYZE might have been needed to avoid the > nested loop plan at some point in history. > > Here's a patch to remove LIMIT 1, which fixes the plan for Jeff's test > scenario and some smaller and larger examples I tried. The query is > already executed with SPI_execute(..., 1) so it'll give up after one > row anyway. The regression test includes a case that causes a row to > be produced here and that's passing ('ERROR: new data for > materialized view "mvtest_mv" contains duplicate rows without any null > columns'). > Is there any good way to make the regression tests fail if the plan reverts to the bad one? The only thing I can think of would be to make the table bigger so the regression tests becomes "noticeably slower", but that is pretty vague and not user friendly to formally pass and just hope it is slow enough for someone to investigate. Cheers, Jeff
Re: Fixes for missing schema qualifications
On Tue, Mar 13, 2018 at 05:19:50PM -0700, David G. Johnston wrote: > On Tue, Mar 13, 2018 at 5:11 PM, Tatsuo Ishii wrote: >> select pg_catalog.count(*) from pg_catalog.pg_namespace where >> pg_catalog.nameeq(nspname, '%s'); >> >> > I'd rather write that: > > select [...] where nspname operator(pg_catalog.=) '%s' > > Introducing undocumented implementation functions to these queries is > undesirable; and besides, indexing and equivalence relies on operators and > not the underlying functions so there would be some risk of performance > issues if the functions were used directly. Using directly the function calls calls under the wood of what an operator does is a potential solution, though I would discard it as it becomes harder for the reader to undertand that this is an operator. Things become even more confusing when dealing with input parameters of different types for simple maths like addition, multiplication or division using several kinds of integers, leading to more complications in maintaining this code in the future. And the operator is careful about doing proper casting itself. -- Michael signature.asc Description: PGP signature
Re: Fixes for missing schema qualifications
On Sat, Mar 10, 2018 at 08:36:34AM +, Noah Misch wrote: > This qualifies some functions, but it leaves plenty of unqualified operators. Yeah, I know that, and i don't have a perfect reply to offer to you. There are a couple of methods that we could use to tackle that: 1) For functions, enforce search_path with a SET search_path = 'pg_catalog' command. However this has a performance impact. 2) Enforce operators qualification with operator(pg_catalog.foo). This has no impact on performance, but repeating that all over the place is rather ugly, particularly for psql's describe.c and tab-completion.c. 3) Tweak dynamically search_path before running a query: - Save the existing search_path value by issuing SHOW search_path. - Use ALWAYS_SECURE_SEARCH_PATH_SQL to enforce the path. - Set back search_path based on the previous value. This logic can happen in a dedicated wrapper, but this impacts performance as it requires extra round trips to the server. For information_schema.sql, we are talking about tweaking 12 functions. So I think that we could live with 2). To simplify user's life, we could also recommend just to users to issue a ALTER FUNCTION SET search_path to fix the problem for all functions, that's easier to digest. For the rest, which basically concerns psql, I have been thinking that actually using 2) would be the most painful approach, still something which does not impact the user experience, while 3) is easier to back-patch by minimizing the code footprint and avoids also any kind of future problems. Thoughts? -- Michael signature.asc Description: PGP signature
Re: ExplainProperty* and units
On 14 March 2018 at 13:27, Andres Freund wrote: > In the attached *POC* patch I've added a 'unit' parameter to the numeric > ExplainProperty* functions, which EXPLAIN_FORMAT_TEXT adds to the > output. This can avoid the above and other similar branches (of which > the JIT patch would add a number). > Comments? This seems like a good change. It would be nice to see us completely get rid of all the appendStringInfo* calls, apart from in the functions which meant to handle the behaviour specific to the explain format. This is a step towards that, so has my vote. Functions like show_hash_info are making it difficult to do more in this area. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: PATCH: Configurable file mode mask
On Tue, Mar 13, 2018 at 12:19:07PM -0400, David Steele wrote: > I'll attach new patches in a reply to [1] once I have made the changes > Tom requested. Cool, thanks for your patience. Looking forward to seeing those. I'll spend time on 0003 at the same time. Let's also put the additional umask calls for pg_rewind and pg_resetwal into a separate patch. -- Michael signature.asc Description: PGP signature
Re: PATCH: Configurable file mode mask
On Tue, Mar 13, 2018 at 01:28:17PM -0400, Tom Lane wrote: > David Steele writes: >> On 3/12/18 3:28 AM, Michael Paquier wrote: >>> In pg_rewind and pg_resetwal, isn't that also a portion which is not >>> necessary without the group access feature? > >> These seem like a good idea to me with or without patch 03. Some of our >> front-end tools (initdb, pg_upgrade) were setting umask and others >> weren't. I think it's more consistent (and safer) if they all do, at >> least if they are writing into PGDATA. > > +1 ... see a926eb84e for an example of how easy it is to screw up if > the process's overall umask is permissive. Okay. A suggestion that I have here would be to split those extra calls into a separate patch. That's a useful self-contained improvement. -- Michael signature.asc Description: PGP signature
Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.
On Tue, 13 Mar 2018 11:29:03 -0400 Tom Lane wrote: > David Gould writes: > > I have attached the patch we are currently using. It applies to 9.6.8. I > > have versions for older releases in 9.4, 9.5, 9.6. I fails to apply to 10, > > and presumably head but I can update it if there is any interest. > > > The patch has three main features: > > - Impose an ordering on the autovacuum workers worklist to avoid > > the need for rechecking statistics to skip already vacuumed tables. > > - Reduce the frequency of statistics refreshes > > - Remove the AutovacuumScheduleLock > > As per the earlier thread, the first aspect of that needs more work to > not get stuck when the worklist has long tasks near the end. I don't > think you're going to get away with ignoring that concern. I agree that the concern needs to be addressed. The other concern is that sorting by oid is fairly arbitrary, the essential part is that there is some determinate order. > Perhaps we could sort the worklist by decreasing table size? That's not > an infallible guide to the amount of time that a worker will need to > spend, but it's sure safer than sorting by OID. That is better. I'll modify it to also prioritize tables that have relpages and reltuples = 0 which usually means the table has no stats at all. Maybe use oid to break ties. > Alternatively, if we decrease the frequency of stats refreshes, how > much do we even need to worry about reordering the worklist? The stats refresh in the current scheme is needed to make sure that two different workers don't vacuum the same table in close succession. It doesn't actually work, and it costs the earth. The patch imposes an ordering to prevent workers trying to claim recently vacuumed tables. This removes the need for the stats refresh. > In any case, I doubt anyone will have any appetite for back-patching > such a change. I'd recommend that you clean up your patch and rebase > to HEAD, then submit it into the September commitfest (either on a > new thread or a continuation of the old #13750 thread, not this one). I had in mind to make a more comprehensive patch to try to make utovacuum more responsive when there are lots of tables, but was a bit shy of the reception. I'll try again with this one (in a new thread) based on the suggestions. Thanks! -dg -- David Gould da...@sonic.net If simplicity worked, the world would be overrun with insects.
Re: Fixes for missing schema qualifications
On Tue, Mar 13, 2018 at 5:26 PM, Tatsuo Ishii wrote: > Next question is, should we update the manual? There are bunch of > places where example queries are shown without schema qualifications. > > I hope that isn't deemed necessary. David J.
Re: ALTER TABLE ADD COLUMN fast default
Hi, On 2018-03-14 09:06:54 +1030, Andrew Dunstan wrote: > I do suspect that the "physical tlist" optimization sometimes turns > out not to be one. It seems perverse to be able to improve a query's > performance by dropping a column. Yea, it's definitely not always a boon. Especially w/ the newer v10+ project code. I suspect we should largely get rid of it in v12, which presumably will see a storage layer abstraction... It'll be even more worthwhile to get rid of it if we manage to avoid deforming columns that aren't needed but are lower than the currently required columns. I still think we should build bitmasks of required columns during planning. Greetings, Andres Freund
Re: ALTER TABLE ADD COLUMN fast default
On 14 March 2018 at 11:36, Andrew Dunstan wrote: > Here are the benchmark results from the v15 patch. Fairly similar to > previous results. I'm going to run some profiling again to see if I > can identify any glaring hotspots. I do suspect that the "physical > tlist" optimization sometimes turns out not to be one. It seems > perverse to be able to improve a query's performance by dropping a > column. Can you explain what "fdnmiss" is that appears in the results? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
ExplainProperty* and units
Hi, while adding EXPLAIN support for JITing (displaying time spent etc), I got annoyed by the amount of duplication required. There's a fair amount of if (es->format == EXPLAIN_FORMAT_TEXT) appendStringInfo(es->str, "Execution time: %.3f ms\n", 1000.0 * totaltime); else ExplainPropertyFloat("Execution Time", 1000.0 * totaltime, which is fairly redundant. In the attached *POC* patch I've added a 'unit' parameter to the numeric ExplainProperty* functions, which EXPLAIN_FORMAT_TEXT adds to the output. This can avoid the above and other similar branches (of which the JIT patch would add a number). The most valid counterargument I see is that in many cases, particularly inside plans, we have more specific output for text mode anyway. Which means there we'll not benefit much. But I think that's a) considerably done due to backward compatibility concerns b) verbosity concerns inside plans, which obviously can be complicated. Therefore I think it's perfectly reasonable to avoid specific branches for data that's only going to be displayed once per plan? We also could add separate ExplainProperty*Unit(...) functions, but I don't really see a need. Comments? Greetings, Andres Freund diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 900fa74e85e..d5d1363d8e1 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -123,8 +123,8 @@ static void ExplainSubPlans(List *plans, List *ancestors, const char *relationship, ExplainState *es); static void ExplainCustomChildren(CustomScanState *css, List *ancestors, ExplainState *es); -static void ExplainProperty(const char *qlabel, const char *value, -bool numeric, ExplainState *es); +static void ExplainProperty(const char *qlabel, const char *unit, +const char *value, bool numeric, ExplainState *es); static void ExplainDummyGroup(const char *objtype, const char *labelname, ExplainState *es); static void ExplainXMLTag(const char *tagname, int flags, ExplainState *es); @@ -549,11 +549,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es, { double plantime = INSTR_TIME_GET_DOUBLE(*planduration); - if (es->format == EXPLAIN_FORMAT_TEXT) - appendStringInfo(es->str, "Planning time: %.3f ms\n", - 1000.0 * plantime); - else - ExplainPropertyFloat("Planning Time", 1000.0 * plantime, 3, es); + ExplainPropertyFloat("Planning Time", "ms", 1000.0 * plantime, 3, es); } /* Print info about runtime of triggers */ @@ -585,14 +581,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es, * the output). By default, ANALYZE sets SUMMARY to true. */ if (es->summary && es->analyze) - { - if (es->format == EXPLAIN_FORMAT_TEXT) - appendStringInfo(es->str, "Execution time: %.3f ms\n", - 1000.0 * totaltime); - else - ExplainPropertyFloat("Execution Time", 1000.0 * totaltime, - 3, es); - } + ExplainPropertyFloat("Execution Time", "ms", 1000.0 * totaltime, 3, + es); ExplainCloseGroup("Query", NULL, true, es); } @@ -764,8 +754,9 @@ report_triggers(ResultRelInfo *rInfo, bool show_relname, ExplainState *es) ExplainPropertyText("Constraint Name", conname, es); ExplainPropertyText("Relation", relname, es); if (es->timing) -ExplainPropertyFloat("Time", 1000.0 * instr->total, 3, es); - ExplainPropertyFloat("Calls", instr->ntuples, 0, es); +ExplainPropertyFloat("Time", "ms", 1000.0 * instr->total, 3, + es); + ExplainPropertyFloat("Calls", NULL, instr->ntuples, 0, es); } if (conname) @@ -1280,10 +1271,10 @@ ExplainNode(PlanState *planstate, List *ancestors, } else { - ExplainPropertyFloat("Startup Cost", plan->startup_cost, 2, es); - ExplainPropertyFloat("Total Cost", plan->total_cost, 2, es); - ExplainPropertyFloat("Plan Rows", plan->plan_rows, 0, es); - ExplainPropertyInteger("Plan Width", plan->plan_width, es); + ExplainPropertyFloat("Startup Cost", NULL, plan->startup_cost, 2, es); + ExplainPropertyFloat("Total Cost", NULL, plan->total_cost, 2, es); + ExplainPropertyFloat("Plan Rows", NULL, plan->plan_rows, 0, es); + ExplainPropertyInteger("Plan Width", NULL, plan->plan_width, es); } } @@ -1323,11 +1314,11 @@ ExplainNode(PlanState *planstate, List *ancestors, { if (es->timing) { -ExplainPropertyFloat("Actual Startup Time", startup_sec, 3, es); -ExplainPropertyFloat("Actual Total Time", total_sec, 3, es); +ExplainPropertyFloat("Actual Startup Time", NULL, startup_sec, 3, es); +ExplainPropertyFloat("Actual Total Time", NULL, total_sec, 3, es); } - ExplainPropertyFloat("Actual Rows", rows, 0, es); - ExplainPropertyFloat("Actual Loops", nloops, 0, es); + ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es); + ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es); } } else if (es->analyze) @@ -1338,11 +1329,11
Re: Fixes for missing schema qualifications
>> select pg_catalog.count(*) from pg_catalog.pg_namespace where >> pg_catalog.nameeq(nspname, '%s'); >> >> > I'd rather write that: > > select [...] where nspname operator(pg_catalog.=) '%s' > > Introducing undocumented implementation functions to these queries is > undesirable; and besides, indexing and equivalence relies on operators and > not the underlying functions so there would be some risk of performance > issues if the functions were used directly. Thanks. Yours looks much better. Next question is, should we update the manual? There are bunch of places where example queries are shown without schema qualifications. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: Fixes for missing schema qualifications
On Tue, Mar 13, 2018 at 5:11 PM, Tatsuo Ishii wrote: > >>> +"select pg_catalog.count(*) " > >>> +"from pg_catalog.pg_namespace > where nspname = '%s'", > >> > >> This qualifies some functions, but it leaves plenty of unqualified > operators. > > Oops. I meant: > > select pg_catalog.count(*) from pg_catalog.pg_namespace where > pg_catalog.nameeq(nspname, '%s'); > > I'd rather write that: select [...] where nspname operator(pg_catalog.=) '%s' Introducing undocumented implementation functions to these queries is undesirable; and besides, indexing and equivalence relies on operators and not the underlying functions so there would be some risk of performance issues if the functions were used directly. David J.
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On 14 March 2018 at 09:25, Robert Haas wrote: > What do you think about the idea of using a projection path as a proxy > path instead of inventing a new method? It seems simple enough to do: > > new_path = (Path *) create_projection_path(root, new_rel, old_path, > old_path->pathtarget); > > ...when we need a proxy path. I'm very open to finding a better way to do this, but does that not just handle the targetlist issue? The proxy path also carries information which allows the translation of Vars in other places in the plan from the old rel into the vars of the new rel. Join conditions, sort clauses etc. Wouldn't a ProjectionPath just need the same additional translation fields that I've bolted onto AppendPath to make it work properly? I've looked at the projection path code but got a bit confused with the dummypp field. I see where it gets set, but not where it gets checked. A comment in create_projection_plan() seems to explain that dummypp is pretty useless since targetlists are modified sometimes, so I'm a bit unsure what the point of it is. Maybe that just goes to show that my understanding of projection paths is not that great. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Fixes for missing schema qualifications
>>> +"select pg_catalog.count(*) " >>> +"from pg_catalog.pg_namespace where >>> nspname = '%s'", >> >> This qualifies some functions, but it leaves plenty of unqualified operators. > > So this should go something like this? > > select pg_catalog.count(*) from pg_catalog.pg_namespace where nameeq(nspname, > '%s'); Oops. I meant: select pg_catalog.count(*) from pg_catalog.pg_namespace where pg_catalog.nameeq(nspname, '%s'); Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: Fixes for missing schema qualifications
>> + "select pg_catalog.count(*) " >> + "from pg_catalog.pg_namespace where >> nspname = '%s'", > > This qualifies some functions, but it leaves plenty of unqualified operators. So this should go something like this? select pg_catalog.count(*) from pg_catalog.pg_namespace where nameeq(nspname, '%s'); Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: SQL/JSON: functions
On 14 Mar 2018 01:54, "Michael Paquier" wrote: On Tue, Mar 13, 2018 at 04:08:01PM +0300, Oleg Bartunov wrote: > The docs are here > https://github.com/obartunov/sqljsondoc/blob/master/README.jsonpath.md > > It's not easy to write docs for SQL/JSON in xml, so I decided to write in more > friendly way. We'll have time to convert it to postgres format. If you aim at getting a feature committed first without its documentation, and getting the docs written after the feature freeze using a dedicated open item or such, Exactly. SQL/JSON is rather complex thing and "converting" the standard to the user level understanding is a separate challenge and I'd like to continue to work on it. It's mostly written, we need to understand how to organize it. this is much acceptable in my opinion and the CF is running short in time. -- Michael
Re: neqjoinsel versus "refresh materialized view concurrently"
On Wed, Mar 14, 2018 at 12:29 PM, Tom Lane wrote: > Thomas Munro writes: >> There is a fundamental and complicated estimation problem lurking here >> of course and I'm not sure what to think about that yet. Maybe there >> is a very simple fix for this particular problem: > > Ah, I see you thought of the same hack I did. > > I think this may actually be a good fix, and here's the reason: this plan > is in fact being driven entirely off planner default estimates, because > we don't have any estimation code that knows what to do with > "wholerowvar *= wholerowvar". I'm suspicious that we could drop the > preceding ANALYZE as being a waste of cycles, except maybe it's finding > out the number of rows for us. In any case, LIMIT 1 is only a good idea > to the extent that the planner knows what it's doing, and this is an > example where it demonstrably doesn't and won't any time soon. Hmm. I wonder if the ANALYZE might have been needed to avoid the nested loop plan at some point in history. Here's a patch to remove LIMIT 1, which fixes the plan for Jeff's test scenario and some smaller and larger examples I tried. The query is already executed with SPI_execute(..., 1) so it'll give up after one row anyway. The regression test includes a case that causes a row to be produced here and that's passing ('ERROR: new data for materialized view "mvtest_mv" contains duplicate rows without any null columns'). -- Thomas Munro http://www.enterprisedb.com 0001-Fix-performance-regression-in-REFRESH-MATERIALIZED-V.patch Description: Binary data
Re: PATCH: Configurable file mode mask
Chapman, * Chapman Flack (c...@anastigmatix.net) wrote: > I'm not advocating the Sisyphean task of having PG incorporate > knowledge of all those possibilities. I'm advocating the conservative > approach: have PG be that well-behaved application that those extended > semantics are generally all designed to play well with, and just not > do stuff that obstructs or tramples over what the admin takes care > to set up. I think we get that you're advocating removing the checks and permissions-setting that the PG tools do, however... > I wonder how complicated it would have to be really. On any system > with a POSIX base, I guess it's possible for PGDATA to have an S_ISVTX > "sticky" bit in the mode, which does have an official significance (but > one that only affects whether non-postgres can rename or unlink things > in the directory, which might be of little practical significance). > Perhaps its meaning could be overloaded with "the admin is handling > the permissions, thank you", and postmaster and various command-line > utilities could see it, and refrain from any gratuitous chmods or > refusals to function. > > Or, if overloading S_ISVTX seems in poor taste, what would be wrong > with simply checking for an empty file PERMISSIONS-ARE-MANAGED in > PGDATA and responding the same way? > > Or, assuming some form of ACL is available, just let the admin > change the owner and group of PGDATA to other than postgres, > grant no access to other, and give rwx to postgres in the ACL? None of these suggestions really sound workable to me. I certainly don't think we should be overloading the meaning of a specific and entirely independent filesystem permission (and really don't even want to imagine what it would require to do something like that on Windows...), dropping an empty file in the directory strikes me as a ripe target for confusion, and we have specific checks to try and make sure we are running as the owner that owns the directory with some pretty good reasons around that (avoiding multiple postmasters running in the same PGDATA directory, specifically). > PG could then reason as follows: * I do not own this directory. > * I am not the group of this directory. * It grants no access to other. > * Yet, I find myself listing and accessing files in it without > difficulty. * The admin has set this up for me in a way I do not > understand. * I will refrain from messing with it. Removing the check that says "we aren't going to try to run PG in this directory if we aren't the owner of it" also doesn't seem like it's necessairly a great plan. > Three ideas off the top of my head. Probably more where they came from. None of them really seem workable though. On the other hand, let's consider what this patch actually ends up doing when POSIX ACLs are involved. This allows ACLs to be used where they weren't before and with appropriate defaults set even to work for new files being created, which wouldn't work before. Yes, if you create a default ACL which says "grant this other user write access to files in the PG data directory" then that won't actually be honored because we will chmod(640) the file and the POSIX ACL system actually works with the user/group privilege system, like so: Create the "data" dir, as initdb would: ➜ ~ mkdir xyz ➜ ~ chmod 750 xyz ➜ ~ ls -ld xyz drwxr-x--- 2 sfrost sfrost 4096 Mar 13 19:07 xyz Set a default ACL to allow the "daemon" user read/write access to files created: ➜ ~ setfacl -dm u:daemon:rw xyz ➜ ~ getfacl xyz # file: xyz # owner: sfrost # group: sfrost user::rwx group::r-x other::--- default:user::rwx default:user:daemon:rw- default:group::r-x default:mask::rwx default:other::--- Create a file the way PG would: ➜ ~ touch xyz/a ➜ ~ chmod 640 xyz/a ➜ ~ getfacl xyz/a # file: xyz/a # owner: sfrost # group: sfrost user::rw- user:daemon:rw- #effective:r-- group::r-x #effective:r-- mask::r-- other::--- The daemon user ends up with read-only access (note the '#effective', which shows that the POSIX ACL system isn't overriding the "regular" ACLs). ... but that's basically what we want. Multiple users having write access to the data directory could be quite bad as you might possibly get two postmasters running against the same data directory at the same time and there's basically no case where that's a good thing to have happen. This change does let users grant out read access to other users/groups, even beyond what's possible using the traditional user/group system, so this opens up a lot more possible options for advanced users, provided they set the defaults appropriately at the directory level (which, presumably, an administrator versed in POSIX ACLs and wishing to use them would know, or would figure out quickly). Yes, perhaps there's some argument to be made that we should have an option where we don't force any privileges, but that can certainly be considered a future capability and what's being implemented here doesn't
JIT compiling with LLVM v12
Hi, I've pushed a revised and rebased version of my JIT patchset. The git tree is at https://git.postgresql.org/git/users/andresfreund/postgres.git in the jit branch https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/jit There's nothing hugely exciting, mostly lots of cleanups. - added some basic EXPLAIN output, displaying JIT options and time spent jitting (see todo below) JIT: Functions: 9 Generation Time: 4.604 Inlining: false Inlining Time: 0.000 Optimization: false Optimization Time: 0.585 Emission Time: 12.858 - Fixed bugs around alignment computations in tuple deforming. Wasn't able to trigger any bad consequences, but it was clearly wrong. - Worked a lot on making code more pgindent safe. There's still some minor layout damage, but it's mostly ok now. For that I had to add a bunch of helpers that make the code shorter - Freshly emitted functions now have proper attributes indicating architecture, floating point behaviour etc. That's what previously prevented the inliner of doing its job without forcing its hand. That yields a bit of a speedup. - reduced size of code a bit by deduplicating code, in particular don't "manually" create signatures for function declarations anymore. Besides deduplicating, this also ensures code generation time errors when function signatures change. - fixed a number of FIXMEs etc - added a lot of comments - portability fixes (OSX, freebsd) Todo: - some build issues with old clang versions pointed out by Thomas Munro - when to take jit_expressions into account (both exec and plan or just latter) - EXPLAIN for queries that are JITed should display units. Starting thread about effort to not duplicate code for that - more explanations of type & function signature syncing - GUC docs (including postgresql.conf.sample) Thanks everyone, particularly Peter in this update, for helping me along! Regards, Andres
Re: [HACKERS] path toward faster partition pruning
By the way, I checked whether patch 0002 (additional tests) had an effect on coverage, and couldn't detect any changes in terms of lines/functions. Were you able to find any bugs in your code thanks to the new tests that would not have been covered by existing tests? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: neqjoinsel versus "refresh materialized view concurrently"
Thomas Munro writes: > There is a fundamental and complicated estimation problem lurking here > of course and I'm not sure what to think about that yet. Maybe there > is a very simple fix for this particular problem: Ah, I see you thought of the same hack I did. I think this may actually be a good fix, and here's the reason: this plan is in fact being driven entirely off planner default estimates, because we don't have any estimation code that knows what to do with "wholerowvar *= wholerowvar". I'm suspicious that we could drop the preceding ANALYZE as being a waste of cycles, except maybe it's finding out the number of rows for us. In any case, LIMIT 1 is only a good idea to the extent that the planner knows what it's doing, and this is an example where it demonstrably doesn't and won't any time soon. regards, tom lane
Re: neqjoinsel versus "refresh materialized view concurrently"
Thomas Munro writes: > This looks like an invisible correlation problem. Yeah --- the planner has no idea that the join rows satisfying newdata.* *= newdata2.* are likely to be exactly the ones not satisfying newdata.ctid <> newdata2.ctid. It's very accidental that we got a good plan before. I've not looked to see where this query is generated, but I wonder if we could make things better by dropping the LIMIT 1 and instead using an executor count parameter to stop early. regards, tom lane
Re: SQL/JSON: functions
On 2018-03-14 07:54:35 +0900, Michael Paquier wrote: > On Tue, Mar 13, 2018 at 04:08:01PM +0300, Oleg Bartunov wrote: > > The docs are here > > https://github.com/obartunov/sqljsondoc/blob/master/README.jsonpath.md > > > > It's not easy to write docs for SQL/JSON in xml, so I decided to write in > > more > > friendly way. We'll have time to convert it to postgres format. > > If you aim at getting a feature committed first without its > documentation, and getting the docs written after the feature freeze > using a dedicated open item or such, this is much acceptable in my > opinion and the CF is running short in time. Given that this patch still uses PG_TRY/CATCH around as wide paths of code as a whole ExecEvalExpr() invocation, basically has gotten no review, I don't see this going anywhere for v11. Greetings, Andres Freund
Re: neqjoinsel versus "refresh materialized view concurrently"
On Wed, Mar 14, 2018 at 11:34 AM, Thomas Munro wrote: > LOG: duration: 26101.612 ms plan: > Query Text: SELECT newdata FROM pg_temp_3.pg_temp_16452 newdata WHERE > newdata IS NOT NULL AND EXISTS (SELECT * FROM pg_temp_3.pg_temp_16452 > newdata2 WHERE newdata2 IS NOT NULL AND newdata2 > OPERATOR(pg_catalog.*=) newdata AND newdata2.ctid > OPERATOR(pg_catalog.<>) newdata.ctid) LIMIT 1 > Limit (cost=0.00..90.52 rows=1 width=28) (actual > time=26101.608..26101.608 rows=0 loops=1) > -> Nested Loop Semi Join (cost=0.00..225220.96 rows=2488 width=28) > (actual time=26101.606..26101.606 rows=0 loops=1) >Join Filter: ((newdata2.ctid <> newdata.ctid) AND (newdata.* *= > newdata2.*)) >Rows Removed by Join Filter: 2500 >-> Seq Scan on pg_temp_16452 newdata (cost=0.00..73.00 > rows=4975 width=34) (actual time=0.022..15.448 rows=5000 loops=1) > Filter: (newdata.* IS NOT NULL) >-> Materialize (cost=0.00..97.88 rows=4975 width=34) (actual > time=0.000..0.500 rows=5000 loops=5000) > -> Seq Scan on pg_temp_16452 newdata2 (cost=0.00..73.00 > rows=4975 width=34) (actual time=0.010..4.033 rows=5000 loops=1) >Filter: (newdata2.* IS NOT NULL) This plan is chosen because we're looking for just one row (LIMIT 1) that has equal data but a different ctid. In this case we're not going to find one, so we'll pay the full enormous cost of the nested loop, but the startup cost is estimated as 0 and we think we are going to find a row straight away. That's because we don't know that it's unlikely for there to be a row with the same columns but a different ctid. There is a fundamental and complicated estimation problem lurking here of course and I'm not sure what to think about that yet. Maybe there is a very simple fix for this particular problem: --- a/src/backend/commands/matview.c +++ b/src/backend/commands/matview.c @@ -660,7 +660,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner, "(SELECT * FROM %s newdata2 WHERE newdata2 IS NOT NULL " "AND newdata2 OPERATOR(pg_catalog.*=) newdata " "AND newdata2.ctid OPERATOR(pg_catalog.<>) " -"newdata.ctid) LIMIT 1", +"newdata.ctid)", tempname, tempname); if (SPI_execute(querybuf.data, false, 1) != SPI_OK_SELECT) elog(ERROR, "SPI_exec failed: %s", querybuf.data); That gets me back to the sort-merge plan, but maybe it's too superficial. -- Thomas Munro http://www.enterprisedb.com
Re: SQL/JSON: functions
On Tue, Mar 13, 2018 at 04:08:01PM +0300, Oleg Bartunov wrote: > The docs are here > https://github.com/obartunov/sqljsondoc/blob/master/README.jsonpath.md > > It's not easy to write docs for SQL/JSON in xml, so I decided to write in more > friendly way. We'll have time to convert it to postgres format. If you aim at getting a feature committed first without its documentation, and getting the docs written after the feature freeze using a dedicated open item or such, this is much acceptable in my opinion and the CF is running short in time. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] path toward faster partition pruning
Amit Langote wrote: > I will continue working on improving the comments / cleaning things up and > post a revised version soon, but until then please look at the attached. I tried to give this a read. It looks pretty neat stuff -- as far as I can tell, it follows Robert's sketch for how this should work. The fact that it's under-commented makes me unable to follow it too closely though (I felt like adding a few "wtf?" comments here and there), so it's taking me a bit to follow things in detail. Please do submit improved versions as you have them. I think you're using an old version of pg_bsd_indent. In particular need of commentary * match_clause_to_partition_key() should indicate which params are output and what do they get * get_steps_using_prefix already has a comment, but it doesn't really explain much. (I'm not sure why you use the term "tuple" here. I mean, mathematically it probably makes sense, but in the overall context it seems just confusing.) -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: ALTER TABLE ADD COLUMN fast default
On Tue, Mar 13, 2018 at 10:58 PM, Andrew Dunstan wrote: > On Tue, Mar 13, 2018 at 2:40 PM, Andrew Dunstan > wrote: > >>> >>> Going by the commitfest app, the patch still does appear to be waiting >>> on Author. Never-the-less, I've made another pass over it and found a >>> few mistakes and a couple of ways to improve things: >>> >> >> working on these. Should have a new patch tomorrow. >> > > > Here is a patch that attends to most of these, except that I haven't > re-indented it. > > Your proposed changes to slot_getmissingattrs() wouldn't work, but I > have rewritten it in a way that avoids the double work you disliked. > > I'll rerun the benchmark tests I posted upthread and let you know the results. > Here are the benchmark results from the v15 patch. Fairly similar to previous results. I'm going to run some profiling again to see if I can identify any glaring hotspots. I do suspect that the "physical tlist" optimization sometimes turns out not to be one. It seems perverse to be able to improve a query's performance by dropping a column. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services results.t100r50k.v15 Description: Binary data results.t100r64.v15 Description: Binary data
Re: [submit code] I develop a tool for pgsql, how can I submit it
2018-03-13 12:19 GMT-03:00 leap : > I develop a tool for pgsql, I want to submit it on pgsql. > how can I submit it? > As Laurenz said a tool doesn't necessarily have to be part of PostgreSQL. Sometimes a lot of people ask for a tool, someone write it and the community decide to maintain it. I'm not sure this is the case for your tool. The pro is that the tool will be maintained by a group of experienced programmers (I'm not saying you can't have this group outside PostgreSQL project. You may have enough interest if people are excited by it). The con about this direction is that the tool development cycle is tied to PostgreSQL (which means 1-year cycle and slow development over time). Since I do a lot of debug it seems very useful for me. If you decide to convince people about the usefulness of your tool, you should: (i) remove some overlap between the tool and core, (ii) cleanup your code to follow the PostgreSQL guidelines, (iii) reuse core functions, (iv) reuse constants instead of redefine them and (v) document everything. Steps (i), (iii) and (iv) may require some code rearrange in PostgreSQL. Even if you decide to continue the development outside PostgreSQL, those steps would improve your code. -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
Re: neqjoinsel versus "refresh materialized view concurrently"
On Wed, Mar 14, 2018 at 8:07 AM, Jeff Janes wrote: > The following commit has caused a devastating performance regression > in concurrent refresh of MV: > > commit 7ca25b7de6aefa5537e0dbe56541bc41c0464f97 > Author: Tom Lane > Date: Wed Nov 29 22:00:29 2017 -0500 > > Fix neqjoinsel's behavior for semi/anti join cases. > > > The below reproduction goes from taking about 1 second to refresh, to taking > an amount of time I don't have the patience to measure. > > drop table foobar2 cascade; > create table foobar2 as select * from generate_series(1,20); > create materialized view foobar3 as select * from foobar2; > create unique index on foobar3 (generate_series ); > analyze foobar3; > refresh materialized view CONCURRENTLY foobar3 ; > > > When I interrupt the refresh, I get a message including this line: > > CONTEXT: SQL statement "SELECT newdata FROM pg_temp_3.pg_temp_16420 newdata > WHERE newdata IS NOT NULL AND EXISTS (SELECT * FROM pg_temp_3.pg_temp_16420 > newdata2 WHERE newdata2 IS NOT NULL AND newdata2 OPERATOR(pg_catalog.*=) > newdata AND newdata2.ctid OPERATOR(pg_catalog.<>) newdata.ctid) LIMIT 1" > > So I makes sense that the commit in question could have caused a change in > the execution plan. Because these are temp tables, I can't easily get my > hands on them to investigate further. Ouch. A quadratic join. This looks like an invisible correlation problem. load 'auto_explain'; set auto_explain.log_min_duration = 0; set auto_explain.log_analyze = true; drop table if exists t cascade; create table t as select generate_series(1, 5000); create materialized view mv as select * from t; create unique index on mv(generate_series); analyze mv; refresh materialized view concurrently mv; HEAD: LOG: duration: 26101.612 ms plan: Query Text: SELECT newdata FROM pg_temp_3.pg_temp_16452 newdata WHERE newdata IS NOT NULL AND EXISTS (SELECT * FROM pg_temp_3.pg_temp_16452 newdata2 WHERE newdata2 IS NOT NULL AND newdata2 OPERATOR(pg_catalog.*=) newdata AND newdata2.ctid OPERATOR(pg_catalog.<>) newdata.ctid) LIMIT 1 Limit (cost=0.00..90.52 rows=1 width=28) (actual time=26101.608..26101.608 rows=0 loops=1) -> Nested Loop Semi Join (cost=0.00..225220.96 rows=2488 width=28) (actual time=26101.606..26101.606 rows=0 loops=1) Join Filter: ((newdata2.ctid <> newdata.ctid) AND (newdata.* *= newdata2.*)) Rows Removed by Join Filter: 2500 -> Seq Scan on pg_temp_16452 newdata (cost=0.00..73.00 rows=4975 width=34) (actual time=0.022..15.448 rows=5000 loops=1) Filter: (newdata.* IS NOT NULL) -> Materialize (cost=0.00..97.88 rows=4975 width=34) (actual time=0.000..0.500 rows=5000 loops=5000) -> Seq Scan on pg_temp_16452 newdata2 (cost=0.00..73.00 rows=4975 width=34) (actual time=0.010..4.033 rows=5000 loops=1) Filter: (newdata2.* IS NOT NULL) And with commit 7ca25b7de6aefa5537e0dbe56541bc41c0464f97 reverted: LOG: duration: 36.358 ms plan: Query Text: SELECT newdata FROM pg_temp_3.pg_temp_16470 newdata WHERE newdata IS NOT NULL AND EXISTS (SELECT * FROM pg_temp_3.pg_temp_16470 newdata2 WHERE newdata2 IS NOT NULL AND newdata2 OPERATOR(pg_catalog.*=) newdata AND newdata2.ctid OPERATOR(pg_catalog.<>) newdata.ctid) LIMIT 1 Limit (cost=756.95..939.50 rows=1 width=28) (actual time=36.354..36.354 rows=0 loops=1) -> Merge Semi Join (cost=756.95..2947.51 rows=12 width=28) (actual time=36.352..36.352 rows=0 loops=1) Merge Cond: (newdata.* *= newdata2.*) Join Filter: (newdata2.ctid <> newdata.ctid) Rows Removed by Join Filter: 5000 -> Sort (cost=378.48..390.91 rows=4975 width=34) (actual time=9.622..10.300 rows=5000 loops=1) Sort Key: newdata.* USING *< Sort Method: quicksort Memory: 622kB -> Seq Scan on pg_temp_16470 newdata (cost=0.00..73.00 rows=4975 width=34) (actual time=0.021..4.986 rows=5000 loops=1) Filter: (newdata.* IS NOT NULL) -> Sort (cost=378.48..390.91 rows=4975 width=34) (actual time=7.378..8.010 rows=5000 loops=1) Sort Key: newdata2.* USING *< Sort Method: quicksort Memory: 622kB -> Seq Scan on pg_temp_16470 newdata2 (cost=0.00..73.00 rows=4975 width=34) (actual time=0.017..3.034 rows=5000 loops=1) Filter: (newdata2.* IS NOT NULL) -- Thomas Munro http://www.enterprisedb.com
Re: JIT compiling with LLVM v11
Hi, On 2018-03-14 10:32:40 +1300, Thomas Munro wrote: > I decided to try this on a CentOS 7.2 box. It has LLVM 3.9 in the > 'epel' package repo, but unfortunately it only has clang 3.4. That's a bit odd, given llvm and clang really live in the same repo... > clang: error: unknown argument: '-fexcess-precision=standard' > clang: error: unknown argument: '-flto=thin' > > Ok, so I hacked src/Makefile.global.in to remove -flto=thin. I think I can get actually rid of that entirely. > It looks > like -fexcess-precision-standard is coming from a configure test that > was run against ${CC}, not against ${CLANG}, so I hacked the generated > src/Makefile.global to remove that too, just to see if I could get > past that. Yea, I'd hoped we could avoid duplicating all the configure tests, but maybe not :(. > Then I could build successfully and make check passed. I did see one warning: > > In file included from execExpr.c:39: > ../../../src/include/jit/jit.h:36:3: warning: redefinition of typedef > 'JitProviderCallbacks' is a C11 feature [-Wtypedef-redefinition] > } JitProviderCallbacks; > ^ > ../../../src/include/jit/jit.h:22:37: note: previous definition is here > typedef struct JitProviderCallbacks JitProviderCallbacks; > ^ Yep. Removed the second superflous / redundant typedef. Will push a heavily rebased version in a bit, will include fix for this. Greetings, Andres Freund
Re: JIT compiling with LLVM v11
On Thu, Mar 1, 2018 at 9:02 PM, Andres Freund wrote: > Biggest changes: > - LLVM 3.9 - master are now supported. This includes a good chunk of > work by Pierre Ducroquet. I decided to try this on a CentOS 7.2 box. It has LLVM 3.9 in the 'epel' package repo, but unfortunately it only has clang 3.4. I suppose it's important to make this work for RHEL7 using only dependencies that can be met by the vendor package repos? Maybe someone who knows more about CentOS/RHE could tell me if I'm mistaken and there is a way to get a more modern clang from a reputable repo that our packages could depend on, though I release that clang is only a build dependency, not a runtime one. I'm unsure how that constrains things. clang: "clang version 3.4.2 (tags/RELEASE_34/dot2-final)" gcc and g++: "gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC)" llvm: "3.9.1" First problem: clang: error: unknown argument: '-fexcess-precision=standard' clang: error: unknown argument: '-flto=thin' Ok, so I hacked src/Makefile.global.in to remove -flto=thin. It looks like -fexcess-precision-standard is coming from a configure test that was run against ${CC}, not against ${CLANG}, so I hacked the generated src/Makefile.global to remove that too, just to see if I could get past that. I don't know if there was another way to control floating point precision in ancient clang before they adopted the GCC-compatible flag, but it would seem slightly fishy to have .o files and .bc files compiled with different floating point settings because then you could get different answers depending on whether your expression is JITted. Then I could build successfully and make check passed. I did see one warning: In file included from execExpr.c:39: ../../../src/include/jit/jit.h:36:3: warning: redefinition of typedef 'JitProviderCallbacks' is a C11 feature [-Wtypedef-redefinition] } JitProviderCallbacks; ^ ../../../src/include/jit/jit.h:22:37: note: previous definition is here typedef struct JitProviderCallbacks JitProviderCallbacks; ^ That's a legit complaint. -- Thomas Munro http://www.enterprisedb.com
Re: parallel append vs. simple UNION ALL
On Tue, Mar 13, 2018 at 12:35 AM, Ashutosh Bapat wrote: > It looks like it was not changed in all the places. make falied. I > have fixed all the instances of these two functions in the attached > patchset (only 0003 changes). Please check. Oops. Thanks. I'm going to go ahead and commit 0001 here. Any more thoughts on the rest? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: PATCH: Configurable file mode mask
On 03/13/2018 02:47 PM, Tom Lane wrote: > Well, to be blunt, what we target is POSIX-compatible systems. If > you're telling us to worry about non-POSIX filesystem semantics, > and your name is not Microsoft, it's going to be a hard sell. > We have enough to do just keeping up with that scope of target > systems. So, how many POSIX-compatible systems are available today (if any), where you can actually safely assume that there are no additional security/access-control-related considerations in effect beyond three user bits/three group bits/three other bits, and not be wrong? I'm not advocating the Sisyphean task of having PG incorporate knowledge of all those possibilities. I'm advocating the conservative approach: have PG be that well-behaved application that those extended semantics are generally all designed to play well with, and just not do stuff that obstructs or tramples over what the admin takes care to set up. On 03/13/2018 03:44 PM, Stephen Frost wrote: > * Chapman Flack (c...@anastigmatix.net) wrote: >> So my suggestion boils down to PG having at least an option, somehow, >> to be well-behaved in that sense. > > I'm afraid that we haven't got any great answer to that "somehow". I > was hoping you might have some other ideas beyond command-line > switches which could leave the system in an inconsistent state a bit > too easily. I wonder how complicated it would have to be really. On any system with a POSIX base, I guess it's possible for PGDATA to have an S_ISVTX "sticky" bit in the mode, which does have an official significance (but one that only affects whether non-postgres can rename or unlink things in the directory, which might be of little practical significance). Perhaps its meaning could be overloaded with "the admin is handling the permissions, thank you", and postmaster and various command-line utilities could see it, and refrain from any gratuitous chmods or refusals to function. Or, if overloading S_ISVTX seems in poor taste, what would be wrong with simply checking for an empty file PERMISSIONS-ARE-MANAGED in PGDATA and responding the same way? Or, assuming some form of ACL is available, just let the admin change the owner and group of PGDATA to other than postgres, grant no access to other, and give rwx to postgres in the ACL? PG could then reason as follows: * I do not own this directory. * I am not the group of this directory. * It grants no access to other. * Yet, I find myself listing and accessing files in it without difficulty. * The admin has set this up for me in a way I do not understand. * I will refrain from messing with it. Three ideas off the top of my head. Probably more where they came from. :) -Chap
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Mon, Feb 19, 2018 at 4:02 AM, David Rowley wrote: > On 19 February 2018 at 18:01, David Rowley > wrote: >> On 19 February 2018 at 15:11, Tomas Vondra >> wrote: >>> and perhaps we should do s/isproxy/is_proxy/ which seems like the usual >>> naming for boolean variables. >> >> You're right. I'll change that and post an updated patch. I'll also >> reprocess your email again and try to improve some comments to make >> the intent of the code more clear. > > I've made a few changes to the patch. "isproxy" is now "is_proxy". > I've made the comments a bit more verbose at the top of the definition > of struct AppendPath. Also shuffled some comments around in allpaths.c > > I've attached both a delta patch with just these changes, and also a > completely new patch based on a recent master. While I was working on http://postgr.es/m/ca+tgmoakt5gmahbpwgqrr2nadfomaonoxyowhrdvfgws34t...@mail.gmail.com I noticed that a projection path is very close to being usable as a proxy path, because we've already got code to strip out unnecessary proxy paths at plan generation time. I noticed that there were a few problems with that which I wrote patches to fix (see 0001, 0002 attached to that email) but they seem to be only minor issues. What do you think about the idea of using a projection path as a proxy path instead of inventing a new method? It seems simple enough to do: new_path = (Path *) create_projection_path(root, new_rel, old_path, old_path->pathtarget); ...when we need a proxy path. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: PATCH: Configurable file mode mask
Greetings Chapman, * Chapman Flack (c...@anastigmatix.net) wrote: > On 03/13/2018 01:50 PM, Stephen Frost wrote: > > PG will stat PGDATA and conclude that the system is saying that group > > access has been granted on PGDATA and will do the same for subsequent > > files created later on. ... The only issue that remains is that PG > > doesn't understand or work with POSIX ACLs or Linux capabilities, > > but that's not anything new. > > What's new is that it is now pretending even more extravagantly to > understand what it doesn't understand. Where it would previously draw > in incorrect conclusion and refuse to start, which is annoying but > not very difficult to work around if need be, it would now draw the > same incorrect conclusion and then actively go about making the real > world embody the incorrect conclusion, contrary to the admin's efforts. I have to say that I disagree about it being "easy to work-around" PG refusing to start. Also, we're not pretending any more or less, we're sticking to exactly what we do understand- which is the 80's unix permission system, as you put it. The options that I see here are to stick with the user/group system and our naive understanding of it, to go whole-hog and try to completely understand everything (with lots of #ifdef's, as discussed), or to completely remove all checks- but we don't have a clear proposal for that and it strikes me as at least unlikely to go over well anyway, given all of the discussion here about trying to simply change the one check we have. > >> umask inherited by the postmaster. A --permission-transparency option > >> would simply use open and mkdir in the traditional ways, allowing > >> the chosen umask, ACL defaults, SELinux contexts, etc., to do their > >> thing, and would avoid then stepping on the results with explicit > >> chmods (and of course skip the I-refuse-to-start-because-I- > >> misunderstand-your-setup check). > >> ... > > > > I'm a fan of this idea in general, but it's unclear how that > > --permission-transparency option would work in practice. Are you > > suggesting that it be a compile-time flag, which would certainly work > > but also then have to be debated among packagers as to the right setting > > and if there's any need to be concerned about users misunderstanding it, > > or a flag for each program, > > I was thinking of a command-line option ... > > > which hardly seems ideal as some invokations > > of programs might forget to specify that, leading to inconsistent > > permissions, or something else..? > > ... but I see how that gets complicated with the various other command- > line utilities included. Indeed. > > .. we'd have to build in complete > > support for POSIX ACLs and Linux capabilities if we go down a route > > I'm wary of an argument that we can't do better except by building > in complete support for POSIX ACLs, and capabilities (and NFSv4 > ACLs, and SELinux, and AppArmor, and #ifdef and #ifdef and #ifdef). I don't think I meant to imply that we can't do better, I was just trying to enumate what I saw the different options being. > It seems to me that, in most cases, the designers of these sorts of > extensions to old traditional Unix behavior take great pains to design > them such that they can still usefully function in the presence of > programs that "don't pay attention to or understand or use" them, as > long as those programs are in some sense well-behaved, and not going > out of their way with active steps that insist on or impose permissions > that aren't appropriate under the non-understood circumstances. > > So my suggestion boils down to PG having at least an option, somehow, > to be well-behaved in that sense. I'm afraid that we haven't got any great answer to that "somehow". I was hoping you might have some other ideas beyond command-line switches which could leave the system in an inconsistent state a bit too easily. Unless there's a better way then the approach proposed by Tom (originally) and implemented by David seems like the way to go and at least an improvement over the current situation. Thanks! Stephen signature.asc Description: PGP signature
Re: [submit code] I develop a tool for pgsql, how can I submit it
leap wrote: > I develop a tool for pgsql, I want to submit it on pgsql. > how can I submit it? > > address: https://github.com/leapking/pgcheck I would leave it on Github and add some documentation. A lot of great tools for PostgreSQL are not part of the core distribution; that doesn't mean that they are unloved. Yours, Laurenz Albe
Re: [HACKERS] taking stdbool.h into use
On 3/1/18 23:34, Michael Paquier wrote: > Indeed, this is wrong. Peter, why did you switch suddendly this patch > as ready for committer? The patch is waiting for your input as you > mentioned that the GIN portion of this patch series is not completely > baked yet. So I have switched the patch in this state. After more digging, there are more problems with having a bool that is not 1 byte. For example, pg_control has a bool field, so with a different bool size, pg_control would be laid out differently. That would require changing all the mentions of bool to bool8 where the end up on disk somehow, as I had already done for the system catalog structures, but we don't know all the other places that would be affected. So I'm going back to my proposal from December, to just use stdbool.h when sizeof(bool) == 1, and add a static assertion to prevent other configurations. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From daedc3ea6d044dde18cc0c630085a7f55e764794 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Tue, 19 Dec 2017 13:54:05 -0500 Subject: [PATCH v6 1/2] Add static assertions about size of GinTernaryValue vs bool We need these to be the same because some code casts between pointers to them. --- src/backend/utils/adt/tsginidx.c | 4 +++- src/include/access/gin.h | 4 ++-- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/src/backend/utils/adt/tsginidx.c b/src/backend/utils/adt/tsginidx.c index de59e6417e..00e32b2570 100644 --- a/src/backend/utils/adt/tsginidx.c +++ b/src/backend/utils/adt/tsginidx.c @@ -309,7 +309,9 @@ gin_tsquery_consistent(PG_FUNCTION_ARGS) * query. */ gcv.first_item = GETQUERY(query); - gcv.check = check; + StaticAssertStmt(sizeof(GinTernaryValue) == sizeof(bool), +"sizes of GinTernaryValue and bool are not equal"); + gcv.check = (GinTernaryValue *) check; gcv.map_item_operand = (int *) (extra_data[0]); gcv.need_recheck = recheck; diff --git a/src/include/access/gin.h b/src/include/access/gin.h index 0acdb88241..3d8a130b69 100644 --- a/src/include/access/gin.h +++ b/src/include/access/gin.h @@ -51,8 +51,8 @@ typedef struct GinStatsData /* * A ternary value used by tri-consistent functions. * - * For convenience, this is compatible with booleans. A boolean can be - * safely cast to a GinTernaryValue. + * This must be of the same size as a bool because some code will cast a + * pointer to a bool to a pointer to a GinTernaryValue. */ typedef char GinTernaryValue; base-commit: 17bb62501787c56e0518e61db13a523d47afd724 -- 2.16.2 >From 67f32f4195632fd1068bf3ad0bf93c12a865c7d4 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Tue, 19 Dec 2017 14:24:43 -0500 Subject: [PATCH v6 2/2] Use stdbool.h if suitable Using the standard bool type provided by C allows some recent compilers and debuggers to give better diagnostics. Also, some extension code and third-party headers are increasingly pulling in stdbool.h, so it's probably saner if everyone uses the same definition. But PostgreSQL code is not prepared to handle bool of a size other than 1, so we keep our own old definition if we encounter a stdbool.h with a bool of a different size. (Apparently, that includes Power CPUs and some very old compilers that declared bool to be an enum.) We have static assertions in critical places that check that bool is of the right size. --- configure | 213 - configure.in | 7 ++ src/include/c.h| 14 ++- src/include/pg_config.h.in | 9 ++ src/pl/plperl/plperl.h | 10 +-- 5 files changed, 205 insertions(+), 48 deletions(-) diff --git a/configure b/configure index 3943711283..c670becd16 100755 --- a/configure +++ b/configure @@ -1999,116 +1999,116 @@ $as_echo "$ac_res" >&6; } } # ac_fn_c_check_func -# ac_fn_c_check_member LINENO AGGR MEMBER VAR INCLUDES -# -# Tries to find if the field MEMBER exists in type AGGR, after including -# INCLUDES, setting cache variable VAR accordingly. -ac_fn_c_check_member () +# ac_fn_c_check_type LINENO TYPE VAR INCLUDES +# --- +# Tests whether TYPE exists after having included INCLUDES, setting cache +# variable VAR accordingly. +ac_fn_c_check_type () { as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $2.$3" >&5 -$as_echo_n "checking for $2.$3... " >&6; } -if eval \${$4+:} false; then : + { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $2" >&5 +$as_echo_n "checking for $2... " >&6; } +if eval \${$3+:} false; then : $as_echo_n "(cached) " >&6 else + eval "$3=no" cat conf
Re: JIT compiling with LLVM v11
Andres Freund writes: > On 2018-03-13 14:36:44 -0400, Robert Haas wrote: >> I realize that EXPLAIN (JIT OFF) may sound like it's intended to >> disable JIT itself > Yea, that's what I'm concerned about. >> , but I think it's pretty clear that EXPLAIN (BUFFERS OFF) does not >> disable the use of actual buffers, only the display of buffer-related >> information. > Hm. FWIW, I agree with Robert's preference for just JIT here. The "info" bit isn't conveying anything. And we've never had any EXPLAIN options that actually change the behavior of the explained command, only ones that change the amount of info displayed. I don't see why we'd consider JIT an exception to that. regards, tom lane
neqjoinsel versus "refresh materialized view concurrently"
The following commit has caused a devastating performance regression in concurrent refresh of MV: commit 7ca25b7de6aefa5537e0dbe56541bc41c0464f97 Author: Tom Lane Date: Wed Nov 29 22:00:29 2017 -0500 Fix neqjoinsel's behavior for semi/anti join cases. The below reproduction goes from taking about 1 second to refresh, to taking an amount of time I don't have the patience to measure. drop table foobar2 cascade; create table foobar2 as select * from generate_series(1,20); create materialized view foobar3 as select * from foobar2; create unique index on foobar3 (generate_series ); analyze foobar3; refresh materialized view CONCURRENTLY foobar3 ; When I interrupt the refresh, I get a message including this line: CONTEXT: SQL statement "SELECT newdata FROM pg_temp_3.pg_temp_16420 newdata WHERE newdata IS NOT NULL AND EXISTS (SELECT * FROM pg_temp_3.pg_temp_16420 newdata2 WHERE newdata2 IS NOT NULL AND newdata2 OPERATOR(pg_catalog.*=) newdata AND newdata2.ctid OPERATOR(pg_catalog.<>) newdata.ctid) LIMIT 1" So I makes sense that the commit in question could have caused a change in the execution plan. Because these are temp tables, I can't easily get my hands on them to investigate further. Cheers, Jeff
Re: Fix error in ECPG while connection handling
> Thanks for spotting and fixing. I will push the patch as soon as I'm > online again. > Thanks Michael for taking care of this. Regards, Jeevan Ladhe.
Re: PATCH: Configurable file mode mask
Chapman Flack writes: > On 03/13/2018 01:50 PM, Stephen Frost wrote: >> I'll point out that PG does run on quite a few other OS's beyond Linux. > I'll second that, as I think it is making my argument. When I can > supply three or four examples of semantic subtleties that undermine > PG's assumptions under Linux alone, the picture when broadened to > those quite-a-few other platforms as well certainly doesn't become > simpler! Well, to be blunt, what we target is POSIX-compatible systems. If you're telling us to worry about non-POSIX filesystem semantics, and your name is not Microsoft, it's going to be a hard sell. We have enough to do just keeping up with that scope of target systems. regards, tom lane
Re: proposal: schema variables
2018-03-13 10:54 GMT+01:00 Pavel Luzanov : > Pavel Stehule wrote > > Now, there is one important question - storage - Postgres stores all > > objects to files - only memory storage is not designed yet. This is part, > > where I need a help. > > O, I do not feel confident in such questions. > May be some ideas you can get from extension with similar functionality: > https://github.com/postgrespro/pg_variables Unfortunately not - it doesn't implement this functionality Regards Pavel > > > - > Pavel Luzanov > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company > > > > -- > Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers- > f1928748.html > >
Re: JIT compiling with LLVM v11
Hi, On 2018-03-13 14:36:44 -0400, Robert Haas wrote: > On Mon, Mar 12, 2018 at 5:04 PM, Andres Freund wrote: > > Currently a handful of explain outputs in the regression tests change > > output when compiled with JITing. Therefore I'm thinking of adding > > JITINFO or such option, which can be set to false for those tests? > > Can we spell that JIT or at least JIT_INFO? The latter works, I don't have a strong opinion on that. For now I've just tied it to COSTS off. > I realize that EXPLAIN (JIT OFF) may sound like it's intended to > disable JIT itself Yea, that's what I'm concerned about. > , but I think it's pretty clear that EXPLAIN (BUFFERS OFF) does not > disable the use of actual buffers, only the display of buffer-related > information. Hm. Greetings, Andres Freund
Re: JIT compiling with LLVM v11
On Mon, Mar 12, 2018 at 5:04 PM, Andres Freund wrote: > Currently a handful of explain outputs in the regression tests change > output when compiled with JITing. Therefore I'm thinking of adding > JITINFO or such option, which can be set to false for those tests? Can we spell that JIT or at least JIT_INFO? I realize that EXPLAIN (JIT OFF) may sound like it's intended to disable JIT itself, but I think it's pretty clear that EXPLAIN (BUFFERS OFF) does not disable the use of actual buffers, only the display of buffer-related information. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] why not parallel seq scan for slow functions
On Wed, Feb 14, 2018 at 5:37 AM, Amit Kapila wrote: > Your concern is valid, but isn't the same problem exists in another > approach as well, because in that also we can call > adjust_paths_for_srfs after generating gather path which means that we > might lose some opportunity to reduce the relative cost of parallel > paths due to tlists having SRFs. Also, a similar problem can happen > in create_order_paths for the cases as described in the example > above. You're right. I think cleaning all of this up for v11 is too much to consider, but please tell me your opinion of the attached patch set. Here, instead of the ripping the problematic logic out of apply_projection_to_path, what I've done is just removed several of the callers to apply_projection_to_path. I think that the work of removing other callers to that function could be postponed to a future release, but we'd still get some benefit now, and this shows the direction I have in mind. I'm going to explain what the patches do one by one, but out of order, because I backed into the need for the earlier patches as a result of troubleshooting the later ones in the series. Note that the patches need to be applied in order even though I'm explaining them out of order. 0003 introduces a new upper relation to represent the result of applying the scan/join target to the topmost scan/join relation. I'll explain further down why this seems to be needed. Since each path must belong to only one relation, this involves using create_projection_path() for the non-partial pathlist as we already do for the partial pathlist, rather than apply_projection_to_path(). This is probably marginally more expensive but I'm hoping it doesn't matter. (However, I have not tested.) Each non-partial path in the topmost scan/join rel produces a corresponding path in the new upper rel. The same is done for partial paths if the scan/join target is parallel-safe; otherwise we can't. 0004 causes the main part of the planner to skip calling generate_gather_paths() from the topmost scan/join rel. This logic is mostly stolen from your patch. If the scan/join target is NOT parallel-safe, we perform generate_gather_paths() on the topmost scan/join rel. If the scan/join target IS parallel-safe, we do it on the post-projection rel introduced by 0003 instead. This is the patch that actually fixes Jeff's original complaint. 0005 goes a bit further and removes a bunch of logic from create_ordered_paths(). The removed logic tries to satisfy the query's ordering requirement via cheapest_partial_path + Sort + Gather Merge. Instead, it adds an additional path to the new upper rel added in 0003. This again avoids a call to apply_projection_to_path() which could cause projection to be pushed down after costing has already been fixed. Therefore, it gains the same advantages as 0004 for queries that would this sort of plan. Currently, this loses the ability to set limit_tuples for the Sort path; that doesn't look too hard to fix but I haven't done it. If we decide to proceed with this overall approach I'll see about getting it sorted out. Unfortunately, when I initially tried this approach, a number of things broke due to the fact that create_projection_path() was not exactly equivalent to apply_projection_to_path(). This initially surprised me, because create_projection_plan() contains logic to eliminate the Result node that is very similar to the logic in apply_projection_to_path(). If apply_projection_path() sees that the subordinate node is projection-capable, it applies the revised target list to the child; if not, it adds a Result node. create_projection_plan() does the same thing. However, create_projection_plan() can lose the physical tlist optimization for the subordinate node; it forces an exact projection even if the parent node doesn't require this. 0001 fixes this, so that we get the same plans with create_projection_path() that we would with apply_projection_to_path(). I think this patch is a good idea regardless of what we decide to do about the rest of this, because it can potentially avoid losing the physical-tlist optimization in any place where create_projection_path() is used. It turns out that 0001 doesn't manage to preserve the physical-tlist optimization when the projection path is attached to an upper relation. 0002 fixes this. If we decide to go forward with this approach, it may makes sense to merge some of these when actually committing, but I found it useful to separate them for development and testing purposes, and for clarity about what was getting changed at each stage. Please let me know your thoughts. Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company 0005-Remove-explicit-path-construction-logic-in-create_or.patch Description: Binary data 0004-Postpone-generate_gather_paths-for-topmost-scan-join.patch Description: Binary data 0003-Add-new-upper-rel-to-represent-projecting-toplevel-
Re: JIT compiling with LLVM v11
On 2018-03-13 10:25:49 -0400, Peter Eisentraut wrote: > On 3/12/18 17:04, Andres Freund wrote: > > │ JIT: > > │ > > │ Functions: 4 > > │ > > │ Inlining: false > > │ > > │ Inlining Time: 0.000 > > │ > > │ Optimization: false > > │ > > │ Optimization Time: 5.023 > > │ > > │ Emission Time: 34.987 > > │ > > The time quantities need some units. > > > │ Execution time: 46.277 ms > > │ > > like this :) Yea, I know. I was planning to start a thread about that. explain.c is littered with code like if (es->format == EXPLAIN_FORMAT_TEXT) appendStringInfo(es->str, "Planning time: %.3f ms\n", 1000.0 * plantime); else ExplainPropertyFloat("Planning Time", 1000.0 * plantime, 3, es); which, to me, is bonkers. I think we should add add 'const char *unit' parameter to at least ExplainProperty{Float,Integer,Long}? Or a *Unit version of them doing so, allowing a bit more gradual change? Greetings, Andres Freund
Re: TupleTableSlot abstraction
On 2018-03-13 15:18:34 -0300, Alvaro Herrera wrote: > Andres Freund wrote: > > Hi, > > > > I've recently been discussing with Robert how to abstract > > TupleTableSlots in the light of upcoming developments around abstracting > > storage away. Besides that aspect I think we also need to make them > > more abstract to later deal with vectorized excution, but I'm more fuzzy > > on the details there. > Hi, is this patch proposed for pg11? I wish, but I don't see it happening unless I get a time compression device from somewhere :( Greetings, Andres Freund
Re: PATCH: Configurable file mode mask
On 03/13/2018 01:50 PM, Stephen Frost wrote: > Yes, that's true, but PG has never paid attention to POSIX ACLs or Linux > capabilities. Changing it to do so is quite a bit beyond the scope... I think we're largely in agreement here, as my aim was not to advocate that PG should work harder to understand the subtleties of every system it could be installed on, but rather that it should work less hard at pretending to understand them when it doesn't, and thus avoid obstructing the admin, who presumably does. > I'll point out that PG does run on quite a few other OS's beyond Linux. I'll second that, as I think it is making my argument. When I can supply three or four examples of semantic subtleties that undermine PG's assumptions under Linux alone, the picture when broadened to those quite-a-few other platforms as well certainly doesn't become simpler! >> but then sooner or later it will still end up making assumptions >> that aren't true under, say, SELinux, so there's another #ifdef, >> and where does it end? > > I agree with this general concern. :) That's probably where it became clear that I'm not advocating an add-#ifdefs-for-everything approach. > PG will stat PGDATA and conclude that the system is saying that group > access has been granted on PGDATA and will do the same for subsequent > files created later on. ... The only issue that remains is that PG > doesn't understand or work with POSIX ACLs or Linux capabilities, > but that's not anything new. What's new is that it is now pretending even more extravagantly to understand what it doesn't understand. Where it would previously draw in incorrect conclusion and refuse to start, which is annoying but not very difficult to work around if need be, it would now draw the same incorrect conclusion and then actively go about making the real world embody the incorrect conclusion, contrary to the admin's efforts. >> umask inherited by the postmaster. A --permission-transparency option >> would simply use open and mkdir in the traditional ways, allowing >> the chosen umask, ACL defaults, SELinux contexts, etc., to do their >> thing, and would avoid then stepping on the results with explicit >> chmods (and of course skip the I-refuse-to-start-because-I- >> misunderstand-your-setup check). >> ... > > I'm a fan of this idea in general, but it's unclear how that > --permission-transparency option would work in practice. Are you > suggesting that it be a compile-time flag, which would certainly work > but also then have to be debated among packagers as to the right setting > and if there's any need to be concerned about users misunderstanding it, > or a flag for each program, I was thinking of a command-line option ... > which hardly seems ideal as some invokations > of programs might forget to specify that, leading to inconsistent > permissions, or something else..? ... but I see how that gets complicated with the various other command- line utilities included. > .. we'd have to build in complete > support for POSIX ACLs and Linux capabilities if we go down a route I'm wary of an argument that we can't do better except by building in complete support for POSIX ACLs, and capabilities (and NFSv4 ACLs, and SELinux, and AppArmor, and #ifdef and #ifdef and #ifdef). It seems to me that, in most cases, the designers of these sorts of extensions to old traditional Unix behavior take great pains to design them such that they can still usefully function in the presence of programs that "don't pay attention to or understand or use" them, as long as those programs are in some sense well-behaved, and not going out of their way with active steps that insist on or impose permissions that aren't appropriate under the non-understood circumstances. So my suggestion boils down to PG having at least an option, somehow, to be well-behaved in that sense. -Chap
Re: INOUT parameters in procedures
2018-03-13 14:14 GMT+01:00 Peter Eisentraut < peter.eisentr...@2ndquadrant.com>: > On 3/8/18 02:25, Pavel Stehule wrote: > > It looks like some error in this concept. The rules for enabling > > overwriting procedures should modified, so this collision should not be > > done. > > > > When I using procedure from PL/pgSQL, then it is clear, so I place on > > *OUT position variables. But when I call procedure from top, then I'll > > pass fake parameters to get some result. > > What we'll probably want to do here is to make the OUT parameters part > of the identity signature of procedures, unlike in functions. This > should be a straightforward change, but it will require some legwork in > many parts of the code. > yes > > >if (argmodes && (argmodes[i] == PROARGMODE_INOUT || > > argmodes[i] == PROARGMODE_OUT)) > > + { > > + Param *param; > > > > Because PROARGMODE_OUT are disallowed, then this check is little bit > > messy. Please, add some comment. > > Fixed. > > I discovered another issue, in LANGUAGE SQL procedures. Currently, if > you make a CALL with an INOUT parameter in an SQL procedure, the output > is thrown away (unless it's the last command). I would like to keep > open the option of assigning the results by name, like we do in > PL/pgSQL. So in this patch I have made a change to prohibit calling > procedures with INOUT parameters in LANGUAGE SQL routines (see > check_sql_fn_statements()). What do you think? > The disabling it, it is probably the best what is possible now. The variables in SQL are more named parameters than variables. Is not necessary to complicate it. Regards Pavel > > -- > Peter Eisentraut http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >
Re: TupleTableSlot abstraction
Andres Freund wrote: > Hi, > > I've recently been discussing with Robert how to abstract > TupleTableSlots in the light of upcoming developments around abstracting > storage away. Besides that aspect I think we also need to make them > more abstract to later deal with vectorized excution, but I'm more fuzzy > on the details there. Hi, is this patch proposed for pg11? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Inquiry regarding the projects under GSOC
Hello Postgres Team I am Ankit 3rd year B.tech student from India I am really interested to work along with your organisation as an intern to rectify the issues and bugs as I love challenges and stated in the docs that some features might be not possible to implement so I would love to try. My inquiry is that what are the current bug fixes or features updates available for interns to work upon Thanks !!
[submit code] I develop a tool for pgsql, how can I submit it
hello! I develop a tool for pgsql, I want to submit it on pgsql. how can I submit it? address: https://github.com/leapking/pgcheck
Re: [HACKERS] path toward faster partition pruning
Hi Amit, On 03/13/2018 07:37 AM, Amit Langote wrote: I will continue working on improving the comments / cleaning things up and post a revised version soon, but until then please look at the attached. Passes check-world. Some minor comments: 0001: Ok 0002: Ok 0003: * Trailing white space * pruning.c - partkey_datum_from_expr * "Add more expression types..." -- Are you planning to add more of these ? Otherwise change the comment - get_partitions_for_null_keys * Documentation for method * 'break;' missing for _HASH and default case - get_partitions_for_keys * 'break;'s are outside of the 'case' blocks * The 'switch(opstrategy)'s could use some {} blocks * 'break;' missing from default - perform_pruning_combine_step * Documentation for method * nodeFuncs.c - Missing 'break;'s to follow style 0004: Ok Best regards, Jesper
Re: PATCH: Configurable file mode mask
Greetings Chapman, * Chapman Flack (c...@anastigmatix.net) wrote: > On 03/13/2018 10:40 AM, Stephen Frost wrote: > > ... Ultimately, the default which makes sense here isn't a > > one-size-fits-all but is system dependent and the administrator should > > be able to choose what permissions they want to have. > > Hear, hear. Returning for a moment again to > > https://www.postgresql.org/message-id/8b16cc30-0187-311e-04b5-1f32446d89dd%40anastigmatix.net > > we see that a stat() returning mode 0750 on a modern system may not > even mean there is any group access at all. In that example, the > datadir had these permissions: Yes, that's true, but PG has never paid attention to POSIX ACLs or Linux capabilities. Changing it to do so is quite a bit beyond the scope of this particular patch and I don't see anything in what this patch is doing which would preclude someone from putting in that effort in the future. > While PostgreSQL does its stat() and interprets the mode as if it is > still on a Unix box from the '80s, two things have changed underneath > it: POSIX ACLs and Linux capabilities. Capabilities take the place of > the former specialness of root, who now needs to be granted r-x > explicitly in the ACL to be able to read stuff there at all, and there > is clearly no access to group and no access to other. It would be hard > for anybody to call this an insecure configuration. But when you stat() > an object with a POSIX ACL, you get the 'mask' value where the 'group' > bits used to go, so postgres sees this as 0750, thinks the 5 represents > group access, and refuses to start. Purely a mistake. I'll point out that PG does run on quite a few other OS's beyond Linux. > It's the kind of mistake that is inherent in this sort of check, > which tries to draw conclusions from the semantics it assumes, while > systems evolve and semantics march along. One hypothetical fix would > be to add: > > #ifdef HAS_POSIX_ACLS > ... check if there's really an ACL here, and the S_IRWXG bits are > really just the mask, or even try to pass judgment on whether the > admin's chosen ACL is adequately secure ... > #endif > > but then sooner or later it will still end up making assumptions > that aren't true under, say, SELinux, so there's another #ifdef, > and where does it end? I agree with this general concern. > On 03/13/2018 11:00 AM, Tom Lane wrote: > > In a larger sense, this fails to explain the operating principle, > > namely what I stated above, that we allow the existing permissions > > on PGDATA to decide what we allow as group permissions. > > I admit I've only skimmed the patches, trying to determine what > that will mean in practice. In a case like the ACL example above, > does this mean that postgres will stat PGDATA, conclude incorrectly > that group access is granted, and then, based on that, actually go > granting unwanted group access for real on newly-created files > and directories? PG will stat PGDATA and conclude that the system is saying that group access has been granted on PGDATA and will do the same for subsequent files created later on. This is new in PG, so there isn't any concern about this causing problems in an existing environment- you couldn't have had those ACLs on an existing PGDATA dir in the first place, as you note above. The only issue that remains is that PG doesn't understand or work with POSIX ACLs or Linux capabilities, but that's not anything new. > On 03/13/2018 10:45 AM, David Steele wrote: > > As Stephen notes, this can be enforced by the user if they want to, > > and without much effort (and with better tools). > > To me, that seems really the key. An --allow-group-access option is > nice (but, as we see, misleading if its assumptions are outdated > regarding how the filesystem works), but I would plug even harder for > a --permission-transparency option, which would just assume that the > admin is making security arrangements, through mechanisms that > postgres may or may not even understand. The admin can create ACLs > with default entries that propagate to newly created objects. > SELinux contexts can work in similar ways. The admin controls the > umask inherited by the postmaster. A --permission-transparency option > would simply use open and mkdir in the traditional ways, allowing > the chosen umask, ACL defaults, SELinux contexts, etc., to do their > thing, and would avoid then stepping on the results with explicit > chmods (and of course skip the I-refuse-to-start-because-I- > misunderstand-your-setup check). > > It wouldn't be for every casual install, but it would be the best > way to stay out of the way of an admin securing a system with modern > facilities. > > A lot of the design effort put into debating what postgres itself > should or shouldn't insist on could then, perhaps, go into writing > postgres-specific configuration rule packages for some of those > better configuration-checking tools, and there it might even be > possible to write tests tha
Re: Additional Statistics Hooks
Mat Arye writes: > An example, of a case a protransform type system would not be able to > optimize is mathematical operator expressions like bucketing integers by > decile --- (integer / 10) * 10. Uh, why not? An estimation function that is specific to integer divide shouldn't have much trouble figuring out that x/10 has one-tenth as many distinct values as x does. I'd certainly rather have that knowledge associated directly with int4div, and the corresponding knowledge about date_trunc associated with that function, and similar knowledge about extension-provided operators provided by the extensions, than try to maintain a hook function that embeds all such knowledge. > I also think that the point with extended statistics is a good one and > points to the need for more experimentation/experience which I think > a C hook is better suited for. Putting in a hook will allow extension > writers like us to experiment and figure out the kinds of transform on > statistics that are useful while having > a small footprint on the core. If you're experimenting you might as well just change the source code. A hook is only useful if you're trying to ship something for production, and I doubt that factorizing things this way is a credible production solution. regards, tom lane
Re: PATCH: Configurable file mode mask
On 03/13/2018 10:40 AM, Stephen Frost wrote: > * Michael Paquier (mich...@paquier.xyz) wrote: >> Well, one thing is that the current checks in the postmaster make sure >> that a data folder is never using anything else than 0700. From a >> security point of view, making it possible to allow a postmaster to >> start with 0750 is a step backwards ... > Lastly, the user *is* able to enforce the privileges on the data > directory if they wish to, using tools such as tripwire which are built > specifically to provide such checks and to do so regularly instead of > the extremely ad-hoc check provided by PG. > > ... Ultimately, the default which makes sense here isn't a > one-size-fits-all but is system dependent and the administrator should > be able to choose what permissions they want to have. Hear, hear. Returning for a moment again to https://www.postgresql.org/message-id/8b16cc30-0187-311e-04b5-1f32446d89dd%40anastigmatix.net we see that a stat() returning mode 0750 on a modern system may not even mean there is any group access at all. In that example, the datadir had these permissions: # getfacl . # file: . # owner: postgres # group: postgres user::rwx user:root:r-x group::--- mask::r-x other::--- While PostgreSQL does its stat() and interprets the mode as if it is still on a Unix box from the '80s, two things have changed underneath it: POSIX ACLs and Linux capabilities. Capabilities take the place of the former specialness of root, who now needs to be granted r-x explicitly in the ACL to be able to read stuff there at all, and there is clearly no access to group and no access to other. It would be hard for anybody to call this an insecure configuration. But when you stat() an object with a POSIX ACL, you get the 'mask' value where the 'group' bits used to go, so postgres sees this as 0750, thinks the 5 represents group access, and refuses to start. Purely a mistake. It's the kind of mistake that is inherent in this sort of check, which tries to draw conclusions from the semantics it assumes, while systems evolve and semantics march along. One hypothetical fix would be to add: #ifdef HAS_POSIX_ACLS ... check if there's really an ACL here, and the S_IRWXG bits are really just the mask, or even try to pass judgment on whether the admin's chosen ACL is adequately secure ... #endif but then sooner or later it will still end up making assumptions that aren't true under, say, SELinux, so there's another #ifdef, and where does it end? On 03/13/2018 11:00 AM, Tom Lane wrote: > In a larger sense, this fails to explain the operating principle, > namely what I stated above, that we allow the existing permissions > on PGDATA to decide what we allow as group permissions. I admit I've only skimmed the patches, trying to determine what that will mean in practice. In a case like the ACL example above, does this mean that postgres will stat PGDATA, conclude incorrectly that group access is granted, and then, based on that, actually go granting unwanted group access for real on newly-created files and directories? That doesn't seem ideal. On 03/13/2018 10:45 AM, David Steele wrote: > As Stephen notes, this can be enforced by the user if they want to, > and without much effort (and with better tools). To me, that seems really the key. An --allow-group-access option is nice (but, as we see, misleading if its assumptions are outdated regarding how the filesystem works), but I would plug even harder for a --permission-transparency option, which would just assume that the admin is making security arrangements, through mechanisms that postgres may or may not even understand. The admin can create ACLs with default entries that propagate to newly created objects. SELinux contexts can work in similar ways. The admin controls the umask inherited by the postmaster. A --permission-transparency option would simply use open and mkdir in the traditional ways, allowing the chosen umask, ACL defaults, SELinux contexts, etc., to do their thing, and would avoid then stepping on the results with explicit chmods (and of course skip the I-refuse-to-start-because-I- misunderstand-your-setup check). It wouldn't be for every casual install, but it would be the best way to stay out of the way of an admin securing a system with modern facilities. A lot of the design effort put into debating what postgres itself should or shouldn't insist on could then, perhaps, go into writing postgres-specific configuration rule packages for some of those better configuration-checking tools, and there it might even be possible to write tests that incorporate knowledge of ACLs, SELinux, etc. -Chap
Re: PATCH: Configurable file mode mask
David Steele writes: > On 3/12/18 3:28 AM, Michael Paquier wrote: >> In pg_rewind and pg_resetwal, isn't that also a portion which is not >> necessary without the group access feature? > These seem like a good idea to me with or without patch 03. Some of our > front-end tools (initdb, pg_upgrade) were setting umask and others > weren't. I think it's more consistent (and safer) if they all do, at > least if they are writing into PGDATA. +1 ... see a926eb84e for an example of how easy it is to screw up if the process's overall umask is permissive. regards, tom lane
Re: [WIP PATCH] Index scan offset optimisation using visibility map
Hello. Tom, thanks a lot for your thorough review. > What you've done to > IndexNext() is a complete disaster from a modularity standpoint: it now > knows all about the interactions between index_getnext, index_getnext_tid, > and index_fetch_heap. I was looking into the current IndexOnlyNext implementation as a starting point - and it knows about index_getnext_tid and index_fetch_heap already. At the same time I was trying to keep patch non-invasive. Patched IndexNext now only knowns about index_getnext_tid and index_fetch_heap - the same as IndexOnlyNext. But yes, it probably could be done better. > I'm not sure about a nicer way to refactor that, but I'd suggest that > maybe you need an additional function in indexam.c that hides all this > knowledge about the internal behavior of an IndexScanDesc. I'll try to split index_getnext into two functions. A new one will do everything index_getnext does except index_fetch_heap. > Or that is, it knows too much and still not enough, > because it's flat out wrong for the case that xs_continue_hot is set. > You can't call index_getnext_tid when that's still true from last time. Oh.. Yes, clear error here. < The PredicateLockPage call also troubles me quite a bit, not only from < a modularity standpoint but because that implies a somewhat-user-visible < behavioral change when this optimization activates. People who are using < serializable mode are not going to find it to be an improvement if their < queries fail and need retries a lot more often than they did before. < I don't know if that problem is bad enough that we should disable skipping < when serializable mode is active, but it's something to think about. Current IndexOnlyScan already does that. And I think a user should expect such a change in serializable mode. > You haven't documented the behavior required for tuple-skipping in any > meaningful fashion, particularly not the expectation that the child plan > node will still return tuples that just need not contain any valid > content. Only nodeLimit could receive such tuples and they are immediately discarded. I'll add some comment to it. > is a gross hack and probably wrong. You could use ExecStoreAllNullTuple, > perhaps. Oh, nice, missed it. > I'm disturbed by the fact that you've not taught the planner about the > potential cost saving from this, so that it won't have any particular > reason to pick a regular indexscan over some other plan type when this > optimization applies. Maybe there's no practical way to do that, or maybe > it wouldn't really matter in practice; I've not looked into it. But not > doing anything feels like a hack. I was trying to do it. But current planner architecture does not provide a nice way to achive it due to the way it handles limit and offset. So, I think it is better to to be avoided for now. > Setting this back to Waiting on Author. I'll try to make the required changes in a few days. Thanks.
Re: PATCH: Unlogged tables re-initialization tests
Dagfinn Ilmari Mannsåker wrote: > $SIG{__DIE__} gets called even for exceptions that would be caught by a > surrunding eval block, so this should at the very least be: > > $SIG{__DIE__} = sub { BAIL_OUT(@_) unless $^S }; > > However, that is still wrong, because die() and BAIL_OUT() mean > different things: die() aborts the current test script, while BAIL_OUT() > aborts the entire 'prove' run, i.e. all subsequent scripts in the same > test directory. Sounds like 'die' is the correct thing, then, and that BAIL_OUT should be called sparingly ... for example this one in PostgresNode::start seems like an overreaction: BAIL_OUT("node \"$name\" is already running") if defined $self->{_pid}; Yes? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: PATCH: Exclude temp relations from base backup
Hi, On 2/28/18 10:55 AM, David Steele wrote: > This is a follow-up patch from the exclude unlogged relations discussion > [1]. > > The patch excludes temporary relations during a base backup using the > existing looks_like_temp_rel_name() function for identification. > > It shares code to identify database directories from [1], so for now > that has been duplicated in this patch to make it independent. I'll > rebase depending on what gets committed first. Updated the patch to change die() to BAIL_OUT() and use append_to_file() as suggested for another test patch [1]. Regards, -- -David da...@pgmasters.net [1] https://www.postgresql.org/message-id/6bc5d931-5b00-279f-f65a-26e32de400a6%40pgmasters.net From c83f3ae09bacb961c511ebe9cb1f9f79bf1e3d29 Mon Sep 17 00:00:00 2001 From: David Steele Date: Tue, 13 Mar 2018 12:22:24 -0400 Subject: [PATCH 1/1] Exclude temporary relations from base backup. Exclude temporary relations during a base backup using the existing looks_like_temp_rel_name() function for identification. It shares code to identify database directories with [1], so for now that has been duplicated in this patch to make it independent. I'll rebase depending on what gets committed first. [1] https://www.postgresql.org/message-id/04791bab-cb04-ba43-e9c0-664a4c1ffb2c%40pgmasters.net --- doc/src/sgml/protocol.sgml | 2 +- src/backend/replication/basebackup.c | 41 +++ src/backend/storage/file/fd.c| 3 +- src/bin/pg_basebackup/t/010_pg_basebackup.pl | 49 +++- src/include/storage/fd.h | 1 + 5 files changed, 92 insertions(+), 4 deletions(-) diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index 4fd61d7c2d..629a462574 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -2565,7 +2565,7 @@ The commands accepted in replication mode are: Various temporary files and directories created during the operation of the PostgreSQL server, such as any file or directory beginning - with pgsql_tmp. + with pgsql_tmp and temporary relations. diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index 185f32a5f9..f0c3d13b2b 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -26,6 +26,7 @@ #include "nodes/pg_list.h" #include "pgtar.h" #include "pgstat.h" +#include "port.h" #include "postmaster/syslogger.h" #include "replication/basebackup.h" #include "replication/walsender.h" @@ -958,6 +959,36 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces, charpathbuf[MAXPGPATH * 2]; struct stat statbuf; int64 size = 0; + const char *lastDir; /* Split last dir from parent path. */ + boolisDbDir = false;/* Does this directory contain relations? */ + + /* +* Determine if the current path is a database directory that can +* contain relations. +* +* Start by finding the location of the delimiter between the parent +* path and the current path. +*/ + lastDir = last_dir_separator(path); + + /* Does this path look like a database path (i.e. all digits)? */ + if (lastDir != NULL && + strspn(lastDir + 1, "0123456789") == strlen(lastDir + 1)) + { + /* Part of path that contains the parent directory. */ + int parentPathLen = lastDir - path; + + /* +* Mark path as a database directory if the parent path is either +* $PGDATA/base or a tablespace version path. +*/ + if (strncmp(path, "./base", parentPathLen) == 0 || + (parentPathLen >= (sizeof(TABLESPACE_VERSION_DIRECTORY) - 1) && +strncmp(lastDir - (sizeof(TABLESPACE_VERSION_DIRECTORY) - 1), +TABLESPACE_VERSION_DIRECTORY, +sizeof(TABLESPACE_VERSION_DIRECTORY) - 1) == 0)) + isDbDir = true; + } dir = AllocateDir(path); while ((de = ReadDir(dir, path)) != NULL) @@ -1007,6 +1038,16 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces, if (excludeFound) continue; + /* Exclude temporary relations */ + if (isDbDir && looks_like_temp_rel_name(de->d_name)) + { + elog(DEBUG2, +"temporary relation file \"%s\" excluded from backup", +de->d_name); + + continue; + } + snprintf(pathbuf, sizeof(pathbuf), "%s/%s", path, de->d_name);
Re: [HACKERS] [FEATURE PATCH] pg_stat_statements with plans (v02)
On Mon, Mar 12, 2018 at 02:11:42PM +0100, Julian Markwort wrote: > > In same manner pg_stat_statements_good_plan_reset() and > > pg_stat_statements_bad_plan_reset() functions can be combined in > > function: > > > pg_stat_statements_plan_reset(in queryid bigint, in good boolean, in bad > > boolean) > > This is also sensible, however if any more kinds of plans would be added in > the future, there would be a lot of flags in this function. I think having > varying amounts of flags (between extension versions) as arguments to the > function also makes any upgrade paths difficult. Thinking more about this, we > could also user function overloading to avoid this. > An alternative would be to have the caller pass the type of plan he wants to > reset. Omitting the type would result in the deletion of all plans, maybe? > pg_stat_statements_plan_reset(in queryid bigint, in plan_type text) Yes, it is a good idea. But maybe instead of passing an empty string into plans type we should overload the function with only queryid argument. I think it is a common practice in PostgreSQL. Otherwise allowance to pass empty string as plan type may confuse users. So functions declaration may be the following: pg_stat_statements_plan_reset(in queryid bigint, in plan_type text) pg_stat_statements_plan_reset(in queryid bigint) + interquartile_dist = 2.0*(0.6745 * sqrt(e->counters.sum_var_time / e->counters.calls)); I think it would be better to have defines for 2.0 and 0.6745 values for the sake of readability. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: Additional Statistics Hooks
On Tue, Mar 13, 2018 at 6:31 AM, David Rowley wrote: > On 13 March 2018 at 11:44, Tom Lane wrote: > > While it would certainly be nice to have better behavior for that, > > "add a hook so users who can write C can fix it by hand" doesn't seem > > like a great solution. On top of the sheer difficulty of writing a > > hook function, you'd have the problem that no pre-written hook could > > know about all available functions. I think somehow we'd need a way > > to add per-function knowledge, perhaps roughly like the protransform > > feature. > I think this isn't either-or. I think a general hook can be useful for extensions that want to optimize particular data distributions/workloads using domain-knowledge about functions common for those workloads. That way users working with that data can use extensions to optimize workloads without writing C themselves. I also think a protransform like feature would add a lot of power to the native planner but this could take a while to get into core properly and may not handle all kinds of data distributions/cases. An example, of a case a protransform type system would not be able to optimize is mathematical operator expressions like bucketing integers by decile --- (integer / 10) * 10. This is somewhat analogous to date_trunc in the integer space and would also change the number of resulting distinct rows. > > I always imagined that extended statistics could be used for this. > Right now the estimates are much better when you create an index on > the function, but there's no real reason to limit the stats that are > gathered to just plain columns + expression indexes. > > I believe I'm not the only person to have considered this. Originally > extended statistics were named multivariate statistics. I think it was > Dean and I (maybe others too) that suggested to Tomas to give the > feature a more generic name so that it can be used for a more general > purpose later. > I also think that the point with extended statistics is a good one and points to the need for more experimentation/experience which I think a C hook is better suited for. Putting in a hook will allow extension writers like us to experiment and figure out the kinds of transform on statistics that are useful while having a small footprint on the core. I think designing a protransform-like system would benefit from more experience with the kinds of transformations that are useful. For example, can anything be done if the interval passed to date_trunc is not constant, or is it not even worth bothering with that case? Maybe extended statistics is a better approach, etc. > > -- > David Rowley http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services >
Re: PATCH: Configurable file mode mask
Hi Michael, On 3/12/18 3:28 AM, Michael Paquier wrote: > On Fri, Mar 09, 2018 at 01:51:14PM -0500, David Steele wrote: >> How about a GUC that enforces one mode or the other on startup? Default >> would be 700. The GUC can be set automatically by initdb based on the >> -g option. We had this GUC originally, but since the front-end tools >> can't read it we abandoned it. Seems like it would be good as an >> enforcing mechanism, though. > > Hm. OK. I can see the whole set of points about that. Please let me > think a bit more about that bit. Do you think that there could be a > pool of users willing to switch from one mode to another? Compared to > your v1, we could indeed have a GUC which enforces a restriction to not > allow group access, and enabled by default. As the commit fest is > running and we don't have a clear picture yet, I am afraid that it may > be better to move that to v12, and focus on getting patches 1 and 2 > committed. This will provide a good base for the next move. > > There are three places where things are still not correct: > > - if (chmod(location, S_IRWXU) != 0) > + current_umask = umask(0); > + umask(current_umask); > + > + if (chmod(location, PG_DIR_MODE_DEFAULT & ~current_umask) != 0) > This is in tablespace.c. I have moved this hunk to 03 and used only PG_DIR_MODE_DEFAULT instead. > @@ -185,6 +186,9 @@ main(int argc, char **argv) > exit(1); > } > > + /* Set dir/file mode mask */ > + umask(PG_MODE_MASK_DEFAULT); > + > In pg_rewind and pg_resetwal, isn't that also a portion which is not > necessary without the group access feature? These seem like a good idea to me with or without patch 03. Some of our front-end tools (initdb, pg_upgrade) were setting umask and others weren't. I think it's more consistent (and safer) if they all do, at least if they are writing into PGDATA. > This is all I have basically for patch 2, which would be good for > shipping. Thanks! I'll attach new patches in a reply to [1] once I have made the changes Tom requested. -- -David da...@pgmasters.net [1] https://www.postgresql.org/message-id/22928.1520953220%40sss.pgh.pa.us signature.asc Description: OpenPGP digital signature
Re: PATCH: Configurable file mode mask
Greetings, * David Steele (da...@pgmasters.net) wrote: > On 3/13/18 11:00 AM, Tom Lane wrote: > > Stephen Frost writes: > >> * Michael Paquier (mich...@paquier.xyz) wrote: > >>> If the problem is parsing, it could as well be more portable to put that > >>> in the control file, no? > > > >> Then we'd need a tool to allow changing it for people who do want to > >> change it. There's no reason we should have to have an extra tool for > >> this- an administrator who chooses to change the privileges on the data > >> folder should be able to do so and I don't think anyone is going to > >> thank us for requiring them to use some special tool to do so for > >> PostgreSQL. > > > > FWIW, I took a quick look through this patch and don't have any problem > > with the approach, which appears to be "use the data directory's > > startup-time permissions as the status indicator". I am kinda wondering > > about this though: > > > > +(These files can confuse pg_ctl.) If group > > read > > +access is enabled on the data directory and an unprivileged user in the > > +PostgreSQL group is performing the backup, > > then > > +postmaster.pid will not be readable and must be > > +excluded. > > > > If we're allowing group read on the data directory, why should that not > > include postmaster.pid? There's nothing terribly security-relevant in > > there, and being able to find out the postmaster PID seems useful. I can > > see the point of restricting server key files, as documented elsewhere, > > but it's possible to keep those somewhere else so they don't cause > > problems for simple backup software. > > I'm OK with that. I had a look at the warnings regarding the required > mode of postmaster.pid in miscinit.c (889-911) and it seems to me they > still hold true if the mode is 640 instead of 600. > > Do you agree, Tom? Stephen? > > If so, I'll make those changes. I agree that we can still consider an EPERM-error case as being ok even with the changes proposed, and with the same-user check happening earlier in checkDataDir(), we won't even get to the point of looking at the pid file if the userid's don't match. The historical comment about the old datadir permissions can likely just be removed, perhaps replaced with a bit more commentary above that check in checkDataDir(). The open() call should also fail if we only have group-read privileges on the file (0640), but surely the kill() will in any case. Thanks! Stephen signature.asc Description: PGP signature
Re: Google Summer of Code: Potential Applicant
Greetings, * Aleksander Alekseev (a.aleks...@postgrespro.ru) wrote: > Craig, I believe you probably did something wrong if you had to work > with some library directly. Actually you generate classes from text > description and just use them. I worked with Thrift some time ago, in > 2015 [1]. I wouldn't call it awkward. Protobuf is fine too, but > unfortunately we don't have any Protobuf-related projects this time. Just to be clear, the list on the wiki is just a set of suggestions- students are welcome to propose their own projects as well. > Also it's probably worth noticing that the GSoC project doesn't imply > using any existing libraries, only the binary format which is quite > stable. A student proposal should really include information about what other libraries, if any, are being considered for the project as that will play into the consideration as to if it's something we would be interested in including in PG or not. Thanks! Stephen signature.asc Description: PGP signature
Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.
David Gould writes: > I have attached the patch we are currently using. It applies to 9.6.8. I > have versions for older releases in 9.4, 9.5, 9.6. I fails to apply to 10, > and presumably head but I can update it if there is any interest. > The patch has three main features: > - Impose an ordering on the autovacuum workers worklist to avoid > the need for rechecking statistics to skip already vacuumed tables. > - Reduce the frequency of statistics refreshes > - Remove the AutovacuumScheduleLock As per the earlier thread, the first aspect of that needs more work to not get stuck when the worklist has long tasks near the end. I don't think you're going to get away with ignoring that concern. Perhaps we could sort the worklist by decreasing table size? That's not an infallible guide to the amount of time that a worker will need to spend, but it's sure safer than sorting by OID. Alternatively, if we decrease the frequency of stats refreshes, how much do we even need to worry about reordering the worklist? In any case, I doubt anyone will have any appetite for back-patching such a change. I'd recommend that you clean up your patch and rebase to HEAD, then submit it into the September commitfest (either on a new thread or a continuation of the old #13750 thread, not this one). regards, tom lane
Re: Additional Statistics Hooks
On Tue, Mar 13, 2018 at 6:56 AM, Ashutosh Bapat < ashutosh.ba...@enterprisedb.com> wrote: > On Tue, Mar 13, 2018 at 4:14 AM, Tom Lane wrote: > > Mat Arye writes: > >> So the use-case is an analytical query like > > > >> SELECT date_trunc('hour', time) AS MetricMinuteTs, AVG(value) as avg > >> FROM hyper > >> WHERE time >= '2001-01-04T00:00:00' AND time <= '2001-01-05T01:00:00' > >> GROUP BY MetricMinuteTs > >> ORDER BY MetricMinuteTs DESC; > > > >> Right now this query will choose a much-less-efficient GroupAggregate > plan > >> instead of a HashAggregate. It will choose this because it thinks the > >> number of groups > >> produced here is 9,000,000 because that's the number of distinct time > >> values there are. > >> But, because date_trunc "buckets" the values there will be about 24 > groups > >> (1 for each hour). > > > > While it would certainly be nice to have better behavior for that, > > "add a hook so users who can write C can fix it by hand" doesn't seem > > like a great solution. On top of the sheer difficulty of writing a > > hook function, you'd have the problem that no pre-written hook could > > know about all available functions. I think somehow we'd need a way > > to add per-function knowledge, perhaps roughly like the protransform > > feature. > > Like cost associated with a function, we may associate mapping > cardinality with a function. It tells how many distinct input values > map to 1 output value. By input value, I mean input argument tuple. In > Mat's case the mapping cardinality will be 12. The number of distinct > values that function may output is estimated as number of estimated > rows / mapping cardinality of that function. > I think this is complicated by the fact that the mapping cardinality is not a constant per function but depends on the constant given as the first argument to the function and the granularity of the underlying data (do you have a second-granularity or microsecond granularity). I actually think the logic for the estimate here should be the (max(time)-min(time))/interval. I think to be general you need to allow functions on statistics to determine the estimate. > > -- > Best Wishes, > Ashutosh Bapat > EnterpriseDB Corporation > The Postgres Database Company >
Re: PATCH: Configurable file mode mask
On 3/13/18 11:00 AM, Tom Lane wrote: > Stephen Frost writes: >> * Michael Paquier (mich...@paquier.xyz) wrote: >>> If the problem is parsing, it could as well be more portable to put that >>> in the control file, no? > >> Then we'd need a tool to allow changing it for people who do want to >> change it. There's no reason we should have to have an extra tool for >> this- an administrator who chooses to change the privileges on the data >> folder should be able to do so and I don't think anyone is going to >> thank us for requiring them to use some special tool to do so for >> PostgreSQL. > > FWIW, I took a quick look through this patch and don't have any problem > with the approach, which appears to be "use the data directory's > startup-time permissions as the status indicator". I am kinda wondering > about this though: > > +(These files can confuse pg_ctl.) If group > read > +access is enabled on the data directory and an unprivileged user in the > +PostgreSQL group is performing the backup, > then > +postmaster.pid will not be readable and must be > +excluded. > > If we're allowing group read on the data directory, why should that not > include postmaster.pid? There's nothing terribly security-relevant in > there, and being able to find out the postmaster PID seems useful. I can > see the point of restricting server key files, as documented elsewhere, > but it's possible to keep those somewhere else so they don't cause > problems for simple backup software. I'm OK with that. I had a look at the warnings regarding the required mode of postmaster.pid in miscinit.c (889-911) and it seems to me they still hold true if the mode is 640 instead of 600. Do you agree, Tom? Stephen? If so, I'll make those changes. > Also, in general, this patch desperately needs a round of copy-editing, > particularly with attention to the comments. For instance, there are a > minimum of three grammatical errors in this one comment: > > + * Group read/execute may optional be enabled on PGDATA so any frontend tools > + * That write into PGDATA must know what mask to set and the permissions to > + * use for creating files and directories. > In a larger sense, this fails to explain the operating principle, > namely what I stated above, that we allow the existing permissions > on PGDATA to decide what we allow as group permissions. If you > don't already understand that, you're going to have a hard time > extracting it from either file_perm.h or file_perm.c. (The latter > fails to even explain what the argument of DataDirectoryMask is.) I'll do comment/doc review for the next round of patches. Thanks, -- -David da...@pgmasters.net
Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.
David Gould writes: > I also thought about the theory and am confident that there really is no way > to trick it. Basically if there are enough pages that are different to affect > the overall density, say 10% empty or so, there is no way a random sample > larger than a few hundred probes can miss them no matter how big the table is. > If there are few enough pages to "hide" from the sample, then they are so few > they don't matter anyway. > After all this my vote is for back patching too. I don't see any case where > the patched analyze is or could be worse than what we are doing. I'm happy to > provide my test cases if anyone is interested. Yeah, you have a point. I'm still worried about unexpected side-effects, but it seems like overall this is very unlikely to hurt anyone. I'll back-patch (minus the removal of the unneeded vac_estimate_reltuples argument). regards, tom lane
Re: committing inside cursor loop
On 3/6/18 07:48, Ildus Kurbangaliev wrote: > Although as was discussed before it seems inconsistent without ROLLBACK > support. There was a little discussion about it, but no replies. Maybe > the status of the patch should be changed to 'Waiting on author' until > the end of discussion. I'm wondering what the semantics of it should be. For example, consider this: drop table test1; create table test1 (a int, b int); insert into test1 values (1, 11), (2, 22), (3, 33); do language plpgsql $$ declare x int; begin for x in update test1 set b = b + 1 where b > 20 returning a loop raise info 'x = %', x; if x = 2 then rollback; end if; end loop; end; $$; The ROLLBACK call in the first loop iteration undoes the UPDATE command that drives the loop. Is it then sensible to continue the loop? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Another oddity in handling of WCO constraints in postgres_fdw
Greetings, * Ashutosh Bapat (ashutosh.ba...@enterprisedb.com) wrote: > On Mon, Mar 12, 2018 at 1:25 PM, Etsuro Fujita > wrote: > > (2018/03/09 20:55), Etsuro Fujita wrote: > >> (2018/03/08 14:24), Ashutosh Bapat wrote: > >>> For local constraints to be enforced, we use remote > >>> constraints. For local WCO we need to use remote WCO. That means we > >>> create many foreign tables pointing to same local table on the foreign > >>> server through many views, but it's not impossible. > >> > >> Maybe I don't understand this correctly, but I guess that it would be > >> the user's responsibility to not create foreign tables in such a way. > > > > I think I misunderstood your words. Sorry for that. I think what you > > proposed would be a solution for this issue, but I'm not sure that's a good > > one because that wouldn't work for the data sources that don't support views > > with WCO options. > > Our solution for the constraints doesn't work with the data sources > (like flat files) which don't support constraints. So, that argument > doesn't help. It would really help to have some examples of exactly what is being proposed here wrt solutions. WCO is defined at a view level, so I'm not following the notion that we're going to depend on something remote to enforce the WCO when the remote object is just a regular table that you can't define a WCO on top of. That's not the case when we're talking about foreign tables vs. local tables, so it's not the same. I certainly don't think we should require a remote view to exist to perform the WCO check. If the remote WCO is a view itself then I would expect it to operate in the same manner as WCO on local views does- you can have them defined as being cascaded or not. In other words, there is no case where we have a "foreign view." Views are always local. A "foreign table" could actually be a view, in which case everything we treat it as a table in the local database, but WCO doesn't come up in that case at all- there's no way to define WCO on a table, foreign or not. If WCO is defined on the view on the remote server, then it should operate properly and not require anything from the local side. Thanks! Stephen signature.asc Description: PGP signature
Re: Google Summer of Code: Potential Applicant
Thanks, Aleksander!SP- > 13 марта 2018 г., в 19:03, Aleksander Alekseev > написал(а): > >> Do you have any project related insights as to what I should put in >> there? > Christos, as far as I remember, good proposal must have schedule, implementation details and deliverables. Also, it is good to show that you are capable of executing the project, like mentioning your previous project (no matter commercial, educational or pet projects), achievements etc. GSoC typically have 3 milestones, usually this milestones must have some viable results. There are exact dates, but here I'll put a sketch. Algorithms. June - implement introsort and timsort, July - design benchmarks, implement some other hashtables, August - if benchmarks are successful, then propose patch to commitfest and review others patches from commitfest, else implement more algorithms. amcheck. June - implement checks for Gin (like b-tree in b-tree, resembles existing amcheck), July - checks for Hash, BRIN and SP-GiST, August - RUM, patch, commitfest, reviews. Best regards, Andrey Borodin.
Re: PATCH: Configurable file mode mask
Stephen Frost writes: > * Michael Paquier (mich...@paquier.xyz) wrote: >> If the problem is parsing, it could as well be more portable to put that >> in the control file, no? > Then we'd need a tool to allow changing it for people who do want to > change it. There's no reason we should have to have an extra tool for > this- an administrator who chooses to change the privileges on the data > folder should be able to do so and I don't think anyone is going to > thank us for requiring them to use some special tool to do so for > PostgreSQL. FWIW, I took a quick look through this patch and don't have any problem with the approach, which appears to be "use the data directory's startup-time permissions as the status indicator". I am kinda wondering about this though: +(These files can confuse pg_ctl.) If group read +access is enabled on the data directory and an unprivileged user in the +PostgreSQL group is performing the backup, then +postmaster.pid will not be readable and must be +excluded. If we're allowing group read on the data directory, why should that not include postmaster.pid? There's nothing terribly security-relevant in there, and being able to find out the postmaster PID seems useful. I can see the point of restricting server key files, as documented elsewhere, but it's possible to keep those somewhere else so they don't cause problems for simple backup software. Also, in general, this patch desperately needs a round of copy-editing, particularly with attention to the comments. For instance, there are a minimum of three grammatical errors in this one comment: + * Group read/execute may optional be enabled on PGDATA so any frontend tools + * That write into PGDATA must know what mask to set and the permissions to + * use for creating files and directories. In a larger sense, this fails to explain the operating principle, namely what I stated above, that we allow the existing permissions on PGDATA to decide what we allow as group permissions. If you don't already understand that, you're going to have a hard time extracting it from either file_perm.h or file_perm.c. (The latter fails to even explain what the argument of DataDirectoryMask is.) regards, tom lane