Re: [HACKERS] Declarative partitioning - another take
> > With this patch, the mapping is created *only once* during > RelationBuildPartitionDesc() to assign canonical indexes to individual > list values. The partition OID array will also be rearranged such that > using the new (canonical) index instead of the old > catalog-scan-order-based index will retrieve the correct partition for > that value. > > By the way, I fixed one thinko in your patch as follows: > > -result->oids[i] = oids[mapping[i]]; > +result->oids[mapping[i]] = oids[i]; While I can not spot any problem with this logic, when I make that change and run partition_join testcase in my patch, it fails because wrong partitions are matched for partition-wise join of list partitions. In that patch, RelOptInfo of partitions are saved in RelOptInfo of the parent by matching their OIDs. They are saved in the same order as corresponding OIDs. Partition-wise join simply joins the RelOptInfos at the same positions from both the parent RelOptInfos. I can not spot an error in this logic too. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress
On Thu, May 19, 2016 at 6:57 AM, Michael Paquier wrote: > I am adding that to the commit fest of September. And a lot of activity has happened here since. Attached are refreshed patches based on da6c4f6. v11 still applies correctly but it's always better to avoid hunks when applying them. -- Michael diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 289d240..0fd2e2b 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -8448,8 +8448,12 @@ CreateCheckPoint(int flags) if ((flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY | CHECKPOINT_FORCE)) == 0) { + elog(LOG, "Not a forced or shutdown checkpoint: progress_lsn %X/%X, ckpt %X/%X", + (uint32) (progress_lsn >> 32), (uint32) progress_lsn, + (uint32) (ControlFile->checkPoint >> 32), (uint32) ControlFile->checkPoint); if (progress_lsn == ControlFile->checkPoint) { + elog(LOG, "Checkpoint is skipped"); WALInsertLockRelease(); LWLockRelease(CheckpointLock); END_CRIT_SECTION(); @@ -8616,7 +8620,11 @@ CreateCheckPoint(int flags) * recovery we don't need to write running xact data. */ if (!shutdown && XLogStandbyInfoActive()) - LogStandbySnapshot(); + { + XLogRecPtr lsn = LogStandbySnapshot(); + elog(LOG, "snapshot taken by checkpoint %X/%X", + (uint32) (lsn >> 32), (uint32) lsn); + } START_CRIT_SECTION(); diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c index 3a791eb..7637a1d 100644 --- a/src/backend/postmaster/bgwriter.c +++ b/src/backend/postmaster/bgwriter.c @@ -331,7 +331,9 @@ BackgroundWriterMain(void) GetLastCheckpointRecPtr() < current_progress_lsn && last_progress_lsn < current_progress_lsn) { -(void) LogStandbySnapshot(); +XLogRecPtr lsn = LogStandbySnapshot(); +elog(LOG, "snapshot taken by bgwriter %X/%X", + (uint32) (lsn >> 32), (uint32) lsn); last_snapshot_ts = now; last_progress_lsn = current_progress_lsn; } diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index b019bc1..ac40731 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -2507,7 +2507,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid, heaptup->t_len - SizeofHeapTupleHeader); /* filtering by origin on a row level is much more efficient */ - XLogIncludeOrigin(); + XLogSetFlags(XLOG_INCLUDE_ORIGIN); recptr = XLogInsert(RM_HEAP_ID, info); @@ -2846,7 +2846,7 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples, XLogRegisterBufData(0, tupledata, totaldatalen); /* filtering by origin on a row level is much more efficient */ - XLogIncludeOrigin(); + XLogSetFlags(XLOG_INCLUDE_ORIGIN); recptr = XLogInsert(RM_HEAP2_ID, info); @@ -3308,7 +3308,7 @@ l1: } /* filtering by origin on a row level is much more efficient */ - XLogIncludeOrigin(); + XLogSetFlags(XLOG_INCLUDE_ORIGIN); recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_DELETE); @@ -6035,7 +6035,7 @@ heap_finish_speculative(Relation relation, HeapTuple tuple) XLogBeginInsert(); /* We want the same filtering on this as on a plain insert */ - XLogIncludeOrigin(); + XLogSetFlags(XLOG_INCLUDE_ORIGIN); XLogRegisterData((char *) &xlrec, SizeOfHeapConfirm); XLogRegisterBuffer(0, buffer, REGBUF_STANDARD); @@ -7703,7 +7703,7 @@ log_heap_update(Relation reln, Buffer oldbuf, } /* filtering by origin on a row level is much more efficient */ - XLogIncludeOrigin(); + XLogSetFlags(XLOG_INCLUDE_ORIGIN); recptr = XLogInsert(RM_HEAP_ID, info); diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index e11b229..9130816 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -5232,7 +5232,7 @@ XactLogCommitRecord(TimestampTz commit_time, XLogRegisterData((char *) (&xl_origin), sizeof(xl_xact_origin)); /* we allow filtering by xacts */ - XLogIncludeOrigin(); + XLogSetFlags(XLOG_INCLUDE_ORIGIN); return XLogInsert(RM_XACT_ID, info); } diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index c1b9a97..289d240 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -442,11 +442,30 @@ typedef struct XLogwrtResult * the WAL record is just copied to the page and the lock is released. But * to avoid the deadlock-scenario explained above, the indicator is always * updated before sleeping while holding an insertion lock. + * + * The progressAt values indicate the insertion progress used to determine + * WAL insertion activity since a previous checkpoint, which is aimed at + * finding out if a checkpoint should be skipped or not or if standby + * activity should be logged. Progress position is basically updated + * for all types of records, for the time being only snapshot logging + * is out of this scope to properly skip their logging o
Re: [HACKERS] Requesting some information about the small portion of source code of postgreSQL
On 27 September 2016 at 14:13, Srinivas Karthik V wrote: > Dear PostgresSQL Hackers, >I am working in optimizer module of postgreSQL 9.4.1. Why would you target a severely obsolete patch release? > Also I would like to know for what targetlist stands for. Can't really comment on the rest, but the targetlist is what gets projected. It is the query or subquery output. It also contains a bunch of internal-use values that get discarded and are not returned to the user; these are labeled "resjunk". e.g. in SELECT a, b, f(b,c) / 100, 'x' FROM some_table; the targetlist is 4 entries. Two simple column references to "a" and "b". An expression for f(b,c)/100 containing the column-references for "b" and "c" and various surrounding expression nodes. Finally a literal entry for 'x'. You'll find turning on the various debug print options for parsetrees, rewriter output and plan trees helpful in understanding what's going on. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove superuser() checks from pgstattuple
On Tue, Sep 27, 2016 at 4:43 AM, Stephen Frost wrote: > * Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote: >> This is now being obsoleted by the later idea of allowing base installs >> from a chain of upgrade scripts. But if your upgrade scripts contain >> ALTER TABLE commands, you will probably still want to write base install >> scripts that do a fresh CREATE TABLE instead. > > I've updated the patch to remove the new base version script and to rely > on the changes made by Tom to install the 1.4 version and then upgrade > to 1.5. > > Based on my testing, it appears to all work correctly. Same conclusion from here. > Updated (much smaller) patch attached. I had a look at it, and it is doing the work it claims to do. So I am marking it as "Ready for Committer". -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Requesting some information about the small portion of source code of postgreSQL
Dear PostgresSQL Hackers, I am working in optimizer module of postgreSQL 9.4.1. I am trying to return a subplan for a query instead of full plan. For this I need to return an intermediate plan (or path) from the DP lattice (i.e. from *RelOptInfo *standard_join_search() *at* allpaths.c) *instead of the full optimal plan (which is from the last level of *root->join_rel_level()).* while doing so I am getting *error :variable not found in subplan target lists* at function *fix_join_expr_mutator* at *setrefs.c.* It will be great if you can give me some idea about why this error is happening, since this error is happening at *fix_join_expr_mutator* function at *setrefs.c.* Please give me more information about this portion of code. Also I would like to know for what targetlist stands for. Thanks, Srinivas Karthik
Re: [HACKERS] patch: function xmltable
2016-09-27 5:53 GMT+02:00 Craig Ringer : > On 24 September 2016 at 14:01, Pavel Stehule > wrote: > > >> Did some docs copy-editing and integrated some examples. Explained how > >> nested elements work, that multiple top level elements is an error, > >> etc. Explained the time-of-evaluation stuff. Pointed out that you can > >> refer to prior output columns in PATH and DEFAULT, since that's weird > >> and unusual compared to normal SQL. Documented handling of multiple > >> node matches, including the surprising results of somepath/text() on > >> xy. Documented handling of nested > >> elements. Documented that xmltable works only on XML documents, not > >> fragments/forests. > > > > > > I don't understand to this sentence: "It is possible for a PATH > expression > > to reference output columns that appear before it in the column-list, so > > paths may be dynamically constructed based on other parts of the XML > > document:" > > This was based on a misunderstanding of something you said earlier. I > thought the idea was to allow this to work: > > SELECT * FROM xmltable('/x' PASSING > 'avalue' COLUMNS elemName text, > extractedValue text PATH elemName); > > ... but it doesn't: > > > SELECT * FROM xmltable('/x' PASSING > 'avalue' COLUMNS elemName text, > extractedValue text PATH elemName); > ERROR: column "elemname" does not exist > LINE 1: ...' COLUMNS elemName text, extractedValue text PATH elemName); > > ... so please delete that text. I thought I'd tested it but the state > of my tests dir says I just got distracted by another task at the > wrong time. > deleted Regards Pavel > > -- > Craig Ringer http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > xmltable-11.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2016/09/22 19:10, Ashutosh Bapat wrote: > On Thu, Sep 22, 2016 at 1:02 PM, Ashutosh Bapat > wrote: >> For list partitions, the ListInfo stores the index maps for values >> i.e. the index of the partition to which the value belongs. Those >> indexes are same as the indexes in partition OIDs array and come from >> the catalogs. In case a user creates two partitioned tables with >> exactly same lists for partitions but specifies them in a different >> order, the OIDs are stored in the order specified. This means that >> index array for these tables come out different. equal_list_info() >> works around that by creating an array of mappings and checks whether >> that mapping is consistent for all values. This means we will create >> the mapping as many times as equal_list_info() is called, which is >> expected to be more than the number of time >> RelationBuildPartitionDescriptor() is called. Instead, if we >> "canonicalise" the indexes so that they come out exactly same for >> similarly partitioned tables, we build the mapping only once and >> arrange OIDs accordingly. >> >> Here's patch to do that. I have ran make check with this and it didn't >> show any failure. Please consider this to be included in your next set >> of patches. Thanks. It seems like this will save quite a few cycles for future users of equal_list_info(). I will incorporate it into may patch. With this patch, the mapping is created *only once* during RelationBuildPartitionDesc() to assign canonical indexes to individual list values. The partition OID array will also be rearranged such that using the new (canonical) index instead of the old catalog-scan-order-based index will retrieve the correct partition for that value. By the way, I fixed one thinko in your patch as follows: -result->oids[i] = oids[mapping[i]]; +result->oids[mapping[i]] = oids[i]; That is, copy an OID at a given position in the original array to a *mapped position* in the result array (instead of the other way around). > The patch has an if condition as statement by itself > +if (l1->null_index != l2->null_index); > return false; > > There shouldn't be ';' at the end. It looks like in the tests you have > added the function always bails out before it reaches this statement. There is no user of equal_list_info() such that the above bug would have caused a regression test failure. Maybe a segfault crash (due to dangling pointer into partition descriptor's listinfo after the above function would unintentionally return false to equalPartitionDescs() which is in turn called by RelationClearRelation()). Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Push down more full joins in postgres_fdw
On Tue, Sep 27, 2016 at 8:48 AM, Etsuro Fujita wrote: > On 2016/09/26 20:20, Ashutosh Bapat wrote: >> >> On Mon, Sep 26, 2016 at 4:06 PM, Etsuro Fujita >> wrote: >>> >>> On 2016/09/26 18:06, Ashutosh Bapat wrote: On Mon, Sep 26, 2016 at 1:05 PM, Etsuro Fujita wrote: > > > ISTM that the use of the same RTI for subqueries in multi-levels in a > remote > SQL makes the SQL a bit difficult to read. How about using the > position > of > the join rel in join_rel_list, (more precisely, the position plus > list_length(root->parse->rtable)), instead? > > We switch to hash table to maintain the join RelOptInfos when the number of joins grows larger, where the position won't make much sense. > > >>> That's right, but we still store the joinrel into join_rel_list after >>> creating that hash table. > > >> As the list grows, it will take >> longer to locate the RelOptInfo of the given join. Doing that for >> creating an alias seems an overkill. > > > The join rel is appended to the end of the list, so I was thinking to get > the position info by list_length during postgresGetForeignJoinPaths. That's true only when the paths are being added to a newly created joinrel. But that's not true always. We may add paths with different joining order to an existing joinrel, in which case list_length would not give its position. Am I missing something? -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pageinspect: Hash index support
Peter Eisentraut wrote: > On 9/26/16 1:39 PM, Jesper Pedersen wrote: > >> - hash_metap result fields spares and mapp should be arrays of integer. > > > > B-tree and BRIN uses a 'text' field as output, so left as is. > > These fields are specific to hash, so the precedent doesn't necessarily > apply. > > >> - The data field could be of type bytea. > > > > Left as is, for same reasons as 'spares' and 'mapp'. > > Comments from others here? Why not use bytea instead of text? The BRIN pageinspect functions aren't a particularly well thought-out precedent IMO. There was practically no review of that part of the BRIN patch, and I just threw it together in the way that seemed to make the most sense at the time. If there's some reason why some other way is better for hash indexes, that should be followed rather than mimicking BRIN. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: function xmltable
On 24 September 2016 at 14:01, Pavel Stehule wrote: >> Did some docs copy-editing and integrated some examples. Explained how >> nested elements work, that multiple top level elements is an error, >> etc. Explained the time-of-evaluation stuff. Pointed out that you can >> refer to prior output columns in PATH and DEFAULT, since that's weird >> and unusual compared to normal SQL. Documented handling of multiple >> node matches, including the surprising results of somepath/text() on >> xy. Documented handling of nested >> elements. Documented that xmltable works only on XML documents, not >> fragments/forests. > > > I don't understand to this sentence: "It is possible for a PATH expression > to reference output columns that appear before it in the column-list, so > paths may be dynamically constructed based on other parts of the XML > document:" This was based on a misunderstanding of something you said earlier. I thought the idea was to allow this to work: SELECT * FROM xmltable('/x' PASSING 'avalue' COLUMNS elemName text, extractedValue text PATH elemName); ... but it doesn't: SELECT * FROM xmltable('/x' PASSING 'avalue' COLUMNS elemName text, extractedValue text PATH elemName); ERROR: column "elemname" does not exist LINE 1: ...' COLUMNS elemName text, extractedValue text PATH elemName); ... so please delete that text. I thought I'd tested it but the state of my tests dir says I just got distracted by another task at the wrong time. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: function xmltable
2016-09-27 3:34 GMT+02:00 Craig Ringer : > On 24 September 2016 at 14:01, Pavel Stehule > wrote: > > >> Did some docs copy-editing and integrated some examples. Explained how > >> nested elements work, that multiple top level elements is an error, > >> etc. Explained the time-of-evaluation stuff. Pointed out that you can > >> refer to prior output columns in PATH and DEFAULT, since that's weird > >> and unusual compared to normal SQL. Documented handling of multiple > >> node matches, including the surprising results of somepath/text() on > >> xy. Documented handling of nested > >> elements. Documented that xmltable works only on XML documents, not > >> fragments/forests. > > > > > > I don't understand to this sentence: "It is possible for a PATH > expression > > to reference output columns that appear before it in the column-list, so > > paths may be dynamically constructed based on other parts of the XML > > document:" > > > > >> The docs and tests don't seem to cover XML entities. What's the > >> behaviour there? Core XML only defines one entity, but if a schema > >> defines more how are they processed? The tests need to cover the > >> predefined entities " & ' < and > at least. > > > > > > I don't understand, what you are propose here. ?? Please, can you send > some > > examples. > > Per below - handling of DTD declarations, and the builtin > entity tests I already added tests for. > > > >> It doesn't seem to cope with internal DTDs at all (libxml2 limitation?): > >> > >> SELECT * FROM xmltable('/' PASSING $XML$ >> standalone="yes" ?> > >> >> > >> > >> ]> > >> Hello &pg;. > >> $XML$ COLUMNS foo text); > >> > >> + ERROR: invalid XML content > >> + LINE 1: SELECT * FROM xmltable('/' PASSING $XML$ ... > >> +^ > >> + DETAIL: line 2: StartTag: invalid element name > >> + >> + ^ > >> + line 3: StartTag: invalid element name > >> + > >> +^ > >> + line 4: StartTag: invalid element name > >> + > >> +^ > >> + line 6: Entity 'pg' not defined > >> + Hello &pg;. > >> +^ > >> > > > > It is rejected before XMLTABLE function call > > > > postgres=# select $XML$ > > postgres$# > postgres$# > > postgres$# > > postgres$# ]> > > postgres$# Hello &pg;. > > postgres$# $XML$::xml; > > ERROR: invalid XML content > > LINE 1: select $XML$ > >^ > > DETAIL: line 2: StartTag: invalid element name > > [snip] > > > It is disabled by default in libxml2. I found a function > > xmlSubstituteEntitiesDefault http://www.xmlsoft.org/entities.html > > http://www.xmlsoft.org/html/libxml-parser.html# > xmlSubstituteEntitiesDefault > > > > The default behave should be common for all PostgreSQL's libxml2 based > > function - and then it is different topic - maybe part for PostgreSQL > ToDo? > > But I don't remember any user requests related to this issue. > > > OK, so it's not xmltable specific. Fine by me. > > Somebody who cares can deal with it. There's clearly nobody breaking > down the walls wanting the feature. > > > I removed this tests - it is not related to XMLTABLE function, but to > > generic XML processing/validation. > > > Good plan. > > >> In +ExecEvalTableExpr, shouldn't you be using PG_ENSURE_ERROR_CLEANUP > >> instead of a PG_TRY() / PG_CATCH() block? > > > > > > If I understand to doc, the PG_ENSURE_ERROR_CLEANUP should be used, when > you > > want to catch FATAL errors (and when you want to clean shared memory). > > XMLTABLE doesn't use shared memory, and doesn't need to catch fatal > errors. > > Ok, makes sense. > > > >> I don't have the libxml knowledge or remaining brain to usefully > >> evaluate the xpath and xml specifics in xpath.c today. It does strike > >> me that the new xpath parser should probably live in its own file, > >> though. > > > > moved > > Thanks. > > > > new version is attached > > > Great. > > I'm marking this ready for committer at this point. > Thank you very much Regards Pavel > > I think the XML parser likely needs a more close reading, so I'll ping > Peter E to see if he'll have a chance to check that bit out. But by > and large I think the issues have been ironed out - in terms of > functionality, structure and clarity I think it's looking solid. > > -- > Craig Ringer http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services >
Re: [HACKERS] Push down more full joins in postgres_fdw
On 2016/09/26 20:20, Ashutosh Bapat wrote: On Mon, Sep 26, 2016 at 4:06 PM, Etsuro Fujita wrote: On 2016/09/26 18:06, Ashutosh Bapat wrote: On Mon, Sep 26, 2016 at 1:05 PM, Etsuro Fujita wrote: ISTM that the use of the same RTI for subqueries in multi-levels in a remote SQL makes the SQL a bit difficult to read. How about using the position of the join rel in join_rel_list, (more precisely, the position plus list_length(root->parse->rtable)), instead? We switch to hash table to maintain the join RelOptInfos when the number of joins grows larger, where the position won't make much sense. That's right, but we still store the joinrel into join_rel_list after creating that hash table. As the list grows, it will take longer to locate the RelOptInfo of the given join. Doing that for creating an alias seems an overkill. The join rel is appended to the end of the list, so I was thinking to get the position info by list_length during postgresGetForeignJoinPaths. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pageinspect: Hash index support
On 9/26/16 1:39 PM, Jesper Pedersen wrote: > Left as is, since BuildTupleFromCStrings() vs. xyzGetDatum() are equally > readable in this case. But, I can change the patch if needed. The point is that to use BuildTupleFromCStrings() you need to convert numbers to strings, and then they are converted back. This is not a typical way to write row-returning functions. >> - hash_metap result fields spares and mapp should be arrays of integer. > > B-tree and BRIN uses a 'text' field as output, so left as is. These fields are specific to hash, so the precedent doesn't necessarily apply. >> - The data field could be of type bytea. > > Left as is, for same reasons as 'spares' and 'mapp'. Comments from others here? Why not use bytea instead of text? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)
On Tue, Sep 27, 2016 at 11:16 AM, Peter Eisentraut wrote: > On 9/23/16 11:15 AM, Michael Paquier wrote: >>> 0002: >>> > >>> > durable_rename needs close(fd) in non-error path (compare backend code). >> Oops, leak. > > Why did you remove the errno save and restore? That's this bit in durable_rename, patch 0002: +if (fsync(fd) != 0) +{ +fprintf(stderr, _("%s: could not fsync file \"%s\": %s\n"), +progname, newfile, strerror(errno)); +close(fd); +return -1; +} +close(fd); I thought that as long as the error string is shown to the user, it does not matter much if errno is still saved or not. All the callers of durable_rename() on frontends don't check for strerrno(errno) afterwards. Do you think it matters? Changing that back is trivial. Sorry for not mentioning directly in the thread that I modified that when dropping the last patch set. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol
On Mon, Sep 26, 2016 at 9:22 PM, David Steele wrote: > On 9/26/16 4:54 AM, Heikki Linnakangas wrote: >> Hmm. The server could send a SCRAM challenge first, and if the client >> gives an incorrect response, or the username doesn't exist, or the >> user's password is actually MD5-encrypted, the server could then send an >> MD5 challenge. It would add one round-trip to the authentication of MD5 >> passwords, but that seems acceptable. I don't think that this applies just to md5 or scram. Could we for example use a connection parameter, like expected_auth_methods to do that? We include that in the startup packet if the caller has defined it, then the backend checks for matching entries in pg_hba.conf using the username, database and the expected auth method if specified. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)
On 9/23/16 11:15 AM, Michael Paquier wrote: >> 0002: >> > >> > durable_rename needs close(fd) in non-error path (compare backend code). > Oops, leak. Why did you remove the errno save and restore? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
Hi, Attached is v2 of the patch, updated based on the review. That means: - Better comment explaining how free chunks are tracked in Slab context. - Removed the unused SlabPointerIsValid macro. - Modified the comment before SlabChunkData, explaining how it relates to StandardChunkHeader. - Replaced the two Assert() calls with elog(). - Implemented SlabCheck(). I've ended up with quite a few checks there, checking pointers between the context, block and chunks, checks due to MEMORY_CONTEXT_CHECKING etc. And of course, cross-checking the number of free chunks (bitmap, freelist vs. chunk header). - I've also modified SlabContextCreate() to compute chunksPerBlock a bit more efficiently (use a simple formula instead of the loop, which might be a bit too expensive for large blocks / small chunks). I haven't done any changes to GenSlab, but I do have a few notes: Firstly, I've realized there's an issue when chunkSize gets too large - once it exceeds blockSize, the SlabContextCreate() fails as it's impossible to place a single chunk into the block. In reorderbuffer, this may happen when the tuples (allocated in tup_context) get larger than 8MB, as the context uses SLAB_LARGE_BLOCK_SIZE (which is 8MB). For Slab the elog(ERROR) is fine as both parameters are controlled by the developer directly, but GenSlab computes the chunkSize on the fly, so we must not let it fail like that - that'd result in unpredictable failures, which is not very nice. I see two ways to fix this. We may either increase the block size automatically - e.g. instead of specifying specifying chunkSize and blockSize when creating the Slab, specify chunkSize and chunksPerBlock (and then choose the smallest 2^k block large enough). For example with chunkSize=96 and chunksPerBlock=1000, we'd get 128kB blocks, as that's the closest 2^k block larger than 96000 bytes. But maybe there's a simpler solution - we may simply cap the chunkSize (in GenSlab) to ALLOC_CHUNK_LIMIT. That's fine, because AllocSet handles those requests in a special way - for example instead of tracking them in freelist, those chunks got freed immediately. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 0001-simple-slab-allocator-fixed-size-allocations.patch Description: binary/octet-stream 0002-generational-slab-auto-tuning-allocator.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Stopping logical replication protocol
On 26 September 2016 at 21:52, Vladimir Gordiychuk wrote: >>You should rely on time I tests as little as possible. Some of the test >> hosts are VERY slow. If possible something deterministic should be used. > > That's why this changes difficult to verify. Maybe in that case we should > write benchmark but not integration test? > In that case we can say, before this changes stopping logical replication > gets N ms but after apply changes it gets NN ms where NN ms less than N ms. > Is it available add this kind of benchmark to postgresql? I will be grateful > for links. It's for that reason that I added a message printed only in verbose mode that pg_recvlogical emits when it's exiting after a client-initiated copydone. You can use the TAP tests, print diag messages, etc. But we generally want them to run fairly quickly, so big benchmark runs aren't desirable. You'll notice that I left diag messages in to report the timing for the results in your tests, I just changed the tests so they didn't depend on very tight timing for success/failure anymore. We don't currently have any automated benchmarking infrastructure. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Detect supported SET parameters when pg_restore is run
Hackers, At work we use several major versions of PostgreSQL, and developers use non-local clusters for developing and debugging. We do dump/restore schemas/data via custom/dir formats and we have to keep several client versions for 9.2, 9.4 and 9.5 versions on local workstations because after pg_restore95 connects to 9.2, it fails when it sets run-time parameters via SET: vitaly@work 01:21:26 ~ $ pg_restore95 --host DEV_HOST_9_2 -d DBNAME --data-only -e -1 -Fc arhcivefile Password: pg_restore95: [archiver (db)] Error while INITIALIZING: pg_restore95: [archiver (db)] could not execute query: ERROR: unrecognized configuration parameter "lock_timeout" Command was: SET lock_timeout = 0; Of course, it can be fixed avoiding "--single-transaction", but if there is inconsistent schema (or stricter constraints) part of schema/data is already changed/inserted and a lot of errors are generated for the next pg_restore run. The pd_dump has checks in "setup_connection" function to detect what to send after connection is done for dumping, but there is no checks in _doSetFixedOutputState for restoring. If there are checks it is possible to use a single version pg_dump96/pg_restore96 to dump/restore, for example 9.2->9.2 as well as 9.4->9.4 and so on. The only trouble we have is in "SET" block and after some research I discovered it is possible not to send unsupported SET options to the database. Please, find attached simple patch. For restoring to stdout (or dumping to a plain SQL file) I left current behavior: all options in the SET block are written. Also I left "SET row_security = on;" if "enable_row_security" is set to break restoring to a DB non-supported version. -- Best regards, Vitaly Burovoy detect_supported_set_parameters_for_pgrestore.001.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench - allow to store select results into variables
On 2016/09/26 20:27, Fabien COELHO wrote: > > Hello Amit, > >>> I am marking the pgbench-into-5.patch [1] as "Ready for Committer" as I >>> have no further comments at the moment. >> >> Wait... Heikki's latest commit now requires this patch to be rebased. > > Indeed. Here is the rebased version, which still get through my various > tests. Thanks, Fabien. It seems to work here too. Regards, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: function xmltable
On 24 September 2016 at 14:01, Pavel Stehule wrote: >> Did some docs copy-editing and integrated some examples. Explained how >> nested elements work, that multiple top level elements is an error, >> etc. Explained the time-of-evaluation stuff. Pointed out that you can >> refer to prior output columns in PATH and DEFAULT, since that's weird >> and unusual compared to normal SQL. Documented handling of multiple >> node matches, including the surprising results of somepath/text() on >> xy. Documented handling of nested >> elements. Documented that xmltable works only on XML documents, not >> fragments/forests. > > > I don't understand to this sentence: "It is possible for a PATH expression > to reference output columns that appear before it in the column-list, so > paths may be dynamically constructed based on other parts of the XML > document:" >> The docs and tests don't seem to cover XML entities. What's the >> behaviour there? Core XML only defines one entity, but if a schema >> defines more how are they processed? The tests need to cover the >> predefined entities " & ' < and > at least. > > > I don't understand, what you are propose here. ?? Please, can you send some > examples. Per below - handling of DTD declarations, and the builtin entity tests I already added tests for. >> It doesn't seem to cope with internal DTDs at all (libxml2 limitation?): >> >> SELECT * FROM xmltable('/' PASSING $XML$> standalone="yes" ?> >> > >> >> ]> >> Hello &pg;. >> $XML$ COLUMNS foo text); >> >> + ERROR: invalid XML content >> + LINE 1: SELECT * FROM xmltable('/' PASSING $XML$> +^ >> + DETAIL: line 2: StartTag: invalid element name >> + > + ^ >> + line 3: StartTag: invalid element name >> + >> +^ >> + line 4: StartTag: invalid element name >> + >> +^ >> + line 6: Entity 'pg' not defined >> + Hello &pg;. >> +^ >> > > It is rejected before XMLTABLE function call > > postgres=# select $XML$ > postgres$# postgres$# > postgres$# > postgres$# ]> > postgres$# Hello &pg;. > postgres$# $XML$::xml; > ERROR: invalid XML content > LINE 1: select $XML$ >^ > DETAIL: line 2: StartTag: invalid element name > It is disabled by default in libxml2. I found a function > xmlSubstituteEntitiesDefault http://www.xmlsoft.org/entities.html > http://www.xmlsoft.org/html/libxml-parser.html#xmlSubstituteEntitiesDefault > > The default behave should be common for all PostgreSQL's libxml2 based > function - and then it is different topic - maybe part for PostgreSQL ToDo? > But I don't remember any user requests related to this issue. OK, so it's not xmltable specific. Fine by me. Somebody who cares can deal with it. There's clearly nobody breaking down the walls wanting the feature. > I removed this tests - it is not related to XMLTABLE function, but to > generic XML processing/validation. Good plan. >> In +ExecEvalTableExpr, shouldn't you be using PG_ENSURE_ERROR_CLEANUP >> instead of a PG_TRY() / PG_CATCH() block? > > > If I understand to doc, the PG_ENSURE_ERROR_CLEANUP should be used, when you > want to catch FATAL errors (and when you want to clean shared memory). > XMLTABLE doesn't use shared memory, and doesn't need to catch fatal errors. Ok, makes sense. >> I don't have the libxml knowledge or remaining brain to usefully >> evaluate the xpath and xml specifics in xpath.c today. It does strike >> me that the new xpath parser should probably live in its own file, >> though. > > moved Thanks. > new version is attached Great. I'm marking this ready for committer at this point. I think the XML parser likely needs a more close reading, so I'll ping Peter E to see if he'll have a chance to check that bit out. But by and large I think the issues have been ironed out - in terms of functionality, structure and clarity I think it's looking solid. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)
On 21 September 2016 at 22:16, Robert Haas wrote: > On Wed, Sep 21, 2016 at 3:40 AM, Craig Ringer wrote: >> The only non-horrible one of those is IMO to just let the caller see >> an error if they lose the race. It's a function that's intended for >> use when you're dealing with indeterminate transaction state after a >> server or application error anyway, so it's part of an error path >> already. So I say we just document the behaviour. > > I am not keen on that idea. The errors we're likely to be exposing > are going to look like low-level internal failures, which might scare > some DBA. Even if they don't, I think it's playing with fire. The > system is designed on the assumption that nobody will try to look up > an XID that's too old, and if we start to violate that assumption I > think we're undermining the design integrity of the system in a way > we'll likely come to regret. To put that more plainly, when the code > is written with the assumption that X will never happen, it's usually > a bad idea to casually add code that does X. Fair point. > [snip] > > It might not be too hard to add a second copy of oldestXid in shared > memory that is updated before truncation rather than afterward... but > yeah, like you, I'm not finding this nearly as straightforward as > might have been hoped. Yeah. I suspect that'll be the way to go, to add another copy that's updated before clog truncation. It just seems ... unclean. Like it shouldn't be necessary, something else isn't right. But it's probably the lowest pain option. I'm going to take a step back on this and see if I can spot an alternative. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: pg_ctl: Detect current standby state from pg_control
On Tue, Sep 27, 2016 at 9:45 AM, Peter Eisentraut wrote: > On 9/26/16 7:56 PM, Peter Eisentraut wrote: >> On 9/26/16 8:53 AM, Tom Lane wrote: >>> I think that it's 100% pointless for get_control_dbstate >>> to be worried about transient CRC failures. If writes to pg_control >>> aren't atomic then we have problems enormously larger than whether >>> "pg_ctl promote" throws an error or not. >> >> The new code was designed to handle that, but if we think it's not an >> issue, then we can simplify it. I'm on it. > > How about this? + if (crc_ok_p) + *crc_ok_p = false; + else + { + pfree(ControlFile); + return NULL; + } Seems overcomplicated to me. How about returning the control file all the time and let the caller pfree the result? You could then use crc_ok in pg_ctl.c's get_control_dbstate() to do the decision-making. Also, if the sleep() loop is removed for promote -w, we assume that we'd fail immediately in case of a corrupted control file. Could it be possible that after a couple of seconds the backend gets it right and overwrites a correct control file on top of a corrupted one? I am just curious about such possibilities, still I am +1 for removing the pg_usleep loop and failing right away as you propose. If we do the renaming of get_controlfile(), it may be also a good idea to do the same for get_configdata() in config_info.c for consistency. That's not really critical IMHO though. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: pg_ctl: Detect current standby state from pg_control
On 9/26/16 7:56 PM, Peter Eisentraut wrote: > On 9/26/16 8:53 AM, Tom Lane wrote: >> I think that it's 100% pointless for get_control_dbstate >> to be worried about transient CRC failures. If writes to pg_control >> aren't atomic then we have problems enormously larger than whether >> "pg_ctl promote" throws an error or not. > > The new code was designed to handle that, but if we think it's not an > issue, then we can simplify it. I'm on it. How about this? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 0001-Fix-CRC-check-handling-in-get_controlfile.patch Description: invalid/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wal_segment size vs max_wal_size
On Mon, Sep 26, 2016 at 9:30 PM, Kuntal Ghosh wrote: > On Mon, Sep 26, 2016 at 5:04 PM, Amit Kapila wrote: >> >> IIRC, there is already a patch to update the minRecoveryPoint >> correctly, can you check if that solves the problem for you? >> >> [1] - >> https://www.postgresql.org/message-id/20160609.215558.118976703.horiguchi.kyotaro%40lab.ntt.co.jp >> > +1. I've tested after applying the patch. This clearly solves the problem. Even if many things have been discussed on this thread, Horiguchi-san's first patch is still the best approach found after several lookups and attempts when messing with the recovery code. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speedup twophase transactions
On Thu, Sep 22, 2016 at 12:30 AM, Stas Kelvich wrote: > I’m not giving up yet, i’ll write them) I still have in mind several other > patches to 2pc handling in > postgres during this release cycle — logical decoding and partitioned hash > instead of > TwoPhaseState list. > > My bet that relative speed of that patches will depend on used filesystem. > Like it was with the > first patch in that mail thread it is totally possible sometimes to hit > filesystem limits on file > creation speed. Otherwise both approaches should be more or less equal, i > suppose. OK. I am marking this patch as returned with feedback then. Looking forward to seeing the next investigations.. At least this review has taught us one thing or two. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VACUUM's ancillary tasks
On Tue, Sep 27, 2016 at 2:33 AM, Tom Lane wrote: > Thomas Munro writes: >> I noticed that ATExecAlterColumnType does this: >> * Drop any pg_statistic entry for the column, since it's now wrong type > >> What if there is no rewrite, because the type is binary compatible >> (ATColumnChangeRequiresRewrite returns false)? Then I think your patch >> won't attract an autovacuum daemon to ANALYZE the table and produce new >> statistics, because it won't touch changes_since_analyze. I wonder if we >> should simply keep the stats if no rewrite is required. Aren't the >> statistical properties of binary-compatible types also compatible? > > Not necessarily: the type semantics might be different --- in fact, > probably are different, else why would there be distinct types in the > first place? It would be unwise to keep the old stats IMO. > > If you need a concrete example, consider OID vs int4. They're > binary-compatible, but since int4 is signed while OID is unsigned, > stats for one would be wrong for the other. This is the same reason > why ALTER COLUMN TYPE has to rebuild indexes even in binary-compatible > cases. Ah, right. Then I think this patch should somehow bump changes_since_analyze in the no-rewrite case if it's going to do it in the rewrite case. It would be surprising and weird if altering a column's type *sometimes* resulted in new statistics being automatically generated to replace those that were dropped, depending on the technical detail of whether a rewrite was necessary. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add support for restrictive RLS policies
Stephen Frost wrote: > * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: > > Stephen Frost wrote: > > > + > > > + If the policy is a "permissive" or "restrictive" policy. > > > Permissive > > > + policies are the default and always add visibillity- they ar "OR"d > > > + together to allow the user access to all rows through any of the > > > + permissive policies they have access to. Alternativly, a policy > > > can > > > + instead by "restrictive", meaning that the policy will be "AND"d > > > + with other restrictive policies and with the result of all of the > > > + permissive policies on the table. > > > + > > > + > > > + Stephen, > > I dislike the "AND"d and "OR"d spelling of those terms. Currently they > > only appear in comments within rowsecurity.c (of your authorship too, I > > imagine). I think it'd be better to find actual words for those > > actions. > > I'm certainly open to suggestions, should you, or anyone else, have > them. I'll try and come up with something else, but that really is what > we're doing- literally using either AND or OR to join the expressions > together. I understand, but the words "and" and "or" are not verbs. I don't know what are good verbs to use for this but I suppose there must be verbs related to "conjunction" and "disjunction" ("to conjoin" and "to disjoin" appear in the Merriam-Webster dictionary but I don't think they represent the operation very well). Maybe some native speaker would have a better suggestion. > > I don't understand this part. Didn't you say previously that we already > > had this capability in 9.5 and you were only exposing it over SQL? If > > that is true, how come you need to add a new column to this catalog? > > The capability exists in 9.5 through hooks which are available to > extensions, see the test_rls_hooks extension. Those hooks are called > every time and therefore don't require the information about restrictive > policies to be tracked in pg_policy, and so they weren't. Since this is > adding support for users to define restrictive policies, we need to save > those policies and therefore we need to distinguish which policies are > restrictive and which are permissive, hence the need to modify pg_policy > to track that information. Ah, okay. I thought you meant that it was already possible to create a policy to behave this way, just not using SQL-based DDL. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cache Hash Index meta page.
On Tue, Sep 13, 2016 at 12:55 PM, Mithun Cy wrote: > On Thu, Sep 8, 2016 at 11:21 PM, Jesper Pedersen < > jesper.peder...@redhat.com> wrote: >> >> > For the archives, this patch conflicts with the WAL patch [1]. >> >> > [1] https://www.postgresql.org/message-id/CAA4eK1JS%2BSiRSQBzEFp >> nsSmxZKingrRH7WNyWULJeEJSj1-%3D0w%40mail.gmail.com >> > > Updated the patch it applies over Amit's concurrent hash index[1] and > Amit's wal for hash index patch[2] together. > I think that this needs to be updated again for v8 of concurrent and v5 of wal Thanks, Jeff
Re: [HACKERS] pageinspect: Hash index support
Jeff Janes wrote: > On Tue, Sep 20, 2016 at 12:19 AM, Michael Paquier > wrote: > > > > > Note: the patch checks if a superuser is calling the new functions, > > which is a good thing. > > > > If we only have the bytea functions and the user needs to supply the raw > pages themselves, rather than having the function go get the raw page for > you, is there any reason to restrict the interpretation function to super > users? I guess if we don't trust the C coded functions not to dereference > bogus data in harmful ways? Yeah, it'd likely be simple to manufacture a fake page that causes the server to misbehave resulting in a security leak or at least DoS. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pageinspect: Hash index support
On Tue, Sep 20, 2016 at 12:19 AM, Michael Paquier wrote: > > Note: the patch checks if a superuser is calling the new functions, > which is a good thing. > If we only have the bytea functions and the user needs to supply the raw pages themselves, rather than having the function go get the raw page for you, is there any reason to restrict the interpretation function to super users? I guess if we don't trust the C coded functions not to dereference bogus data in harmful ways? Cheers, Jeff
Re: [HACKERS] pgbench more operators & functions
Hello Jeevan, I did the review of your patch and here are my views on your patch. Thanks for this detailed review and debugging! Documentation: [...] it be a good idea to have a table of operators similar to that of functions. We need not have several columns here like description, example etc., but a short table just categorizing the operators would be sufficient. Ok, done. Further testing and review: === 1. Postgres treats '^' as exponentiation rather than XOR, and '#' as XOR. Personally, I think it can cause confusion, so it will be better if we can stick to the behavior of Postgres mathematical operators. Ok. I agree to avoid '^'. 2. I could not see any tests for bitwise operators in the functions.sql file that you have attached. Indeed. Included in attached version. 3. Precedence: [...] Hmm. I got them all wrong, shame on me! I've followed C rules in the updated version. 5. Sorry, I was not able to understand the "should exist" comment in following snippet. +"xor" { return XOR_OP; } /* should exist */ +"^^" { return XOR_OP; } /* should exist */ There is no "logical exclusive or" operator in C nor in SQL. I do not see why not, so I put one... 7. You may want to reword following comment: [...] there -> here Ok, fixed twice. 8. [...] if (coerceToBool(&vargs[0])) *retval = vargs[1]; else *retval = vargs[2]; Ok. Attached is an updated patch & test script. -- Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index 285608d..c958c1c 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -821,9 +821,8 @@ pgbench options dbname The expression may contain integer constants such as 5432, double constants such as 3.14159, references to variables :variablename, - unary operators (+, -) and binary operators - (+, -, *, /, - %) with their usual precedence and associativity, + operators + with their usual precedence and associativity, function calls, and parentheses. @@ -909,6 +908,84 @@ pgbench options dbname + + Built-In Operators + + + The arithmetic, bitwise, comparison and logical operators listed in +are built into pgbench + and may be used in expressions appearing in + \set. + + + + pgbench Operators + + + + Operator Category + Result Type + List of Operators + + + + + unary arithmetic + integer or double + +, - + + + binary arithmetic + integer or double + +, -, *, / + + + binary arithmetic + integer only + % + + + unary bitwise + integer + ~ + + + binary bitwise + integer + &, |, # + + + shifts + integer + <<, >> + + + comparison + boolean (0/1) + + =/==, <>/!=, + <, <=, >, >= + + + + unary logical + boolean (0/1) + not/! + + + binary logical + boolean (0/1) + + and/&&, + or/||, + xor/^^, + + + + + + + Built-In Functions @@ -955,6 +1032,13 @@ pgbench options dbname 5432.0 + exp(x) + double + exponential + exp(1.0) + 2.718281828459045 + + greatest(a [, ... ] ) double if any a is double, else integer largest value among arguments @@ -962,6 +1046,13 @@ pgbench options dbname 5 + if(c,e1,e2) + same as e* + if c is not zero then e1 else e2 + if(0,1.0,2.0) + 2.0 + + int(x) integer cast to int @@ -976,6 +1067,13 @@ pgbench options dbname 2.1 + ln(x) + double + natural logarithm + ln(2.718281828459045) + 1.0 + + pi() double value of the constant PI diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y index 0cc665b..f8dbbaf 100644 --- a/src/bin/pgbench/exprparse.y +++ b/src/bin/pgbench/exprparse.y @@ -52,11 +52,21 @@ static PgBenchExpr *make_func(yyscan_t yyscanner, int fnumber, PgBenchExprList * %type VARIABLE FUNCTION %token INTEGER_CONST DOUBLE_CONST VARIABLE FUNCTION - -/* Precedence: lowest to highest */ +%token AND_OP OR_OP XOR_OP NE_OP LE_OP GE_OP LS_OP RS_OP + +/* Precedence: lowest to highest, taken from C */ +%left OR_OP +%left XOR_OP +%left AND_OP +%left '|' +%left '#' +%left '&' +%left '=' NE_OP +%left '<' '>' LE_OP GE_OP +%left LS_OP RS_OP %left '+' '-' %left '*' '/' '%' -%right UMINUS +%right UNARY %% @@ -68,14 +78,32 @@ elist: { $$ = NULL; } ; expr: '(' expr ')' { $$ = $2; } - | '+' expr %prec UMINUS { $$ = $2; } - | '-' expr %prec UMINUS { $$ = make_op(yyscanner, "-", + | '+' expr %prec UNARY { $$ = $2; } + | '-' expr %prec UNAR
Re: [HACKERS] Add support for restrictive RLS policies
Alvaro, * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: > Stephen Frost wrote: > > Stephen, the typo "awseome" in the tests is a bit distracting. Can you > please fix it? Will fix. > > Updated patch changes to using IDENT and an strcmp() (similar to > > AlterOptRoleElem and vacuum_option_elem) to check the results at parse-time, > > and then throwing a more specific error if an unexpected value is found > > (instead of just 'syntax error'). This avoids adding any keywords. > > Thanks. No problem. > > diff --git a/doc/src/sgml/ref/create_policy.sgml > > b/doc/src/sgml/ref/create_policy.sgml > > index 89d2787..d930052 100644 > > --- a/doc/src/sgml/ref/create_policy.sgml > > +++ b/doc/src/sgml/ref/create_policy.sgml > > @@ -22,6 +22,7 @@ PostgreSQL documentation > > > > > > CREATE POLICY name ON > > table_name > > +[ AS ( PERMISSIVE | RESTRICTIVE ) ] > > I think you should use braces here, not parens: > > [ AS { PERMISSIVE | RESTRICTIVE } ] Will fix. > > > > +permissive > > + > > + > > + If the policy is a "permissive" or "restrictive" policy. Permissive > > + policies are the default and always add visibillity- they ar "OR"d > > + together to allow the user access to all rows through any of the > > + permissive policies they have access to. Alternativly, a policy can > > + instead by "restrictive", meaning that the policy will be "AND"d > > + with other restrictive policies and with the result of all of the > > + permissive policies on the table. > > + > > + > > + > > I don't think this paragraph is right -- you should call out each of the > values PERMISSIVE and RESTRICTIVE (in upper case) instead. Also note > typos "Alternativly" and "visibillity". Will fix, along with the 'ar' typo. > I dislike the "AND"d and "OR"d spelling of those terms. Currently they > only appear in comments within rowsecurity.c (of your authorship too, I > imagine). I think it'd be better to find actual words for those > actions. I'm certainly open to suggestions, should you, or anyone else, have them. I'll try and come up with something else, but that really is what we're doing- literally using either AND or OR to join the expressions together. > > diff --git a/src/include/catalog/pg_policy.h > > b/src/include/catalog/pg_policy.h > > index d73e9c2..30dc367 100644 > > --- a/src/include/catalog/pg_policy.h > > +++ b/src/include/catalog/pg_policy.h > > @@ -23,6 +23,7 @@ CATALOG(pg_policy,3256) > > NameDatapolname;/* Policy name. */ > > Oid polrelid; /* Oid of the relation > > with policy. */ > > charpolcmd; /* One of ACL_*_CHR, or '*' for > > all */ > > + boolpolpermissive; /* restrictive or permissive policy */ > > > > #ifdef CATALOG_VARLEN > > Oid polroles[1];/* Roles associated with > > policy, not-NULL */ > > @@ -42,12 +43,13 @@ typedef FormData_pg_policy *Form_pg_policy; > > * compiler constants for pg_policy > > * > > */ > > -#define Natts_pg_policy6 > > -#define Anum_pg_policy_polname 1 > > -#define Anum_pg_policy_polrelid2 > > -#define Anum_pg_policy_polcmd 3 > > -#define Anum_pg_policy_polroles4 > > -#define Anum_pg_policy_polqual 5 > > -#define Anum_pg_policy_polwithcheck 6 > > +#define Natts_pg_policy6 > > +#define Anum_pg_policy_polname 1 > > +#define Anum_pg_policy_polrelid2 > > +#define Anum_pg_policy_polcmd 3 > > +#define Anum_pg_policy_polpermissive 4 > > +#define Anum_pg_policy_polroles5 > > +#define Anum_pg_policy_polqual 6 > > +#define Anum_pg_policy_polwithcheck7 > > I don't understand this part. Didn't you say previously that we already > had this capability in 9.5 and you were only exposing it over SQL? If > that is true, how come you need to add a new column to this catalog? The capability exists in 9.5 through hooks which are available to extensions, see the test_rls_hooks extension. Those hooks are called every time and therefore don't require the information about restrictive policies to be tracked in pg_policy, and so they weren't. Since this is adding support for users to define restrictive policies, we need to save those policies and therefore we need to distinguish which policies are restrictive and which are permissive, hence the need to modify pg_policy to track that information. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Add support for restrictive RLS policies
Stephen Frost wrote: Stephen, the typo "awseome" in the tests is a bit distracting. Can you please fix it? > Updated patch changes to using IDENT and an strcmp() (similar to > AlterOptRoleElem and vacuum_option_elem) to check the results at parse-time, > and then throwing a more specific error if an unexpected value is found > (instead of just 'syntax error'). This avoids adding any keywords. Thanks. > diff --git a/doc/src/sgml/ref/create_policy.sgml > b/doc/src/sgml/ref/create_policy.sgml > index 89d2787..d930052 100644 > --- a/doc/src/sgml/ref/create_policy.sgml > +++ b/doc/src/sgml/ref/create_policy.sgml > @@ -22,6 +22,7 @@ PostgreSQL documentation > > > CREATE POLICY name ON > table_name > +[ AS ( PERMISSIVE | RESTRICTIVE ) ] I think you should use braces here, not parens: [ AS { PERMISSIVE | RESTRICTIVE } ] > > +permissive > + > + > + If the policy is a "permissive" or "restrictive" policy. Permissive > + policies are the default and always add visibillity- they ar "OR"d > + together to allow the user access to all rows through any of the > + permissive policies they have access to. Alternativly, a policy can > + instead by "restrictive", meaning that the policy will be "AND"d > + with other restrictive policies and with the result of all of the > + permissive policies on the table. > + > + > + I don't think this paragraph is right -- you should call out each of the values PERMISSIVE and RESTRICTIVE (in upper case) instead. Also note typos "Alternativly" and "visibillity". I dislike the "AND"d and "OR"d spelling of those terms. Currently they only appear in comments within rowsecurity.c (of your authorship too, I imagine). I think it'd be better to find actual words for those actions. > diff --git a/src/include/catalog/pg_policy.h b/src/include/catalog/pg_policy.h > index d73e9c2..30dc367 100644 > --- a/src/include/catalog/pg_policy.h > +++ b/src/include/catalog/pg_policy.h > @@ -23,6 +23,7 @@ CATALOG(pg_policy,3256) > NameDatapolname;/* Policy name. */ > Oid polrelid; /* Oid of the relation > with policy. */ > charpolcmd; /* One of ACL_*_CHR, or '*' for > all */ > + boolpolpermissive; /* restrictive or permissive policy */ > > #ifdef CATALOG_VARLEN > Oid polroles[1];/* Roles associated with > policy, not-NULL */ > @@ -42,12 +43,13 @@ typedef FormData_pg_policy *Form_pg_policy; > * compiler constants for pg_policy > * > */ > -#define Natts_pg_policy 6 > -#define Anum_pg_policy_polname 1 > -#define Anum_pg_policy_polrelid 2 > -#define Anum_pg_policy_polcmd3 > -#define Anum_pg_policy_polroles 4 > -#define Anum_pg_policy_polqual 5 > -#define Anum_pg_policy_polwithcheck 6 > +#define Natts_pg_policy 6 > +#define Anum_pg_policy_polname 1 > +#define Anum_pg_policy_polrelid 2 > +#define Anum_pg_policy_polcmd3 > +#define Anum_pg_policy_polpermissive 4 > +#define Anum_pg_policy_polroles 5 > +#define Anum_pg_policy_polqual 6 > +#define Anum_pg_policy_polwithcheck 7 I don't understand this part. Didn't you say previously that we already had this capability in 9.5 and you were only exposing it over SQL? If that is true, how come you need to add a new column to this catalog? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove superuser() checks from pgstattuple
Peter, all, * Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote: > This is now being obsoleted by the later idea of allowing base installs > from a chain of upgrade scripts. But if your upgrade scripts contain > ALTER TABLE commands, you will probably still want to write base install > scripts that do a fresh CREATE TABLE instead. I've updated the patch to remove the new base version script and to rely on the changes made by Tom to install the 1.4 version and then upgrade to 1.5. Based on my testing, it appears to all work correctly. Updated (much smaller) patch attached. Thanks! Stephen From 508a4db0111607d8d59eef4a8e172f8a7aa8fdfa Mon Sep 17 00:00:00 2001 From: Stephen Frost Date: Tue, 23 Aug 2016 17:13:09 -0400 Subject: [PATCH] Remove superuser checks in pgstattuple Now that we track initial privileges on extension objects and changes to those permissions, we can drop the superuser() checks from the various functions which are part of the pgstattuple extension. Since a pg_upgrade will preserve the version of the extension which existed prior to the upgrade, we can't simply modify the existing functions but instead need to create new functions which remove the checks and update the SQL-level functions to use the new functions (and to REVOKE EXECUTE rights on those functions from PUBLIC). Approach suggested by Noah. --- contrib/pgstattuple/Makefile | 7 +- contrib/pgstattuple/pgstatapprox.c| 35 ++-- contrib/pgstattuple/pgstatindex.c | 108 +++-- contrib/pgstattuple/pgstattuple--1.4--1.5.sql | 111 ++ contrib/pgstattuple/pgstattuple.c | 36 + contrib/pgstattuple/pgstattuple.control | 2 +- 6 files changed, 284 insertions(+), 15 deletions(-) create mode 100644 contrib/pgstattuple/pgstattuple--1.4--1.5.sql diff --git a/contrib/pgstattuple/Makefile b/contrib/pgstattuple/Makefile index e732680..294077d 100644 --- a/contrib/pgstattuple/Makefile +++ b/contrib/pgstattuple/Makefile @@ -4,9 +4,10 @@ MODULE_big = pgstattuple OBJS = pgstattuple.o pgstatindex.o pgstatapprox.o $(WIN32RES) EXTENSION = pgstattuple -DATA = pgstattuple--1.4.sql pgstattuple--1.3--1.4.sql \ - pgstattuple--1.2--1.3.sql pgstattuple--1.1--1.2.sql \ - pgstattuple--1.0--1.1.sql pgstattuple--unpackaged--1.0.sql +DATA = pgstattuple--1.4.sql pgstattuple--1.4--1.5.sql \ + pgstattuple--1.3--1.4.sql pgstattuple--1.2--1.3.sql \ + pgstattuple--1.1--1.2.sql pgstattuple--1.0--1.1.sql \ + pgstattuple--unpackaged--1.0.sql PGFILEDESC = "pgstattuple - tuple-level statistics" REGRESS = pgstattuple diff --git a/contrib/pgstattuple/pgstatapprox.c b/contrib/pgstattuple/pgstatapprox.c index a49ff54..11bae14 100644 --- a/contrib/pgstattuple/pgstatapprox.c +++ b/contrib/pgstattuple/pgstatapprox.c @@ -29,6 +29,9 @@ #include "commands/vacuum.h" PG_FUNCTION_INFO_V1(pgstattuple_approx); +PG_FUNCTION_INFO_V1(pgstattuple_approx_v1_5); + +Datum pgstattuple_approx_internal(Oid relid, FunctionCallInfo fcinfo); typedef struct output_type { @@ -209,6 +212,33 @@ Datum pgstattuple_approx(PG_FUNCTION_ARGS) { Oid relid = PG_GETARG_OID(0); + + if (!superuser()) + ereport(ERROR, +(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), + (errmsg("must be superuser to use pgstattuple functions"; + + PG_RETURN_DATUM(pgstattuple_approx_internal(relid, fcinfo)); +} + +/* + * As of pgstattuple version 1.5, we no longer need to check if the user + * is a superuser because we REVOKE EXECUTE on the function from PUBLIC. + * Users can then grant access to it based on their policies. + * + * Otherwise identical to pgstattuple_approx (above). + */ +Datum +pgstattuple_approx_v1_5(PG_FUNCTION_ARGS) +{ + Oid relid = PG_GETARG_OID(0); + + PG_RETURN_DATUM(pgstattuple_approx_internal(relid, fcinfo)); +} + +Datum +pgstattuple_approx_internal(Oid relid, FunctionCallInfo fcinfo) +{ Relation rel; output_type stat = {0}; TupleDesc tupdesc; @@ -217,11 +247,6 @@ pgstattuple_approx(PG_FUNCTION_ARGS) HeapTuple ret; int i = 0; - if (!superuser()) - ereport(ERROR, -(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), - (errmsg("must be superuser to use pgstattuple functions"; - if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE) elog(ERROR, "return type must be a row type"); diff --git a/contrib/pgstattuple/pgstatindex.c b/contrib/pgstattuple/pgstatindex.c index 6084589..753e52d 100644 --- a/contrib/pgstattuple/pgstatindex.c +++ b/contrib/pgstattuple/pgstatindex.c @@ -54,6 +54,14 @@ PG_FUNCTION_INFO_V1(pg_relpages); PG_FUNCTION_INFO_V1(pg_relpagesbyid); PG_FUNCTION_INFO_V1(pgstatginindex); +PG_FUNCTION_INFO_V1(pgstatindex_v1_5); +PG_FUNCTION_INFO_V1(pgstatindexbyid_v1_5); +PG_FUNCTION_INFO_V1(pg_relpages_v1_5); +PG_FUNCTION_INFO_V1(pg_relpagesbyid_v1_5); +PG_FUNCTION_INFO_V1(pgstatginindex_v1_5); + +Datum pgstatginindex_internal(Oid relid, FunctionCallInfo fcinfo); + #define IS_INDEX(r) ((r)->rd_rel-
Re: [HACKERS] proposal: psql \setfileref
2016-09-26 21:47 GMT+02:00 Ryan Murphy : > > >> please, can you check attached patch? It worked in my laptop. >> >> Regards >> >> Pavel >> >> > Yes, that one applied for me without any problems. > Great, Thank you Pavel > > Thanks, > Ryan >
Re: [HACKERS] proposal: psql \setfileref
> > please, can you check attached patch? It worked in my laptop. > > Regards > > Pavel > > Yes, that one applied for me without any problems. Thanks, Ryan
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Mon, Sep 26, 2016 at 6:58 PM, Robert Haas wrote: >> That requires some kind of mutual exclusion mechanism, like an LWLock. > > No, it doesn't. Shared memory queues are single-reader, single-writer. The point is that there is a natural dependency when merging is performed eagerly within the leader. One thing needs to be in lockstep with the others. That's all. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] inconsistent behaviour of set-returning functions in sub-query with random()
Tom van Tilburg writes: > I'm often using the WHERE clause random() > 0.5 to pick a random subset of > my data. Now I noticed that when using a set-returning function in a > sub-query, I either get the whole set or none (meaning that the WHERE > random() > 0.5 clause is interpreted *before* the set is being generated). > e.g.: > > SELECT num FROM ( > SELECT unnest(Array[1,2,3,4,5,6,7,8,9,10]) num) AS foo WHERE random() > > 0.5; Hmm, I think this is an optimizer bug. There are two legitimate behaviors here: SELECT * FROM unnest(ARRAY[1,2,3,4,5,6,7,8,9,10]) WHERE random() > 0.5; should (and does) re-evaluate the WHERE for every row output by unnest(). SELECT unnest(ARRAY[1,2,3,4,5,6,7,8,9,10]) WHERE random() > 0.5; should evaluate WHERE only once, since that happens before expansion of the set-returning function in the targetlist. (If you're an Oracle user and you imagine this query as having an implicit "FROM dual", the WHERE should be evaluated for the single row coming out of the FROM clause.) In the case you've got here, given the placement of the WHERE in the outer query, you'd certainly expect it to be evaluated for each row coming out of the inner query. But the optimizer is deciding it can push the WHERE clause down to become a WHERE of the sub-select. That is legitimate in a lot of cases, but not when there are SRF(s) in the sub-select's targetlist, because that pushes the WHERE to occur before the SRF(s), analogously to the change between the two queries I wrote. I'm a bit hesitant to change this in existing releases. Given the lack of previous complaints, it seems more likely to break queries that were behaving as-expected than to make people happy. But we could change it in v10 and up, especially since some other corner-case changes in SRF-in-tlist behavior are afoot. In the meantime, you could force it to work as you wish by inserting the all-purpose optimization fence "OFFSET 0" in the sub-select: =# SELECT num FROM ( SELECT unnest(Array[1,2,3,4,5,6,7,8,9,10]) num OFFSET 0) AS foo WHERE random() > 0.5; num - 1 4 7 9 (4 rows) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add support for restrictive RLS policies
Jeevan, all, * Jeevan Chalke (jeevan.cha...@enterprisedb.com) wrote: > On Mon, Sep 12, 2016 at 7:27 AM, Stephen Frost wrote: > > * Robert Haas (robertmh...@gmail.com) wrote: > > > On Thu, Sep 8, 2016 at 5:21 PM, Tom Lane wrote: > > > > Stephen Frost writes: > > > >> * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: > > > >>> Can't you keep those words as Sconst or something (DefElems?) until > > the > > > >>> execution phase, so that they don't need to be keywords at all? > > > > > > > >> Seems like we could do that, though I'm not convinced that it really > > > >> gains us all that much. These are only unreserved keywords, of > > course, > > > >> so they don't impact users the way reserved keywords (of any kind) > > can. > > > >> While there may be some places where we use a string to represent a > > set > > > >> of defined options, I don't believe that's typical > > > > > > > > -1 for having to write them as string literals; but I think what Alvaro > > > > really means is to arrange for the words to just be identifiers in the > > > > grammar, which you strcmp against at execution. See for example > > > > reloption_list. (Whether you use DefElem as the internal > > representation > > > > is a minor detail, though it might help for making the parsetree > > > > copyObject-friendly.) > > > > > > > > vacuum_option_elem shows another way to avoid making a word into a > > > > keyword, although to me that one is more of an antipattern; it'd be > > better > > > > to leave the strcmp to execution, since there's so much other code that > > > > does things that way. > > > > > > There are other cases like that, too, e.g. AlterOptRoleElem; I don't > > > think it's a bad pattern. Regardless of the specifics, I do think > > > that it would be better not to bloat the keyword table with things > > > that don't really need to be keywords. > > > > The AlterOptRoleElem case is, I believe, what Tom was just suggesting as > > an antipattern, since the strcmp() is being done at parse time instead > > of at execution time. > > > > If we are concerned about having too many unreserved keywords, then I > > agree that AlterOptRoleElem is a good candidate to look at for reducing > > the number we have, as it appears to contain 3 keywords which are not > > used anywhere else (and 1 other which is only used in one other place). > > > > I do think that using IDENT for the various role attributes does make > > sense, to be clear, as there are quite a few of them, they change > > depending on major version, and those keywords are very unlikely to ever > > have utilization elsewhere. > > > > For this case, there's just 2 keywords which seem like they may be used > > again (perhaps for ALTER or DROP POLICY, or default policies if we grow > > those in the future), and we're unlikly to grow more in the future for > > that particular case (as we only have two binary boolean operators and > > that seems unlikely to change), though, should that happens, we could > > certainly revisit this. > > > > To me, adding two new keywords for two new options does not look good as it > will bloat keywords list. Per my understanding we should add keyword if and > only if we have no option than adding at, may be to avoid grammar conflicts. > > I am also inclined to think that using identifier will be a good choice > here. > However I would prefer to do the string comparison right into the grammar > itself, so that if we have wrong option as input there, then we will not > proceed further with it. We are anyway going to throw an error later then > why not at early stage. Updated patch changes to using IDENT and an strcmp() (similar to AlterOptRoleElem and vacuum_option_elem) to check the results at parse-time, and then throwing a more specific error if an unexpected value is found (instead of just 'syntax error'). This avoids adding any keywords. Thanks! Stephen From 11471c7921271e3c03078f3d31148dd4afd9d6e0 Mon Sep 17 00:00:00 2001 From: Stephen Frost Date: Thu, 1 Sep 2016 02:11:30 -0400 Subject: [PATCH] Add support for restrictive RLS policies We have had support for restrictive RLS policies since 9.5, but they were only available through extensions which use the appropriate hooks. This adds support into the grammer, catalog, psql and pg_dump for restrictive RLS policies, thus reducing the cases where an extension is necessary. --- doc/src/sgml/ref/create_policy.sgml | 16 +++ src/backend/commands/policy.c | 9 ++ src/backend/nodes/copyfuncs.c | 1 + src/backend/nodes/equalfuncs.c| 1 + src/backend/parser/gram.y | 34 - src/backend/rewrite/rowsecurity.c | 7 +- src/bin/pg_dump/pg_dump.c | 38 -- src/bin/pg_dump/pg_dump.h | 1 + src/bin/pg_dump/t/002_pg_dump.pl | 39 +- src/bin/psql/describe.c | 109 src/bin/psql/tab-complete.c | 29 - src/include/catalog/pg_
Re: [HACKERS] pageinspect: Hash index support
On Mon, Sep 26, 2016 at 10:39 AM, Jesper Pedersen < jesper.peder...@redhat.com> wrote: > Hi, > > On 09/23/2016 12:10 AM, Peter Eisentraut wrote: > >> >> > - As of very recently, we don't need to move pageinspect--1.5.sql to >> pageinspect--1.6.sql anymore. Just add pageinspect--1.5--1.6.sql. >> >> > We need to rename the pageinspect--1.5.sql file and add the new methods, > right ? Or ? You just need to add pageinspect--1.5--1.6.sql. With recent changes to the extension infrastructure, when you create the extension as version 1.6, the infrastructure automatically figures out that it should install 1.5 and then upgrade to 1.6. Or that is my understanding, anyway. Cheers, Jeff
Re: [HACKERS] Add support for restrictive RLS policies
Jeevan, * Jeevan Chalke (jeevan.cha...@enterprisedb.com) wrote: > I have started reviewing this patch and here are couple of points I have > observed so far: > > 1. Patch applies cleanly > 2. make / make install / initdb all good. > 3. make check (regression) FAILED. (Attached diff file for reference). I've re-based my patch on top of current head and still don't see the failures which you are getting during the regression tests. Is it possible you were doing the tests without a full rebuild of the source tree..? Can you provide details of your build/test environment and the full regression before and after output? Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] proposal: psql \setfileref
Hi 2016-09-26 14:57 GMT+02:00 Ryan Murphy : > Hi Pavel, > > I just tried to apply your patch psql-setfileref-initial.patch (using git > apply) to the newest revision of postgres at the time (da6c4f6ca88) and it > failed to patch startup.c. Thinking that the patch was for some previous > revision, I then checked out d062245b5, which was from 2016-08-31, and > tried applying the patch from there. I had the same problem: > > $ git apply psql-setfileref-initial.patch > error: patch failed: src/bin/psql/startup.c:106 > error: src/bin/psql/startup.c: patch does not apply > > However, when I applied the changes to startup.c manually and removed them > from the patch, the rest of the patch applied without a problem. I don't > know if I may have done something wrong in trying to apply the patch, but > you may want to double check if you need to regenerate your patch from the > latest revision so it will apply smoothly for reviewers. > please, can you check attached patch? It worked in my laptop. Regards Pavel > > In the meantime, I haven't had a chance to try out the fileref feature yet > but I'll give it a go when I get a chance! > > Thanks! > Ryan > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index a9a2fdb..af38ff9 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -1363,6 +1363,32 @@ exec_command(const char *cmd, free(envval); } + /* \setfileref - set variable by reference on file */ + else if (strcmp(cmd, "setfileref") == 0) + { + char *name = psql_scan_slash_option(scan_state, + OT_NORMAL, NULL, false); + + char *ref = psql_scan_slash_option(scan_state, + OT_NORMAL, NULL, false); + + success = false; + + if (!name || !ref) + { + psql_error("\\%s: missing required argument\n", cmd); + success = false; + } + else + { + if (!SetFileRef(pset.vars, name, ref)) + { +psql_error("\\%s: error while setting variable\n", cmd); +success = false; + } + } + } + /* \sf -- show a function's source code */ else if (strcmp(cmd, "sf") == 0 || strcmp(cmd, "sf+") == 0) { diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c index a7789df..b160228 100644 --- a/src/bin/psql/common.c +++ b/src/bin/psql/common.c @@ -25,6 +25,7 @@ #include "settings.h" #include "command.h" #include "copy.h" +#include "catalog/pg_type.h" #include "crosstabview.h" #include "fe_utils/mbprint.h" @@ -33,7 +34,6 @@ static bool ExecQueryUsingCursor(const char *query, double *elapsed_msec); static bool command_no_begin(const char *query); static bool is_select_command(const char *query); - /* * openQueryOutputFile --- attempt to open a query output file * @@ -109,6 +109,120 @@ setQFout(const char *fname) return true; } +void +psql_reset_query_params(void) +{ + int i; + + for (i = 0; i < pset.nparams; i++) + if (pset.params[i] != NULL) + { + PQfreemem(pset.params[i]); + pset.params[i] = NULL; + } + + pset.nparams = 0; +} + +/* + * Load a content of the file_ref related file to query params buffer. + * When escaping is requested, then the text content is expected. + * Without escaping the bytea content is expected and related bytea + * escaping is processed. + */ +static char * +get_file_ref_content(const char *value, bool escape, bool as_ident) +{ + PQExpBufferData buffer; + FILE *fd = NULL; + char *fname; + char *escaped_value; + charline[1024]; + size_tsize; + + fname = pstrdup(value); + + expand_tilde(&fname); + if (!fname) + { + psql_error("missing valid path to file\n"); + return NULL; + } + + canonicalize_path(fname); + + fd = fopen(fname, PG_BINARY_R); + if (!fd) + { + psql_error("%s: %s\n", fname, strerror(errno)); + PQfreemem(fname); + return NULL; + } + + /* can append another parameter */ + if (pset.nparams >= MAX_BINARY_PARAMS) + { + psql_error("too much binary parameters"); + PQfreemem(fname); + return NULL; + } + + if (!pset.db) + { + psql_error("cannot escape without active connection\n"); + PQfreemem(fname); + return NULL; + } + + initPQExpBuffer(&buffer); + + while ((size = fread(line, 1, sizeof(line), fd)) > 0) + appendBinaryPQExpBuffer(&buffer, line, size); + + if (ferror(fd)) + { + psql_error("%s: %s\n", fname, strerror(errno)); + PQfreemem(fname); + termPQExpBuffer(&buffer); + return NULL; + } + + if (escape) + { + if (as_ident) + escaped_value = +PQescapeIdentifier(pset.db, buffer.data, buffer.len); + else + escaped_value = +PQescapeLiteral(pset.db, buffer.data, buffer.len); + pset.paramTypes[pset.nparams] = UNKNOWNOID; + } + else + { + escaped_value = (char *) +PQescapeByteaConn(pset.db, + (const unsigned char *) buffer.data, buffer.len, &size); + pset.paramTypes[pset.nparams] = BYTEAOID; + } + + /* fname, buffer is not necessary longer */ + PQfreemem(fna
Re: [HACKERS] Showing parallel status in \df+
* Robert Haas (robertmh...@gmail.com) wrote: > On Mon, Sep 26, 2016 at 10:48 AM, Stephen Frost wrote: > > I agree that "do nothing" isn't a good option. I'm not terribly > > thrilled with just putting the source code at the bottom of the \df+ > > output either, though it's at least slightly less ridiculous than trying > > to put the source code into a column in a table. > > Several people have spoken against that option, I feel like we're getting wrapped around the axle as it regards who is perceived to be voting for what. Based on my review of this thread, we seem to have one person (Peter) who dislikes removing prosrc from \df+, though he caveated his comments with being concerned that it was happening after beta2 (which is no longer the case, of course), see: f16571cc-bf6f-53a1-6809-f09f48f0a...@2ndquadrant.com. Subsequently, in 5aacd611-94b7-3b98-de8e-cae34e18c...@2ndquadrant.com, he seems to suggest that he might support it if \sf was changed to support wildcards and multiple results. Michael had voiced concern that removing prosrc from \df+ would make debugging more difficult, but subsequent discussion indicated that he agreed with removing prosrc for non-internal/C functions (which was later described as 'Internal name'), see: cab7npqtikt-e7e5cx6wm89m9fr0-za+bkb1k4_xvfgpcr7d...@mail.gmail.com Rushabh Lathia had an issue with keeping 'source code' for internal/C language functions, but not for other languages, see: cagpqqf2jz3q+bvkxraqxe+ztuzhrayfkp+syc74nc+fu5pn...@mail.gmail.com Pavel didn't feel having the footer be used for the source code was appropriate, see: cafj8prbh-m7cmez2jt49fkgh8qwfglxxbzfdbbi3gtybbxd...@mail.gmail.com I don't particularly care for it either, primairly because \sf could be improved upon, as suggested by Peter, to avoid the need to have the same information displayed by both \df+ and \sf. > but I think it's > actually pretty clearly an improvement over the status quo. If you > don't want to see all of that output, fine; look at the first > screenful and then quit your pager (which probably means "press q"). I agree that it's an improvment over the current \df+ output, but I'm not convinced that it's better than improving \sf to do that and then removing prosrc from \df+ and adding 'Internal name' and I'm not sure that there are actual dissenting votes for that as the end-goal. Peter's comments seem to be brought up as rejecting the removal of prosrc from \df+, full-stop, but that's not how I read his actual comments on this thread. > If you do want to see all of the output, you'll appreciate not having > it indented by 60 or 80 columns any more. There's really no > circumstanced under which it's worse than what we're doing today. That doesn't mean, at least to me, that we should forgo considering better alternatives. > It's fairly likely that there's no option here that will please > everyone completely, but that's not a reason to reject a patch that is > clearly better than what we've got now. We often reject patches which only improve a bit on the status quo because we wish for a better overall solution, particularly when we're talking about user interfaces that we don't want to change between every release. That's part of the reason that we have a role system today; Tom, correctly, pointed out that we don't just want a system where a given object can have multiple owners (one of my first proposals to the lists) but wished to have role based access controls, which would provide that capability and more. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Allowing GIN array_ops to work on anyarray
Enrique Meneses writes: > Great, given it does not apply to this patch, then all the other tests > passed and the change looks good. Pushed, thanks for the review! regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Python adding support for multi-dimensional arrays
> > This crashes with arrays with non-default lower bounds: > > postgres=# SELECT * FROM test_type_conversion_array_int4('[2:4]={1,2,3}'); > INFO: ([1, 2, ], ) > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > > Attached patch fixes this bug, and adds a test for it. > > I'd like to see some updates to the docs for this. The manual doesn't > currently say anything about multi-dimensional arrays in pl/python, but it > should've mentioned that they're not supported. Now that it is supported, > should mention that, and explain briefly that a multi-dimensional array is > mapped to a python list of lists. > > If the code passes I'll fix the docs > It seems we don't have any mention in the docs about arrays with > non-default lower-bounds ATM. That's not this patch's fault, but it would > be good to point out that the lower bounds are discarded when an array is > passed to python. > > I find the loop in PLyList_FromArray() quite difficult to understand. Are > the comments there mixing up the "inner" and "outer" dimensions? I wonder > if that would be easier to read, if it was written in a recursive-style, > rather than iterative with stacks for the dimensions. > > Yes, it is fairly convoluted. Dave Cramer da...@postgresintl.com www.postgresintl.com
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 09/26/2016 07:16 PM, Tomas Vondra wrote: The averages (over the 10 runs, 5 minute each) look like this: 3.2.80 1 8 16 32 64128192 granular-locking1567 12146 26341 44188 43263 49590 15042 no-content-lock 1567 12180 25549 43787 43675 51800 16831 group-update1550 12018 26121 44451 42734 51455 15504 master 1566 12057 25457 42299 42513 42562 10462 4.5.5 1 8 16 32 64128192 granular-locking3018 19031 27394 29222 32032 34249 36191 no-content-lock 2988 18871 27384 29260 32120 34456 36216 group-update2960 18848 26870 29025 32078 34259 35900 master 2984 18917 26430 29065 32119 33924 35897 That is: (1) The 3.2.80 performs a bit better than before, particularly for 128 and 256 clients - I'm not sure if it's thanks to the reboots or so. (2) 4.5.5 performs measurably worse for >= 32 clients (by ~30%). That's a pretty significant regression, on a fairly common workload. FWIW, now that I think about this, the regression is roughly in line with my findings presented in my recent blog post: http://blog.2ndquadrant.com/postgresql-vs-kernel-versions/ Those numbers were collected on a much smaller machine (2/4 cores only), which might be why the difference observed on 32-core machine is much more significant. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Sat, Sep 24, 2016 at 9:07 AM, Peter Geoghegan wrote: > On Thu, Sep 22, 2016 at 8:57 PM, Robert Haas wrote: >> On Thu, Sep 22, 2016 at 3:51 PM, Heikki Linnakangas wrote: >>> It'd be good if you could overlap the final merges in the workers with the >>> merge in the leader. ISTM it would be quite straightforward to replace the >>> final tape of each worker with a shared memory queue, so that the leader >>> could start merging and returning tuples as soon as it gets the first tuple >>> from each worker. Instead of having to wait for all the workers to complete >>> first. >> >> If you do that, make sure to have the leader read multiple tuples at a >> time from each worker whenever possible. It makes a huge difference >> to performance. See bc7fcab5e36b9597857fa7e3fa6d9ba54aaea167. > > That requires some kind of mutual exclusion mechanism, like an LWLock. No, it doesn't. Shared memory queues are single-reader, single-writer. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] pgpassfile connection option
On Thu, Sep 22, 2016 at 11:34 AM, Julian Markwort wrote: > I haven't really thought about this as I had been asked to make this work as > an additional option to the connection parameters... > Now that I've looked at it - there is really only the benefit of saving the > step of setting the PGPASSFILE environment variable. > However, there might be cases in which setting an environment variable might > not be the easiest option. So, there are some security concerns here in my mind. If a program running under a particular user ID accepts a connection string from a source that isn't fully trusted, the user has to accept the risk that their .pgpass file will be used for authentication to whatever database the program might try to connect. However, they don't have to accept the possibility that arbitrary local files readable by the user ID will be used for authentication and/or disclosed; this patch would force them to accept that risk. That doesn't seem particularly good. If an adversary has enough control over my account that they can set environment variables, it's game over: they win. But if I merely accept connection strings from them, I shouldn't have to worry about anything worse than that I might be induced to connect to the wrong thing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pageinspect: Hash index support
On 09/23/2016 01:56 AM, Amit Kapila wrote: While looking at patch, I noticed below code which seems somewhat problematic: + stat->max_avail = BLCKSZ - (BLCKSZ - phdr->pd_special + SizeOfPageHeaderData); + + /* page type (flags) */ + if (opaque->hasho_flag & LH_META_PAGE) + stat->type = 'm'; + else if (opaque->hasho_flag & LH_OVERFLOW_PAGE) + stat->type = 'v'; + else if (opaque->hasho_flag & LH_BUCKET_PAGE) + stat->type = 'b'; + else if (opaque->hasho_flag & LH_BITMAP_PAGE) + stat->type = 'i'; In the above code, it appears that you are trying to calculate max_avail space for all pages in same way. Don't you need to calculate it differently for bitmap page or meta page as they don't share the same format as other type of pages? Correct, however the max_avail calculation was removed in v6, since we don't display average item size anymore. Thanks for the feedback ! Best regards, Jesper -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pageinspect: Hash index support
Hi, On 09/23/2016 12:10 AM, Peter Eisentraut wrote: On 9/21/16 9:30 AM, Jesper Pedersen wrote: Attached is v5, which add basic page verification. There are still some things that I found that appear to follow the old style (btree) conventions instead the new style (brin, gin) conventions. - Rename hash_metap to hash_metapage_info. Done. - Don't use BuildTupleFromCStrings(). (There is nothing inherently wrong with it, but why convert numbers to strings and back again when it can be done more directly.) Left as is, since BuildTupleFromCStrings() vs. xyzGetDatum() are equally readable in this case. But, I can change the patch if needed. - Documentation for hash_page_stats still has blkno argument. Fixed. - In hash_metap, the magic field could be type bytea, so that it displays in hex. Or text like the brin functions. Changed to 'text'. Some other comments: - hash_metap result fields spares and mapp should be arrays of integer. B-tree and BRIN uses a 'text' field as output, so left as is. (Incidentally, the comment in hash.h talks about bitmaps[] but I think it means hashm_mapp[].) - As of very recently, we don't need to move pageinspect--1.5.sql to pageinspect--1.6.sql anymore. Just add pageinspect--1.5--1.6.sql. We need to rename the pageinspect--1.5.sql file and add the new methods, right ? Or ? - In the documentation for hash_page_stats(), the "+" sign is misaligned. Fixed. - In hash_page_items, the fields itemlen, nulls, vars are always 16, false, false. So perhaps there is no need for them. Similarly, the hash_page_stats in hash_page_stats is always 16. Fixed, by removing them. We can add them back later if needed. - The data field could be of type bytea. Left as is, for same reasons as 'spares' and 'mapp'. - Add a pointer in the documentation to the relevant header file. Done. Bonus: - Add subsections in the documentation (general functions, B-tree functions, etc.) Done. - Add tests. Thanks for the review ! Best regards, Jesper >From 24c3ff3cb48ca2076d589eda9027a0abe73b5aad Mon Sep 17 00:00:00 2001 From: jesperpedersen Date: Fri, 5 Aug 2016 10:16:32 -0400 Subject: [PATCH] pageinspect: Hash index support --- contrib/pageinspect/Makefile | 10 +- contrib/pageinspect/hashfuncs.c | 400 ++ contrib/pageinspect/pageinspect--1.5--1.6.sql | 59 contrib/pageinspect/pageinspect--1.5.sql | 279 -- contrib/pageinspect/pageinspect--1.6.sql | 346 ++ contrib/pageinspect/pageinspect.control | 2 +- doc/src/sgml/pageinspect.sgml | 151 +- 7 files changed, 952 insertions(+), 295 deletions(-) create mode 100644 contrib/pageinspect/hashfuncs.c create mode 100644 contrib/pageinspect/pageinspect--1.5--1.6.sql delete mode 100644 contrib/pageinspect/pageinspect--1.5.sql create mode 100644 contrib/pageinspect/pageinspect--1.6.sql diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile index a98237e..c73d728 100644 --- a/contrib/pageinspect/Makefile +++ b/contrib/pageinspect/Makefile @@ -2,13 +2,13 @@ MODULE_big = pageinspect OBJS = rawpage.o heapfuncs.o btreefuncs.o fsmfuncs.o \ - brinfuncs.o ginfuncs.o $(WIN32RES) + brinfuncs.o ginfuncs.o hashfuncs.o $(WIN32RES) EXTENSION = pageinspect -DATA = pageinspect--1.5.sql pageinspect--1.4--1.5.sql \ - pageinspect--1.3--1.4.sql pageinspect--1.2--1.3.sql \ - pageinspect--1.1--1.2.sql pageinspect--1.0--1.1.sql \ - pageinspect--unpackaged--1.0.sql +DATA = pageinspect--1.6.sql pageinspect--1.5--1.6.sql \ + pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \ + pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \ + pageinspect--1.0--1.1.sql pageinspect--unpackaged--1.0.sql PGFILEDESC = "pageinspect - functions to inspect contents of database pages" ifdef USE_PGXS diff --git a/contrib/pageinspect/hashfuncs.c b/contrib/pageinspect/hashfuncs.c new file mode 100644 index 000..96e72a9 --- /dev/null +++ b/contrib/pageinspect/hashfuncs.c @@ -0,0 +1,400 @@ +/* + * hashfuncs.c + * Functions to investigate the content of HASH indexes + * + * Copyright (c) 2016, PostgreSQL Global Development Group + * + * IDENTIFICATION + * contrib/pageinspect/hashfuncs.c + */ + +#include "postgres.h" + +#include "access/hash.h" +#include "funcapi.h" +#include "miscadmin.h" + +PG_FUNCTION_INFO_V1(hash_metapage_info); +PG_FUNCTION_INFO_V1(hash_page_items); +PG_FUNCTION_INFO_V1(hash_page_stats); + +/* + * structure for single hash page statistics + * + */ +typedef struct HashPageStat +{ + uint32 live_items; + uint32 dead_items; + uint32 page_size; + uint32 free_size; + char type; + + /* opaque data */ + BlockNumber hasho_prevblkno; + BlockNumber hasho_nextblkno; + Bucket hasho_bucket; + uint16 hasho_flag; + uint16 hasho_page_id; +} HashPageStat; + +
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Fri, Sep 23, 2016 at 9:20 AM, Amit Kapila wrote: > On Fri, Sep 23, 2016 at 6:50 AM, Robert Haas wrote: >> On Thu, Sep 22, 2016 at 7:44 PM, Tomas Vondra >> wrote: >>> I don't dare to suggest rejecting the patch, but I don't see how we could >>> commit any of the patches at this point. So perhaps "returned with feedback" >>> and resubmitting in the next CF (along with analysis of improved workloads) >>> would be appropriate. >> >> I think it would be useful to have some kind of theoretical analysis >> of how much time we're spending waiting for various locks. So, for >> example, suppose we one run of these tests with various client counts >> - say, 1, 8, 16, 32, 64, 96, 128, 192, 256 - and we run "select >> wait_event from pg_stat_activity" once per second throughout the test. >> Then we see how many times we get each wait event, including NULL (no >> wait event). Now, from this, we can compute the approximate >> percentage of time we're spending waiting on CLogControlLock and every >> other lock, too, as well as the percentage of time we're not waiting >> for lock. That, it seems to me, would give us a pretty clear idea >> what the maximum benefit we could hope for from reducing contention on >> any given lock might be. >> > As mentioned earlier, such an activity makes sense, however today, > again reading this thread, I noticed that Dilip has already posted > some analysis of lock contention upthread [1]. It is clear that patch > has reduced LWLock contention from ~28% to ~4% (where the major > contributor was TransactionIdSetPageStatus which has reduced from ~53% > to ~3%). Isn't it inline with what you are looking for? Hmm, yes. But it's a little hard to interpret what that means; I think the test I proposed in the quoted material above would provide clearer data. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Showing parallel status in \df+
On Mon, Sep 26, 2016 at 10:48 AM, Stephen Frost wrote: > I agree that "do nothing" isn't a good option. I'm not terribly > thrilled with just putting the source code at the bottom of the \df+ > output either, though it's at least slightly less ridiculous than trying > to put the source code into a column in a table. Several people have spoken against that option, but I think it's actually pretty clearly an improvement over the status quo. If you don't want to see all of that output, fine; look at the first screenful and then quit your pager (which probably means "press q"). If you do want to see all of the output, you'll appreciate not having it indented by 60 or 80 columns any more. There's really no circumstanced under which it's worse than what we're doing today. It's fairly likely that there's no option here that will please everyone completely, but that's not a reason to reject a patch that is clearly better than what we've got now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: psql \setfileref
2016-09-26 14:57 GMT+02:00 Ryan Murphy : > Hi Pavel, > > I just tried to apply your patch psql-setfileref-initial.patch (using git > apply) to the newest revision of postgres at the time (da6c4f6ca88) and it > failed to patch startup.c. Thinking that the patch was for some previous > revision, I then checked out d062245b5, which was from 2016-08-31, and > tried applying the patch from there. I had the same problem: > > $ git apply psql-setfileref-initial.patch > error: patch failed: src/bin/psql/startup.c:106 > error: src/bin/psql/startup.c: patch does not apply > > However, when I applied the changes to startup.c manually and removed them > from the patch, the rest of the patch applied without a problem. I don't > know if I may have done something wrong in trying to apply the patch, but > you may want to double check if you need to regenerate your patch from the > latest revision so it will apply smoothly for reviewers. > Thank you for info. I'll do it immediately. Regards Pavel > > In the meantime, I haven't had a chance to try out the fileref feature yet > but I'll give it a go when I get a chance! > > Thanks! > Ryan > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Re: [HACKERS] temporary table vs array performance
2016-09-26 17:39 GMT+02:00 dby...@163.com : > test: > create type h3 as (id int,name char(10)); > > CREATE or replace FUNCTION proc17() > RETURNS SETOF h3 AS $$ > DECLARE > v_rec h3; > BEGIN > create temp table abc(id int,name varchar) on commit drop; > insert into abc select 1,'lw'; > insert into abc select 2,'lw2'; > for v_rec in > select * from abc loop > return next v_rec; > end loop; > END; > $$ > LANGUAGE plpgsql; > > > CREATE or replace FUNCTION proc16() > RETURNS SETOF h3 AS $$ > DECLARE > id_array int[]; > name_arr varchar[]; > v_rec h3; > BEGIN > id_array =array[1,2]; > name_arr=array['lw','lw2']; > for v_rec in > select unnest(id_array) ,unnest(name_arr) loop > return next v_rec; > end loop; > END; > $$ > LANGUAGE plpgsql; > postgres=# select * from proc17(); > id |name > + > 1 | lw > 2 | lw2 > (2 rows) > > Time: 68.372 ms > postgres=# select * from proc16(); > id |name > + > 1 | lw > 2 | lw2 > (2 rows) > > Time: 1.357 ms > > temp talbe result: > [postgres@pg95 test_sql]$ pgbench -M prepared -n -r -c > 2 -j 2 -T 10 -f temporary_test_1.sql > transaction type: Custom query > scaling factor: 1 > query mode: prepared > number of clients: 2 > number of threads: 2 > duration: 10 s > number of transactions actually processed: 5173 > latency average: 3.866 ms > tps = 517.229191 (including connections establishing) > tps = 517.367956 (excluding connections establishing) > statement latencies in milliseconds: > 3.863798 select * from proc17(); > > array result: > [postgres@pg95 test_sql]$ pgbench -M prepared -n -r -c > 2 -j 2 -T 10 -f arrary_test_1.sql > transaction type: Custom query > scaling factor: 1 > query mode: prepared > number of clients: 2 > number of threads: 2 > duration: 10 s > number of transactions actually processed: 149381 > latency average: 0.134 ms > tps = 14936.875176 (including connections establishing) > tps = 14940.234960 (excluding connections establishing) > statement latencies in milliseconds: > 0.132983 select * from proc16(); > > Array is not convenient to use in function, whether > there are other methods can be replaced temp table in function > > Temporary tables are pretty expensive - from more reasons, and horrible when you use fresh table for two rows only. More if you recreate it every transaction. More often pattern is create first and delete repeatedly. Better don't use temp tables when it is necessary. It is one reason why PostgreSQL supports a arrays. Partially - PostgreSQL arrays are analogy to T-SQL memory tables. Regards Pavel > > -- > dby...@163.com >
Re: [HACKERS] Allowing GIN array_ops to work on anyarray
Great, given it does not apply to this patch, then all the other tests passed and the change looks good. Thank you, Enrique On Mon, Sep 26, 2016 at 6:27 AM Tom Lane wrote: > Enrique Meneses writes: > > I was not sure what "Spec compliant means"... so I did not select as > tested or passed. What should I do to validate that this change is "Spec > compliant"? > > It's irrelevant to this patch, AFAICS. The SQL standard doesn't discuss > indexes at all, much less legislate on which operator classes ought to > exist for GIN indexes. > > regards, tom lane >
Re: [HACKERS] temporary table vs array performance
Its considered bad form to post to multiple lists. Please pick the most relevant one - in this case I'd suggest -general. On Mon, Sep 26, 2016 at 8:39 AM, dby...@163.com wrote: > > Array is not convenient to use in function, whether > there are other methods can be replaced temp table in function > > I have no difficulty using arrays in functions. As for "other methods" - you can use CTE (WITH) to create a truly local table - updating the catalogs by using a temp table is indeed quite expensive. WITH vals AS ( VALUES (1, 'lw'), (2, 'lw2') ) SELECT * FROM vals; David J.
[HACKERS] temporary table vs array performance
test: create type h3 as (id int,name char(10)); CREATE or replace FUNCTION proc17() RETURNS SETOF h3 AS $$ DECLARE v_rec h3; BEGIN create temp table abc(id int,name varchar) on commit drop; insert into abc select 1,'lw'; insert into abc select 2,'lw2'; for v_rec in select * from abc loop return next v_rec; end loop; END; $$ LANGUAGE plpgsql; CREATE or replace FUNCTION proc16() RETURNS SETOF h3 AS $$ DECLARE id_array int[]; name_arr varchar[]; v_rec h3; BEGIN id_array =array[1,2]; name_arr=array['lw','lw2']; for v_rec in select unnest(id_array) ,unnest(name_arr) loop return next v_rec; end loop; END; $$ LANGUAGE plpgsql; postgres=# select * from proc17(); id |name + 1 | lw 2 | lw2 (2 rows) Time: 68.372 ms postgres=# select * from proc16(); id |name + 1 | lw 2 | lw2 (2 rows) Time: 1.357 ms temp talbe result: [postgres@pg95 test_sql]$ pgbench -M prepared -n -r -c 2 -j 2 -T 10 -f temporary_test_1.sql transaction type: Custom query scaling factor: 1 query mode: prepared number of clients: 2 number of threads: 2 duration: 10 s number of transactions actually processed: 5173 latency average: 3.866 ms tps = 517.229191 (including connections establishing) tps = 517.367956 (excluding connections establishing) statement latencies in milliseconds: 3.863798 select * from proc17(); array result: [postgres@pg95 test_sql]$ pgbench -M prepared -n -r -c 2 -j 2 -T 10 -f arrary_test_1.sql transaction type: Custom query scaling factor: 1 query mode: prepared number of clients: 2 number of threads: 2 duration: 10 s number of transactions actually processed: 149381 latency average: 0.134 ms tps = 14936.875176 (including connections establishing) tps = 14940.234960 (excluding connections establishing) statement latencies in milliseconds: 0.132983 select * from proc16(); Array is not convenient to use in function, whether there are other methods can be replaced temp table in function dby...@163.com
Re: [HACKERS] WIP: Covering + unique indexes.
24.09.2016 15:36, Amit Kapila: On Wed, Sep 21, 2016 at 6:51 PM, Anastasia Lubennikova wrote: 20.09.2016 08:21, Amit Kapila: On Tue, Sep 6, 2016 at 10:18 PM, Anastasia Lubennikova wrote: 28.08.2016 09:13, Amit Kapila: The problem seems really tricky, but the answer is simple. We store included columns unordered. It was mentioned somewhere in this thread. Is there any fundamental problem in storing them in ordered way? I mean to say, you need to anyway store all the column values on leaf page, so why can't we find the exact location for the complete key. Basically use truncated key to reach to leaf level and then use the complete key to find the exact location to store the key. I might be missing some thing here, but if we can store them in ordered fashion, we can use them even for queries containing ORDER BY (where ORDER BY contains included columns). I'd say that the reason for not using included columns in any operations which require comparisons, is that they don't have opclass. Let's go back to the example of points. This data type don't have any opclass for B-tree, because of fundamental reasons. And we can not apply _bt_compare() and others to this attribute, so we don't include it to scan key. create table t (i int, i2 int, p point); create index idx1 on (i) including (i2); create index idx2 on (i) including (p); create index idx3 on (i) including (i2, p); create index idx4 on (i) including (p, i2); You can keep tuples ordered in idx1, but not for idx2, partially ordered for idx3, but not for idx4. At the very beginning of this thread [1], I suggested to use opclass, where possible. Exactly the same idea, you're thinking about. But after short discussion, we came to conclusion that it would require many additional checks and will be too complicated, at least for the initial patch. Let me give you an example: create table t (i int, p point); create index on (i) including (p); "point" data type doesn't have any opclass for btree. Should we insert (0, '(0,2)') before (0, '(1,1)') or after? We have no idea what is the "correct order" for this attribute. So the answer is "it doesn't matter". When searching in index, we know that only key attrs are ordered, so only them can be used in scankey. Other columns are filtered after retrieving data. explain select i,p from t where i =0 and p <@ circle '((0,0),2)'; QUERY PLAN --- Index Only Scan using idx on t (cost=0.14..4.20 rows=1 width=20) Index Cond: (i = 0) Filter: (p <@ '<(0,0),2>'::circle) I think here reason for using Filter is that because we don't keep included columns in scan keys, can't we think of having them in scan keys, but use only key columns in scan key to reach till leaf level and then use complete scan key at leaf level. What should I add to README (or to documentation), to make it more understandable? May be add the data representation like only leaf pages contains all the columns and how the scan works. I think you can see if you can extend "Notes About Data Representation" and or "Other Things That Are Handy to Know" sections in existing README. Ok, I'll write it in a few days. [1] https://www.postgresql.org/message-id/55f84df4.5030...@postgrespro.ru -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Showing parallel status in \df+
* Tom Lane (t...@sss.pgh.pa.us) wrote: > Pavel Stehule writes: > > 2016-09-23 7:22 GMT+02:00 Rushabh Lathia : > >> On Thu, Sep 22, 2016 at 10:04 PM, Tom Lane wrote: > >>> If it's unreadable in \df+, how would \df++ make that any better? > > >> Eventhough source code as part of \df+ is bit annoying (specifically > >> for PL functions), I noticed the argument in this thread that it's > >> useful information for some of. So \df++ is just alternate option for > >> the those who want the source code. > > > ++ is little bit obscure. So better to remove src everywhere. > > Well, that was suggested upthread (which is where the idea of relying > on \sf came from) and Peter objected on the quite reasonable grounds > that people expect \df+ to provide this info and won't know to go > use \sf instead. So I'm afraid that suggestion is going nowhere. For my 2c, I disagree that "just because it's always been there and that's where people know to go look" is a reason to not remove it. Moving src out of \df+ will mean that people looking for it will need to use \? to see where it went (or use \ef, which is what I'd argue most already do today..), but I hardly see that as a huge issue and the improvement in readability of \df+ is well worth that cost. > I think the options that have a chance of happening are to rearrange > \df+ output more or less as in my patch, or to do nothing. I'm not very > happy about "do nothing", but that seems to be where we're ending up. I agree that "do nothing" isn't a good option. I'm not terribly thrilled with just putting the source code at the bottom of the \df+ output either, though it's at least slightly less ridiculous than trying to put the source code into a column in a table. If we really are worried that people who know how to use \df+ and how to write plpgsql (or other PL) code can't figure out how to view the src with \sf or \ef, then we could include at the bottom of the \df+ output a hint which essentially says "use \sf to view function source". Alternativly, and I kind of hate suggesting this, but it's not like most people don't already have a .psqlrc to deal with our silly defaults, we could add a variable to control if src is included in \df+ or not. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Showing parallel status in \df+
Pavel Stehule writes: > 2016-09-23 7:22 GMT+02:00 Rushabh Lathia : >> On Thu, Sep 22, 2016 at 10:04 PM, Tom Lane wrote: >>> If it's unreadable in \df+, how would \df++ make that any better? >> Eventhough source code as part of \df+ is bit annoying (specifically >> for PL functions), I noticed the argument in this thread that it's >> useful information for some of. So \df++ is just alternate option for >> the those who want the source code. > ++ is little bit obscure. So better to remove src everywhere. Well, that was suggested upthread (which is where the idea of relying on \sf came from) and Peter objected on the quite reasonable grounds that people expect \df+ to provide this info and won't know to go use \sf instead. So I'm afraid that suggestion is going nowhere. I think the options that have a chance of happening are to rearrange \df+ output more or less as in my patch, or to do nothing. I'm not very happy about "do nothing", but that seems to be where we're ending up. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Should we fix postmaster to avoid slow shutdown?
"Tsunakawa, Takayuki" writes: >> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas >> I think that we shouldn't start changing things based on guesses about what >> the problem is, even if they're fairly smart guesses. The thing to do would >> be to construct a test rig, crash the server repeatedly, and add debugging >> instrumentation to figure out where the time is actually going. > We have tried to reproduce the problem in the past several days with much > more stress on our environment than on the customer's one -- 1,000 tables > aiming for a dozens of times larger stats file and repeated reconnection > requests from hundreds of clients -- but we could not succeed. >> I do think your theory about the stats collector might be worth pursuing. >> It seems that the stats collector only responds to SIGQUIT, ignoring SIGTERM. >> Making it do a clean shutdown on SIGTERM and a fast exit on SIGQUIT seems >> possibly worthwhile. > Thank you for giving confidence for proceeding. And I also believe that > postmaster should close the listening ports earlier. Regardless of whether > this problem will be solved not confident these will solve the, I think it'd > be better to fix these two points so that postmaster doesn't longer time than > necessary. I think I'll create a patch after giving it a bit more thought. FWIW, I'm pretty much -1 on messing with the timing of the socket close actions. I broke that once within recent memory, so maybe I'm gun-shy, but I think that the odds of unpleasant side effects greatly outweigh any likely benefit there. Allowing SIGQUIT to prompt fast shutdown of the stats collector seems sane, though. Try to make sure it doesn't leave partly-written stats files behind. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small race in pg_xlogdump --follow
Magnus Hagander writes: > Oh, right, at the very last loop. I've never seen it need more than 1 loop > so I didn't manage to hit that codepath :) But yeah, saving errno and > restoring it on the other side of pg_usleep is probably a good idea. Right. On a larger scale --- I'm too uncaffeinated to figure out whether this is the code path taken for the initially user-supplied file name. But it would be unfriendly if when you fat-finger the command line arguments, it takes 5 seconds for pg_xlogdump to tell you so. Maybe the looping behavior should only happen on non-first file access. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Stopping logical replication protocol
>You should rely on time I tests as little as possible. Some of the test hosts are VERY slow. If possible something deterministic should be used. That's why this changes difficult to verify. Maybe in that case we should write benchmark but not integration test? In that case we can say, before this changes stopping logical replication gets N ms but after apply changes it gets NN ms where NN ms less than N ms. Is it available add this kind of benchmark to postgresql? I will be grateful for links. 2016-09-26 5:32 GMT+03:00 Craig Ringer : > On 26 Sep. 2016 02:16, "Vladimir Gordiychuk" wrote: > > > > >It looks like you didn't import my updated patches, so I've > rebased your new patches on top of them. > > Yes, i forgot it, sorry. Thanks for you fixes. > > > > >I did't see you explain why this was removed: > > > > -/* fast path */ > > -/* Try to flush pending output to the client */ > > -if (pq_flush_if_writable() != 0) > > -WalSndShutdown(); > > - > > -if (!pq_is_send_pending()) > > -return; > > > > I remove it, because during decode long transaction code, we always > exist from function by fast path and not receive messages from client. Now > we always include in 'for' loop and executes > > /* Check for input from the client */ ProcessRepliesIfAny(); > > Makes sense. > I > > >Some of the changes to pg_recvlogical look to be copied from > > >receivelog.c, most notably the functions ProcessKeepalive and > > >ProcessXLogData . > > > > During refactoring huge function I also notice It, but decide not > include additional changes in already huge patch. I only split method that > do everything to few small functions. > > Good decision. This improves things already and makes future refactoring > easier. > > > >I was evaluating the tests and I don't think they actually demonstrate > > >that the patch works. There's nothing done to check that > > >pg_recvlogical exited because of client-initiated CopyDone. While > > >looking into that I found that it also never actually hits > > >ProcessCopyDone(...) because libpq handles the CopyDone reply from the > > >server its self and treats it as end-of-stream. So the loop in > > >StreamLogicalLog will just end and ProcessCopyDone() is dead code. > > > > The main idea was about fast stop logical replication as it available, > because if stop replication gets more them 1 seconds it takes more pain. > > You should rely on time I tests as little as possible. Some of the test > hosts are VERY slow. If possible something deterministic should be used. > > > Increase this timeout to 3 or 5 second hide problems that this PR try > fix, that's why I think this lines should be restore to state from previous > patch. > > Per above I don't agree. > > > > > ``` > > ok($spendTime <= 5, # allow extra time for slow machines > > "pg_recvlogical exited promptly on signal when decoding"); > > > > ok((time() - $cancelTime) <= 3, # allow extra time for slow machines > > "pg_recvlogical exited promptly on sigint when idle" > > ); > > > > ``` > > > > Also I do not understand why we do > > > > $proc->pump while $proc->pumpable; > > > > after send signal to process, why we can not just wait stop replication > slot? > > Because verbose output can produce a lot of writes. If we don't pump the > buffer pg_recvlogical could block writing on its socket. Also it lets us > capture stderr which we need to observe that pg_recvlogical actually ended > copy on user command rather than just receiving all the input available. >
Re: [HACKERS] Small race in pg_xlogdump --follow
On Mon, Sep 26, 2016 at 3:44 PM, Tom Lane wrote: > Magnus Hagander writes: > > Attached patch puts a retry loop around opening the file that retries > for 5 > > seconds (which is excessive, but should be safe) in case the file is > > missing (and still fails out for other error messages of course). > > > Comments? > > The patch assumes that pg_usleep won't change errno, an assumption > I have little faith in. > > Oh, right, at the very last loop. I've never seen it need more than 1 loop so I didn't manage to hit that codepath :) But yeah, saving errno and restoring it on the other side of pg_usleep is probably a good idea. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ diff --git a/src/bin/pg_xlogdump/pg_xlogdump.c b/src/bin/pg_xlogdump/pg_xlogdump.c index 02575eb..6947c0c 100644 --- a/src/bin/pg_xlogdump/pg_xlogdump.c +++ b/src/bin/pg_xlogdump/pg_xlogdump.c @@ -249,6 +249,7 @@ XLogDumpXLogRead(const char *directory, TimeLineID timeline_id, if (sendFile < 0 || !XLByteInSeg(recptr, sendSegNo)) { char fname[MAXFNAMELEN]; + int tries; /* Switch to another logfile segment */ if (sendFile >= 0) @@ -258,7 +259,24 @@ XLogDumpXLogRead(const char *directory, TimeLineID timeline_id, XLogFileName(fname, timeline_id, sendSegNo); - sendFile = fuzzy_open_file(directory, fname); + for (tries = 0; tries < 10; tries++) + { +sendFile = fuzzy_open_file(directory, fname); +if (sendFile >= 0) + break; +if (errno == ENOENT) +{ + int save_errno = errno; + + /* File not there yet, try again */ + pg_usleep(500 * 1000); + + errno = save_errno; + continue; +} +/* Any other error, fall through and fail */ +break; + } if (sendFile < 0) fatal_error("could not find file \"%s\": %s", -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench more operators & functions
On Sat, Jul 9, 2016 at 12:14 PM, Fabien COELHO wrote: > >> Here is a simple patch which adds a bunch of operators (bitwise: & | ^ ~, >> comparisons: =/== <>/!= < <= > >=, logical: and/&& or/|| xor/^^ not/!) and >> functions (exp ln if) to pgbench. I've tried to be pg's SQL compatible where >> appropriate. >> >> Also attached is a simple test script. > > > Here is a sightly updated version. > Hi Fabien, I did the review of your patch and here are my views on your patch. Purpose of the patch: = This patch introduces extensive list of new operators and functions that can be used in expressions in pgbench transaction scripts(given with -f option). Here is the list of operators and functions introduced by this patch: New operators: -- bitwise: <<, >>, &, |, ^/#, ~ comparisons: =/==, <>/!=, <, <=, >, >= logical: and/&&, or/||, xor/^^, not, ! New functions: -- exp, ln, if I see there had been a discussion about timing for submission of patch, but not much about what the patch is doing, so I believe the purpose of patch is quite acceptable. Currently there are limited number of operators available in pgbench. So, I think these new operators definitely make a value addition towards custom scripts. Documentation: = I could build the documentation without any errors. New operators and functions are well categorized and added in proper sections of existing pgbench documentation. I have a small suggestion here: Earlier we had only few number of operators, so it was OK to have the operators embedded in the description of '\set' command itself, but now as we have much more number of operators will it be a good idea to have a table of operators similar to that of functions. We need not have several columns here like description, example etc., but a short table just categorizing the operators would be sufficient. Initial Run: I was able to apply patch with 'patch -p1'. The testcase file(functions.sql) given along the patch gives an expected output. Further testing and review: === 1. Postgres treats '^' as exponentiation rather than XOR, and '#' as XOR. Personally, I think it can cause confusion, so it will be better if we can stick to the behavior of Postgres mathematical operators. 2. I could not see any tests for bitwise operators in the functions.sql file that you have attached. 3. Precedence: a. Unary operators have more precedence than binary operators. So, UNOT and UINV should have precedence next to UMINUS. I tried couple of test expressions using C Vs your patch(pgbench) expression result_in_C result_in_pgbench (~14-14+2) -27 -3 (!14-14+2) -12 0 b. Similarly shift operators should take more precedence over other bitwise operators: expression result_in_C result_in_pgbench (4|1<<1) 6 10 (4^5&3) 5 1 c. Also, comparison would take more precedence over bitwise operators(&,|,^) but shift operators. expression result_in_C result_in_pgbench (2&1<3) 1 0 In backend/parser/gram.y, I see that unary operators are given higher precedence than other operators, but it too does not have explicit precedence defined for bitwise operators. I tried to check Postgres documentation for operator precedence, but in 'Lexical Structure'[1] the documentation does not mention anything about bitwise operators. Also, SQL standards 99 does not talk much about operator precedence. I may be wrong while trying to understand the precedence you defined here and you might have picked this per some standard, but do you have any reference which you considered for this? 4. If we are going to stick to current precedence, I think it will be good idea to document it. 5. Sorry, I was not able to understand the "should exist" comment in following snippet. +"xor" { return XOR_OP; } /* should exist */ +"^^" { return XOR_OP; } /* should exist */ 7. You may want to reword following comment: + else /* cannot get there */ To + else /* cannot get here */ 8. + case PGBENCH_IF: + /* should it do a lazy evaluation of the branch? */ + Assert(nargs == 3); + *retval = coerceToBool(&vargs[0]) ? vargs[1] : vargs[2]; I believe ternary operator does the lazy evaluation internally, but to be sure you may consider rewriting this as following: if (coerceToBool(&vargs[0])) *retval = vargs[1]; else *retval = vargs[2]; Conclusion: === I have tested the patch and each of the operator is implemented correctly. The only concern I have is precedence, otherwise the patch seems to be doing what it is supposed to do. [1]https://www.postgresql.org/docs/9.6/static/sql-syntax-lexical.html Regards, Jeevan Ladhe. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small race in pg_xlogdump --follow
Magnus Hagander writes: > Attached patch puts a retry loop around opening the file that retries for 5 > seconds (which is excessive, but should be safe) in case the file is > missing (and still fails out for other error messages of course). > Comments? The patch assumes that pg_usleep won't change errno, an assumption I have little faith in. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Small race in pg_xlogdump --follow
When using pg_xlogdump in --follow mode, there is what I believe is a small race condition when files are switched. If pg_xlogdump detects then end of one file (by looking at the record) it will immediately try to open the following file. If that file has not yet been created, it will fail with an error message saying it cannot open the file. But there's a race condition where the server has not had time to put the file in place. Attached patch puts a retry loop around opening the file that retries for 5 seconds (which is excessive, but should be safe) in case the file is missing (and still fails out for other error messages of course). Comments? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ diff --git a/src/bin/pg_xlogdump/pg_xlogdump.c b/src/bin/pg_xlogdump/pg_xlogdump.c index 02575eb..a0e0a0c 100644 --- a/src/bin/pg_xlogdump/pg_xlogdump.c +++ b/src/bin/pg_xlogdump/pg_xlogdump.c @@ -249,6 +249,7 @@ XLogDumpXLogRead(const char *directory, TimeLineID timeline_id, if (sendFile < 0 || !XLByteInSeg(recptr, sendSegNo)) { char fname[MAXFNAMELEN]; + int tries; /* Switch to another logfile segment */ if (sendFile >= 0) @@ -258,7 +259,20 @@ XLogDumpXLogRead(const char *directory, TimeLineID timeline_id, XLogFileName(fname, timeline_id, sendSegNo); - sendFile = fuzzy_open_file(directory, fname); + for (tries = 0; tries < 10; tries++) + { +sendFile = fuzzy_open_file(directory, fname); +if (sendFile >= 0) + break; +if (errno == ENOENT) +{ + /* File not there yet, try again */ + pg_usleep(500*1000); + continue; +} +/* Any other error, fall through and fail */ +break; + } if (sendFile < 0) fatal_error("could not find file \"%s\": %s", -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VACUUM's ancillary tasks
Thomas Munro writes: > I noticed that ATExecAlterColumnType does this: > * Drop any pg_statistic entry for the column, since it's now wrong type > What if there is no rewrite, because the type is binary compatible > (ATColumnChangeRequiresRewrite returns false)? Then I think your patch > won't attract an autovacuum daemon to ANALYZE the table and produce new > statistics, because it won't touch changes_since_analyze. I wonder if we > should simply keep the stats if no rewrite is required. Aren't the > statistical properties of binary-compatible types also compatible? Not necessarily: the type semantics might be different --- in fact, probably are different, else why would there be distinct types in the first place? It would be unwise to keep the old stats IMO. If you need a concrete example, consider OID vs int4. They're binary-compatible, but since int4 is signed while OID is unsigned, stats for one would be wrong for the other. This is the same reason why ALTER COLUMN TYPE has to rebuild indexes even in binary-compatible cases. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing GIN array_ops to work on anyarray
Enrique Meneses writes: > I was not sure what "Spec compliant means"... so I did not select as tested > or passed. What should I do to validate that this change is "Spec compliant"? It's irrelevant to this patch, AFAICS. The SQL standard doesn't discuss indexes at all, much less legislate on which operator classes ought to exist for GIN indexes. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing GIN array_ops to work on anyarray
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation:tested, passed I built and installed (make world / make install-world) github branch REL9_6_STABLE and applied the patch (patch -p1 < patch/gin-true-anyarray-opclass-1.patch). I then upgraded my development database (postgres 9.5) using pg_upgrade. This database has one table with an UUID array gin index. The database was upgraded correctly to postgresql 9.6 and I was able to successfully connect to it from a web application which uses the database. There were no conflicts so I expect others to be able to upgrade without issues. I then dropped the database and re-created it... I made sure that I no longer used my prior operator class definition. I re-started my web application and confirmed it works. This verifies the feature works as designed. The following is no longer required: CREATE OPERATOR CLASS _uuid_ops DEFAULT FOR TYPE _uuid USING gin AS OPERATOR 1 &&(anyarray, anyarray), OPERATOR 2 @>(anyarray, anyarray), OPERATOR 3 <@(anyarray, anyarray), OPERATOR 4 =(anyarray, anyarray), FUNCTION 1 uuid_cmp(uuid, uuid), FUNCTION 2 ginarrayextract(anyarray, internal, internal), FUNCTION 3 ginqueryarrayextract(anyarray, internal, smallint, internal, internal, internal, internal), FUNCTION 4 ginarrayconsistent(internal, smallint, anyarray, integer, internal, internal, internal, internal), STORAGE uuid; I also ran "make installcheck-world" and all the tests passed. The new status of this patch is: Waiting on Author -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing GIN array_ops to work on anyarray
I was not sure what "Spec compliant means"... so I did not select as tested or passed. What should I do to validate that this change is "Spec compliant"? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: psql \setfileref
Hi Pavel, I just tried to apply your patch psql-setfileref-initial.patch (using git apply) to the newest revision of postgres at the time (da6c4f6ca88) and it failed to patch startup.c. Thinking that the patch was for some previous revision, I then checked out d062245b5, which was from 2016-08-31, and tried applying the patch from there. I had the same problem: $ git apply psql-setfileref-initial.patch error: patch failed: src/bin/psql/startup.c:106 error: src/bin/psql/startup.c: patch does not apply However, when I applied the changes to startup.c manually and removed them from the patch, the rest of the patch applied without a problem. I don't know if I may have done something wrong in trying to apply the patch, but you may want to double check if you need to regenerate your patch from the latest revision so it will apply smoothly for reviewers. In the meantime, I haven't had a chance to try out the fileref feature yet but I'll give it a go when I get a chance! Thanks! Ryan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Aggregate Push Down - Performing aggregation on foreign server
This patch will need some changes to conversion_error_callback(). That function reports an error in case there was an error converting the result obtained from the foreign server into an internal datum e.g. when the string returned by the foreign server is not acceptable by local input function for the expected datatype. In such cases, the input function will throw error and conversion_error_callback() will provide appropriate context for that error. postgres_fdw.sql has tests to test proper context -- === -- conversion error -- === ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 TYPE int; SELECT * FROM ft1 WHERE c1 = 1; -- ERROR SELECT ft1.c1, ft2.c2, ft1.c8 FROM ft1, ft2 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = 1; -- ERROR SELECT ft1.c1, ft2.c2, ft1 FROM ft1, ft2 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = 1; -- ERROR ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 TYPE user_enum; Right now we report the column name in the error context. This needs to change for aggregate pushdown, which is pushing down expressions. Right now, in case the aggregate on foreign server produces a string unacceptable locally, it would crash at 4982 Assert(IsA(var, Var)); 4983 4984 rte = rt_fetch(var->varno, estate->es_range_table); since it's not a Var node as expected. We need to fix the error context to provide meaningful information or at least not crash. This has been discussed briefly in [1]. [1] https://www.postgresql.org/message-id/flat/CAFjFpRdcC7Ykb1SkARBYikx9ubKniBiAgHqMD9e47TxzY2EYFw%40mail.gmail.com#cafjfprdcc7ykb1skarbyikx9ubknibiaghqmd9e47txzy2e...@mail.gmail.com On Fri, Sep 16, 2016 at 7:15 PM, Jeevan Chalke wrote: > Hi, > > On Mon, Sep 12, 2016 at 5:17 PM, Jeevan Chalke > wrote: >> >> Thanks Ashutosh for the detailed review comments. >> >> I am working on it and will post updated patch once I fix all your >> concerns. >> >> > > Attached new patch fixing the review comments. > > Here are few comments on the review points: > > 1. Renamed deparseFromClause() to deparseFromExpr() and > deparseAggOrderBy() to appendAggOrderBy() > > 2. Done > > 3. classifyConditions() assumes list expressions of type RestrictInfo. But > HAVING clause expressions (and JOIN conditions) are plain expressions. Do > you mean we should modify the classifyConditions()? If yes, then I think it > should be done as a separate (cleanup) patch. > > 4, 5. Both done. > > 6. Per my understanding, I think checking for just aggregate function is > enough as we are interested in whole aggregate result. Meanwhile I will > check whether we need to find and check shippability of transition, > combination and finalization functions or not. > > 7, 8, 9, 10, 11, 12. All done. > > 13. Fixed. However instead of adding new function made those changes inline. > Adding support for deparsing List expressions separating list by comma can > be > considered as cleanup patch as it will touch other code area not specific to > aggregate push down. > > 14, 15, 16, 17. All done. > > 18. I think re-factoring estimate_path_cost_size() should be done separately > as a cleanup patch too. > > 19, 20, 21, 22, 23, 24, 25, 26, 27, 28. All done. > > 29. I have tried passing input rel's relids to fetch_upper_rel() call in > create_grouping_paths(). It solved this patch's issue, but ended up with > few regression failures which is mostly plan changes. I am not sure whether > we get good plan now or not as I have not analyzed them. > So for this patch, I am setting relids in add_foreign_grouping_path() > itself. > However as suggested, used bms_copy(). I still kept the FIXME over there as > I think it should have done by the core itself. > > 30, 31, 32, 33. All done. > > Let me know your views. > > Thanks > -- > Jeevan B Chalke > Principal Software Engineer, Product Development > EnterpriseDB Corporation > The Enterprise PostgreSQL Company > -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wal_segment size vs max_wal_size
On Mon, Sep 26, 2016 at 5:04 PM, Amit Kapila wrote: > > IIRC, there is already a patch to update the minRecoveryPoint > correctly, can you check if that solves the problem for you? > > [1] - > https://www.postgresql.org/message-id/20160609.215558.118976703.horiguchi.kyotaro%40lab.ntt.co.jp > +1. I've tested after applying the patch. This clearly solves the problem. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol
On 9/26/16 4:54 AM, Heikki Linnakangas wrote: > On 09/26/2016 09:02 AM, Michael Paquier wrote: >> On Mon, Sep 26, 2016 at 2:15 AM, David Steele >> wrote: >>> However, it doesn't look like they can be used in conjunction since the >>> pg_hba.conf entry must specify either m5 or scram (though the database >>> can easily contain a mixture). This would probably make a migration >>> very unpleasant. >> >> Yep, it uses a given auth-method once user and database match. This is >> partially related to the problem to support multiple password >> verifiers per users, which was submitted last CF but got rejected >> because of a lack of interest, and removed to simplify this patch. You >> need as well to think about other things like password and protocol >> aging. But well, it is a problem that we don't have to tackle with >> this patch... >> >>> Is there any chance of a mixed mode that will allow new passwords to be >>> set as scram while still honoring the old md5 passwords? Or does that >>> cause too many complications with the protocol? >> >> Hm. That looks complicated to me. This sounds to me like a retry logic >> if for multiple authentication methods, and a different feature. What >> you'd be looking for here is a connection parameter to specify a list >> of protocols and try them all, no? > > It would be possible to have a "md5-or-scram" authentication method in > pg_hba.conf, such that the server would look up the pg_authid row of the > user when it receives startup message, and send an MD5 or SCRAM > challenge depending on which one the user's password is encrypted with. > It has one drawback though: it allows an unauthenticated user to probe > if there is a role with a given name in the system, because if a user > doesn't exist, we'd have to still send an MD5 or SCRAM challenge, or a > "user does not exist" error without a challenge. If we send a SCRAM > challenge for a non-existent user, and the attacker knows that most > users still have a MD5 password, that reveals that the username doesn't > most likely doesn't exist. > > Hmm. The server could send a SCRAM challenge first, and if the client > gives an incorrect response, or the username doesn't exist, or the > user's password is actually MD5-encrypted, the server could then send an > MD5 challenge. It would add one round-trip to the authentication of MD5 > passwords, but that seems acceptable. > > We can do this as a follow-up patch though. Let's try to keep this patch > series small. Fair enough. I'm not even 100% sure we should do it, but wanted to raise it as a possible issue. -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transactions involving multiple postgres foreign servers
On Mon, Sep 26, 2016 at 5:25 PM, Masahiko Sawada wrote: > On Mon, Sep 26, 2016 at 7:28 PM, Ashutosh Bapat > wrote: >> My original patch added code to manage the files for 2 phase >> transactions opened by the local server on the remote servers. This >> code was mostly inspired from the code in twophase.c which manages the >> file for prepared transactions. The logic to manage 2PC files has >> changed since [1] and has been optimized. One of the things I wanted >> to do is see, if those optimizations are applicable here as well. Have >> you considered that? >> >> > > Yeah, we're considering it. > After these changes are committed, we will post the patch incorporated > these changes. > > But what we need to do first is the discussion in order to get consensus. > Since current design of this patch is to transparently execute DCL of > 2PC on foreign server, this code changes lot of code and is > complicated. Can you please elaborate. I am not able to understand what DCL is involved here. According to [1], examples of DCL are GRANT and REVOKE command. > Another approach I have is to push down DCL to only foreign servers > that support 2PC protocol, which is similar to DML push down. > This approach would be more simpler than current idea and is easy to > use by distributed transaction manager. Again, can you please elaborate, how that would be different from the current approach and how does it simplify the code. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add support for restrictive RLS policies
Hi, I have started reviewing this patch and here are couple of points I have observed so far: 1. Patch applies cleanly 2. make / make install / initdb all good. 3. make check (regression) FAILED. (Attached diff file for reference). Please have a look over failures. Meanwhile I will go ahead and review the code changes. Thanks -- Jeevan B Chalke Principal Software Engineer, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company regression.diffs Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transactions involving multiple postgres foreign servers
On Mon, Sep 26, 2016 at 7:28 PM, Ashutosh Bapat wrote: > My original patch added code to manage the files for 2 phase > transactions opened by the local server on the remote servers. This > code was mostly inspired from the code in twophase.c which manages the > file for prepared transactions. The logic to manage 2PC files has > changed since [1] and has been optimized. One of the things I wanted > to do is see, if those optimizations are applicable here as well. Have > you considered that? > > Yeah, we're considering it. After these changes are committed, we will post the patch incorporated these changes. But what we need to do first is the discussion in order to get consensus. Since current design of this patch is to transparently execute DCL of 2PC on foreign server, this code changes lot of code and is complicated. Another approach I have is to push down DCL to only foreign servers that support 2PC protocol, which is similar to DML push down. This approach would be more simpler than current idea and is easy to use by distributed transaction manager. I think that it would be good place to start. I'd like to discuss what the best approach is for transaction involving foreign servers. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add support for restrictive RLS policies
On Mon, Sep 12, 2016 at 7:27 AM, Stephen Frost wrote: > Robert, > > * Robert Haas (robertmh...@gmail.com) wrote: > > On Thu, Sep 8, 2016 at 5:21 PM, Tom Lane wrote: > > > Stephen Frost writes: > > >> * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: > > >>> Can't you keep those words as Sconst or something (DefElems?) until > the > > >>> execution phase, so that they don't need to be keywords at all? > > > > > >> Seems like we could do that, though I'm not convinced that it really > > >> gains us all that much. These are only unreserved keywords, of > course, > > >> so they don't impact users the way reserved keywords (of any kind) > can. > > >> While there may be some places where we use a string to represent a > set > > >> of defined options, I don't believe that's typical > > > > > > -1 for having to write them as string literals; but I think what Alvaro > > > really means is to arrange for the words to just be identifiers in the > > > grammar, which you strcmp against at execution. See for example > > > reloption_list. (Whether you use DefElem as the internal > representation > > > is a minor detail, though it might help for making the parsetree > > > copyObject-friendly.) > > > > > > vacuum_option_elem shows another way to avoid making a word into a > > > keyword, although to me that one is more of an antipattern; it'd be > better > > > to leave the strcmp to execution, since there's so much other code that > > > does things that way. > > > > There are other cases like that, too, e.g. AlterOptRoleElem; I don't > > think it's a bad pattern. Regardless of the specifics, I do think > > that it would be better not to bloat the keyword table with things > > that don't really need to be keywords. > > The AlterOptRoleElem case is, I believe, what Tom was just suggesting as > an antipattern, since the strcmp() is being done at parse time instead > of at execution time. > > If we are concerned about having too many unreserved keywords, then I > agree that AlterOptRoleElem is a good candidate to look at for reducing > the number we have, as it appears to contain 3 keywords which are not > used anywhere else (and 1 other which is only used in one other place). > > I do think that using IDENT for the various role attributes does make > sense, to be clear, as there are quite a few of them, they change > depending on major version, and those keywords are very unlikely to ever > have utilization elsewhere. > > For this case, there's just 2 keywords which seem like they may be used > again (perhaps for ALTER or DROP POLICY, or default policies if we grow > those in the future), and we're unlikly to grow more in the future for > that particular case (as we only have two binary boolean operators and > that seems unlikely to change), though, should that happens, we could > certainly revisit this. > To me, adding two new keywords for two new options does not look good as it will bloat keywords list. Per my understanding we should add keyword if and only if we have no option than adding at, may be to avoid grammar conflicts. I am also inclined to think that using identifier will be a good choice here. However I would prefer to do the string comparison right into the grammar itself, so that if we have wrong option as input there, then we will not proceed further with it. We are anyway going to throw an error later then why not at early stage. Thanks > > Thanks! > > Stephen > -- Jeevan B Chalke Principal Software Engineer, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company
Re: [HACKERS] wal_segment size vs max_wal_size
On Mon, Sep 26, 2016 at 4:00 PM, Kuntal Ghosh wrote: > On Wed, Sep 21, 2016 at 8:33 PM, Peter Eisentraut > wrote: >> There is apparently some misbehavior if max_wal_size is less than 5 * >> wal_segment_size. >> > >> This should probably be made friendlier in some way. But it also shows >> that bigger WAL segment sizes are apparently not well-chartered >> territory lately. >> > Well, there can be multiple solutions to this problem. > 1. If somebody intends to increase wal segment size, he should > increase max_wal_size accordingly. > 2. In recovery test, we can add some delay before taking backup so > that the pending logs in the buffer > gets flushed. (Not a good solution) > 3. In CreateRestartPoint() method, we can force a XLogFlush to update > minRecoveryPoint. > IIRC, there is already a patch to update the minRecoveryPoint correctly, can you check if that solves the problem for you? [1] - https://www.postgresql.org/message-id/20160609.215558.118976703.horiguchi.kyotaro%40lab.ntt.co.jp -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench - allow to store select results into variables
Hello Amit, I am marking the pgbench-into-5.patch [1] as "Ready for Committer" as I have no further comments at the moment. Wait... Heikki's latest commit now requires this patch to be rebased. Indeed. Here is the rebased version, which still get through my various tests. -- Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index 285608d..0a474e1 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -809,6 +809,30 @@ pgbench options dbname + + + \into var1 [var2 ...] + + + + + Stores the first fields of the resulting row from the current or preceding + SQL command into these variables. + The queries must yield exactly one row and the number of provided + variables must be less than the total number of columns of the results. + This meta command does not end the current SQL command. + + + + Example: + +SELECT abalance \into abalance + FROM pgbench_accounts WHERE aid=5432; + + + + + \set varname expression diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index 1fb4ae4..9c13a18 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -373,11 +373,14 @@ static const char *QUERYMODE[] = {"simple", "extended", "prepared"}; typedef struct { - char *line; /* text of command line */ + char *line; /* first line for short display */ + char *lines; /* full multi-line text of command */ int command_num; /* unique index of this Command struct */ int type; /* command type (SQL_COMMAND or META_COMMAND) */ int argc; /* number of command words */ char *argv[MAX_ARGS]; /* command word list */ + int compound; /* last compound command (number of \;) */ + char ***intovars; /* per-compound command \into variables */ PgBenchExpr *expr; /* parsed expression, if needed */ SimpleStats stats; /* time spent in this command */ } Command; @@ -1221,6 +1224,110 @@ getQueryParams(CState *st, const Command *command, const char **params) params[i] = getVariable(st, command->argv[i + 1]); } +/* read all responses from backend */ +static bool +read_response(CState *st, char ***intovars) +{ + PGresult *res; + int compound = -1; + + while ((res = PQgetResult(st->con)) != NULL) + { + compound += 1; + + switch (PQresultStatus(res)) + { + case PGRES_COMMAND_OK: /* non-SELECT commands */ + case PGRES_EMPTY_QUERY: /* may be used for testing no-op overhead */ + if (intovars[compound] != NULL) + { +fprintf(stderr, + "client %d file %d command %d compound %d: " + "\\into expects a row\n", + st->id, st->use_file, st->command, compound); +st->ecnt++; +return false; + } + break; /* OK */ + + case PGRES_TUPLES_OK: + if (intovars[compound] != NULL) + { +/* store result into variables */ +int ntuples = PQntuples(res), + nfields = PQnfields(res), + f = 0; + +if (ntuples != 1) +{ + fprintf(stderr, + "client %d file %d command %d compound %d: " + "expecting one row, got %d\n", + st->id, st->use_file, st->command, compound, ntuples); + st->ecnt++; + PQclear(res); + discard_response(st); + return false; +} + +while (intovars[compound][f] != NULL && f < nfields) +{ + /* store result as a string */ + if (!putVariable(st, "into", intovars[compound][f], + PQgetvalue(res, 0, f))) + { + /* internal error, should it rather abort? */ + fprintf(stderr, +"client %d file %d command %d compound %d: " +"error storing into var %s\n", +st->id, st->use_file, st->command, compound, +intovars[compound][f]); + st->ecnt++; + PQclear(res); + discard_response(st); + return false; + } + + f++; +} + +if (intovars[compound][f] != NULL) +{ + fprintf(stderr, + "client %d file %d command %d compound %d: missing results" + " to fill into variable %s\n", + st->id, st->use_file, st->command, compound, + intovars[compound][f]); + st->ecnt++; + return false; +} + } + break; /* OK */ + + default: + /* everything else is unexpected, so probably an error */ + fprintf(stderr, "client %d file %d aborted in command %d compound %d: %s", + st->id, st->use_file, st->command, compound, + PQerrorMessage(st->con)); + st->ecnt++; + PQclear(res); + discard_response(st); + return false; + } + + PQclear(res); + } + + if (compound == -1) + { + fprintf(stderr, "client %d command %d: no results\n", st->id, st->command); + st->ecnt++; + return false; + } + + return true; +} + /* get a value as an int, tell if there is a problem */ static bool coerceToInt(PgBenchValue *pval, int64 *ival) @@ -1953,7 +2060,6 @@ evaluateSleep(CState *st, int argc, char **argv, int *usecs) static void doCustom(TState *thread, CState *st, StatsData *agg) { - PGresul
Re: [HACKERS] Push down more full joins in postgres_fdw
On Mon, Sep 26, 2016 at 4:06 PM, Etsuro Fujita wrote: > On 2016/09/26 18:06, Ashutosh Bapat wrote: >> >> On Mon, Sep 26, 2016 at 1:05 PM, Etsuro Fujita >> wrote: > > >>> ISTM that the use of the same RTI for subqueries in multi-levels in a >>> remote >>> SQL makes the SQL a bit difficult to read. How about using the position >>> of >>> the join rel in join_rel_list, (more precisely, the position plus >>> list_length(root->parse->rtable)), instead? > > >> We switch to hash table to maintain the join RelOptInfos when the >> number of joins grows larger, where the position won't make much >> sense. > > > That's right, but we still store the joinrel into join_rel_list after > creating that hash table. That hash table is just for fast lookup. See > build_join_rel. Sorry, I wasn't clear in my reply. As the list grows, it will take longer to locate the RelOptInfo of the given join. Doing that for creating an alias seems an overkill. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Push down more full joins in postgres_fdw
On 2016/09/26 18:06, Ashutosh Bapat wrote: On Mon, Sep 26, 2016 at 1:05 PM, Etsuro Fujita wrote: ISTM that the use of the same RTI for subqueries in multi-levels in a remote SQL makes the SQL a bit difficult to read. How about using the position of the join rel in join_rel_list, (more precisely, the position plus list_length(root->parse->rtable)), instead? We switch to hash table to maintain the join RelOptInfos when the number of joins grows larger, where the position won't make much sense. That's right, but we still store the joinrel into join_rel_list after creating that hash table. That hash table is just for fast lookup. See build_join_rel. We might differentiate between a base relation alias and subquery alias by using different prefixes like "r" and "s" resp. Hmm. My concern about that would the recursive use of "s" with the same RTI as subquery aliases for different level subqueries in a single remote SQL. For example, if those subqueries include a base rel with RTI=1, then "s1" would be used again and again within that SQL. That would be logically correct, but seems confusing to me. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wal_segment size vs max_wal_size
On Wed, Sep 21, 2016 at 8:33 PM, Peter Eisentraut wrote: > There is apparently some misbehavior if max_wal_size is less than 5 * > wal_segment_size. > > For example, if you build with --with-wal-segsize=64, then the recovery > test fails unless you set max_wal_size to at least 320MB in > PostgresNode.pm. The issue is that pg_basebackup fails with: > In recovery tests, max_wal_size is set to 128MB. Now, when you build with --with-wal-segsize=64, max_wal_size is calculated as follows: max_wal_size = 128 / (64 * 1024 * 1024) / (1024 * 1024) = 2. and CheckPointSegments is calculated as follows: CheckPointSegments = 2 / (2 + 0.5) = 0.8 rounded to 1. (Default is 3) Hence, checkpoints occurs very frequently at master. > pg_basebackup: could not get transaction log end position from server: > ERROR: could not find any WAL files This error occurs when the recovery test tries to take backup from the standby using the above settings. pg_basebackup scans pg_xlog and include all WAL files in the range between 'startptr' and 'endptr', regardless of the timeline the file is stamped with. 'startptr' is initialized to ControlFile->checkPointCopy.redo and 'endptr' is initialized to ControlFile->minRecoveryPoint. Now, whenever we redo a CHECKPOINT_ONLINE log, we update checkPointCopy.redo and whenever we flush logs, we update minRecoveryPoint. In this case, we are having frequent checkpoints at master which in turn updates checkPointCopy.redo in standy frequently. Sometimes, it even goes ahead of minRecoveryPoint. At this point, if you call pg_basebackup, it will throw the aforesaid error. > This should probably be made friendlier in some way. But it also shows > that bigger WAL segment sizes are apparently not well-chartered > territory lately. > Well, there can be multiple solutions to this problem. 1. If somebody intends to increase wal segment size, he should increase max_wal_size accordingly. 2. In recovery test, we can add some delay before taking backup so that the pending logs in the buffer gets flushed. (Not a good solution) 3. In CreateRestartPoint() method, we can force a XLogFlush to update minRecoveryPoint. Thoughts? -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transactions involving multiple postgres foreign servers
My original patch added code to manage the files for 2 phase transactions opened by the local server on the remote servers. This code was mostly inspired from the code in twophase.c which manages the file for prepared transactions. The logic to manage 2PC files has changed since [1] and has been optimized. One of the things I wanted to do is see, if those optimizations are applicable here as well. Have you considered that? [1]. https://www.postgresql.org/message-id/74355FCF-AADC-4E51-850B-47AF59E0B215%40postgrespro.ru On Fri, Aug 26, 2016 at 11:43 AM, Ashutosh Bapat wrote: > > > On Fri, Aug 26, 2016 at 11:37 AM, Masahiko Sawada > wrote: >> >> On Fri, Aug 26, 2016 at 3:03 PM, Ashutosh Bapat >> wrote: >> > >> > >> > On Fri, Aug 26, 2016 at 11:22 AM, Masahiko Sawada >> > >> > wrote: >> >> >> >> On Fri, Aug 26, 2016 at 1:32 PM, Vinayak Pokale >> >> wrote: >> >> > Hi All, >> >> > >> >> > Ashutosh proposed the feature 2PC for FDW for achieving atomic >> >> > commits >> >> > across multiple foreign servers. >> >> > If a transaction make changes to more than two foreign servers the >> >> > current >> >> > implementation in postgres_fdw doesn't make sure that either all of >> >> > them >> >> > commit or all of them rollback their changes. >> >> > >> >> > We (Masahiko Sawada and me) reopen this thread and trying to >> >> > contribute >> >> > in >> >> > it. >> >> > >> >> > 2PC for FDW >> >> > >> >> > The patch provides support for atomic commit for transactions >> >> > involving >> >> > foreign servers. when the transaction makes changes to foreign >> >> > servers, >> >> > either all the changes to all the foreign servers commit or rollback. >> >> > >> >> > The new patch 2PC for FDW include the following things: >> >> > 1. The patch 0001 introduces a generic feature. All kinds of FDW that >> >> > support 2PC such as oracle_fdw, mysql_fdw, postgres_fdw etc. can >> >> > involve >> >> > in >> >> > the transaction. >> >> > >> >> > Currently we can push some conditions down to shard nodes, especially >> >> > in >> >> > 9.6 >> >> > the directly modify feature has >> >> > been introduced. But such a transaction modifying data on shard node >> >> > is >> >> > not >> >> > executed surely. >> >> > Using 0002 patch, that modify is executed with 2PC. It means that we >> >> > almost >> >> > can provide sharding solution using >> >> > multiple PostgreSQL server (one parent node and several shared node). >> >> > >> >> > For multi master, we definitely need transaction manager but >> >> > transaction >> >> > manager probably can use this 2PC for FDW feature to manage >> >> > distributed >> >> > transaction. >> >> > >> >> > 2. 0002 patch makes postgres_fdw possible to use 2PC. >> >> > >> >> > 0002 patch makes postgres_fdw to use below APIs. These APIs are >> >> > generic >> >> > features which can be used by all kinds of FDWs. >> >> > >> >> > a. Execute PREAPRE TRANSACTION and COMMIT/ABORT PREAPRED instead >> >> > of >> >> > COMMIT/ABORT on foreign server which supports 2PC. >> >> > b. Manage information of foreign prepared transactions resolver >> >> > >> >> > Masahiko Sawada will post the patch. >> >> > >> >> > >> >> >> > >> > Thanks Vinayak and Sawada-san for taking this forward and basing your >> > work >> > on my patch. >> > >> >> >> >> Still lot of work to do but attached latest patches. >> >> These are based on the patch Ashutosh posted before, I revised it and >> >> divided into two patches. >> >> Compare with original patch, patch of pg_fdw_xact_resolver and >> >> documentation are lacked. >> > >> > >> > I am not able to understand the last statement. >> >> Sorry to confuse you. >> >> > Do you mean to say that your patches do not have pg_fdw_xact_resolver() >> > and >> > documentation that my patches had? >> >> Yes. >> I'm confirming them that your patches had. > > > Thanks for the clarification. I had added pg_fdw_xact_resolver() to resolve > any transactions which can not be resolved immediately after they were > prepared. There was a comment from Kevin (IIRC) that leaving transactions > unresolved on the foreign server keeps the resources locked on those > servers. That's not a very good situation. And nobody but the initiating > server can resolve those. That functionality is important to make it a > complete 2PC solution. So, please consider it to be included in your first > set of patches. > > -- > Best Wishes, > Ashutosh Bapat > EnterpriseDB Corporation > The Postgres Database Company -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Push down more full joins in postgres_fdw
On Mon, Sep 26, 2016 at 1:05 PM, Etsuro Fujita wrote: > On 2016/09/15 15:29, Ashutosh Bapat wrote: >> >> On Wed, Sep 14, 2016 at 8:52 PM, Robert Haas >> wrote: > > >>> I'm not sure why it wouldn't work >>> to just use the lowest RTI involved in the join, though; the others >>> won't appear anywhere else at that query level. > > >> So +1 for >> using the smallest RTI to indicate a subquery. > > > +1 for the general idea. > > ISTM that the use of the same RTI for subqueries in multi-levels in a remote > SQL makes the SQL a bit difficult to read. How about using the position of > the join rel in join_rel_list, (more precisely, the position plus > list_length(root->parse->rtable)), instead? > We switch to hash table to maintain the join RelOptInfos when the number of joins grows larger, where the position won't make much sense. We might differentiate between a base relation alias and subquery alias by using different prefixes like "r" and "s" resp. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Aggregate Push Down - Performing aggregation on foreign server
Thanks Jeevan for working on the comments. > > 3. classifyConditions() assumes list expressions of type RestrictInfo. But > HAVING clause expressions (and JOIN conditions) are plain expressions. Do > you mean we should modify the classifyConditions()? If yes, then I think it > should be done as a separate (cleanup) patch. Ok. Yes, we should also handle bare conditions in classifyConditions(). I am fine doing it as a separate patch. > > 6. Per my understanding, I think checking for just aggregate function is > enough as we are interested in whole aggregate result. Meanwhile I will > check whether we need to find and check shippability of transition, > combination and finalization functions or not. Ok. I agree with this analysis. > 13. Fixed. However instead of adding new function made those changes inline. > Adding support for deparsing List expressions separating list by comma can > be > considered as cleanup patch as it will touch other code area not specific to > aggregate push down. Ok. > > 18. I think re-factoring estimate_path_cost_size() should be done separately > as a cleanup patch too. Ok. > > 29. I have tried passing input rel's relids to fetch_upper_rel() call in > create_grouping_paths(). It solved this patch's issue, but ended up with > few regression failures which is mostly plan changes. I am not sure whether > we get good plan now or not as I have not analyzed them. > So for this patch, I am setting relids in add_foreign_grouping_path() > itself. > However as suggested, used bms_copy(). I still kept the FIXME over there as > I think it should have done by the core itself. I don't think add_foreign_grouping_path() is the right place to change a structure managed by the core and esp when we are half-way adding paths. An FDW should not meddle with an upper RelOptInfo. Since create_foreignscan_plan() picks those up from RelOptInfo and both of those are part of core, we need a place in core to set the RelOptInfo::relids for an upper relation OR we have stop using fs_relids for upper relation foreign scans. Here are some more comments on the patch, mostly focused on the tests. 1. If we dig down each usage of deparse_expr_cxt::scanrel, it boils down checking whether a given Var comes from given scan relation (deparseVar()) or checking whether a given Var needs relation qualification while deparsing (again deparseVar()). I think both of those purposes can be served by looking at scanrel::relids, multiple relids implying relation qualification. So, instead of having a separate member scan_rel, we should instead fetch the relids from deparse_expr_cxt::foreignrel. I am assuming that the proposed change to set relids in upper relations is acceptable. If it's not, we should pass scanrel->relids through the context instead of scanrel itself. 2. SortGroupClause is a parser node, so we can name appendSortGroupClause() as deparseSortGroupClause(), to be consistent with the naming convention. If we do that change, may be it's better to pass SortGroupClause as is and handle other members of that structure as well. 3. Following block may be reworded +/* + * aggorder too present into args so no need to check its + * shippability explicitly. However, if any expression has + * USING clause with sort operator, we need to make sure the + * shippability of that operator. + */ as "For aggorder elements, check whether the sort operator, if specified, is shippable or not." and mention aggorder expression along with aggdirectargs in the comment before this one. 4. Following line is correct as long as there is only one upper relation. +context.scanrel = (rel->reloptkind == RELOPT_UPPER_REL) ? fpinfo->outerrel : rel; But as we push down more and more of upper operation, there will be a chain of upper relations on top of scan relation. So, it's better to assert that the scanrel is a join or a base relation to be future proof. 5. In deparseConst(), showtype is compared with hardcoded values. The callers of this function pass hardcoded values. Instead, we should define an enum and use it. I understand that this logic has been borrowed from get_const_expr(), which also uses hardcoded values. Any reason why not to adopt a better style here? In fact, it only uses two states for showtype, 0 and "> 0". Why don't we use a boolean then OR why isn't the third state in get_const_expr() applicable here? 6. "will" or "should" is missing from the following sentence. "Plain var nodes either be same as some GROUP BY expression or part of some GROUP BY expression. 7. The changes in block else { /* * If we have sortgroupref set, then it means that we have an * ORDER BY entry pointing to this expression. Since we are * not pushing ORDER BY with GROUP BY, clear it. */ if (sgref) gr
Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol
On 09/26/2016 09:02 AM, Michael Paquier wrote: On Mon, Sep 26, 2016 at 2:15 AM, David Steele wrote: However, it doesn't look like they can be used in conjunction since the pg_hba.conf entry must specify either m5 or scram (though the database can easily contain a mixture). This would probably make a migration very unpleasant. Yep, it uses a given auth-method once user and database match. This is partially related to the problem to support multiple password verifiers per users, which was submitted last CF but got rejected because of a lack of interest, and removed to simplify this patch. You need as well to think about other things like password and protocol aging. But well, it is a problem that we don't have to tackle with this patch... Is there any chance of a mixed mode that will allow new passwords to be set as scram while still honoring the old md5 passwords? Or does that cause too many complications with the protocol? Hm. That looks complicated to me. This sounds to me like a retry logic if for multiple authentication methods, and a different feature. What you'd be looking for here is a connection parameter to specify a list of protocols and try them all, no? It would be possible to have a "md5-or-scram" authentication method in pg_hba.conf, such that the server would look up the pg_authid row of the user when it receives startup message, and send an MD5 or SCRAM challenge depending on which one the user's password is encrypted with. It has one drawback though: it allows an unauthenticated user to probe if there is a role with a given name in the system, because if a user doesn't exist, we'd have to still send an MD5 or SCRAM challenge, or a "user does not exist" error without a challenge. If we send a SCRAM challenge for a non-existent user, and the attacker knows that most users still have a MD5 password, that reveals that the username doesn't most likely doesn't exist. Hmm. The server could send a SCRAM challenge first, and if the client gives an incorrect response, or the username doesn't exist, or the user's password is actually MD5-encrypted, the server could then send an MD5 challenge. It would add one round-trip to the authentication of MD5 passwords, but that seems acceptable. We can do this as a follow-up patch though. Let's try to keep this patch series small. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench - allow to store select results into variables
On 2016/09/26 16:12, Amit Langote wrote: > I am marking the pgbench-into-5.patch [1] as "Ready for Committer" as I > have no further comments at the moment. Wait... Heikki's latest commit now requires this patch to be rebased. commit 12788ae49e1933f463bc59a6efe46c4a01701b76 Author: Heikki Linnakangas Date: Mon Sep 26 10:56:02 2016 +0300 Refactor script execution state machine in pgbench. So, will change the status to "Waiting on Author". Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL logging problem in 9.4.3?
Hello, I return to this before my things:) Though I haven't played with the patch yet.. At Fri, 29 Jul 2016 16:54:42 +0900, Michael Paquier wrote in > > Well, not that soon at the end, but I am back on that... I have not > > completely reviewed all the code yet, and the case of index relation > > referring to a relation optimized with truncate is still broken, but > > for now here is a rebased patch if people are interested. I am going > > to get as well a TAP tests out of my pocket to ease testing. > > The patch I sent yesterday was based on an incorrect version. Attached > is a slightly-modified version of the last one I found here > (https://www.postgresql.org/message-id/56b342f5.1050...@iki.fi), which > is rebased on HEAD at ed0b228. I have also converted the test case > script of upthread into a TAP test in src/test/recovery that covers 3 > cases and I included that in the patch: > 1) CREATE + INSERT + COPY => crash > 2) CREATE + trigger + COPY => crash > 3) CREATE + TRUNCATE + COPY => incorrect number of rows. > The first two tests make the system crash, the third one reports an > incorrect number of rows. At the first glance, managing sync_above and truncate_to is workable for these cases, but seems too complicated for the problem to be resolved. This provides smgr with a capability to manage pending page synchs. But the postpone-page-syncs-or-not issue rather seems to be a matter of the users of that, who are responsible for WAL issueing. Anyway heap_resgister_sync doesn't use any secret of smgr. So I think this approach binds smgr with Relation too tightly. By this patch, many RelationNeedsWALs, which just accesses local struct, are replaced with HeapNeedsWAL, which eventually accesses a hash added by this patch. Especially in log_heap_update, it is called for every update of single tuple (on a relation that needs WAL). Though I don't know how it actually impacts the perfomance, it seems to me that we can live with truncated_to and sync_above in RelationData and BufferNeedsWAL(rel, buf) instead of HeapNeedsWAL(rel, buf). Anyway up to one entry for one relation seems to exist at once in the hash. What do you think? regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench - minor fix for meta command only scripts
On 09/24/2016 12:45 PM, Fabien COELHO wrote: Although I cannot be absolutely sure that the refactoring does not introduce any new bug, I'm convinced that it will be much easier to find them:-) :-) Attached are some small changes to your version: I have added the sleep_until fix. I have fixed a bug introduced in the patch by changing && by || in the (min_sec > 0 && maxsock != -1) condition which was inducing errors with multi-threads & clients... I have factored out several error messages in "commandFailed", in place of the "metaCommandFailed", and added the script number as well in the error messages. All messages are now specific to the failed command. I have added two states to the machine: - CSTATE_CHOOSE_SCRIPT which simplifies threadRun, there is now one call to chooseScript instead of two before. - CSTATE_END_COMMAND which manages is_latencies and proceeding to the next command, thus merging the three instances of updating the stats that were in the first version. The later state means that processing query results is included in the per statement latency, which is an improvement because before I was getting some transaction latency significantly larger that the apparent sum of the per-statement latencies, which did not make much sense... Ok. I agree that makes more sense. I have added & updated a few comments. Thanks! Committed. There are some places where the break could be a pass through instead, not sure how desirable it is, I'm fine with break. I left them as "break". Pass-throughs are error-prone, and make it more difficult to read, IMHO. The compiler will optimize it into a pass-through anyway, if possible and worthwhile, so there should be no performance difference. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
Hi Ashutosh, On 2016/09/22 14:42, Ashutosh Bapat wrote: > Hi Amit, > Following sequence of DDLs gets an error > -- > -- multi-leveled partitions > -- > CREATE TABLE prt1_l (a int, b int, c varchar) PARTITION BY RANGE(a); > CREATE TABLE prt1_l_p1 PARTITION OF prt1_l FOR VALUES START (0) END > (250) PARTITION BY RANGE (b); > CREATE TABLE prt1_l_p1_p1 PARTITION OF prt1_l_p1 FOR VALUES START (0) END > (100); > CREATE TABLE prt1_l_p1_p2 PARTITION OF prt1_l_p1 FOR VALUES START > (100) END (250); > CREATE TABLE prt1_l_p2 PARTITION OF prt1_l FOR VALUES START (250) END > (500) PARTITION BY RANGE (c); > CREATE TABLE prt1_l_p2_p1 PARTITION OF prt1_l_p2 FOR VALUES START > ('0250') END ('0400'); > CREATE TABLE prt1_l_p2_p2 PARTITION OF prt1_l_p2 FOR VALUES START > ('0400') END ('0500'); > CREATE TABLE prt1_l_p3 PARTITION OF prt1_l FOR VALUES START (500) END > (600) PARTITION BY RANGE ((b + a)); > ERROR: cannot use column or expression from ancestor partition key > > The last statement is trying create subpartitions by range (b + a), > which contains a partition key from ancestor partition key but is not > exactly same as that. In fact it contains some extra columns other > than the ancestor partition key columns. Why do we want to prohibit > such cases? Per discussion [1], I am going to remove this ill-considered restriction. Thanks, Amit [1] https://www.postgresql.org/message-id/CA%2BHiwqEXAU_m%2BV%3Db-VGmsDNjoqc-Z_9KQdyPuOGbiQGzNObmVg%40mail.gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Odd system-column handling in postgres_fdw join pushdown patch
On 2016/09/21 0:40, Robert Haas wrote: On Fri, Jul 1, 2016 at 3:10 AM, Etsuro Fujita wrote: On 2016/04/14 4:57, Robert Haas wrote: 1. For a regular FDW scan, zero the xmin, xmax, and cid of the tuple before returning it from postgres_fdw, so that we don't expose the datum-tuple fields. I can't see any reason this isn't safe, but I might be missing something. I noticed odd behavior of this in EvalPlanQual. Consider: -- session 1 postgres=# create extension postgres_fdw; CREATE EXTENSION postgres=# create server fs foreign data wrapper postgres_fdw options (dbname 'postgres'); CREATE SERVER postgres=# create user mapping for public server fs; CREATE USER MAPPING postgres=# create table t1 (a int, b int); CREATE TABLE postgres=# create table t2 (a int, b int); CREATE TABLE postgres=# insert into t1 values (1, 1); INSERT 0 1 postgres=# insert into t2 values (1, 1); INSERT 0 1 postgres=# create foreign table ft1 (a int, b int) server fs options (table_name 't1'); CREATE FOREIGN TABLE postgres=# select xmin, xmax, cmin, * from ft1; xmin | xmax | cmin | a | b --+--+--+---+--- 0 |0 |0 | 1 | 1 (1 row) -- session 2 postgres=# begin; BEGIN postgres=# update t2 set a = a; UPDATE 1 -- session 1 postgres=# select ft1.xmin, ft1.xmax, ft1.cmin, ft1.* from ft1, t2 for update; -- session 2 postgres=# commit; COMMIT -- session 1 (will show the following) xmin |xmax| cmin | a | b --++---+---+--- 128 | 4294967295 | 16398 | 1 | 1 (1 row) The values of xmin, xmax, and cmin are not 0! The reason for that is that we don't zero these values in a test tuple stored by EvalPlanQualFetchRowMarks for EvalPlanQual re-evaluations. That cleanup applies to the file_fdw foreign table case as well, so I think xmin, xmax, and cid in tuples from such tables should be set to 0, too. And ISTM that it's better that the core (ie, ForeignNext) supports doing that, rather than each FDW does that work. That would also minimize the overhead because ForeignNext does that if needed. Please find attached a patch. If you want this to be considered, you'll need to rebase it and submit it to the next CommitFest. Will do. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Push down more full joins in postgres_fdw
On 2016/09/15 15:29, Ashutosh Bapat wrote: On Wed, Sep 14, 2016 at 8:52 PM, Robert Haas wrote: I'm not sure why it wouldn't work to just use the lowest RTI involved in the join, though; the others won't appear anywhere else at that query level. So +1 for using the smallest RTI to indicate a subquery. +1 for the general idea. ISTM that the use of the same RTI for subqueries in multi-levels in a remote SQL makes the SQL a bit difficult to read. How about using the position of the join rel in join_rel_list, (more precisely, the position plus list_length(root->parse->rtable)), instead? Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Push down more full joins in postgres_fdw
On 2016/09/13 14:17, Ashutosh Bapat wrote: But another concern about the approach of generating an subselect alias for a path, if needed, at the path creation time would be that the path might be rejected by add_path, which would result in totally wasting cycles for generating the subselect alias for the path. A path may get rejected but the relation is not rejected. The alias applies to a relation and its targetlist, which will remain same for all paths created for that relation, esp when it's a base relation or join. So, AFAIU that work never gets wasted. I think there is no guarantee that add_path won't reject foreign join paths. The possibility would be very low, though. However minimum overhead it might have, we will deparse the query every time we create a path involving that kind of relation i.e. for different pathkeys, different parameterization and different joins in which that relation participates in. This number can be significant. We want to avoid this overhead if we can. Exactly. As I said above, I think the overhead would be negligible compared to the overhead in explaining each remote query for costing or the overhead in sending the final remote query for execution, though. It won't remain minimal as the number of paths created increases, increasing the number of times a query is deparsed. We deparse query every time time we cost a path for a relation with use_remote_estimates true. As we try to push down more and more stuff, we will create more paths and deparse the query more time. Also, that makes the interface better. Right now, in your patch, you have changed the order of deparsing in the existing code, so that the aliases are registered while deparsing FROM clause and before any Var nodes are deparsed. If we create aliases at the time of path creation, only once in GetForeignJoinPaths or GetForeignPaths as appropriate, that would require less code churn and would save some CPU cycles as well. Agreed. Will fix. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transactions involving multiple postgres foreign servers
On 2016/09/07 10:54, vinayak wrote: Thanks for the clarification. I had added pg_fdw_xact_resolver() to resolve any transactions which can not be resolved immediately after they were prepared. There was a comment from Kevin (IIRC) that leaving transactions unresolved on the foreign server keeps the resources locked on those servers. That's not a very good situation. And nobody but the initiating server can resolve those. That functionality is important to make it a complete 2PC solution. So, please consider it to be included in your first set of patches. The attached patch included pg_fdw_xact_resolver. The attached patch includes the documentation. Regards, Vinayak Pokale NTT Open Source Software Center 0001-Support-transaction-with-foreign-servers.patch Description: application/download -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench - allow to store select results into variables
Hi Fabien, I am marking the pgbench-into-5.patch [1] as "Ready for Committer" as I have no further comments at the moment. Thanks, Amit [1] https://www.postgresql.org/message-id/alpine.DEB.2.20.1609130730380.10870%40lancre -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers