Re: [HACKERS] PATCH: Batch/pipelining support for libpq
On Fri, Feb 17, 2017 at 2:17 PM, Prabakaran, Vaishnaviwrote: > On 22 November 2016 at 18:32, Craig Ringer wrote: > I am interested in this patch and addressed various below comments from > reviewers. And, I have separated out code and test module into 2 patches. So, > If needed, test patch can be enhanced more, meanwhile code patch can be > committed. Cool. Nice to see a new version of this patch. (You may want to your replies breath a bit, for example by adding an extra newline to the paragraph you are answering to.) >>Renaming and refactoring new APIs >> +PQisInBatchMode 172 >>+PQqueriesInBatch 173 >>+PQbeginBatchMode 174 >>+PQendBatchMode175 >>+PQsendEndBatch176 >>+PQgetNextQuery177 >>+PQbatchIsAborted 178 >>This set of routines is a bit inconsistent. Why not just prefixing them with >>PQbatch? Like that for example: >> PQbatchStatus(): consists of disabled/inactive/none, active, error. This >> covers both PQbatchIsAborted() and PQisInBatchMode(). >>PQbatchBegin() >>PQbatchEnd() >>PQbatchQueueSync() or PQbatchQueueFlush, same as PQsendEndBatch() to add and >>process a sync message into the queue. > > Renamed and modified batch status APIs as below > PQisInBatchMode & PQbatchIsAborted ==> PQbatchStatus > PQqueriesInBatch ==> PQbatchQueueCount > PQbeginBatchMode ==> PQbatchBegin > PQendBatchMode ==> PQbatchEnd > PQsendEndBatch ==> PQbatchQueueSync > PQgetNextQuery ==> PQbatchQueueProcess Thanks. This seems a way cleaner interface to me. Others may have a different opinion so let's see if there are some. >>PQbatchQueueCount(): returns N>0 if there are N entries, 0 if empty,-1 on >>failure >>PQbatchQueueProcess(): returns 1 if process can begin, 0 if not, -1 on >>failure (OOM) > > I think it is still ok to keep the current behaviour like other ones present > in the same file. E.g:"PQsendPrepare" "PQsendQueryGuts" > >>PQqueriesInBatch() (Newname(NN):PQbatchQueueCount)doesn't work as documented. >>It says: >>"Returns the number of queries still in the queue for this batch" >>but in fact it's implemented as a Boolean. > > Modified the logic to count number of entries in pending queue and return the > count > >>The changes in src/test/examples/ are not necessary anymore. You moved all >>the tests to test_libpq (for the best actually). > Removed these unnecessary changes from src/test/examples folder and corrected > the path mentioned in comments section of testlibpqbatch.c >>But with the libpq batch API, maybe this could be modernized >>with meta-commands like this: >>\startbatch >>... >>\endbatch > > I think it is a separate patch candidate. Definitely agreed on that. Let's not complicate things further more. >> It may be a good idea to check for PG_PROTOCOL_MAJOR < 3 and issue an error >> for the new routines. > > All the APIs which supports asynchronous command processing can be executed > only if PG_PROTOCOL_MAJOR >= 3. So, adding it to new routines are not really > needed. OK. Let's not complicate the patch more than necessary. After an extra lookup at the patch, here are some comments. In the docs when listing any of the PQBATCH_MODE_* variables, you should have a markup around them. It would be cleaner to make a list of the potential values that can be returned by PQbatchStatus() using . + while (queue != NULL) + { + PGcommandQueueEntry *prev = queue; + + queue = queue->next; + if (prev->query) + free(prev->query); + free(prev); + } + conn->cmd_queue_recycle = NULL This could be useful as a separate routine. +/* PQmakePipelinedCommand + * Get a new command queue entry, allocating it if required. Doesn't add it to + * the tail of the queue yet, use PQappendPipelinedCommand once the command has + * been written for that. If a command fails once it's called this, it should + * use PQrecyclePipelinedCommand to put it on the freelist or release it. + * This comment block is not project-like. Please use an empty line as first line with only "/*". The same comment applies to a bunch of the routines introduced by this patch. Not sure there is much point in having PQstartProcessingNewQuery. It only does two things in two places, so that's not worth the loss in readability. + if (conn->batch_status != PQBATCH_MODE_OFF) + { + pipeCmd = PQmakePipelinedCommand(conn); + + if (pipeCmd == NULL) + return 0; /* error msg already set */ + + last_query = >query; + queryclass = >queryclass; + } + else + { + last_query = >last_query; + queryclass = >queryclass; + } This setup should happen further down. conn->in_batch is undefined, causing the patch to fail to compile. And actually you should not need it. - conn->asyncStatus = PGASYNC_READY; + conn->asyncStatus = PGASYNC_READY_MORE; return;
Re: [HACKERS] dropping partitioned tables without CASCADE
On 2017/02/21 15:35, Amit Langote wrote: >> drop table list_parted cascade; >> -NOTICE: drop cascades to 3 other objects >> -DETAIL: drop cascades to table part_ab_cd >> probably we should remove cascade from there, unless you are testing CASCADE >> functionality. Similarly for other blocks like >> drop table range_parted cascade; Oops, failed to address this in the last email. Updated patches attached. Thanks, Amit >From 0dc550d501805bcea81da9b2a06a39430427927c Mon Sep 17 00:00:00 2001 From: amitDate: Thu, 16 Feb 2017 15:56:44 +0900 Subject: [PATCH 1/2] Allow dropping partitioned table without CASCADE Currently, a normal dependency is created between a inheritance parent and child when creating the child. That means one must specify CASCADE to drop the parent table if a child table exists. When creating partitions as inheritance children, create auto dependency instead, so that partitions are dropped automatically when the parent is dropped i.e., without specifying CASCADE. --- src/backend/commands/tablecmds.c | 26 ++ src/test/regress/expected/alter_table.out | 10 -- src/test/regress/expected/create_table.out | 9 ++--- src/test/regress/expected/inherit.out | 22 ++ src/test/regress/expected/insert.out | 7 ++- src/test/regress/expected/update.out | 7 +-- src/test/regress/sql/alter_table.sql | 10 -- src/test/regress/sql/create_table.sql | 9 ++--- src/test/regress/sql/inherit.sql | 4 ++-- src/test/regress/sql/insert.sql| 7 ++- src/test/regress/sql/update.sql| 2 +- 11 files changed, 40 insertions(+), 73 deletions(-) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 3cea220421..31b50ad77f 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -289,9 +289,11 @@ static List *MergeAttributes(List *schema, List *supers, char relpersistence, static bool MergeCheckConstraint(List *constraints, char *name, Node *expr); static void MergeAttributesIntoExisting(Relation child_rel, Relation parent_rel); static void MergeConstraintsIntoExisting(Relation child_rel, Relation parent_rel); -static void StoreCatalogInheritance(Oid relationId, List *supers); +static void StoreCatalogInheritance(Oid relationId, List *supers, + bool child_is_partition); static void StoreCatalogInheritance1(Oid relationId, Oid parentOid, - int16 seqNumber, Relation inhRelation); + int16 seqNumber, Relation inhRelation, + bool child_is_partition); static int findAttrByName(const char *attributeName, List *schema); static void AlterIndexNamespaces(Relation classRel, Relation rel, Oid oldNspOid, Oid newNspOid, ObjectAddresses *objsMoved); @@ -725,7 +727,7 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId, typaddress); /* Store inheritance information for new rel. */ - StoreCatalogInheritance(relationId, inheritOids); + StoreCatalogInheritance(relationId, inheritOids, stmt->partbound != NULL); /* * We must bump the command counter to make the newly-created relation @@ -2240,7 +2242,8 @@ MergeCheckConstraint(List *constraints, char *name, Node *expr) * supers is a list of the OIDs of the new relation's direct ancestors. */ static void -StoreCatalogInheritance(Oid relationId, List *supers) +StoreCatalogInheritance(Oid relationId, List *supers, + bool child_is_partition) { Relation relation; int16 seqNumber; @@ -2270,7 +2273,8 @@ StoreCatalogInheritance(Oid relationId, List *supers) { Oid parentOid = lfirst_oid(entry); - StoreCatalogInheritance1(relationId, parentOid, seqNumber, relation); + StoreCatalogInheritance1(relationId, parentOid, seqNumber, relation, + child_is_partition); seqNumber++; } @@ -2283,7 +2287,8 @@ StoreCatalogInheritance(Oid relationId, List *supers) */ static void StoreCatalogInheritance1(Oid relationId, Oid parentOid, - int16 seqNumber, Relation inhRelation) + int16 seqNumber, Relation inhRelation, + bool child_is_partition) { TupleDesc desc = RelationGetDescr(inhRelation); Datum values[Natts_pg_inherits]; @@ -2317,7 +2322,10 @@ StoreCatalogInheritance1(Oid relationId, Oid parentOid, childobject.objectId = relationId; childobject.objectSubId = 0; - recordDependencyOn(, , DEPENDENCY_NORMAL); + if (!child_is_partition) + recordDependencyOn(, , DEPENDENCY_NORMAL); + else + recordDependencyOn(, , DEPENDENCY_AUTO); /* * Post creation hook of this inheritance. Since object_access_hook @@ -10744,7 +10752,9 @@ CreateInheritance(Relation child_rel, Relation parent_rel) StoreCatalogInheritance1(RelationGetRelid(child_rel), RelationGetRelid(parent_rel), inhseqno + 1, - catalogRelation); + catalogRelation, + parent_rel->rd_rel->relkind == +
Re: [HACKERS] Provide list of subscriptions and publications in psql's completion
On Tue, Feb 21, 2017 at 12:48 AM, Stephen Frostwrote: > * Fujii Masao (masao.fu...@gmail.com) wrote: >> On Fri, Feb 17, 2017 at 11:17 PM, Peter Eisentraut >> 1) Expose all the columns except subconninfo in pg_subscription to >> non-superusers. In this idea, the tab-completion and psql meta-command >> for subscription still sees pg_subscription. One good point of >> idea is that even non-superusers can see all the useful information >> about subscriptions other than sensitive information like conninfo. >> >> 2) Change pg_stat_subscription so that it also shows all the columns except >> subconninfo in pg_subscription. Also change the tab-completion and >> psql meta-command for subscription so that they see pg_stat_subscription >> instead of pg_subscription. One good point is that pg_stat_subscription >> exposes all the useful information about subscription to even >> non-superusers. OTOH, pg_subscription and pg_stat_subscription have >> to have several same columns. This would be redundant and a bit >> confusing. >> >> 3) Expose subdbid in pg_stat_subscription. Also change the tab-completion >> and psql meta-command for subscription so that they see >> pg_stat_subscription. This change is very simple. But non-superusers >> cannot >> see useful information like subslotname because of privilege problem. >> >> I like #1, but any better approach? > > #1 seems alright to me, at least. We could start using column-level > privs in other places also, as I mentioned up-thread. Among the three, this looks like a good one. > I don't particularly like the suggestions to have psql use pg_stat_X > views or to put things into pg_stat_X views which are object definitions > for non-superusers. If we really don't want to use column-level > privileges then we should have an appropriate view create instead which > provides non-superusers with the non-sensitive object information. Neither do I, those are by definition tables for statistics. Having tab completion depend on them is not right. So what do you think about the patch attached? This does the following: - complete subscription list for CREATE/ALTER SUBSCRIPTION - complete publication list for CREATE/ALTER PUBLICATION - complete to nothing for PUBLICATION in CREATE/ALTER SUBSCRIPTION as this refers to remote objects defined in subconninfo. - authorize read access to public for all columns of pg_subscription except subconninfo Thoughts? -- Michael diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 38be9cf1a0..3e88a2a595 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -900,7 +900,10 @@ CREATE VIEW pg_replication_origin_status AS REVOKE ALL ON pg_replication_origin_status FROM public; +-- All columns of pg_subscription, except subconninfo, are readable. REVOKE ALL ON pg_subscription FROM public; +GRANT SELECT (subdbid, subname, subowner, subenabled, subslotname, subpublications) +ON pg_subscription TO public; -- -- We have a few function definitions in here, too. diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c index 94814c20d0..fa0ce5c184 100644 --- a/src/bin/psql/tab-complete.c +++ b/src/bin/psql/tab-complete.c @@ -830,6 +830,18 @@ static const SchemaQuery Query_for_list_of_matviews = { " FROM pg_catalog.pg_am "\ " WHERE substring(pg_catalog.quote_ident(amname),1,%d)='%s'" +#define Query_for_list_of_subscriptions \ +" SELECT pg_catalog.quote_ident(c1.subname) "\ +" FROM pg_catalog.pg_subscription c1, pg_catalog.pg_database c2"\ +" WHERE substring(pg_catalog.quote_ident(c1.subname),1,%d)='%s'"\ +"AND c2.datname = pg_catalog.current_database()"\ +"AND c1.subdbid = c2.oid" + +#define Query_for_list_of_publications \ +" SELECT pg_catalog.quote_ident(pubname) "\ +" FROM pg_catalog.pg_publication "\ +" WHERE substring(pg_catalog.quote_ident(pubname),1,%d)='%s'" + /* the silly-looking length condition is just to eat up the current word */ #define Query_for_list_of_arguments \ "SELECT pg_catalog.oidvectortypes(proargtypes)||')' "\ @@ -960,13 +972,13 @@ static const pgsql_thing_t words_after_create[] = { {"OWNED", NULL, NULL, THING_NO_CREATE}, /* for DROP OWNED BY ... */ {"PARSER", Query_for_list_of_ts_parsers, NULL, THING_NO_SHOW}, {"POLICY", NULL, NULL}, - {"PUBLICATION", NULL, NULL}, + {"PUBLICATION", Query_for_list_of_publications}, {"ROLE", Query_for_list_of_roles}, {"RULE", "SELECT pg_catalog.quote_ident(rulename) FROM pg_catalog.pg_rules WHERE substring(pg_catalog.quote_ident(rulename),1,%d)='%s'"}, {"SCHEMA", Query_for_list_of_schemas}, {"SEQUENCE", NULL, _for_list_of_sequences}, {"SERVER", Query_for_list_of_servers}, - {"SUBSCRIPTION", NULL, NULL}, + {"SUBSCRIPTION", Query_for_list_of_subscriptions}, {"TABLE", NULL, _for_list_of_tables}, {"TABLESPACE", Query_for_list_of_tablespaces}, {"TEMP", NULL, NULL,
Re: [HACKERS] dropping partitioned tables without CASCADE
Hi Ashutosh, Thanks for taking a look at the patch. On 2017/02/20 21:49, Ashutosh Bapat wrote: > Thanks for working on all the follow on work for partitioning feature. > > May be you should add all those patches in the next commitfest, so > that we don't forget those. I think adding these as one of the PostgreSQL 10 Open Items [0] might be better. I've done that. > On Mon, Feb 20, 2017 at 7:46 AM, Amit Langote wrote: >> So I count more than a few votes saying that we should be able to DROP >> partitioned tables without specifying CASCADE. >> >> I tried to implement that using the attached patch by having >> StoreCatalogInheritance1() create DEPENDENCY_AUTO dependency between >> parent and child if the child is a partition, instead of DEPENDENCY_NORMAL >> that would otherwise be created. Now it seems that that is one way of >> making sure that partitions are dropped when the root partitioned table is >> dropped, not sure if the best; why create the pg_depend entries at all one >> might ask. I chose it for now because that's the one with fewest lines of >> change. Adjusted regression tests as well, since we recently tweaked >> tests [1] to work around the irregularities of test output when using >> CASCADE. > > The patch applies cleanly and regression does not show any failures. > > Here are some comments > > For the sake of readability you may want reverse the if and else order. > -recordDependencyOn(, , DEPENDENCY_NORMAL); > +if (!child_is_partition) > +recordDependencyOn(, , DEPENDENCY_NORMAL); > +else > +recordDependencyOn(, , DEPENDENCY_AUTO); > like > +if (child_is_partition) > +recordDependencyOn(, , DEPENDENCY_AUTO); > +else > +recordDependencyOn(, , DEPENDENCY_NORMAL); Sure, done. > It's weird that somebody can perform DROP TABLE on the partition without > referring to its parent. That may be a useful feature as it allows one to > detach the partition as well as remove the table in one command. But it looks > wierd for someone familiar with partitioning features of other DBMSes. But > then > our partition creation command is CREATE TABLE So may be this is expected > difference. There is a line on the CREATE TABLE page in the description of PARTITION OF clause: "Note that dropping a partition with DROP TABLE requires taking an ACCESS EXCLUSIVE lock on the parent table." In earlier proposals I had included the ALTER TABLE parent ADD/DROP PARTITION commands, but CRAETE TABLE PARTITION OF / DROP TABLE prevailed. > --- cleanup: avoid using CASCADE > -DROP TABLE list_parted, part_1; > -DROP TABLE list_parted2, part_2, part_5, part_5_a; > -DROP TABLE range_parted, part1, part2; > +-- cleanup > +DROP TABLE list_parted, list_parted2, range_parted; > Testcases usually drop one table at a time, I guess, to reduce the differences > when we add or remove tables from testcases. All such blocks should probably > follow same policy. Hmm, I see this in src/test/regress/sql/inherit.sql:141 DROP TABLE firstparent, secondparent, jointchild, thirdparent, otherchild; > drop table list_parted cascade; > -NOTICE: drop cascades to 3 other objects > -DETAIL: drop cascades to table part_ab_cd > probably we should remove cascade from there, unless you are testing CASCADE > functionality. Similarly for other blocks like > drop table range_parted cascade; > > BTW, I noticed that although we are allowing foreign tables to be > partitions, there are no tests in foreign_data.sql for testing it. If > there would have been we would tests DROP TABLE on a partitioned table > with foreign partitions as well. That file has testcases for testing > foreign table inheritance, and should have tests for foreign table > partitions. That makes sense. Patch 0002 is for that (I'm afraid this should be posted separately though). I didn't add/repeat all the tests that were added by the foreign table inheritance patch again for foreign partitions (common inheritance rules apply to both cases), only added those for the new partitioning commands and certain new rules. Thanks, Amit [0] https://wiki.postgresql.org/wiki/PostgreSQL_10_Open_Items >From c7cff17ab7861dbd0a4fb329115b3f99fd800325 Mon Sep 17 00:00:00 2001 From: amitDate: Thu, 16 Feb 2017 15:56:44 +0900 Subject: [PATCH 1/2] Allow dropping partitioned table without CASCADE Currently, a normal dependency is created between a inheritance parent and child when creating the child. That means one must specify CASCADE to drop the parent table if a child table exists. When creating partitions as inheritance children, create auto dependency instead, so that partitions are dropped automatically when the parent is dropped i.e., without specifying CASCADE. --- src/backend/commands/tablecmds.c | 26 ++ src/test/regress/expected/alter_table.out | 10 -- src/test/regress/expected/create_table.out | 9 ++--- src/test/regress/expected/inherit.out
[HACKERS] [doc fix] Really trivial fix for BRIN documentation
Hello, This is just a correction from "index" to "table". I was a bit confused when I first read this part. Regards Takayuki Tsunakawa brin_trivial_doc_fix.patch Description: brin_trivial_doc_fix.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel bitmap heap scan
On Mon, Feb 20, 2017 at 11:18 PM, Robert Haaswrote: > On Mon, Feb 20, 2017 at 5:55 AM, Thomas Munro > wrote: >> On Sun, Feb 19, 2017 at 9:59 PM, Dilip Kumar wrote: >>> So basically, what I want to propose is that Only during >>> ExecReScanBitmapHeapScan we can free all the DSA pointers because at >>> that time we can be sure that all the workers have completed there >>> task and we are safe to free. (And we don't free any DSA memory at >>> ExecEndBitmapHeapScan). >> >> I think this works. > > OK. In my latest version of the patch, I have fixed it as described above i.e only cleanup in ExecReScanBitmapHeapScan. For getting this there is some change in both the patches. 0001: 1. I have got a new interface, "tbm_free_shared_area(dsa_area *dsa, dsa_pointer dp)" which will be responsible for freeing all the shared members (pagetable, ptpage and ptchunk). Actually, we can not do this in tbm_free itself because the only leader will have a local tbm with reference to all these pointers and our parallel bitmap leader may not necessarily be the actual backend. If we want to get this done using tbm, then we need to create a local tbm in each worker and get the shared member reference copied into tbm using tbm_attach_shared_iterate like we were doing in the earlier version. 2. Now tbm_free is not freeing any of the shared members which can be accessed by other worker so tbm_free is safe to call from ExecEndBitmapHeapScan without any safety check or ref count. 0002: 1. We don't need ExecShutdownBitmapHeapScan anymore because now we are not freeing any shared member in ExecEndBitmapHeapScan. 2. In ExecReScanBitmapHeapScan we will call tbm_free_shared_area to free the shared members of the TBM. 3. After that, we will free TBMSharedIteratorState what we allocated using tbm_prepare_shared_iterate. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com 0001-tidbitmap-support-shared-v5.patch Description: Binary data 0002-parallel-bitmap-heapscan-v5.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Documentation improvements for partitioning
On Tue, Feb 21, 2017 at 9:57 AM, Amit Langotewrote: > On 2017/02/21 12:10, Ashutosh Bapat wrote: >> I think, that's a limitation till we implement global indexes. But >> nothing in the current implementation stops us from implementing it. >> In fact, I remember, a reply from Robert to another thread on >> partitioning started in parallel to Amit's thread had explained some >> of these design decisions. I am unable to find link to that exact >> reply though. > > Are you perhaps thinking of the message titled "design for a partitioning > feature (was: inheritance)" a few months back: > > https://www.postgresql.org/message-id/ca+tgmozev0nryvw9cnb81xdojg4xtpjmkdif0zto-gdlcoc...@mail.gmail.com Thanks a lot, that's the one I was searching for. And now that I confirm it, there's NO reply to Robert's mail. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Documentation improvements for partitioning
On 2017/02/21 12:10, Ashutosh Bapat wrote: > I think, that's a limitation till we implement global indexes. But > nothing in the current implementation stops us from implementing it. > In fact, I remember, a reply from Robert to another thread on > partitioning started in parallel to Amit's thread had explained some > of these design decisions. I am unable to find link to that exact > reply though. Are you perhaps thinking of the message titled "design for a partitioning feature (was: inheritance)" a few months back: https://www.postgresql.org/message-id/ca+tgmozev0nryvw9cnb81xdojg4xtpjmkdif0zto-gdlcoc...@mail.gmail.com Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Passing query string to workers
On Mon, Feb 20, 2017 at 8:35 PM, Kuntal Ghoshwrote: > > + char *query_data; > + query_data = estate->es_sourceText; > estate->es_sourceText is a const char* variable. Assigning this const > pointer to a non-const pointer violates the rules > constant-correctness. So, either you should change query_data as const > char*, or as Robert suggested, you can directly use > estate->es_sourceText. > Done. -- Regards, Rafia Sabih EnterpriseDB: http://www.enterprisedb.com/ pass_queryText_to_workers_v7.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
On Tue, Feb 21, 2017 at 7:14 AM, Thomas Munrowrote: > On Fri, Jan 20, 2017 at 2:49 AM, Kevin Grittner wrote: >> Attached is v9 which fixes bitrot from v8. No other changes. >> >> Still needs review. Based on a suggestion from Robert off-list I tried inserting into a delta relation from a trigger function and discovered that it segfaults: * frame #0: 0x0001057705a6 postgres`addRangeTableEntryForRelation(pstate=0x7fa58186a4d0, rel=0x, alias=0x, inh='\0', inFromCl='\0') + 70 at parse_relation.c:1280 [opt] frame #1: 0x00010575bbda postgres`setTargetTable(pstate=0x7fa58186a4d0, relation=0x7fa58186a098, inh=, alsoSource='\0', requiredPerms=1) + 90 at parse_clause.c:199 [opt] frame #2: 0x000105738530 postgres`transformStmt [inlined] transformInsertStmt(pstate=) + 69 at analyze.c:540 [opt] frame #3: 0x0001057384eb postgres`transformStmt(pstate=, parseTree=) + 2411 at analyze.c:279 [opt] frame #4: 0x000105737a42 postgres`transformTopLevelStmt(pstate=, parseTree=0x7fa58186a438) + 18 at analyze.c:192 [opt] frame #5: 0x0001059408d0 postgres`pg_analyze_and_rewrite_params(parsetree=0x7fa58186a438, query_string="insert into d values (100, 100, 'x')", parserSetup=(plpgsql.so`plpgsql_parser_setup at pl_comp.c:1017), parserSetupArg=0x7fa58185c2a0, queryEnv=0x7fa581857798) + 128 at postgres.c:706 [opt] -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Documentation improvements for partitioning
On Tue, Feb 21, 2017 at 2:03 AM, Bruce Momjianwrote: > On Mon, Feb 20, 2017 at 02:37:44PM +0530, Robert Haas wrote: >> On Mon, Feb 20, 2017 at 2:09 AM, Simon Riggs wrote: >> > On 15 February 2017 at 15:46, Robert Haas wrote: >> >>> It leaves me asking what else is missing. >> >> >> >> There is certainly a lot of room for improvement here but I don't >> >> understand your persistent negativity about what's been done thus far. >> >> I think it's pretty clearly a huge step forward, and I think Amit >> >> deserves a ton of credit for making it happen. The improvements in >> >> bulk loading performance alone are stupendous. You apparently have >> >> the idea that somebody could have written an even larger patch that >> >> solved even more problems at once, but this was already a really big >> >> patch, and IMHO quite a good one. >> > >> > Please explain these personal comments against me. >> >> Several of your emails, including your first post to this thread, >> seemed to me to be quite negative about the state of this feature. I >> don't think that's warranted, though perhaps I am misreading your >> tone, as I have been known to do. I also don't think that expressing >> the opinion that the feature is better than you're giving it credit >> for is a personal comment against you. Where exactly do you see a >> personal comment against you in what I wrote? > > I have to admit my reaction was similar to Simon's, meaning that the > lack of docs is a problem, and that the limitations are kind of a > surprise, and I wonder what other surprises there are. > > I am thinking this is a result of small teams, often from the same > company, working on a features in isolation and then making them public. I agree that this is result of small teams. The partitioning feature encompasses features like global indexes, which is large in themselves. Usually, in a company many teams would be working on different sub-features in the same release schedule. But that's not the case here. We have to add sub-features in multiple releases. That might be the reason behind some of the current limitations. > It is often not clear what decisions were made and why. Amit Langote submitted the patch sometime in August 2015, which certainly didn't look like a well-thought design, certainly not a product of 'long cooking' within his company. It was more experimental. (Obviously he had background of many earlier discussions on partitioning, that all happened on hackers.) Since then all the discussion is on hackers; all decisions were made during the discussion. While what you are saying may be true with other patches, I am not sure whether it's true with this work. > The idea that > unique indexes on a parent table can't guarantee uniqueness across child > tables is both a surprise, and obvious once stated. I think, that's a limitation till we implement global indexes. But nothing in the current implementation stops us from implementing it. In fact, I remember, a reply from Robert to another thread on partitioning started in parallel to Amit's thread had explained some of these design decisions. I am unable to find link to that exact reply though. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Documentation improvements for partitioning
On 2017/02/16 10:45, Amit Langote wrote: > Also attaching 0002 (unchanged) for tab-completion support for the new > partitioning syntax. Robert already spawned a new thread titled "tab completion for partitioning" for this [0]. > 0003 changes how ExecFindPartition() shows the row for which > get_partition_for_tuple() failed to find a partition. As Simon commented > upthread, we should show just the partition key, not the whole row in the > error DETAIL. So the DETAIL now looks like what's shown by > _bt_check_unique() upon uniqueness violation: > > DETAIL: Partition key of the failing row contains (key1, key2, ...)=(val1, > val2, ...) > > The rules about which columns to show or whether to show the DETAIL at all > are similar to those in BuildIndexValueDescription(): > > - if user has SELECT privilege on the whole table, simply go ahead > > - if user doesn't have SELECT privilege on the table, check that they >can see all the columns in the key (no point in showing partial key); >however abort on finding an expression for which we don't try finding >out privilege situation of whatever columns may be in the expression I posted this patch in a new thread titled "error detail when partition not found" [1]. Thanks, Amit [0] https://www.postgresql.org/message-id/CA+TgmobYOj=a8gesies_v2wq46-_w0+7mowpinwc+iuzj-u...@mail.gmail.com [1] https://www.postgresql.org/message-id/9f9dc7ae-14f0-4a25-5485-964d9bfc19bd%40lab.ntt.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] error detail when partition not found
Simon pointed out in a nearby thread [0] that the detail part of partition-not-found error should show just the partition keys. I posted a patch on that thread [1], but to avoid confusion being caused by multitude of patches over there I'm re-posting it here. * What the patch does: Currently we show the whole row in the detail part of the error. CREATE TABLE measurement_year_month ( logdate date not null, peaktempint, unitsales int ) PARTITION BY RANGE (EXTRACT(YEAR FROM logdate), EXTRACT(MONTH FROM logdate)); # INSERT INTO measurement_year_month VALUES ('2016-12-02', 1, 1); ERROR: no partition of relation "measurement_year_month" found for row DETAIL: Failing row contains (2016-12-02, 1, 1). Patch changes it look like the following: # INSERT INTO measurement_year_month VALUES ('2016-12-02', 1, 1); ERROR: no partition of relation "measurement_year_month" found for row DETAIL: Partition key of the failing row contains (date_part('year'::text, logdate), date_part('month'::text, logdate))=(2016, 12). It's similar to error detail shown when btree unique violation occurs: -- just to be clear, using LIKE won't make measurement partitioned too CREATE TABLE measurement (LIKE measurement_year_month); CREATE UNIQUE INDEX ON measurement (EXTRACT(YEAR FROM logdate), EXTRACT(MONTH FROM logdate)) # INSERT INTO measurement VALUES ('2016-12-02', 1, 1); INSERT 0 1 # INSERT INTO measurement VALUES ('2016-12-02', 1, 1); ERROR: duplicate key value violates unique constraint "measurement_date_part_date_part1_idx" DETAIL: Key (date_part('year'::text, logdate), date_part('month'::text, logdate))=(2016, 12) already exists. * Some of the implementation details of the patch here: The rules about which columns to show or whether to show the DETAIL at all are similar to those in BuildIndexValueDescription(): - if user has SELECT privilege on the whole table, simply go ahead - if user doesn't have SELECT privilege on the table, check that they can see all the columns in the key (no point in showing partial key); however abort on finding an expression for which we don't try finding out privilege situation of whatever columns may be in the expression Thanks, Amit [0] https://www.postgresql.org/message-id/CANP8%2BjJBpWocfKrbJcaf3iBt9E3U%3DWPE_NC8YE6rye%2BYJ1sYnQ%40mail.gmail.com [1] https://www.postgresql.org/message-id/2f8df068-9a49-d74a-30af-7cd17bdee181%40lab.ntt.co.jp >From b382d1ebb25ab3d577667d50e86430b34e57ab9a Mon Sep 17 00:00:00 2001 From: amitDate: Mon, 13 Feb 2017 19:52:19 +0900 Subject: [PATCH] Show only the partition key upon failing to find a partition Currently, the whole row is shown without column names. Instead, adopt a style similar to _bt_check_unique() in ExecFindPartition() and show the failing key in format (key1, ...)=(val1, ...). The key description shown in the error message is now built by the new ExecBuildSlotPartitionKeyDescription(), which works along the lines of BuildIndexValueDescription(), using similar rules about what columns of the partition key to include depending on the user's privileges to view the same. A bunch of relevant tests are added. --- src/backend/catalog/partition.c | 29 +++ src/backend/executor/execMain.c | 147 ++- src/backend/utils/adt/ruleutils.c| 37 ++--- src/include/catalog/partition.h | 13 +++- src/include/utils/ruleutils.h| 2 + src/test/regress/expected/insert.out | 38 - src/test/regress/sql/insert.sql | 30 +++ 7 files changed, 246 insertions(+), 50 deletions(-) diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index 4bcef58763..710ce07a6f 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -140,13 +140,6 @@ static int partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo, void *probe, bool probe_is_bound, bool *is_equal); -/* Support get_partition_for_tuple() */ -static void FormPartitionKeyDatum(PartitionDispatch pd, - TupleTableSlot *slot, - EState *estate, - Datum *values, - bool *isnull); - /* * RelationBuildPartitionDesc * Form rel's partition descriptor @@ -1608,7 +1601,7 @@ generate_partition_qual(Relation rel) * the heap tuple passed in. * */ -static void +void FormPartitionKeyDatum(PartitionDispatch pd, TupleTableSlot *slot, EState *estate, @@ -1672,7 +1665,7 @@ int get_partition_for_tuple(PartitionDispatch *pd, TupleTableSlot *slot, EState *estate, - Oid *failed_at) + GetPartitionFailureData *gpfd) { PartitionDispatch parent; Datum values[PARTITION_MAX_KEYS]; @@ -1693,13 +1686,6 @@ get_partition_for_tuple(PartitionDispatch *pd, TupleTableSlot *myslot = parent->tupslot; TupleConversionMap *map = parent->tupmap; - /* Quick exit */ - if (partdesc->nparts == 0) - { -
Re: [HACKERS] delta relations in AFTER triggers
On Fri, Jan 20, 2017 at 2:49 AM, Kevin Grittnerwrote: > Attached is v9 which fixes bitrot from v8. No other changes. > > Still needs review. This patch still applies, builds cleanly after a small modification, includes regression tests and the tests past. The modification I needed to make was due to this compile error: nodeNamedtuplestorescan.c:154:19: error: no member named 'ps_TupFromTlist' in 'struct PlanState' scanstate->ss.ps.ps_TupFromTlist = false; Commit ea15e18677fc2eff3135023e27f69ed8821554ed got rid of that member of PlanState and I assume based on other changes in that commit that the thing to do was simply to remove that line. Having done that, it builds cleanly. +INSERT INTO level1_table(level1_no) + SELECT generate_series(1,200); +INSERT INTO level2_table(level2_no, parent_no) + SELECT level2_no, level2_no / 50 + 1 AS parent_no +FROM generate_series(1,) level2_no; +INSERT INTO all_level_status(level, node_no, status) + SELECT 1, level1_no, 0 FROM level1_table; +INSERT INTO all_level_status(level, node_no, status) + SELECT 2, level2_no, 0 FROM level2_table; +INSERT INTO level1_table(level1_no) + SELECT generate_series(201,1000); +DELETE FROM level1_table WHERE level1_no BETWEEN 201 AND 1000; +DELETE FROM level1_table WHERE level1_no BETWEEN 1 AND 10010; +SELECT count(*) FROM level1_table; + count +--- + 200 +(1 row) Was it intentional that this test doesn't include any statements that reach the case where the trigger does RAISE EXCEPTION 'RI error'? >From the output generated there doesn't seem to be any evidence that the triggers run at all, though I know from playing around with this that they do: postgres=# delete from level1_table where level1_no = 42; ERROR: RI error CONTEXT: PL/pgSQL function level1_table_ri_parent_del_func() line 6 at RAISE + * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group These copyright messages are missing 3 years' worth of bugfixes. +SPI_get_caller_relation(const char *name) Do we need this function? It's not used by the implementation. If it does have a good use for end-users, then perhaps it should be called something like SPI_get_registered_relation, to make it clear that it will return whatever you registered with SPI_register_relation, instead of introducing this 'caller' terminology? +typedef struct NamedTuplestoreScan +{ + Scan scan; + char *enrname; +} NamedTuplestoreScan; Nearly plan node structs always have a comment for the members below 'scan'; I think this needs one too because 'enrname' is not self-explanatory. /* + * Capture the NEW and OLD transition TABLE tuplestores (if specified for + * this trigger). + */ + if (trigdata->tg_newtable || trigdata->tg_oldtable) + { + estate.queryEnv = create_queryEnv(); + if (trigdata->tg_newtable) + { + Enr enr = palloc(sizeof(EnrData)); + + enr->md.name = trigdata->tg_trigger->tgnewtable; + enr->md.tupdesc = trigdata->tg_relation->rd_att; + enr->md.enrtuples = tuplestore_tuple_count(trigdata->tg_newtable); + enr->reldata = trigdata->tg_newtable; + register_enr(estate.queryEnv, enr); + SPI_register_relation(enr); + } Why do we we have to call register_enr and also SPI_register_relation here? On Mon, Dec 19, 2016 at 4:35 AM, Thomas Munro wrote: > On Sun, Dec 18, 2016 at 3:15 PM, Kevin Grittner wrote: >> There were a few changes Thomas included in the version he posted, >> without really delving into an explanation for those changes. Some >> or all of them are likely to be worthwhile, but I would rather >> incorporate them based on explicit discussion, so this version >> doesn't do much other than generalize the interface a little, >> change some names, and add more regression tests for the new >> feature. (The examples I worked up for the rough proof of concept >> of enforcement of RI through set logic rather than row-at-a-time >> navigation were the basis for the new tests, so the idea won't get >> totally lost.) Thomas, please discuss each suggested change (e.g., >> the inclusion of the query environment in the parameter list of a >> few more functions). > > I was looking for omissions that would cause some kind of statements > to miss out on ENRs arbitrarily. It seemed to me that > parse_analyze_varparams should take a QueryEnvironment, mirroring > parse_analyze, so that PrepareQuery could pass it in. Otherwise, > PREPARE wouldn't see ENRs. Is there any reason why SPI_prepare should > see them but PREPARE not? Any thoughts about that? More soon. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SCRAM authentication, take three
On Mon, Feb 20, 2017 at 9:41 PM, Aleksander Alekseevwrote: >> Speaking about flaws, it looks like there is a memory leak in >> array_to_utf procedure - result is allocated twice. Pushed a fix for this one on my branch. > And a few more things I've noticed after a closer look: > > * build_client_first_message does not free `state->client_nonce` if > second malloc (for `buf`) fails > * same for `state->client_first_message_bare` > * ... and most other procedures declared in fe-auth-scram.c file > (see malloc and strdup calls) You are visibly missing pg_fe_scram_free(). > * scram_Normalize doesn't check malloc return value Yes, I am aware of this one. This makes the interface utterly ugly though because an error log message needs to be handled across many API layers. We could just assume anything returning NULL is equivalent to an OOM and nothing else though. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multivariate statistics and expression indexes
On Tue, Feb 21, 2017 at 01:27:53AM +0100, Tomas Vondra wrote: > On 02/21/2017 12:13 AM, Bruce Momjian wrote: > >At the risk of asking a stupid question, we already have optimizer > >statistics on expression indexes. In what sense are we using this for > >multi-variate statistics, and in what sense can't we. > > > > We're not using that at all, because those are really orthogonal features. > Even with expression indexes, the statistics are per attribute, and the > attributes are treated as independent. > > There was a proposal to also allow creating statistics on expressions > (without having to create an index), but that's not supported yet. OK, thanks. I had to ask. -- Bruce Momjianhttp://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
Hi, Attached is v9 of this patch series. This addresses most of the points raised in the review, namely: 1) change most 'debug' stuff to 'static inline' in memdebug.h 2) fixed and reworded a bunch of comments 3) get rid of the block-level bitmap tracking free chunks Instead of the bitmap, I've used a simple singly-linked list, using int32 chunk indexes. Perhaps it could use the slist instead, but I'm not quite sure MAXALIGN is guaranteed to be greater than pointer. In any case, this seems to be working reasonably well - it saves a bit of code (but also made some code slightly more complex). Also, it seems to be a tad faster than v8 - after repeating the same benchmark as before, I get these numbers: masterslab-v8slab-v9 - 1 50 28 25 5 17500180160 10 15380330 20 ?750670 Although the results are quite noisy. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From 7c70a7bef4029dd7f10c7dc9ff0dd92a7bd2f966 Mon Sep 17 00:00:00 2001 From: Tomas VondraDate: Mon, 20 Feb 2017 20:16:16 +0100 Subject: [PATCH 1/3] move common bits to memdebug --- src/backend/utils/mmgr/Makefile | 2 +- src/backend/utils/mmgr/aset.c | 115 +- src/backend/utils/mmgr/memdebug.c | 93 ++ src/include/utils/memdebug.h | 48 4 files changed, 144 insertions(+), 114 deletions(-) create mode 100644 src/backend/utils/mmgr/memdebug.c diff --git a/src/backend/utils/mmgr/Makefile b/src/backend/utils/mmgr/Makefile index 1842bae..fc5f793 100644 --- a/src/backend/utils/mmgr/Makefile +++ b/src/backend/utils/mmgr/Makefile @@ -12,6 +12,6 @@ subdir = src/backend/utils/mmgr top_builddir = ../../../.. include $(top_builddir)/src/Makefile.global -OBJS = aset.o dsa.o freepage.o mcxt.o portalmem.o +OBJS = aset.o dsa.o freepage.o mcxt.o memdebug.o portalmem.o include $(top_srcdir)/src/backend/common.mk diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c index 4dfc3ec..33b4d01 100644 --- a/src/backend/utils/mmgr/aset.c +++ b/src/backend/utils/mmgr/aset.c @@ -41,46 +41,6 @@ * chunks as chunks. Anything "large" is passed off to malloc(). Change * the number of freelists to change the small/large boundary. * - * - * About CLOBBER_FREED_MEMORY: - * - * If this symbol is defined, all freed memory is overwritten with 0x7F's. - * This is useful for catching places that reference already-freed memory. - * - * About MEMORY_CONTEXT_CHECKING: - * - * Since we usually round request sizes up to the next power of 2, there - * is often some unused space immediately after a requested data area. - * Thus, if someone makes the common error of writing past what they've - * requested, the problem is likely to go unnoticed ... until the day when - * there *isn't* any wasted space, perhaps because of different memory - * alignment on a new platform, or some other effect. To catch this sort - * of problem, the MEMORY_CONTEXT_CHECKING option stores 0x7E just beyond - * the requested space whenever the request is less than the actual chunk - * size, and verifies that the byte is undamaged when the chunk is freed. - * - * - * About USE_VALGRIND and Valgrind client requests: - * - * Valgrind provides "client request" macros that exchange information with - * the host Valgrind (if any). Under !USE_VALGRIND, memdebug.h stubs out - * currently-used macros. - * - * When running under Valgrind, we want a NOACCESS memory region both before - * and after the allocation. The chunk header is tempting as the preceding - * region, but mcxt.c expects to able to examine the standard chunk header - * fields. Therefore, we use, when available, the requested_size field and - * any subsequent padding. requested_size is made NOACCESS before returning - * a chunk pointer to a caller. However, to reduce client request traffic, - * it is kept DEFINED in chunks on the free list. - * - * The rounded-up capacity of the chunk usually acts as a post-allocation - * NOACCESS region. If the request consumes precisely the entire chunk, - * there is no such region; another chunk header may immediately follow. In - * that case, Valgrind will not detect access beyond the end of the chunk. - * - * See also the cooperating Valgrind client requests in mcxt.c. - * *- */ @@ -296,10 +256,10 @@ static const unsigned char LogTable256[256] = */ #ifdef HAVE_ALLOCINFO #define AllocFreeInfo(_cxt, _chunk) \ - fprintf(stderr, "AllocFree: %s: %p, %d\n", \ + fprintf(stderr, "AllocFree: %s: %p, %lu\n", \ (_cxt)->header.name, (_chunk), (_chunk)->size) #define
Re: [HACKERS] Multivariate statistics and expression indexes
On 02/21/2017 12:13 AM, Bruce Momjian wrote: At the risk of asking a stupid question, we already have optimizer statistics on expression indexes. In what sense are we using this for multi-variate statistics, and in what sense can't we. We're not using that at all, because those are really orthogonal features. Even with expression indexes, the statistics are per attribute, and the attributes are treated as independent. There was a proposal to also allow creating statistics on expressions (without having to create an index), but that's not supported yet. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ICU integration
On Mon, Feb 20, 2017 at 3:51 PM, Bruce Momjianwrote: > So we don't have any other cases where we warn about possible corruption > except this? I'm not sure that I understand the distinction you're making. > Also, I will go back to my previous concern, that while I like the fact > we can detect collation changes with ICU, we don't know if ICU > collations change more often than OS collations. We do know that ICU collations can never change behaviorally in a given stable release. Bug fixes are allowed in point releases, but these never change the user-visible behavior of collations. That's very clear, because an upstream Unciode UCA version is used by a given major release of ICU. This upstream data describes the behavior of a collation using a high-level declarative language, that non-programmer experts in natural languages write. ICU versions many different things, in fact. Importantly, it explicitly decouples behavioral issues (user visible sort order -- UCA version) from technical issues (collator implementation details). So, my original point is that that could change, and if that happens we ought to have a plan. But, it won't change unless it really has to. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ICU integration
On Mon, Feb 20, 2017 at 03:29:07PM -0800, Peter Geoghegan wrote: > Marking all indexes as invalid would be an enormous overkill. We don't > even know for sure that the values the user has indexed happens to be > affected. In order for there to have been a bug in ICU in the first > place, the likelihood is that it only occurs in what are edge cases > for the collator. > > ICU is a very popular library, used in software that I personally > interact with every day [1]. Any bugs like this should be exceptional. > They very clearly appreciate how sensitive software like Postgres is > to changes like this, which is why the versioning API exists. > > [1] http://site.icu-project.org/#TOC-Who-Uses-ICU- So we don't have any other cases where we warn about possible corruption except this? Also, I will go back to my previous concern, that while I like the fact we can detect collation changes with ICU, we don't know if ICU collations change more often than OS collations. -- Bruce Momjianhttp://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ICU integration
On Mon, Feb 20, 2017 at 3:19 PM, Bruce Momjianwrote: > Well, the release notes are a clear-enough communication for a need to > reindex. I don't see a LOG message as similar. Don't we have other > cases where we have to warn administrators? We can mark the indexes as > invalid but how would we report that? Marking all indexes as invalid would be an enormous overkill. We don't even know for sure that the values the user has indexed happens to be affected. In order for there to have been a bug in ICU in the first place, the likelihood is that it only occurs in what are edge cases for the collator. ICU is a very popular library, used in software that I personally interact with every day [1]. Any bugs like this should be exceptional. They very clearly appreciate how sensitive software like Postgres is to changes like this, which is why the versioning API exists. [1] http://site.icu-project.org/#TOC-Who-Uses-ICU- -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ICU integration
On Mon, Feb 20, 2017 at 02:54:22PM -0800, Peter Geoghegan wrote: > On Mon, Feb 20, 2017 at 2:38 PM, Bruce Momjianwrote: > > I can't think of any cases where we warn of possible corruption only in > > the server logs. > > It's just like any case where there was a bug in Postgres that could > have subtly broken index builds. We don't make the next point release > force users to REINDEX every possibly-affected index every time that > happens. Presumably this is because they aren't particularly likely to > be affected by any given bug, and because it's pretty infeasible for > us to do so anyway. There aren't always easy ways to detect the > problem, and users will never learn to deal with issues like this well > when it is by definition something that should never happen. Well, the release notes are a clear-enough communication for a need to reindex. I don't see a LOG message as similar. Don't we have other cases where we have to warn administrators? We can mark the indexes as invalid but how would we report that? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Multivariate statistics and expression indexes
At the risk of asking a stupid question, we already have optimizer statistics on expression indexes. In what sense are we using this for multi-variate statistics, and in what sense can't we. FYI, I just wrote a blog post about expression index statistics: http://momjian.us/main/blogs/pgblog/2017.html#February_20_2017 -- Bruce Momjianhttp://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ICU integration
On Mon, Feb 20, 2017 at 2:38 PM, Bruce Momjianwrote: > I can't think of any cases where we warn of possible corruption only in > the server logs. It's just like any case where there was a bug in Postgres that could have subtly broken index builds. We don't make the next point release force users to REINDEX every possibly-affected index every time that happens. Presumably this is because they aren't particularly likely to be affected by any given bug, and because it's pretty infeasible for us to do so anyway. There aren't always easy ways to detect the problem, and users will never learn to deal with issues like this well when it is by definition something that should never happen. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ICU integration
On Wed, Feb 15, 2017 at 10:35:34PM -0800, Peter Geoghegan wrote: > On Wed, Feb 15, 2017 at 9:17 PM, Peter Eisentraut >wrote: > > Stable operating systems shouldn't do major library upgrades, so I feel > > pretty confident about this. > > There is one case where it *is* appropriate for a bugfix release of > ICU to bump collator version: When there was a bug in the collator > itself, leading to broken binary blobs [1]. We should make the user > REINDEX when this happens. > > I think that ICU may well do this in point releases, which is actually > a good thing. So, it seems like a good idea to test that there is no > change, in a place like check_strxfrm_bug(). In my opinion, we should > LOG a complaint in the event of a mismatch rather than refusing to > start the server, since there probably isn't that much chance of the > user being harmed by the bug. The cost of not starting the server > properly until a REINDEX finishes or similar looks particularly > unappealing when one considers that the app was probably affected by > any corruption for weeks or months before the ICU update enabled its > detection. > http://www.postgresql.org/mailpref/pgsql-hackers I can't think of any cases where we warn of possible corruption only in the server logs. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] SUBSCRIPTION command hangs up, segfault occurs in the server process and to cancel the execution.
Hi. While verifying the logical replication of PostgreSQL 10-devel, I found the following problem. * When you run the SUBSCRIPTION command against a table in the same PostgreSQL server, hang up. * Canceling the hung SUBSCRIPTION command with CTRL + C causes a server process occurs Segfault, and the PostgreSQL server to restart. [Steps to Reproduce] $ createdb db1 -U postgres $ createdb db2 -U postgres $ psql -U postgres db1 -c "CREATE TABLE test (id int primary key, data text)" $ psql -U postgres db2 -c "CREATE TABLE test (id int primary key, data text)" $ psql -U postgres db1 -c "CREATE PUBLICATION pub_db1_test FOR TABLE test" $ psql -U postgres db2 -c "CREATE SUBSCRIPTION sub_db2_test CONNECTION 'dbname=db1 port=5432 user=postgres' PUBLICATION pub_db1_test" The SUBSCRIPTION command does not end, it hangs up. At this time, the following logs are output in the server log. 2017-02-21 06:58:05.082 JST [22060] LOG: logical decoding found initial starting point at 0/1C06660 2017-02-21 06:58:05.082 JST [22060] DETAIL: 1 transaction needs to finish. Suspending psql (running SUBSCRIPTION) with CTRL + C causes a Segfault in the server process. At this time, the following message is output to the server log. 2017-02-21 07:01:00.246 JST [22059] ERROR: canceling statement due to user request 2017-02-21 07:01:00.246 JST [22059] STATEMENT: CREATE SUBSCRIPTION sub_db2_test CONNECTION 'dbname=db1 port=5432 user=postgres' PUBLICATION pub_db1_test 2017-02-21 07:01:01.006 JST [21445] LOG: server process (PID 22060) was terminated by signal 11: Segmentation fault 2017-02-21 07:01:01.007 JST [21445] LOG: terminating any other active server processes [Environment] postgres=# SELECT version(); version PostgreSQL 10devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16), 64-bit (1 row) postgres=# SHOW wal_level; wal_level --- logical (1 row) postgres=# SHOW max_wal_senders;; max_wal_senders - 10 (1 row) postgres=# SHOW max_replication_slots; max_replication_slots --- 10 (1 row) postgres=# TABLE pg_hba_file_rules ; line_number | type | database| user_name | address | netmask | auth_method | options | error -+---+---++---+-+-+-+--- 2 | local | {all} | {all} | | | trust | | 4 | host | {all} | {all} | 127.0.0.1 | 255.255.255.255 | trust | | 6 | host | {all} | {all} | ::1 | ::::::: | trust | | 9 | local | {replication} | {postgres} | | | trust | | (4 rows) Regards.
Re: [HACKERS] Documentation improvements for partitioning
On Mon, Feb 20, 2017 at 02:37:44PM +0530, Robert Haas wrote: > On Mon, Feb 20, 2017 at 2:09 AM, Simon Riggswrote: > > On 15 February 2017 at 15:46, Robert Haas wrote: > >>> It leaves me asking what else is missing. > >> > >> There is certainly a lot of room for improvement here but I don't > >> understand your persistent negativity about what's been done thus far. > >> I think it's pretty clearly a huge step forward, and I think Amit > >> deserves a ton of credit for making it happen. The improvements in > >> bulk loading performance alone are stupendous. You apparently have > >> the idea that somebody could have written an even larger patch that > >> solved even more problems at once, but this was already a really big > >> patch, and IMHO quite a good one. > > > > Please explain these personal comments against me. > > Several of your emails, including your first post to this thread, > seemed to me to be quite negative about the state of this feature. I > don't think that's warranted, though perhaps I am misreading your > tone, as I have been known to do. I also don't think that expressing > the opinion that the feature is better than you're giving it credit > for is a personal comment against you. Where exactly do you see a > personal comment against you in what I wrote? I have to admit my reaction was similar to Simon's, meaning that the lack of docs is a problem, and that the limitations are kind of a surprise, and I wonder what other surprises there are. I am thinking this is a result of small teams, often from the same company, working on a features in isolation and then making them public. It is often not clear what decisions were made and why. The idea that unique indexes on a parent table can't guarantee uniqueness across child tables is both a surprise, and obvious once stated. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Documentation improvements for partitioning
On Wed, Feb 15, 2017 at 12:08:19PM -0500, Robert Haas wrote: > On Wed, Feb 15, 2017 at 11:34 AM, Alvaro Herrera >wrote: > > I think new-style partitioning is supposed to consider each partition as > > an implementation detail of the table; the fact that you can manipulate > > partitions separately does not really mean that they are their own > > independent object. You don't stop to think "do I really want to drop > > the TOAST table attached to this main table?" and attach a CASCADE > > clause if so. You just drop the main table, and the toast one is > > dropped automatically. I think new-style partitions should behave > > equivalently. > > That's a reasonable point of view. I'd like to get some more opinions > on this topic. I'm happy to have us do whatever most people want, but > I'm worried that having table inheritance and table partitioning work > differently will be create confusion. I'm also suspicious that there > may be some implementation difficulties. On the hand, it does seem a > little silly to say that DROP TABLE partitioned_table should always > fail except in the degenerate case where there are no partitions, so > maybe changing it is for the best. I think we have a precedent for this, and that is the SERIAL data type, which is really just a macro on top of CREATE SEQUENCE and DEFAULT nextval() using the sequence. You can manipulate the sequence and the DEFAULT separately, but if you drop the table the sequence is dropped to automatically. This seems like an instructive example of how we have bundled behavior together in the past in a logical and easy-to-understand way. Of course, their might be some technical limitations that prevent us from using this approach, and I would be interested in hearing those details. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GUC for cleanup indexes threshold.
On Thu, Feb 16, 2017 at 10:41 AM, Robert Haaswrote: >> 2. The current btree vacuum code requires 2 vacuums to fully reuse >> half-dead pages. So skipping an index vacuum might mean that second >> index scan never happens at all, which would be bad. > > Maybe. If there are a tiny number of those half-dead pages in a huge > index, it probably doesn't matter. Also, I don't think it would never > happen, unless the table just never gets any more updates or deletes - > but that case could also happen today. It's just a matter of > happening less frequently. > > I guess the question is whether the accumulation of half-dead pages in > the index could become a problem before the unsetting of all-visible > bits in the heap becomes a problem. If the second one always happen > first, then we don't have an issue here, but if it's possible for the > first one to become a big problem before the second one gets to be a > serious issue, then we need something more sophisticated. Not getting to a second VACUUM where you might have otherwise can only be a problem to the extent that users are sensitive to not reclaiming disk space from indexes at the level of the FSM. It's not accurate to say that pages could be left "half dead" indefinitely by this patch, since that is something that lasts only as long as the first phase of B-Tree page deletion. In fact, the only possible problem is that pages are recyclable in principle, but that doesn't happen due to this new GUC. That isn't analogous to heap bloat at all, because it's not as if there are any downlinks or right links or left links pointing to the recyclable (fully deleted) pages; the previous key space *has* in fact been *fully* reclaimed. These pages are fully dead, and as such are out of the critical path of index scans entirely once the second phase finishes. (They only need to continue to physically exist because old index scans might follow a stale pointer). Note that there is an interlock against RecentGlobalXmin respected by VACUUM, that prevents this sort of recycling. I suspect that the restrictions on page deletion as opposed to page recycling is vastly more likely to cause pain to users, and that's not made any worse by this. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GUC for cleanup indexes threshold.
On 20 February 2017 at 10:27, Amit Kapilawrote: > On Mon, Feb 20, 2017 at 3:01 PM, Simon Riggs wrote: >> On 20 February 2017 at 09:15, Amit Kapila wrote: >>> On Mon, Feb 20, 2017 at 7:26 AM, Masahiko Sawada >>> wrote: On Fri, Feb 17, 2017 at 3:41 AM, Robert Haas wrote: > On Thu, Feb 16, 2017 at 6:17 AM, Simon Riggs > wrote: >> On 15 February 2017 at 08:07, Masahiko Sawada >> wrote: >>> It's a bug. Attached latest version patch, which passed make check. >> >> 2. The current btree vacuum code requires 2 vacuums to fully reuse >> half-dead pages. So skipping an index vacuum might mean that second >> index scan never happens at all, which would be bad. > > Maybe. If there are a tiny number of those half-dead pages in a huge > index, it probably doesn't matter. Also, I don't think it would never > happen, unless the table just never gets any more updates or deletes - > but that case could also happen today. It's just a matter of > happening less frequently. >>> >>> Yeah thats right and I am not sure if it is worth to perform a >>> complete pass to reclaim dead/deleted pages unless we know someway >>> that there are many such pages. >> >> Agreed which is why >> On 16 February 2017 at 11:17, Simon Riggs wrote: >>> I suggest that we store the number of half-dead pages in the metapage >>> after each VACUUM, so we can decide whether to skip the scan or not. >> >> >>> Also, I think we do reclaim the >>> complete page while allocating a new page in btree. >> >> That's not how it works according to the README at least. >> > > I am referring to code (_bt_getbuf()->if (_bt_page_recyclable(page))), > won't that help us in reclaiming the space? Not unless the README is incorrect, no. That section of code is just a retest of pages retrieved from FSM; they aren't even added there until two scans have occurred and even then it may not be possible to recycle. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SERIALIZABLE with parallel query
On Thu, Feb 16, 2017 at 6:19 PM, Thomas Munrowrote: > Specifically, DeleteChildTargetLocks() assumes it can walk > MySerializableXact->predicateLocks and throw away locks that are > covered by a new lock (ie throw away tuple locks because a covering > page lock has been acquired) without let or hindrance until it needs > to modify the locks themselves. That assumption doesn't hold up with > that last patch and will require a new kind of mutual exclusion. I > wonder if the solution is to introduce an LWLock into each > SERIALIZABLEXACT object, so DeleteChildTargetLocks() can prevent > workers from stepping on each others' toes during lock cleanup. An > alternative would be to start taking SerializablePredicateLockListLock > in exclusive rather than shared mode, but that seems unnecessarily > coarse. Here is a patch to do that, for discussion. It adds an LWLock to each SERIALIZABLEXACT, and acquires it after SerializablePredicateListLock and before any predicate lock partition lock. It doesn't bother with that if not in parallel mode, or in the cases where SerializablePredicateListLock is held exclusively. This prevents parallel query workers and leader from stepping on each others' toes when manipulating the predicate list. The case in CheckTargetForConflictsIn is theoretical for now since we don't support writing in parallel query yet. The case in CreatePredicateLock is reachable by running a simple parallel sequential scan. The case in DeleteChildTargetLocks is for when we've acquired a new predicate lock that covers finer grained locks which can be dropped; that is reachable the same way again. I don't think it's required in ReleaseOneSerializableXact since it was already called in several places with an sxact other than the caller's, and deals with finished transactions. -- Thomas Munro http://www.enterprisedb.com ssi-parallel-v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
Robert Haaswrites: > On Mon, Feb 20, 2017 at 7:37 PM, Tom Lane wrote: >> The question to be asked is whether there is still anybody out there >> using float timestamps. I'm starting to get dubious about it myself. > I'm wondering if it has any effect that pg_config.h.win32 says > /* Define to 1 if you want 64-bit integer timestamp and interval support. >(--enable-integer-datetimes) */ > /* #undef USE_INTEGER_DATETIMES */ > Whereas pg_config.h.win32 says: > /* Define to 1 if you want 64-bit integer timestamp and interval support. >(--enable-integer-datetimes) */ > #define USE_INTEGER_DATETIMES 1 Er, what? For me, grep finds src/include/pg_config.h.in: 836: #undef USE_INTEGER_DATETIMES src/include/pg_config.h.win32: 630: /* #undef USE_INTEGER_DATETIMES */ > It looks like it was commit 2169e42bef9db7e0bdd1bea00b81f44973ad83c8 > that enabled integer datetimes by default, but that commit seems to > not to have touched the Windows build scripts. Commit > fcf053d7829f2d83829256153e856f9a36c83ffd changed MSVC over to use > integer datetimes by default, but I'm not clear if there's any build > environment where we rely on config.h.win32 but not Solution.pm? Any such build would find itself without a defined value of BLCKSZ, to mention just one of the settings that get appended by Solution.pm. It does look like we expect pg_config.h.win32 to be usable standalone for libpq-only Windows compiles, but it would never work for building the backend. (I dunno whether the libpq-only scripts still work at all or have bitrotted, but it's irrelevant for the question at hand.) > If not, what exactly is pg_config.h.win32 for and to what degree does it > need to be in sync with pg_config.h.in? The list of differences > appears to be far more extensive than the header comment at the top of > pg_config.h.win32 would lead one to believe. Yeah, I think the maintenance of pg_config.h.win32 has been a bit more haphazard than one could wish. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel bitmap heap scan
Robert Haaswrites: > On Mon, Feb 20, 2017 at 6:19 AM, Tom Lane wrote: >> The thing that you really have to worry about for this kind of proposal >> is "what if the query errors out and we never get to ExecEndNode"? >> It's particularly nasty if you're talking about parallel queries where >> maybe only one or some of the processes involved detect an error. > I think that's not actually a problem, because we've already got code > to make sure that all DSM resources associated with the query get > blown away in that case. Of course, that code might have bugs, but if > it does, I think it's better to try to fix those bugs than to insert > some belt-and-suspenders mechanism for reclaiming every possible chunk > of memory in retail fashion, just like we blow up es_query_cxt rather > than trying to pfree allocations individually. Actually, I think we're saying the same thing: rely on the general DSM cleanup mechanism, don't insert extra stuff that you expect will get done by executor shutdown. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fd,c just Assert()s that lseek() succeeds
I wrote: > I noticed while researching bug #14555 that fd.c contains two separate > cases like > vfdP->seekPos = lseek(vfdP->fd, (off_t) 0, SEEK_CUR); > Assert(vfdP->seekPos != (off_t) -1); > This seems, um, unwise. It might somehow fail to fail in production > builds, because elsewhere it's assumed that -1 means FileUnknownPos, > but it seems like reporting the error would be a better plan. I looked at things more closely and decided that this was really pretty seriously broken. Attached is a patch that attempts to spruce things up. In LruDelete, it's more or less okay to not throw an error, mainly because that's likely to get called during error abort and so throwing an error would probably just lead to an infinite loop. But by the same token, throwing an error from the close() right after that is ill-advised, not to mention that it would've left the LRU state corrupted since we'd already unlinked the VFD. I also noticed that really, most of the time, we should know the current seek position and it shouldn't be necessary to do an lseek here at all. In the attached, if we don't have a seek position and an lseek attempt doesn't give us one, we'll close the file but then subsequent re-open attempts will fail (except in the somewhat-unlikely case that a FileSeek(SEEK_SET) call comes between and allows us to re-establish a known seek position). Meanwhile, having an Assert instead of an honest test in LruInsert is really dangerous: if that lseek failed, a subsequent read or write would read or write from the start of the file, not where the caller expected, leading to data corruption. I also fixed a number of other places that were being sloppy about behaving correctly when the seekPos is unknown. Also, I changed FileSeek to return -1 with EINVAL for the cases where it detects a bad offset, rather than throwing a hard elog(ERROR). It seemed pretty inconsistent that some bad-offset cases would get a failure return while others got elog(ERROR). It was missing an offset validity check for the SEEK_CUR case on a closed file, too. I'm inclined to treat this as a bug fix and back-patch it. lseek() failure on a valid FD is certainly unlikely, but if it did happen our behavior would be pretty bad. I'm also suspicious that some of these bugs could be exposed even without an lseek() failure, because of the code in FileRead and FileWrite that will reset seekPos to "unknown" after an error. regards, tom lane diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c index ce4bd0f..1065bc1 100644 *** a/src/backend/storage/file/fd.c --- b/src/backend/storage/file/fd.c *** int max_safe_fds = 32; /* default if n *** 160,166 --- 160,173 #define FileIsNotOpen(file) (VfdCache[file].fd == VFD_CLOSED) + /* + * Note: a VFD's seekPos is normally always valid, but if for some reason + * an lseek() fails, it might become set to FileUnknownPos. We can struggle + * along without knowing the seek position in many cases, but in some places + * we have to fail if we don't have it. + */ #define FileUnknownPos ((off_t) -1) + #define FilePosIsUnknown(pos) ((pos) < 0) /* these are the assigned bits in fdstate below: */ #define FD_TEMPORARY (1 << 0) /* T = delete when closed */ *** typedef struct vfd *** 174,180 File nextFree; /* link to next free VFD, if in freelist */ File lruMoreRecently; /* doubly linked recency-of-use list */ File lruLessRecently; ! off_t seekPos; /* current logical file position */ off_t fileSize; /* current size of file (0 if not temporary) */ char *fileName; /* name of file, or NULL for unused VFD */ /* NB: fileName is malloc'd, and must be free'd when closing the VFD */ --- 181,187 File nextFree; /* link to next free VFD, if in freelist */ File lruMoreRecently; /* doubly linked recency-of-use list */ File lruLessRecently; ! off_t seekPos; /* current logical file position, or -1 */ off_t fileSize; /* current size of file (0 if not temporary) */ char *fileName; /* name of file, or NULL for unused VFD */ /* NB: fileName is malloc'd, and must be free'd when closing the VFD */ *** LruDelete(File file) *** 967,985 vfdP = [file]; ! /* delete the vfd record from the LRU ring */ ! Delete(file); ! ! /* save the seek position */ ! vfdP->seekPos = lseek(vfdP->fd, (off_t) 0, SEEK_CUR); ! Assert(vfdP->seekPos != (off_t) -1); /* close the file */ if (close(vfdP->fd)) ! elog(ERROR, "could not close file \"%s\": %m", vfdP->fileName); ! ! --nfile; vfdP->fd = VFD_CLOSED; } static void --- 974,1003 vfdP = [file]; ! /* ! * Normally we should know the seek position, but if for some reason we ! * have lost track of it, try again to get it. If we still can't get it, ! * we have a problem: we will be unable to restore the file seek position ! * when and if the file is
[HACKERS] possible encoding issues with libxml2 functions
Hi Today I played with xml_recv function and with xml processing functions. xml_recv function ensures correct encoding from document encoding to server encoding. But the decl section holds original encoding info - that should be obsolete after encoding. Sometimes we solve this issue by removing decl section - see the xml_out function. Sometimes we don't do it - lot of functions uses direct conversion from xmltype to xmlChar. Wrong encoding in decl section can breaks libxml2 parser with error ERROR: could not parse XML document DETAIL: input conversion failed due to input error, bytes 0x88 0x3C 0x2F 0x72 line 1: switching encoding: encoder error This error is not often - but it is hard to find it - because there is small but important difference between printed XML and used XML. There are possible two fixes a) clean decl on input - the encoding info can be removed from decl part b) use xml_out_internal everywhere before transformation to xmlChar. pg_xmlCharStrndup can be good candidate. Comments? Regards Pavel
Re: [HACKERS] DROP SUBSCRIPTION and ROLLBACK
On Thu, Feb 16, 2017 at 11:40 AM, Masahiko Sawadawrote: > On Thu, Feb 16, 2017 at 7:52 AM, Petr Jelinek > wrote: >> On 15/02/17 06:43, Masahiko Sawada wrote: >>> On Tue, Feb 14, 2017 at 1:13 AM, Fujii Masao wrote: On Mon, Feb 13, 2017 at 4:05 PM, Masahiko Sawada wrote: > On Sat, Feb 11, 2017 at 8:18 PM, Petr Jelinek > wrote: >> On 10/02/17 19:55, Masahiko Sawada wrote: >>> On Thu, Feb 9, 2017 at 12:44 AM, Petr Jelinek >>> wrote: On 08/02/17 07:40, Masahiko Sawada wrote: > On Wed, Feb 8, 2017 at 9:01 AM, Michael Paquier > wrote: >> On Wed, Feb 8, 2017 at 1:30 AM, Fujii Masao >> wrote: >>> On Wed, Feb 8, 2017 at 12:26 AM, Petr Jelinek >>> wrote: For example what happens if apply crashes during the DROP SUBSCRIPTION/COMMIT and is not started because the delete from catalog is now visible so the subscription is no longer there? >>> >>> Another idea is to treat DROP SUBSCRIPTION in the same way as >>> VACUUM, i.e., >>> make it emit an error if it's executed within user's transaction >>> block. >> >> It seems to me that this is exactly Petr's point: using >> PreventTransactionChain() to prevent things to happen. > > Agreed. It's better to prevent to be executed inside user transaction > block. And I understood there is too many failure scenarios we need to > handle. > >>> Also DROP SUBSCRIPTION should call CommitTransactionCommand() just >>> after removing the entry from pg_subscription, then connect to the >>> publisher >>> and remove the replication slot. >> >> For consistency that may be important. > > Agreed. > > Attached patch, please give me feedback. > This looks good (and similar to what initial patch had btw). Works fine for me as well. Remaining issue is, what to do about CREATE SUBSCRIPTION then, there are similar failure scenarios there, should we prevent it from running inside transaction as well? >>> >>> Hmm, after thought I suspect current discussing approach. For >>> example, please image the case where CRAETE SUBSCRIPTION creates >>> subscription successfully but fails to create replication slot for >>> whatever reason, and then DROP SUBSCRIPTION drops the subscription but >>> dropping replication slot is failed. In such case, CREAET SUBSCRIPTION >>> and DROP SUBSCRIPTION return ERROR but the subscription is created and >>> dropped successfully. I think that this behaviour confuse the user. >>> >>> I think we should just prevent calling DROP SUBSCRIPTION in user's >>> transaction block. Or I guess that it could be better to separate the >>> starting/stopping logical replication from subscription management. >>> >> >> We need to stop the replication worker(s) in order to be able to drop >> the slot. There is no such issue with startup of the worker as that one >> is launched by launcher after the transaction has committed. >> >> IMO best option is to just don't allow DROP/CREATE SUBSCRIPTION inside a >> transaction block and don't do any commits inside of those (so that >> there are no rollbacks, which solves your initial issue I believe). That >> way failure to create/drop slot will result in subscription not being >> created/dropped which is what we want. On second thought, +1. > I basically agree this option, but why do we need to change CREATE > SUBSCRIPTION as well? Because the window between the creation of replication slot and the transaction commit of CREATE SUBSCRIPTION should be short. Otherwise, if any error happens during that window, the replication slot unexpectedly remains while there is no corresponding subscription. Of course, even If we prevent CREATE SUBSCRIPTION from being executed within user's transaction block, there is still such window. But we can reduce the possibility of that problem. >>> >>> Thank you for the explanation. I understood and agree. >>> >>> I think we should disallow to call ALTER SUBSCRIPTION inside a user's >>> transaction block as well. >> >> Why? ALTER SUBSCRIPTION does not create/drop anything on remote server ever. >> > > Hmm you're right. ALTER SUBSCRIPTION can support transaction. Attached > fixed version patch. We should disallow CREATE/DROP SUBSCRIPTION inside a user transaction block only when CREATE/DROP SLOT option
Re: [HACKERS] GUC for cleanup indexes threshold.
On Mon, Feb 20, 2017 at 6:15 PM, Amit Kapilawrote: > On Mon, Feb 20, 2017 at 7:26 AM, Masahiko Sawada > wrote: >> On Fri, Feb 17, 2017 at 3:41 AM, Robert Haas wrote: >>> On Thu, Feb 16, 2017 at 6:17 AM, Simon Riggs wrote: On 15 February 2017 at 08:07, Masahiko Sawada wrote: > It's a bug. Attached latest version patch, which passed make check. In its current form, I'm not sure this is a good idea. Problems... 1. I'm pretty sure the world doesn't need another VACUUM parameter I suggest that we use the existing vacuum scale factor/4 to reflect that indexes are more sensitive to bloat. >>> >>> I do not think it's a good idea to control multiple behaviors with a >>> single GUC. We don't really know that dividing by 4 will be right for >>> everyone, or even for most people. It's better to have another >>> parameter with a sensible default than to hardcode a ratio that might >>> work out poorly for some people. >>> 2. The current btree vacuum code requires 2 vacuums to fully reuse half-dead pages. So skipping an index vacuum might mean that second index scan never happens at all, which would be bad. >>> >>> Maybe. If there are a tiny number of those half-dead pages in a huge >>> index, it probably doesn't matter. Also, I don't think it would never >>> happen, unless the table just never gets any more updates or deletes - >>> but that case could also happen today. It's just a matter of >>> happening less frequently. >> > > Yeah thats right and I am not sure if it is worth to perform a > complete pass to reclaim dead/deleted pages unless we know someway > that there are many such pages. Also, I think we do reclaim the > complete page while allocating a new page in btree. > >> The half-dead pages are never cleaned up if the ratio of pages >> containing garbage is always lower than threshold. >> > > Which threshold are you referring here? > I meant the new parameter in current patch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should we cacheline align PGXACT?
On 20 February 2017 at 17:32, Robert Haaswrote: >>> Have you checked whether this >>> patch makes any noticeable performance difference? >> >> No, but then we're reducing the number of calls to PgXact directly; >> there is no heuristic involved, its just a pure saving. > > Well, it's adding a branch where there wasn't one. A branch that is avoided in almost all cases, so easy to predict. > Maybe that costs > essentially nothing and the saved write to shared memory saves > something noticeable, but for all I know it's the reverse. If I had > to guess, it would be that neither the costs nor the savings from this > are in the slightest way noticeable on a macrobenchmark, and therefore > there's not much point in changing it, but that could be 100% wrong. Given Andres' earlier measurements, it seems worth testing to me. Hopefully someone can recheck. Thanks in advance. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel bitmap heap scan
On Mon, Feb 20, 2017 at 6:19 AM, Tom Lanewrote: > Thomas Munro writes: >> One practical problem that came up was the need for executor nodes to >> get a chance to do that kind of cleanup before the DSM segment is >> detached. In my patch series I introduced a new node API >> ExecNodeDetach to allow for that. Andres objected that the need for >> that is evidence that the existing protocol is broken and should be >> fixed instead. I'm looking into that. > > The thing that you really have to worry about for this kind of proposal > is "what if the query errors out and we never get to ExecEndNode"? > It's particularly nasty if you're talking about parallel queries where > maybe only one or some of the processes involved detect an error. I think that's not actually a problem, because we've already got code to make sure that all DSM resources associated with the query get blown away in that case. Of course, that code might have bugs, but if it does, I think it's better to try to fix those bugs than to insert some belt-and-suspenders mechanism for reclaiming every possible chunk of memory in retail fashion, just like we blow up es_query_cxt rather than trying to pfree allocations individually. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel bitmap heap scan
On Mon, Feb 20, 2017 at 5:55 AM, Thomas Munrowrote: > On Sun, Feb 19, 2017 at 9:59 PM, Dilip Kumar wrote: >> So basically, what I want to propose is that Only during >> ExecReScanBitmapHeapScan we can free all the DSA pointers because at >> that time we can be sure that all the workers have completed there >> task and we are safe to free. (And we don't free any DSA memory at >> ExecEndBitmapHeapScan). > > I think this works. OK. > Some hand-wavy thoughts on this topic in the context of hash joins: > > The argument for cleaning up sooner rather than later would be that it > could reduce the total peak memory usage of large execution plans. Is > that a reasonable goal and can we achieve it? I suspect the answer is > yes in theory but no in practice, and we don't even try to achieve it > in non-parallel queries as far as I know. We're pretty stupid about causing nodes to stop eating up resources as early as we could; for example, when a Limit is filled, we don't make any attempt to have scans underneath it release pins or memory or anything else. But we don't usually let the same node consume memory multiple times. ExecReScanBitmapHeapScan frees all of the memory used for the previous bitmap in the non-parallel case, so it should probably do that in the parallel case also. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should we cacheline align PGXACT?
On Mon, Feb 20, 2017 at 10:49 PM, Simon Riggswrote: >> Regarding reduce_pgxact_access_AtEOXact.v1.patch, it took me a few >> minutes to figure out that the comment was referring to >> ProcArrayEndTransaction(), so it might be good to be more explicit >> about that if we go forward with this. > > Sure, attached. Looks better, thanks. >> Have you checked whether this >> patch makes any noticeable performance difference? > > No, but then we're reducing the number of calls to PgXact directly; > there is no heuristic involved, its just a pure saving. Well, it's adding a branch where there wasn't one. Maybe that costs essentially nothing and the saved write to shared memory saves something noticeable, but for all I know it's the reverse. If I had to guess, it would be that neither the costs nor the savings from this are in the slightest way noticeable on a macrobenchmark, and therefore there's not much point in changing it, but that could be 100% wrong. >> It's sure >> surprising that we go to all of this trouble to clean things up in >> AtEOXact_Snapshot() when we've already nuked MyPgXact->xmin from >> orbit. (Instead of changing AtEOXact_Snapshot, should we think about >> removing the xid clear logic from ProcArrayEndTransaction and only >> doing it here, or would that be wrong-headed?) > > If anything, I'd move the call to PgXact->xmin = InvalidTransactionId > into a function inside procarray.c, so we only touch snapshots in > snapmgr.c and all procarray stuff is isolated. (Not done here, yet). I'm not convinced that really buys us anything except more function-call overhead. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add pg_disable_checksums() and supporting infrastructure
> On Feb 19, 2017, at 8:14 PM, Jim Nasbywrote: > > On 2/19/17 11:02 AM, David Christensen wrote: >> My design notes for the patch were submitted to the list with little >> comment; see: >> https://www.postgresql.org/message-id/1E6E64E9-634B-43F4-8AA2-CD85AD92D2F8%40endpoint.com >> >> I have since added the WAL logging of checksum states, however I’d be glad >> to take feedback on the other proposed approaches (particularly the system >> catalog changes + the concept of a checksum cycle).] > > A couple notes: > > - AFAIK unlogged tables get checksummed today if checksums are enabled; the > same should hold true if someone enables checksums on the whole cluster. Agreed; AFAIK this should already be working if it’s using the PageIsVerified() API, since I just effectively modified the logic there, depending on state. > - Shared relations should be handled as well; you don't mention them. I agree that they should be handled as well; I thought I had mentioned it later in the design doc, though TBH I’m not sure if there is more involved than just visiting the global relations in pg_class. In addition we need to visit all forks of each relation. I’m not certain if loading those into shared_buffers would be sufficient; I think so, but I’d be glad to be informed otherwise. (As long as they’re already using the PageIsVerified() API call they get this logic for free. > - If an entire cluster is going to be considered as checksummed, then even > databases that don't allow connections would need to get enabled. Yeah, the workaround for now would be to require “datallowconn" to be set to ’t' for all databases before proceeding, unless there’s a way to connect to those databases internally that bypasses that check. Open to ideas, though for a first pass seems like the “datallowconn” approach is the least amount of work. > I like the idea of revalidation, but I'd suggest leaving that off of the > first pass. Yeah, agreed. > It might be easier on a first pass to look at supporting per-database > checksums (in this case, essentially treating shared catalogs as their own > database). All normal backends do per-database stuff (such as setting > current_database) during startup anyway. That doesn't really help for things > like recovery and replication though. :/ And there's still the question of > SLRUs (or are those not checksum'd today??). So you’re suggesting that the data_checksums GUC get set per-database context, so once it’s fully enabled in the specific database it treats it as in enforcing state, even if the rest of the cluster hasn’t completed? Hmm, might think on that a bit, but it seems pretty straightforward. What issues do you see affecting replication and recovery specifically (other than the entire cluster not being complete)? Since the checksum changes are WAL logged, seems you be no worse the wear on a standby if you had to change things. > BTW, it occurs to me that this is related to the problem we have with trying > to make changes that break page binary compatibility. If we had a method for > handling that it would probably be useful for enabling checksums as well. > You'd essentially treat an un-checksum'd page as if it was an "old page > version". The biggest problem there is dealing with the potential that the > new page needs to be larger than the old one was, but maybe there's some > useful progress to be had in this area before tackling the "page too small" > problem. I agree it’s very similar; my issue is I don’t want to have to postpone handling a specific case for some future infrastructure. That being said, what I could see being a general case is the piece which basically walks all pages in the cluster; as long as the page checksum/format validation happens at Page read time, we could do page version checks/conversions at the same time, and the only special code we’d need is to keep track of which files still need to be visited and how to minimize the impact on the cluster via some sort of throttling/leveling. (It’s also similar to vacuum in that way, however we have been going out of our way to make vacuum smart enough to *avoid* visiting every page, so I think it is a different enough use case that we can’t tie the two systems together, nor do I feel like taking that project on.) We could call the checksum_cycle something else (page_walk_cycle? bikeshed time!) and basically have the infrastructure to initiate online/gradual conversion/processing of all pages for free. Thoughts? David -- David Christensen End Point Corporation da...@endpoint.com 785-727-1171 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should we cacheline align PGXACT?
On 20 February 2017 at 16:53, Robert Haaswrote: > On Mon, Feb 20, 2017 at 6:02 PM, Simon Riggs wrote: >> On 15 February 2017 at 19:15, Andres Freund wrote: >> >>> I think I previously >>> mentioned, even just removing the MyPgXact->xmin assignment in >>> SnapshotResetXmin() is measurable performance wise and cache-hit ratio >>> wise. >> >> Currently, we issue SnapshotResetXmin() pointlessly at end of xact, so >> patch attached to remove that call, plus some comments to explain >> that. This reduces the cause. >> >> Also, another patch to reduce the calls to SnapshotResetXmin() using a >> simple heuristic to reduce the effects. > > I think skip_SnapshotResetXmin_if_idle_timeout.v1.patch isn't a good > idea, because it could have the surprising result that setting > idle_in_transaction_timeout to a non-zero value makes bloat worse. I > don't think users will like that. > > Regarding reduce_pgxact_access_AtEOXact.v1.patch, it took me a few > minutes to figure out that the comment was referring to > ProcArrayEndTransaction(), so it might be good to be more explicit > about that if we go forward with this. Sure, attached. > Have you checked whether this > patch makes any noticeable performance difference? No, but then we're reducing the number of calls to PgXact directly; there is no heuristic involved, its just a pure saving. > It's sure > surprising that we go to all of this trouble to clean things up in > AtEOXact_Snapshot() when we've already nuked MyPgXact->xmin from > orbit. (Instead of changing AtEOXact_Snapshot, should we think about > removing the xid clear logic from ProcArrayEndTransaction and only > doing it here, or would that be wrong-headed?) If anything, I'd move the call to PgXact->xmin = InvalidTransactionId into a function inside procarray.c, so we only touch snapshots in snapmgr.c and all procarray stuff is isolated. (Not done here, yet). -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services reduce_pgxact_access_AtEOXact.v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
On Mon, Feb 20, 2017 at 7:37 PM, Tom Lanewrote: > The question to be asked is whether there is still anybody out there > using float timestamps. I'm starting to get dubious about it myself. > Certainly, no packager that I'm aware of has shipped a float-timestamp > build since we switched the default in 8.4. Maybe there is somebody > who's faithfully built a float-timestamp custom build every year so they > can pg_upgrade in place from their 8.3 installation, but there have got > to be darn few such people. I'm wondering if it has any effect that pg_config.h.win32 says /* Define to 1 if you want 64-bit integer timestamp and interval support. (--enable-integer-datetimes) */ /* #undef USE_INTEGER_DATETIMES */ Whereas pg_config.h.win32 says: /* Define to 1 if you want 64-bit integer timestamp and interval support. (--enable-integer-datetimes) */ #define USE_INTEGER_DATETIMES 1 It looks like it was commit 2169e42bef9db7e0bdd1bea00b81f44973ad83c8 that enabled integer datetimes by default, but that commit seems to not to have touched the Windows build scripts. Commit fcf053d7829f2d83829256153e856f9a36c83ffd changed MSVC over to use integer datetimes by default, but I'm not clear if there's any build environment where we rely on config.h.win32 but not Solution.pm? If not, what exactly is pg_config.h.win32 for and to what degree does it need to be in sync with pg_config.h.in? The list of differences appears to be far more extensive than the header comment at the top of pg_config.h.win32 would lead one to believe. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should we cacheline align PGXACT?
On Mon, Feb 20, 2017 at 6:02 PM, Simon Riggswrote: > On 15 February 2017 at 19:15, Andres Freund wrote: > >> I think I previously >> mentioned, even just removing the MyPgXact->xmin assignment in >> SnapshotResetXmin() is measurable performance wise and cache-hit ratio >> wise. > > Currently, we issue SnapshotResetXmin() pointlessly at end of xact, so > patch attached to remove that call, plus some comments to explain > that. This reduces the cause. > > Also, another patch to reduce the calls to SnapshotResetXmin() using a > simple heuristic to reduce the effects. I think skip_SnapshotResetXmin_if_idle_timeout.v1.patch isn't a good idea, because it could have the surprising result that setting idle_in_transaction_timeout to a non-zero value makes bloat worse. I don't think users will like that. Regarding reduce_pgxact_access_AtEOXact.v1.patch, it took me a few minutes to figure out that the comment was referring to ProcArrayEndTransaction(), so it might be good to be more explicit about that if we go forward with this. Have you checked whether this patch makes any noticeable performance difference? It's sure surprising that we go to all of this trouble to clean things up in AtEOXact_Snapshot() when we've already nuked MyPgXact->xmin from orbit. (Instead of changing AtEOXact_Snapshot, should we think about removing the xid clear logic from ProcArrayEndTransaction and only doing it here, or would that be wrong-headed?) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On 02/11/2017 01:38 AM, Tomas Vondra wrote: Incidentally, I've been dealing with a checksum failure reported by a customer last week, and based on the experience I tend to agree that we don't have the tools needed to deal with checksum failures. I think such tooling should be a 'must have' for enabling checksums by default. In this particular case the checksum failure is particularly annoying because it happens during recovery (on a standby, after a restart), during startup, so FATAL means shutdown. I've managed to inspect the page in different way (dd and pageinspect from another instance), and it looks fine - no obvious data corruption, the only thing that seems borked is the checksum itself, and only three consecutive bits are flipped in the checksum. So this doesn't seem like a "stale checksum" - hardware issue is a possibility (the machine has ECC RAM though), but it might just as easily be a bug in PostgreSQL, when something scribbles over the checksum due to a buffer overflow, just before we write the buffer to the OS. So 'false failures' are not entirely impossible thing. Not to leave this without any resolution, it seems the issue has been caused by a SAN. Some configuration changes or something was being done at the time of the issue, and the SAN somehow ended up writing a page into a different relfilenode, into a different block. The page was from a btree index and got written into a heap relfilenode, but otherwise it was correct - the only thing that changed seems to be the block number, which explains the minimal difference in the checksum. I don't think we'll learn much more, but it seems the checksums did their work in detecting the issue. > So I think we're not ready to enable checksums by default for everyone, not until we can provide tools to deal with failures like this (I don't think users will be amused if we tell them to use 'dd' and inspect the pages in a hex editor). ISTM the way forward is to keep the current default (disabled), but to allow enabling checksums on the fly. That will mostly fix the issue for people who actually want checksums but don't realize they need to enable them at initdb time (and starting from scratch is not an option for them), are running on good hardware and are capable of dealing with checksum errors if needed, even without more built-in tooling. Being able to disable checksums on the fly is nice, but it only really solves the issue of extra overhead - it does really help with the failures (particularly when you can't even start the database, because of a checksum failure in the startup phase). So, shall we discuss what tooling would be useful / desirable? Although the checksums did detect the issue (we might never notice without them, or maybe the instance would mysteriously crash), I still think better tooling is neeed. I've posted some minor pageinspect improvements I hacked together while investigating this, but I don't think pageinspect is a very good tool for investigating checksum / data corruption issues, for a number of reasons: 1) It does not work at all when the instance does not even start - you have to manually dump the pages and try inspecting them from another instance. 2) Even then it assumes the pages are not corrupted, and may easily cause segfaults or other issues if that's not the case. 3) Works on a manual page-by-page basis. 4) It does not even try to resolve the issue somehow. For example I think it'd be great to have a tool that work even on instances that are not running. For example, something that recursively walks through all files in a data directory, verifies checksums on everything, lists/dumps pages with broken checksums for further inspection. I have an alpha-alpha versions of something along those lines, written before the root cause was identified. It'd be nice to have something that could help with fixing the issues (e.g. by fetching the last FPI from the backup, or so). But that's clearly way more difficult. There are probably some other tools that might be useful when dealing with data corruption (e.g. scrubbing to detect it). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PROPOSAL] Temporal query processing with range types
On Wed, Feb 15, 2017 at 9:33 PM, David G. Johnstonwrote: > On Wed, Feb 15, 2017 at 12:24 PM, Robert Haas wrote: >> >> So it seems like an ALIGN or NORMALIZE option is kind of like a JOIN, >> except apparently there's no join type and the optimizer can never >> reorder these operations with each other or with other joins. Is that >> right? The optimizer changes in this patch seem fairly minimal, so >> I'm guessing it can't be doing anything very complex here. >> >> + * INPUT: >> + * (r ALIGN s ON q WITH (r.ts, r.te, s.ts, s.te)) c >> + * where q can be any join qualifier, and r.ts, r.te, s.ts, >> and s.t >> e >> + * can be any column name. >> + * >> + * OUTPUT: >> + * ( >> + * SELECT r.*, GREATEST(r.ts, s.ts) P1, LEAST(r.te, s.te) P2 >> + * FROM >> + * ( >> + * SELECT *, row_id() OVER () rn FROM r >> + * ) r >> + * LEFT OUTER JOIN >> + * s >> + * ON q AND r.ts < s.te AND r.te > s.ts >> + * ORDER BY rn, P1, P2 >> + * ) c >> >> It's hard to see what's going on here. What's ts? What's te? If you >> used longer names for these things, it might be a bit more >> self-documenting. > > > Just reasoning out loud here... > > ISTM ts and te are "temporal [range] start" and "temporal [range] end" (or > probably just the common "timestamp start/end") > > From what I can see it is affecting an intersection of the two ranges and, > furthermore, splitting the LEFT range into sub-ranges that match up with the > sub-ranges found on the right side. From the example above this seems like > it should be acting on self-normalized ranges - but I may be missing > something by evaluating this out of context. > > r1 [1, 6] [ts, te] [time period start, time period end] > s1 [2, 3] > s2 [3, 4] > s3 [5, 7] > > r LEFT JOIN s ON (r.ts < s.te AND r.te > s.ts) > > r1[1, 6],s1[2, 3] => [max(r.ts, s.ts),min(r.te, s.te)] => r1[1, 6],d[2, 3] > r1[1, 6],s2[3, 4] => [max(t.ts, s.ts),min(r.te, s.te)] => r1[1, 6],d[3, 4] > r1[1, 6],s3[5, 7] => [max(t.ts, s.ts),min(r.te, s.te)] => r1[1, 6],d[5, 6] > > Thus the intersection is [2,6] but since s1 has three ranges that begin > between 2 and 6 (i.e., 2, 3, and 5) there are three output records that > correspond to those sub-ranges. Yes, this is what the internal rewriting produces for r1. Note that till now we only support half-open ranges, i.e., [), but for visibility I will continue this example using closed ranges []. The executor function (ExecTemporalAdjustment) gets this (the output above) as the input and will then produce: r1[1, 1] r1[2, 3] r1[3, 4] r1[5, 6] Which means also for the ALIGN the non-overlapping parts are retained. > > The description in the OP basically distinguishes between NORMALIZE and > ALIGN in that ALIGN, as described above, affects an INTERSECTION on the two > ranges - discarding the non-overlapping data - while NORMALIZE performs the > alignment while also retaining the non-overlapping data. Also for ALIGN we retain the non-overlapping part. Intersections are symmetric/commutative, so a subsequent outer join can then use equality on the ranges to produce join matches (for overlapping) as well as null-extend the produced non-overlapping parts. The difference between ALIGN and NORMALIZE is how they split, while ALIGN produces intersections between pairs of tuples (used for joins) and the non-overlapping parts, NORMALIZE produces intersections between groups of tuples (used for aggregation, so that all tuples with the same group have equal ranges) and the non-overlapping parts. For instance, the NORMALIZE between r1, s1, s2, and s3 in your example above would give the following: r1[1, 1] r1[2, 2] r1[3, 3] r1[4, 4] r1[5, 6] > > The rest of the syntax seems to deal with selecting subsets of range records > based upon attribute data. Yes, exactly! Best regards, Anton, Johann, Michael, Peter -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
On Fri, Feb 17, 2017 at 7:21 PM, Mithun Cywrote: > > To implement such an incremental addition of bucket blocks I have to > increase the size of array hashm_spares in meta page by four times. > Also, mapping methods which map a total number of buckets to a > split-point position of hashm_spares array, need to be changed. These > changes create backward compatibility issues. > How will high and lowmask calculations work in this new strategy? Till now they always work on doubling strategy and I don't see you have changed anything related to that code. Check below places. _hash_metapinit() { .. /* * We initialize the index with N buckets, 0 .. N-1, occupying physical * blocks 1 to N. The first freespace bitmap page is in block N+1. Since * N is a power of 2, we can set the masks this way: */ metap->hashm_maxbucket = metap->hashm_lowmask = num_buckets - 1; metap->hashm_highmask = (num_buckets << 1) - 1; .. } _hash_expandtable() { .. if (new_bucket > metap->hashm_highmask) { /* Starting a new doubling */ metap->hashm_lowmask = metap->hashm_highmask; metap->hashm_highmask = new_bucket | metap->hashm_lowmask; } .. } Till now, we have worked hard for not changing the page format in a backward incompatible way, so it will be better if we could find some way of doing this without changing the meta page format in a backward incompatible way. Have you considered to store some information in shared memory based on which we can decide how much percentage of buckets are allocated in current table half? I think we might be able to construct this information after crash recovery as well. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] powerpc(32) point/polygon regression failures on Debian Jessie
Re: Tom Lane 2017-02-20 <13825.1487607...@sss.pgh.pa.us> > Still, it'd be worth comparing the assembly code for your test program. I was compiling the program on jessie and on sid, and running the jessie binary on sid made it output the same as the sid binary, so the difference isn't in the binary, but in some system library. Christoph -- Senior Berater, Tel.: +49 2166 9901 187 credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer pgp fingerprint: 5C48 FE61 57F4 9179 5970 87C6 4C5A 6BAB 12D2 A7AE -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] PATCH: pageinspect / add page_checksum and bt_page_items(bytea)
Hi, while investigating some checksum-related issues, I needed to perform some forensics on a copy of a btree page (taken from another instance using 'dd'). But I've ran into two pageinspect limitations, hopefully addressed by this patch: 1) bt_page_items(bytea) not defined We have heap_page_items(bytea) but not bt_page_items(bytea). I suspect this is partially historical API inconsistence, and partially due to the need to handle the btree metapage explicitly. The original function simply threw an error with blkno=0, the new function simply checks for BTP_META page. I believe this is sufficient, assuming the instance without data corruption (which pageinspect assumes anyway). With data corruption all bets are off anyway - for example the metapage might be written to a different block (essentially what I saw in the investigated issue). Actually, the flag check is better in this case - it detects the metapage, while the blkno=0 check fails to do that (leading to crash). 2) page_checksum() When everything is fine, you can do page_header() which also includes the checksum. When the checksum gets broken, you may still dump the page using 'dd+pg_read_binary_file' to see the header, but clearly that checksum is wrong - and it's interesting to see the correct one and compare it to the checksum in the header. This function makes it possible - it accepts the bytea image of the page, and blkno (so it's possible to see how would the block look if it was written somewhere else, for example). BTW I've noticed the pageinspect version is 1.6, but we only have pageinspect--1.5.sql (and upgrade script to 1.6). Not sure that's entirely intentional? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From 499a7bd9aea3032f03d787833c0501d9fa703271 Mon Sep 17 00:00:00 2001 From: Tomas VondraDate: Mon, 13 Feb 2017 21:20:12 +0100 Subject: [PATCH] pageinspect - page_checksum and bt_page_items(bytea) Adds two functions to the pageinspect extension: 1) page_checksum(bytea,int4) allows computing checksum for arbitrary page, even if data checksums are not enabled 2) bt_page_items(bytea) is similar to heap_page_items(bytea) --- contrib/pageinspect/btreefuncs.c | 209 +++--- contrib/pageinspect/expected/btree.out| 13 ++ contrib/pageinspect/pageinspect--1.5--1.6.sql | 22 +++ contrib/pageinspect/rawpage.c | 37 + contrib/pageinspect/sql/btree.sql | 4 + doc/src/sgml/pageinspect.sgml | 58 +++ src/include/access/nbtree.h | 1 + 7 files changed, 320 insertions(+), 24 deletions(-) diff --git a/contrib/pageinspect/btreefuncs.c b/contrib/pageinspect/btreefuncs.c index d50ec3a..93da844 100644 --- a/contrib/pageinspect/btreefuncs.c +++ b/contrib/pageinspect/btreefuncs.c @@ -39,8 +39,14 @@ #include "utils/varlena.h" +extern Datum bt_metap(PG_FUNCTION_ARGS); +extern Datum bt_page_items(PG_FUNCTION_ARGS); +extern Datum bt_page_items_bytea(PG_FUNCTION_ARGS); +extern Datum bt_page_stats(PG_FUNCTION_ARGS); + PG_FUNCTION_INFO_V1(bt_metap); PG_FUNCTION_INFO_V1(bt_page_items); +PG_FUNCTION_INFO_V1(bt_page_items_bytea); PG_FUNCTION_INFO_V1(bt_page_stats); #define IS_INDEX(r) ((r)->rd_rel->relkind == RELKIND_INDEX) @@ -215,17 +221,31 @@ bt_page_stats(PG_FUNCTION_ARGS) elog(ERROR, "return type must be a row type"); j = 0; - values[j++] = psprintf("%d", stat.blkno); - values[j++] = psprintf("%c", stat.type); - values[j++] = psprintf("%d", stat.live_items); - values[j++] = psprintf("%d", stat.dead_items); - values[j++] = psprintf("%d", stat.avg_item_size); - values[j++] = psprintf("%d", stat.page_size); - values[j++] = psprintf("%d", stat.free_size); - values[j++] = psprintf("%d", stat.btpo_prev); - values[j++] = psprintf("%d", stat.btpo_next); - values[j++] = psprintf("%d", (stat.type == 'd') ? stat.btpo.xact : stat.btpo.level); - values[j++] = psprintf("%d", stat.btpo_flags); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.blkno); + values[j] = palloc(32); + snprintf(values[j++], 32, "%c", stat.type); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.live_items); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.dead_items); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.avg_item_size); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.page_size); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.free_size); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.btpo_prev); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.btpo_next); + values[j] = palloc(32); + if (stat.type == 'd') + snprintf(values[j++], 32, "%d", stat.btpo.xact); + else + snprintf(values[j++], 32, "%d", stat.btpo.level); + values[j] = palloc(32); + snprintf(values[j++], 32,
Re: [HACKERS] powerpc(32) point/polygon regression failures on Debian Jessie
Christoph Bergwrites: > Re: Tom Lane 2017-02-20 <30737.1487598...@sss.pgh.pa.us> >> Hmph. We haven't touched that code in awhile, and certainly not in the >> 9.4.x branch. I'd have to agree that this must be a toolchain change. > FYI, in the meantime we could indeed trace it back to an libc issue on > Jessie: I wonder whether it's a compiler change, maybe along the lines of rearranging the computation so that it gives a slightly different result. Although you'd think that 10.0/10.0 would give exactly 1.0 no matter what. Still, it'd be worth comparing the assembly code for your test program. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] powerpc(32) point/polygon regression failures on Debian Jessie
Re: Tom Lane 2017-02-20 <30737.1487598...@sss.pgh.pa.us> > Hmph. We haven't touched that code in awhile, and certainly not in the > 9.4.x branch. I'd have to agree that this must be a toolchain change. FYI, in the meantime we could indeed trace it back to an libc issue on Jessie: $ cat sqrt.c #include #include #include double pg_hypot(double x, double y) { double yx; /* Some PG-specific code deleted here */ /* Else, drop any minus signs */ x = fabs(x); y = fabs(y); /* Swap x and y if needed to make x the larger one */ if (x < y) { double temp = x; x = y; y = temp; } /* * If y is zero, the hypotenuse is x. This test saves a few cycles in * such cases, but more importantly it also protects against * divide-by-zero errors, since now x >= y. */ if (y == 0.0) return x; /* Determine the hypotenuse */ yx = y / x; return x * sqrt(1.0 + (yx * yx)); } int main () { //fesetround(FE_TONEAREST); printf("fegetround is %d\n", fegetround()); double r = pg_hypot(10.0, 10.0); printf("14 %.14g\n", r); printf("15 %.15g\n", r); printf("16 %.16g\n", r); printf("17 %.17g\n", r); return 0; } Jessie output: fegetround is 0 14 14.142135623731 15 14.1421356237309 16 14.14213562373095 17 14.142135623730949 Sid output: fegetround is 0 14 14.142135623731 15 14.142135623731 16 14.14213562373095 17 14.142135623730951 The Sid output is what the point and polygon tests are expecting. Possible culprit is this bug report from November: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=843904 (Though that doesn't explain why it affects 32bit powerpc only.) Christoph -- Senior Berater, Tel.: +49 2166 9901 187 credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer pgp fingerprint: 5C48 FE61 57F4 9179 5970 87C6 4C5A 6BAB 12D2 A7AE -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Suppress Clang 3.9 warnings
Tomas Vondrawrites: > On 02/20/2017 04:37 PM, Tom Lane wrote: >> But that's using gcc. Perhaps clang behaves differently? > AFAIK it happens because clang treats missing declarations as warnings, > which confuses configure: > https://bugs.llvm.org//show_bug.cgi?id=20820 Ah, right. Looks like the autoconf people still haven't made a release incorporating the fix Noah provided :-( regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Suppress Clang 3.9 warnings
On 02/20/2017 04:37 PM, Tom Lane wrote: Aleksander Alekseevwrites: In theory - could we just always use our internal strl* implementations? Hmm, maybe configure's test to see if a declaration has been provided is going wrong? I notice that anchovy, which is supposedly current Arch Linux, doesn't think the platform has it: checking whether strlcat is declared... no checking whether strlcpy is declared... no ... checking for strlcat... no checking for strlcpy... no But that's using gcc. Perhaps clang behaves differently? AFAIK it happens because clang treats missing declarations as warnings, which confuses configure: https://bugs.llvm.org//show_bug.cgi?id=20820 regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Provide list of subscriptions and publications in psql's completion
Greetings, * Fujii Masao (masao.fu...@gmail.com) wrote: > On Fri, Feb 17, 2017 at 11:17 PM, Peter Eisentraut >wrote: > > On 2/13/17 12:07, Fujii Masao wrote: > >> Anyway IMO that we can expose all the > >> columns except the sensitive information (i.e., subconninfo field) > >> in pg_subscription to even non-superusers. > > > > You mean with column privileges? > > Yes. > > So there are several approaches... > > 1) Expose all the columns except subconninfo in pg_subscription to > non-superusers. In this idea, the tab-completion and psql meta-command > for subscription still sees pg_subscription. One good point of > idea is that even non-superusers can see all the useful information > about subscriptions other than sensitive information like conninfo. > > 2) Change pg_stat_subscription so that it also shows all the columns except > subconninfo in pg_subscription. Also change the tab-completion and > psql meta-command for subscription so that they see pg_stat_subscription > instead of pg_subscription. One good point is that pg_stat_subscription > exposes all the useful information about subscription to even > non-superusers. OTOH, pg_subscription and pg_stat_subscription have > to have several same columns. This would be redundant and a bit confusing. > > 3) Expose subdbid in pg_stat_subscription. Also change the tab-completion > and psql meta-command for subscription so that they see > pg_stat_subscription. This change is very simple. But non-superusers > cannot > see useful information like subslotname because of privilege problem. > > I like #1, but any better approach? #1 seems alright to me, at least. We could start using column-level privs in other places also, as I mentioned up-thread. I don't particularly like the suggestions to have psql use pg_stat_X views or to put things into pg_stat_X views which are object definitions for non-superusers. If we really don't want to use column-level privileges then we should have an appropriate view create instead which provides non-superusers with the non-sensitive object information. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Provide list of subscriptions and publications in psql's completion
On Fri, Feb 17, 2017 at 11:17 PM, Peter Eisentrautwrote: > On 2/13/17 12:07, Fujii Masao wrote: >> Anyway IMO that we can expose all the >> columns except the sensitive information (i.e., subconninfo field) >> in pg_subscription to even non-superusers. > > You mean with column privileges? Yes. So there are several approaches... 1) Expose all the columns except subconninfo in pg_subscription to non-superusers. In this idea, the tab-completion and psql meta-command for subscription still sees pg_subscription. One good point of idea is that even non-superusers can see all the useful information about subscriptions other than sensitive information like conninfo. 2) Change pg_stat_subscription so that it also shows all the columns except subconninfo in pg_subscription. Also change the tab-completion and psql meta-command for subscription so that they see pg_stat_subscription instead of pg_subscription. One good point is that pg_stat_subscription exposes all the useful information about subscription to even non-superusers. OTOH, pg_subscription and pg_stat_subscription have to have several same columns. This would be redundant and a bit confusing. 3) Expose subdbid in pg_stat_subscription. Also change the tab-completion and psql meta-command for subscription so that they see pg_stat_subscription. This change is very simple. But non-superusers cannot see useful information like subslotname because of privilege problem. I like #1, but any better approach? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Suppress Clang 3.9 warnings
Aleksander Alekseevwrites: > In theory - could we just always use our internal strl* implementations? Hmm, maybe configure's test to see if a declaration has been provided is going wrong? I notice that anchovy, which is supposedly current Arch Linux, doesn't think the platform has it: checking whether strlcat is declared... no checking whether strlcpy is declared... no ... checking for strlcat... no checking for strlcpy... no But that's using gcc. Perhaps clang behaves differently? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] mat views stats
I've come across a number of times where the statistics on materialized views become stale producing bad plans. It turns out that autovacuum only touches a materialized view when it is first created and ignores it on a refresh. When you have a materialized view like yesterdays_sales the data in the materialized view turns over every day. Attached is a patch to trigger autovacuum based on a matview refresh along with a system view pg_stat_all_matviews to show information more meaningful for materialized views. -- Jim diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index fad5cb0..ec27e2c 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -436,6 +436,21 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser + pg_stat_all_matviewspg_stat_all_matviews + + One row for each materialized view in the current database, showing statistics + about accesses to that specific materialized view. + See for details. + + + + + pg_stat_user_matviewspg_stat_user_matviews + Same as pg_stat_all_matviews, except that only + user materialized views are shown. + + + pg_statio_all_tablespg_statio_all_tables One row for each table in the current database, showing statistics @@ -2277,6 +2292,97 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i + + pg_stat_all_matviews View + + + + Column + Type + Description + + + + + + relid + oid + OID of this materialize view + + + schemaname + name + Name of the schema that this materialized view is in + + + relname + name + Name of this materialized view + + + seq_scan + bigint + Number of sequential scans initiated on this materialized view + + + seq_tup_read + bigint + Number of live rows fetched by sequential scans + + + idx_scan + bigint + Number of index scans initiated on this materialized ciew + + + idx_tup_fetch + bigint + Number of live rows fetched by index scans + + + last_refresh + timestamp with time zone + Last time at which this materialized view was refreshed + + + refresh_count + bigint + Number of times this materialized view has been refreshed + + + last_analyze + timestamp with time zone + Last time at which this materialized view was manually analyzed + + + last_autoanalyze + timestamp with time zone + Last time at which this materialized ciew was analyzed by the + autovacuum daemon + + + analyze_count + bigint + Number of times this materialized view has been manually analyzed + + + autoanalyze_count + bigint + Number of times this materialized view has been analyzed by the + autovacuum daemon + + + + + + + The pg_stat_all_matviews view will contain + one row for each materialized view in the current database, showing + statistics about accesses to that specific materialized view. The + pg_stat_user_matviews contain the same + information, but filtered to only show user materialized views. + + pg_statio_all_tables View diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c index 816483e..8d60d48 100644 --- a/src/backend/access/common/reloptions.c +++ b/src/backend/access/common/reloptions.c @@ -164,6 +164,15 @@ static relopt_int intRelOpts[] = }, -1, 0, INT_MAX }, +{ +{ +"autovacuum_analyze_refresh_threshold", +"Minimum number of materialized view refreshes prior to analyze", +RELOPT_KIND_HEAP, +ShareUpdateExclusiveLock +}, +-1, 0, INT_MAX +}, { { "autovacuum_vacuum_cost_delay", @@ -1283,6 +1292,8 @@ default_reloptions(Datum reloptions, bool validate, relopt_kind kind) offsetof(StdRdOptions, autovacuum) +offsetof(AutoVacOpts, vacuum_threshold)}, {"autovacuum_analyze_threshold", RELOPT_TYPE_INT, offsetof(StdRdOptions, autovacuum) +offsetof(AutoVacOpts, analyze_threshold)}, +{"autovacuum_analyze_refresh_threshold", RELOPT_TYPE_INT, +offsetof(StdRdOptions, autovacuum) +offsetof(AutoVacOpts, analyze_ref_threshold)}, {"autovacuum_vacuum_cost_delay", RELOPT_TYPE_INT, offsetof(StdRdOptions, autovacuum) +offsetof(AutoVacOpts, vacuum_cost_delay)}, {"autovacuum_vacuum_cost_limit", RELOPT_TYPE_INT, diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 38be9cf..07b9f1c 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -535,6 +535,28 @@ CREATE VIEW
Re: [HACKERS] Passing query string to workers
On Mon, Feb 20, 2017 at 10:11 AM, Rafia Sabihwrote: > On Sun, Feb 19, 2017 at 10:11 PM, Robert Haas wrote: >> + query_data = (char *) palloc0(strlen(estate->es_queryString) + 1); >> + strcpy(query_data, estate->es_queryString); >> >> It's unnecessary to copy the query string here; you're going to use it >> later on in the very same function. That code can just refer to >> estate->es_queryString directly. > > > Done. + char *query_data; + query_data = estate->es_sourceText; estate->es_sourceText is a const char* variable. Assigning this const pointer to a non-const pointer violates the rules constant-correctness. So, either you should change query_data as const char*, or as Robert suggested, you can directly use estate->es_sourceText. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] "may be unused" warnings for gcc
Hi, When building with a new-ish gcc (6.3.0 right now, but I've seen this for a while) with optimization I get a number of warnings: In file included from /home/andres/src/postgresql/src/include/postgres.h:48:0, from /home/andres/src/postgresql/src/backend/parser/parse_collate.c:41: /home/andres/src/postgresql/src/backend/parser/parse_collate.c: In function ‘select_common_collation’: /home/andres/src/postgresql/src/include/utils/elog.h:107:4: warning: ‘context.location2’ may be used uninitialized in this function [-Wmaybe-uninitialized] errfinish rest; \ ^ /home/andres/src/postgresql/src/backend/parser/parse_collate.c:210:28: note: ‘context.location2’ was declared here assign_collations_context context; ^~~ In file included from /home/andres/src/postgresql/src/include/postgres.h:48:0, from /home/andres/src/postgresql/src/backend/parser/parse_collate.c:41: /home/andres/src/postgresql/src/include/utils/elog.h:107:4: warning: ‘context.collation2’ may be used uninitialized in this function [-Wmaybe-uninitialized] errfinish rest; \ ^ /home/andres/src/postgresql/src/backend/parser/parse_collate.c:210:28: note: ‘context.collation2’ was declared here assign_collations_context context; ^~~ While I believe these are false positives, I am not surprised that the compiler can't see that. select_common_collation() initializes some fields of assign_collations_context, but not others. There's several branches out of assign_collations_walker that return without setting ocllation2/location2. I think that's currently harmless because it looks like select_common_collation() won't enter the context.strength == COLLATE_CONFLICT branch in that case - but it's certainly hard to see. In file included from /home/andres/src/postgresql/src/backend/commands/dbcommands.c:20:0: /home/andres/src/postgresql/src/backend/commands/dbcommands.c: In function ‘createdb’: /home/andres/src/postgresql/src/include/postgres.h:529:35: warning: ‘src_minmxid’ may be used uninitialized in this function [-Wmaybe-uninitialized] #define TransactionIdGetDatum(X) ((Datum) SET_4_BYTES((X))) ^ /home/andres/src/postgresql/src/backend/commands/dbcommands.c:113:14: note: ‘src_minmxid’ was declared here MultiXactId src_minmxid; ^~~ (and the same for src_frozenxid, src_lastsysoid, ...) It appears that the loop in get_db_info() is too complex for gcc. Replacing the !HeapTupleIsValid(tuple) break; with a heap_close() and direct return fixes those. /home/andres/src/postgresql/src/backend/utils/misc/guc.c: In function ‘RestoreGUCState’: /home/andres/src/postgresql/src/backend/utils/misc/guc.c:6619:21: warning: ‘varsourceline’ may be used uninitialized in this function [-Wmaybe-uninitialized] record->sourceline = sourceline; ~~~^~~~ /home/andres/src/postgresql/src/backend/utils/misc/guc.c:9279:8: note: ‘varsourceline’ was declared here int varsourceline; ^ Not sure what the problem is here - even if the varsourcefile[0] test in RestoreGUCState is stored in a local variable that's also checked before the set_config_sourcefile() branch, it warns. Initializing varsourceline to 0 works and seems reasonable. /home/andres/src/postgresql/src/backend/utils/adt/varlena.c: In function ‘text_position’: /home/andres/src/postgresql/src/backend/utils/adt/varlena.c:1358:36: warning: ‘state.skiptablemask’ may be used uninitialized in this function [-Wmaybe-uninitialized] hptr += state->skiptable[*hptr & skiptablemask]; ~~^~~ /home/andres/src/postgresql/src/backend/utils/adt/varlena.c:1099:20: note: ‘state.skiptablemask’ was declared here TextPositionState state; ^ /home/andres/src/postgresql/src/backend/utils/adt/varlena.c:1344:9: warning: ‘state.wstr2’ may be used uninitialized in this function [-Wmaybe-uninitialized] if (nptr == needle) ^ /home/andres/src/postgresql/src/backend/utils/adt/varlena.c:1099:20: note: ‘state.wstr2’ was declared here TextPositionState state; ^ /home/andres/src/postgresql/src/backend/utils/adt/varlena.c:1099:20: warning: ‘state.wstr1’ may be used uninitialized in this function [-Wmaybe-uninitialized] /home/andres/src/postgresql/src/backend/utils/adt/varlena.c:1288:9: warning: ‘state.str2’ may be used uninitialized in this function [-Wmaybe-uninitialized] if (nptr == needle) ^ /home/andres/src/postgresql/src/backend/utils/adt/varlena.c:1099:20: note: ‘state.str2’ was declared here TextPositionState state; ^ No idea what exactly triggers this, but zero-initializing TextPositionState helps. What confuses me is that doing so in text_position() is sufficient - the other uses don't trigger a warning?
Re: [HACKERS] [PATCH] Suppress Clang 3.9 warnings
In theory - could we just always use our internal strl* implementations? On Mon, Feb 20, 2017 at 09:26:44AM -0500, Tom Lane wrote: > Aleksander Alekseevwrites: > > I've just tried to build PostgreSQL with Clang 3.9.1 (default version > > currently available in Arch Linux) and noticed that it outputs lots of > > warning messages. Most of them are result of a bug in Clang itself: > > > > postinit.c:846:3: note: include the header or explicitly > > provide a declaration for 'strlcpy' > > It might be an incompatibility with the platform-supplied string.h > rather than an outright bug, but yeah, that's pretty annoying. > > > The rest of warnings looks more like something we could easily deal with: > > It's hard to get excited about these if there are going to be hundreds > of the other ones ... > > regards, tom lane -- Best regards, Aleksander Alekseev signature.asc Description: PGP signature
Re: [HACKERS] [PATCH] Suppress Clang 3.9 warnings
Aleksander Alekseevwrites: > I've just tried to build PostgreSQL with Clang 3.9.1 (default version > currently available in Arch Linux) and noticed that it outputs lots of > warning messages. Most of them are result of a bug in Clang itself: > > postinit.c:846:3: note: include the header or explicitly > provide a declaration for 'strlcpy' It might be an incompatibility with the platform-supplied string.h rather than an outright bug, but yeah, that's pretty annoying. > The rest of warnings looks more like something we could easily deal with: It's hard to get excited about these if there are going to be hundreds of the other ones ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] Suppress Clang 3.9 warnings
Hello. I've just tried to build PostgreSQL with Clang 3.9.1 (default version currently available in Arch Linux) and noticed that it outputs lots of warning messages. Most of them are result of a bug in Clang itself: ``` postinit.c:846:3: note: include the header or explicitly provide a declaration for 'strlcpy' ``` I've reported it to Clang developers almost a year ago but apparently no one cares. You can find all the details in a corresponding thread [1]. Frankly I'm not sure what to do about it. The rest of warnings looks more like something we could easily deal with: ``` xloginsert.c:742:18: warning: implicit conversion from 'int' to 'char' changes value from 253 to -3 [-Wconstant-conversion] ``` Patch that fixes these warnings is attached to this email. [1] http://lists.llvm.org/pipermail/cfe-dev/2016-March/048126.html -- Best regards, Aleksander Alekseev diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index f23e108628..0007010b25 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -5015,7 +5015,7 @@ BootStrapXLOG(void) record->xl_rmid = RM_XLOG_ID; recptr += SizeOfXLogRecord; /* fill the XLogRecordDataHeaderShort struct */ - *(recptr++) = XLR_BLOCK_ID_DATA_SHORT; + *(recptr++) = (char)XLR_BLOCK_ID_DATA_SHORT; *(recptr++) = sizeof(checkPoint); memcpy(recptr, , sizeof(checkPoint)); recptr += sizeof(checkPoint); diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c index 03b05f937f..ea8e915029 100644 --- a/src/backend/access/transam/xloginsert.c +++ b/src/backend/access/transam/xloginsert.c @@ -739,7 +739,7 @@ XLogRecordAssemble(RmgrId rmid, uint8 info, if ((curinsert_flags & XLOG_INCLUDE_ORIGIN) && replorigin_session_origin != InvalidRepOriginId) { - *(scratch++) = XLR_BLOCK_ID_ORIGIN; + *(scratch++) = (char)XLR_BLOCK_ID_ORIGIN; memcpy(scratch, _session_origin, sizeof(replorigin_session_origin)); scratch += sizeof(replorigin_session_origin); } @@ -749,13 +749,13 @@ XLogRecordAssemble(RmgrId rmid, uint8 info, { if (mainrdata_len > 255) { - *(scratch++) = XLR_BLOCK_ID_DATA_LONG; + *(scratch++) = (char)XLR_BLOCK_ID_DATA_LONG; memcpy(scratch, _len, sizeof(uint32)); scratch += sizeof(uint32); } else { - *(scratch++) = XLR_BLOCK_ID_DATA_SHORT; + *(scratch++) = (char)XLR_BLOCK_ID_DATA_SHORT; *(scratch++) = (uint8) mainrdata_len; } rdt_datas_last->next = mainrdata_head; diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c index 96b7097f8b..9165da1ee5 100644 --- a/src/bin/pg_resetwal/pg_resetwal.c +++ b/src/bin/pg_resetwal/pg_resetwal.c @@ -1100,7 +1100,7 @@ WriteEmptyXLOG(void) record->xl_rmid = RM_XLOG_ID; recptr += SizeOfXLogRecord; - *(recptr++) = XLR_BLOCK_ID_DATA_SHORT; + *(recptr++) = (char)XLR_BLOCK_ID_DATA_SHORT; *(recptr++) = sizeof(CheckPoint); memcpy(recptr, , sizeof(CheckPoint)); signature.asc Description: PGP signature
Re: [HACKERS] Replication vs. float timestamps is a disaster
Petr Jelinekwrites: > It's definitely not hard, we already have > IntegerTimestampToTimestampTz() which does the opposite conversion anyway. It's not the functions that are hard, it's making sure that you have used them in the correct places, and declared relevant variables with the appropriate types. Evidence on the ground is that that is very hard; I have little faith that we'll get it right without compiler support. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
Andres Freundwrites: > On 2017-02-20 11:58:12 +0100, Petr Jelinek wrote: >> That being said, I did wonder myself if we should just deprecate float >> timestamps as well. > I think we need a proper deprecation period for that, given that the > conversion away will be painful for pg_upgrade using people with big > clusters. So I think we should fix this regardless... :( The question to be asked is whether there is still anybody out there using float timestamps. I'm starting to get dubious about it myself. Certainly, no packager that I'm aware of has shipped a float-timestamp build since we switched the default in 8.4. Maybe there is somebody who's faithfully built a float-timestamp custom build every year so they can pg_upgrade in place from their 8.3 installation, but there have got to be darn few such people. As for "proper deprecation period", the documentation has called the option deprecated since 8.4: -disable-integer-datetimes Disable support for 64-bit integer storage for timestamps and intervals, and store datetime values as floating-point numbers instead. Floating-point datetime storage was the default in PostgreSQL releases prior to 8.4, but it is now deprecated, because it does not support microsecond precision for the full range of timestamp values. I think the strongest reason why we didn't move to kill it sooner was that we were not then assuming that every platform had 64-bit ints; but we've required that since 9.0. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] powerpc(32) point/polygon regression failures on Debian Jessie
Christoph Bergwrites: > The point/polygon regression tests have started to fail on 32-bit > powerpc on Debian Jessie. So far I could reproduce the problem with > PostgreSQL 9.4.10+11 and 9.6.1, on several different machines. Debian > unstable is unaffected. Hmph. We haven't touched that code in awhile, and certainly not in the 9.4.x branch. I'd have to agree that this must be a toolchain change. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Error in XML recv function
Hi When I did tests of own XML import functions I found a strange error. I successfully imported XML to PostgreSQL. This document is readable without any visual defects. But when I tested this document against any libxml2 function I found a error - ERROR: could not parse XML document DETAIL: input conversion failed due to input error, bytes 0x88 0x3C 0x2F 0x72 line 1: switching encoding: encoder error When I debug this document I found a inconsistency between encoding info and data. Recv correctly ensure encoding from cp1250 encoding to UTF8. But the encoding info stays without any change and shows cp1250 still. Then libxml2 functions fails. Regards Pavel ò -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to read a value when it is VARATT EXTERNAL ONDISK from logical replication decoder
Hi, On 2017-02-20 11:44:58 +0100, Adam Dratwiński wrote: > Hello everyone, > > I am writing a custom logical replication decoder, and I took test decoder > from Postgres sources as an example. > > Could anyone tell me how to read "unchanged toast datum" in case it is > VARTT_IS_EXTERNAL_ONDISK. You can't by default. It's simply not available data - we'd have to log a lot more data to make that fully available. If you change the replica identity settings to FULL, the full old data (including toasted data) will be logged. Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] How to read a value when it is VARATT EXTERNAL ONDISK from logical replication decoder
Hello everyone, I am writing a custom logical replication decoder, and I took test decoder from Postgres sources as an example. Could anyone tell me how to read "unchanged toast datum" in case it is VARTT_IS_EXTERNAL_ONDISK. In the test decoder it is ignored: https://github.com/postgres/postgres/blob/master/contrib/test_decoding/test_decoding.c#L377 Cheers Adam
[HACKERS] powerpc(32) point/polygon regression failures on Debian Jessie
The point/polygon regression tests have started to fail on 32-bit powerpc on Debian Jessie. So far I could reproduce the problem with PostgreSQL 9.4.10+11 and 9.6.1, on several different machines. Debian unstable is unaffected. The failure looks like this: https://buildd.debian.org/status/fetch.php?pkg=postgresql-9.6=powerpc=9.6.1-2~bpo8%2B1=1485184696=0 build/src/test/regress/regression.diffs *** /«PKGBUILDDIR»/build/../src/test/regress/expected/point.out Mon Oct 24 20:08:51 2016 --- /«PKGBUILDDIR»/build/src/test/regress/results/point.out Mon Jan 23 15:17:51 2017 *** *** 125,131 | (-3,4) |5 | (-10,0)| 10 | (-5,-12) | 13 ! | (10,10)| 14.142135623731 | (5.1,34.5) | 34.8749193547455 (6 rows) --- 125,131 | (-3,4) |5 | (-10,0)| 10 | (-5,-12) | 13 ! | (10,10)| 14.1421356237309 | (5.1,34.5) | 34.8749193547455 (6 rows) *** *** 150,157 | (-5,-12) | (-10,0)| 13 | (-5,-12) | (0,0) | 13 | (0,0) | (-5,-12) | 13 !| (0,0) | (10,10)| 14.142135623731 !| (10,10)| (0,0) | 14.142135623731 | (-3,4) | (10,10)| 14.3178210632764 | (10,10)| (-3,4) | 14.3178210632764 | (-5,-12) | (-3,4) | 16.1245154965971 --- 150,157 | (-5,-12) | (-10,0)| 13 | (-5,-12) | (0,0) | 13 | (0,0) | (-5,-12) | 13 !| (0,0) | (10,10)| 14.1421356237309 !| (10,10)| (0,0) | 14.1421356237309 | (-3,4) | (10,10)| 14.3178210632764 | (10,10)| (-3,4) | 14.3178210632764 | (-5,-12) | (-3,4) | 16.1245154965971 *** *** 221,227 | (-10,0)| (0,0) | 10 | (-10,0)| (-5,-12) | 13 | (-5,-12) | (0,0) | 13 ! | (0,0) | (10,10)| 14.142135623731 | (-3,4) | (10,10)| 14.3178210632764 | (-5,-12) | (-3,4) | 16.1245154965971 | (-10,0)| (10,10)| 22.3606797749979 --- 221,227 | (-10,0)| (0,0) | 10 | (-10,0)| (-5,-12) | 13 | (-5,-12) | (0,0) | 13 ! | (0,0) | (10,10)| 14.1421356237309 | (-3,4) | (10,10)| 14.3178210632764 | (-5,-12) | (-3,4) | 16.1245154965971 | (-10,0)| (10,10)| 22.3606797749979 == *** /«PKGBUILDDIR»/build/../src/test/regress/expected/polygon.out Mon Oct 24 20:08:51 2016 --- /«PKGBUILDDIR»/build/src/test/regress/results/polygon.out Mon Jan 23 15:17:51 2017 *** *** 222,229 '(2,2)'::point <-> '((0,0),(1,4),(3,1))'::polygon as inside, '(3,3)'::point <-> '((0,2),(2,0),(2,2))'::polygon as near_corner, '(4,4)'::point <-> '((0,0),(0,3),(4,0))'::polygon as near_segment; ! on_corner | on_segment | inside | near_corner | near_segment ! ---+++-+-- ! 0 | 0 | 0 | 1.4142135623731 | 3.2 (1 row) --- 222,229 '(2,2)'::point <-> '((0,0),(1,4),(3,1))'::polygon as inside, '(3,3)'::point <-> '((0,2),(2,0),(2,2))'::polygon as near_corner, '(4,4)'::point <-> '((0,0),(0,3),(4,0))'::polygon as near_segment; ! on_corner | on_segment | inside | near_corner| near_segment ! ---+++--+-- ! 0 | 0 | 0 | 1.41421356237309 | 3.2 (1 row) The 9.4.11 log contains the same point.out diff, but not polygon.out: https://buildd.debian.org/status/fetch.php?pkg=postgresql-9.4=powerpc=9.4.11-0%2Bdeb8u1=1487517299=0 Does that ring any bell? As Debian unstable is unaffected, it's likely the toolchain to be blamed, but it worked for Debian Jessie before. Christoph -- Senior Berater, Tel.: +49 2166 9901 187 credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer pgp fingerprint: 5C48 FE61 57F4 9179 5970 87C6 4C5A 6BAB 12D2 A7AE -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dropping partitioned tables without CASCADE
Thanks for working on all the follow on work for partitioning feature. May be you should add all those patches in the next commitfest, so that we don't forget those. On Mon, Feb 20, 2017 at 7:46 AM, Amit Langotewrote: > Re-posting the patch I posted in a nearby thread [0]. > > On 2017/02/16 2:08, Robert Haas wrote: >> On Wed, Feb 15, 2017 at 11:34 AM, Alvaro Herrera >> wrote: >>> I think new-style partitioning is supposed to consider each partition as >>> an implementation detail of the table; the fact that you can manipulate >>> partitions separately does not really mean that they are their own >>> independent object. You don't stop to think "do I really want to drop >>> the TOAST table attached to this main table?" and attach a CASCADE >>> clause if so. You just drop the main table, and the toast one is >>> dropped automatically. I think new-style partitions should behave >>> equivalently. >> >> That's a reasonable point of view. I'd like to get some more opinions >> on this topic. I'm happy to have us do whatever most people want, but >> I'm worried that having table inheritance and table partitioning work >> differently will be create confusion. I'm also suspicious that there >> may be some implementation difficulties. On the hand, it does seem a >> little silly to say that DROP TABLE partitioned_table should always >> fail except in the degenerate case where there are no partitions, so >> maybe changing it is for the best. > > So I count more than a few votes saying that we should be able to DROP > partitioned tables without specifying CASCADE. > > I tried to implement that using the attached patch by having > StoreCatalogInheritance1() create DEPENDENCY_AUTO dependency between > parent and child if the child is a partition, instead of DEPENDENCY_NORMAL > that would otherwise be created. Now it seems that that is one way of > making sure that partitions are dropped when the root partitioned table is > dropped, not sure if the best; why create the pg_depend entries at all one > might ask. I chose it for now because that's the one with fewest lines of > change. Adjusted regression tests as well, since we recently tweaked > tests [1] to work around the irregularities of test output when using CASCADE. The patch applies cleanly and regression does not show any failures. Here are some comments For the sake of readability you may want reverse the if and else order. -recordDependencyOn(, , DEPENDENCY_NORMAL); +if (!child_is_partition) +recordDependencyOn(, , DEPENDENCY_NORMAL); +else +recordDependencyOn(, , DEPENDENCY_AUTO); like +if (child_is_partition) +recordDependencyOn(, , DEPENDENCY_AUTO); +else +recordDependencyOn(, , DEPENDENCY_NORMAL); It's weird that somebody can perform DROP TABLE on the partition without referring to its parent. That may be a useful feature as it allows one to detach the partition as well as remove the table in one command. But it looks wierd for someone familiar with partitioning features of other DBMSes. But then our partition creation command is CREATE TABLE So may be this is expected difference. --- cleanup: avoid using CASCADE -DROP TABLE list_parted, part_1; -DROP TABLE list_parted2, part_2, part_5, part_5_a; -DROP TABLE range_parted, part1, part2; +-- cleanup +DROP TABLE list_parted, list_parted2, range_parted; Testcases usually drop one table at a time, I guess, to reduce the differences when we add or remove tables from testcases. All such blocks should probably follow same policy. drop table list_parted cascade; -NOTICE: drop cascades to 3 other objects -DETAIL: drop cascades to table part_ab_cd probably we should remove cascade from there, unless you are testing CASCADE functionality. Similarly for other blocks like drop table range_parted cascade; BTW, I noticed that although we are allowing foreign tables to be partitions, there are no tests in foreign_data.sql for testing it. If there would have been we would tests DROP TABLE on a partitioned table with foreign partitions as well. That file has testcases for testing foreign table inheritance, and should have tests for foreign table partitions. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SCRAM authentication, take three
And a few more things I've noticed after a closer look: * build_client_first_message does not free `state->client_nonce` if second malloc (for `buf`) fails * same for `state->client_first_message_bare` * ... and most other procedures declared in fe-auth-scram.c file (see malloc and strdup calls) * scram_Normalize doesn't check malloc return value Sorry for lots of emails. On Mon, Feb 20, 2017 at 03:15:14PM +0300, Aleksander Alekseev wrote: > Speaking about flaws, it looks like there is a memory leak in > array_to_utf procedure - result is allocated twice. > > On Mon, Feb 20, 2017 at 02:51:13PM +0300, Aleksander Alekseev wrote: > > Hi! > > > > Currently I don't see any significant flaws in these patches. However I > > would like to verify that implemented algorithms are compatible with > > algorithms implemented by third party. > > > > For instance, for user 'eax' and password 'pass' I got the following > > record in pg_authid: > > > > ``` > > scram-sha-256: > > xtznkRN/nc/1DQ==: > > 4096: > > 2387c124a3139a276b848c910f43ece05dd670d0977ace4f20d724af522312e4: > > 5e3bdd6584880198b0375acabd8754c460af2389499f71a756660a10a8aaa843 > > ``` > > > > Let's say I would like to implement SCRAM in pure Python, for instance > > add it to pg8000 driver. Firstly I need to know how to generate server > > key and client key having only user name and password. Somehow like > > this: > > > > ``` > > >>> import base64 > > >>> import hashlib > > >>> base64.b16encode(hashlib.pbkdf2_hmac('sha256', b'pass', > > ...base64.b64decode(b'xtznkRN/nc/1DQ=='), 4096)) > > b'14B90CFCF690120399A0E6D30C60DD9D9D221CD9C2E31EA0A00514C41823E6C3' > > ``` > > > > Naturally it doesn't work because both SCRAM_SERVER_KEY_NAME and > > SCRAM_CLIENT_KEY_NAME should also be involved. I see PostgreSQL > > implementation just in front of me but unfortunately I'm still having > > problems calculating exactly the same server key and client key. It makes > > me worry whether PostgreSQL implementation is OK. > > > > Could you please give an example of how to do it? > > > > On Mon, Feb 20, 2017 at 03:29:19PM +0900, Michael Paquier wrote: > > > On Sun, Feb 19, 2017 at 10:07 PM, Michael Paquier > > >wrote: > > > > There is something that I think is still unwelcome in this patch: the > > > > interface in pg_hba.conf. I mentioned that in the previous thread but > > > > now if you want to match a user and a database with a scram password > > > > you need to do that with the current set of patches: > > > > local $dbname $user scram > > > > That's not really portable as SCRAM is one protocol in the SASL > > > > family, and even worse in our case we use SCRAM-SHA-256. I'd like to > > > > change pg_hba.conf to be as follows: > > > > local $dbname $user sasl protocol=scram_sha_256 > > > > This is extensible for the future, and protocol is a mandatory option > > > > that would have now just a single value: scram_sha_256. Heikki, > > > > others, are you fine with that? > > > > > > I have implemented that as 0009 which is attached, and need to be > > > applied on the rest of upthread. I am not sure if we want to make the > > > case where no protocol is specified map to everything. This would be a > > > tricky support for users in the future if new authentication > > > mechanisms for SASL are added in the future. > > > > > > Another issue that I have is: do we really want to have > > > password_encryption being set to "scram" for verifiers of > > > SCRAM-SHA-256? I would think that scram_sha_256 makes the most sense. > > > Who knows, perhaps there could be in a couple of years a SHA-SHA-512.. > > > > > > At the same time, attached is a new version of 0008 that implements > > > SASLprep, I have stabilized the beast after fixing some allocation > > > calculations when converting the decomposed pg_wchar array back to a > > > utf8 string. > > > -- > > > Michael > > > > -- > > Best regards, > > Aleksander Alekseev > > > > -- > Best regards, > Aleksander Alekseev -- Best regards, Aleksander Alekseev signature.asc Description: PGP signature
Re: [HACKERS] 答复: [HACKERS] Adding new output parameter of pg_stat_statements toidentify operation of the query.(Internet mail)
=?gb2312?B?amFzb255c2xpKMDu1L7JrSk=?=writes: > Yes, it seems the pg_stat_sql function can fit the individual need of > collecting tags of query. However the new function can not return other > values of query at the same time, such as block number info, run time and so > on. Returning these values at the same time are very important. Meh ... that seems like a justification that was made up on the fly, rather than being a demonstrable requirement. What's the specific use case that requires it? After thinking about this some more I'm really pretty dubious that storing the command tag will get you anything you can't get as well or better by looking at the first word or two of the query text. I don't believe the original claim that doing so is "too expensive for a monitoring system". It does not take that much to pull out a couple of whitespace-separated keywords and perhaps case-fold them, and by the time you've fetched the string out of the database you've spent way more cycles than that. A slightly better argument is that WITH will confuse matters, but really it does anyway: consider WITH x AS (INSERT INTO ... RETURNING *) SELECT * FROM x; This will get a command tag of "SELECT", but any reasonable person would deem that this would be better characterized as an INSERT. So I think if you want useful results for WITH cases you're going to need to spend effort looking at the querystring anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should we cacheline align PGXACT?
On 15 February 2017 at 19:15, Andres Freundwrote: > I think I previously > mentioned, even just removing the MyPgXact->xmin assignment in > SnapshotResetXmin() is measurable performance wise and cache-hit ratio > wise. Currently, we issue SnapshotResetXmin() pointlessly at end of xact, so patch attached to remove that call, plus some comments to explain that. This reduces the cause. Also, another patch to reduce the calls to SnapshotResetXmin() using a simple heuristic to reduce the effects. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services skip_SnapshotResetXmin_if_idle_timeout.v1.patch Description: Binary data reduce_pgxact_access_AtEOXact.v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SCRAM authentication, take three
Speaking about flaws, it looks like there is a memory leak in array_to_utf procedure - result is allocated twice. On Mon, Feb 20, 2017 at 02:51:13PM +0300, Aleksander Alekseev wrote: > Hi! > > Currently I don't see any significant flaws in these patches. However I > would like to verify that implemented algorithms are compatible with > algorithms implemented by third party. > > For instance, for user 'eax' and password 'pass' I got the following > record in pg_authid: > > ``` > scram-sha-256: > xtznkRN/nc/1DQ==: > 4096: > 2387c124a3139a276b848c910f43ece05dd670d0977ace4f20d724af522312e4: > 5e3bdd6584880198b0375acabd8754c460af2389499f71a756660a10a8aaa843 > ``` > > Let's say I would like to implement SCRAM in pure Python, for instance > add it to pg8000 driver. Firstly I need to know how to generate server > key and client key having only user name and password. Somehow like > this: > > ``` > >>> import base64 > >>> import hashlib > >>> base64.b16encode(hashlib.pbkdf2_hmac('sha256', b'pass', > ...base64.b64decode(b'xtznkRN/nc/1DQ=='), 4096)) > b'14B90CFCF690120399A0E6D30C60DD9D9D221CD9C2E31EA0A00514C41823E6C3' > ``` > > Naturally it doesn't work because both SCRAM_SERVER_KEY_NAME and > SCRAM_CLIENT_KEY_NAME should also be involved. I see PostgreSQL > implementation just in front of me but unfortunately I'm still having > problems calculating exactly the same server key and client key. It makes > me worry whether PostgreSQL implementation is OK. > > Could you please give an example of how to do it? > > On Mon, Feb 20, 2017 at 03:29:19PM +0900, Michael Paquier wrote: > > On Sun, Feb 19, 2017 at 10:07 PM, Michael Paquier > >wrote: > > > There is something that I think is still unwelcome in this patch: the > > > interface in pg_hba.conf. I mentioned that in the previous thread but > > > now if you want to match a user and a database with a scram password > > > you need to do that with the current set of patches: > > > local $dbname $user scram > > > That's not really portable as SCRAM is one protocol in the SASL > > > family, and even worse in our case we use SCRAM-SHA-256. I'd like to > > > change pg_hba.conf to be as follows: > > > local $dbname $user sasl protocol=scram_sha_256 > > > This is extensible for the future, and protocol is a mandatory option > > > that would have now just a single value: scram_sha_256. Heikki, > > > others, are you fine with that? > > > > I have implemented that as 0009 which is attached, and need to be > > applied on the rest of upthread. I am not sure if we want to make the > > case where no protocol is specified map to everything. This would be a > > tricky support for users in the future if new authentication > > mechanisms for SASL are added in the future. > > > > Another issue that I have is: do we really want to have > > password_encryption being set to "scram" for verifiers of > > SCRAM-SHA-256? I would think that scram_sha_256 makes the most sense. > > Who knows, perhaps there could be in a couple of years a SHA-SHA-512.. > > > > At the same time, attached is a new version of 0008 that implements > > SASLprep, I have stabilized the beast after fixing some allocation > > calculations when converting the decomposed pg_wchar array back to a > > utf8 string. > > -- > > Michael > > -- > Best regards, > Aleksander Alekseev -- Best regards, Aleksander Alekseev signature.asc Description: PGP signature
Re: [HACKERS] Replication vs. float timestamps is a disaster
On 20/02/17 12:04, Andres Freund wrote: > On 2017-02-20 11:58:12 +0100, Petr Jelinek wrote: >> That being said, I did wonder myself if we should just deprecate float >> timestamps as well. > > I think we need a proper deprecation period for that, given that the > conversion away will be painful for pg_upgrade using people with big > clusters. So I think we should fix this regardless... :( > That's a good point. Attached should fix the logical replication problems. I am not quite sure if there is anything in physical that needs changing. I opted for GetCurrentIntegerTimestamp() in the reply code as that's the same coding walreceiver uses. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c index 142cd99..dd5bdcc 100644 --- a/src/backend/replication/logical/proto.c +++ b/src/backend/replication/logical/proto.c @@ -46,7 +46,7 @@ logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn) /* fixed fields */ pq_sendint64(out, txn->final_lsn); - pq_sendint64(out, txn->commit_time); + pq_sendint64(out, TimestampTzToIntegerTimestamp(txn->commit_time)); pq_sendint(out, txn->xid, 4); } @@ -60,7 +60,7 @@ logicalrep_read_begin(StringInfo in, LogicalRepBeginData *begin_data) begin_data->final_lsn = pq_getmsgint64(in); if (begin_data->final_lsn == InvalidXLogRecPtr) elog(ERROR, "final_lsn not set in begin message"); - begin_data->committime = pq_getmsgint64(in); + begin_data->committime = IntegerTimestampToTimestampTz(pq_getmsgint64(in)); begin_data->xid = pq_getmsgint(in, 4); } @@ -82,7 +82,7 @@ logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn, /* send fields */ pq_sendint64(out, commit_lsn); pq_sendint64(out, txn->end_lsn); - pq_sendint64(out, txn->commit_time); + pq_sendint64(out, TimestampTzToIntegerTimestamp(txn->commit_time)); } /* @@ -100,7 +100,7 @@ logicalrep_read_commit(StringInfo in, LogicalRepCommitData *commit_data) /* read fields */ commit_data->commit_lsn = pq_getmsgint64(in); commit_data->end_lsn = pq_getmsgint64(in); - commit_data->committime = pq_getmsgint64(in); + commit_data->committime = IntegerTimestampToTimestampTz(pq_getmsgint64(in)); } /* diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c index 0b19fec..225ea4c 100644 --- a/src/backend/replication/logical/worker.c +++ b/src/backend/replication/logical/worker.c @@ -1183,7 +1183,7 @@ send_feedback(XLogRecPtr recvpos, bool force, bool requestReply) pq_sendint64(reply_message, recvpos); /* write */ pq_sendint64(reply_message, flushpos); /* flush */ pq_sendint64(reply_message, writepos); /* apply */ - pq_sendint64(reply_message, now); /* sendTime */ + pq_sendint64(reply_message, GetCurrentIntegerTimestamp()); /* sendTime */ pq_sendbyte(reply_message, requestReply); /* replyRequested */ elog(DEBUG2, "sending feedback (force %d) to recv %X/%X, write %X/%X, flush %X/%X", diff --git a/src/backend/utils/adt/timestamp.c b/src/backend/utils/adt/timestamp.c index 9b4c012..1dd469d 100644 --- a/src/backend/utils/adt/timestamp.c +++ b/src/backend/utils/adt/timestamp.c @@ -1749,6 +1749,20 @@ IntegerTimestampToTimestampTz(int64 timestamp) #endif /* + * TimestampTzToIntegerTimestamp -- convert a native format timestamp to int64 + * + * When compiled with --enable-integer-datetimes, this is implemented as a + * no-op macro. + */ +#ifndef HAVE_INT64_TIMESTAMP +int64 +TimestampTzToIntegerTimestamp(TimestampTz timestamp) +{ + return timestamp * USECS_PER_SEC; +} +#endif + +/* * GetSQLCurrentTimestamp -- implements CURRENT_TIMESTAMP, CURRENT_TIMESTAMP(n) */ TimestampTz diff --git a/src/include/utils/timestamp.h b/src/include/utils/timestamp.h index 21651b1..765fa81 100644 --- a/src/include/utils/timestamp.h +++ b/src/include/utils/timestamp.h @@ -108,9 +108,11 @@ extern bool TimestampDifferenceExceeds(TimestampTz start_time, #ifndef HAVE_INT64_TIMESTAMP extern int64 GetCurrentIntegerTimestamp(void); extern TimestampTz IntegerTimestampToTimestampTz(int64 timestamp); +extern int64 TimestampTzToIntegerTimestamp(TimestampTz timestamp); #else #define GetCurrentIntegerTimestamp() GetCurrentTimestamp() #define IntegerTimestampToTimestampTz(timestamp) (timestamp) +#define TimestampTzToIntegerTimestamp(timestamp) (timestamp) #endif extern TimestampTz time_t_to_timestamptz(pg_time_t tm); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SCRAM authentication, take three
Hi! Currently I don't see any significant flaws in these patches. However I would like to verify that implemented algorithms are compatible with algorithms implemented by third party. For instance, for user 'eax' and password 'pass' I got the following record in pg_authid: ``` scram-sha-256: xtznkRN/nc/1DQ==: 4096: 2387c124a3139a276b848c910f43ece05dd670d0977ace4f20d724af522312e4: 5e3bdd6584880198b0375acabd8754c460af2389499f71a756660a10a8aaa843 ``` Let's say I would like to implement SCRAM in pure Python, for instance add it to pg8000 driver. Firstly I need to know how to generate server key and client key having only user name and password. Somehow like this: ``` >>> import base64 >>> import hashlib >>> base64.b16encode(hashlib.pbkdf2_hmac('sha256', b'pass', ...base64.b64decode(b'xtznkRN/nc/1DQ=='), 4096)) b'14B90CFCF690120399A0E6D30C60DD9D9D221CD9C2E31EA0A00514C41823E6C3' ``` Naturally it doesn't work because both SCRAM_SERVER_KEY_NAME and SCRAM_CLIENT_KEY_NAME should also be involved. I see PostgreSQL implementation just in front of me but unfortunately I'm still having problems calculating exactly the same server key and client key. It makes me worry whether PostgreSQL implementation is OK. Could you please give an example of how to do it? On Mon, Feb 20, 2017 at 03:29:19PM +0900, Michael Paquier wrote: > On Sun, Feb 19, 2017 at 10:07 PM, Michael Paquier >wrote: > > There is something that I think is still unwelcome in this patch: the > > interface in pg_hba.conf. I mentioned that in the previous thread but > > now if you want to match a user and a database with a scram password > > you need to do that with the current set of patches: > > local $dbname $user scram > > That's not really portable as SCRAM is one protocol in the SASL > > family, and even worse in our case we use SCRAM-SHA-256. I'd like to > > change pg_hba.conf to be as follows: > > local $dbname $user sasl protocol=scram_sha_256 > > This is extensible for the future, and protocol is a mandatory option > > that would have now just a single value: scram_sha_256. Heikki, > > others, are you fine with that? > > I have implemented that as 0009 which is attached, and need to be > applied on the rest of upthread. I am not sure if we want to make the > case where no protocol is specified map to everything. This would be a > tricky support for users in the future if new authentication > mechanisms for SASL are added in the future. > > Another issue that I have is: do we really want to have > password_encryption being set to "scram" for verifiers of > SCRAM-SHA-256? I would think that scram_sha_256 makes the most sense. > Who knows, perhaps there could be in a couple of years a SHA-SHA-512.. > > At the same time, attached is a new version of 0008 that implements > SASLprep, I have stabilized the beast after fixing some allocation > calculations when converting the decomposed pg_wchar array back to a > utf8 string. > -- > Michael -- Best regards, Aleksander Alekseev signature.asc Description: PGP signature
[HACKERS] pg_monitor role
Further to the patch I just submitted (https://www.postgresql.org/message-id/CA%2BOCxow-X%3DD2fWdKy%2BHP%2BvQ1LtrgbsYQ%3DCshzZBqyFT5jOYrFw%40mail.gmail.com) I'd like to propose the addition of a default role, pg_monitor. The intent is to make it easy for users to setup a role for fully monitoring their servers, without requiring superuser level privileges which is a problem for many users working within strict security policies. At present, functions or system config info that divulge any installation path related info typically require superuser privileges. This makes monitoring for unexpected changes in configuration or filesystem level monitoring (e.g. checking for large numbers of WAL files or log file info) impossible for non-privileged roles. A similar example is the restriction on the pg_stat_activity.query column, which prevents non-superusers seeing any query strings other than their own. Using ACLs is a problem for a number of reasons: - Users often don't like their database schemas to be modified (cluttered with GRANTs). - ACL modifications would potentially have to be made in every database in a cluster. - Using a pre-defined role minimises the setup that different tools would have to require. - Not all functionality has an ACL (e.g. SHOW) Other DBMSs solve this problem in a similar way. Initially I would propose that permission be granted to the role to: - Execute pg_ls_logdir() and pg_ls_waldir() - Read pg_stat_activity, including the query column for all queries. - Allow "SELECT pg_tablespace_size('pg_global')" - Read all GUCs In the future I would also like to see us add additional roles for system administration functions, for example, a backup operator role that would have the appropriate rights to make and restore backups. Comments? -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_ls_waldir() & pg_ls_logdir()
Hi Following various conversations on list and in person, including the developer meeting in Brussels earlier this month, here is a patch that implements pg_ls_logdir() and pg_ls_waldir() functions. The ultimate aim of this (and followup work I'll be doing) is to provide functionality to enable monitoring of PostgreSQL without requiring a user with superuser permissions as many of us have users for whom security policies prevent this or make it very difficult. In order to achieve that, there are various pieces of functionality such as pg_ls_dir() that need to have superuser checks removed to allow permissions to be granted to a monitoring role. There were objections in previous discussions to doing this with such generic functions, hence this patch which adds two narrowly focussed functions to allow tools to monitor the contents of the log and WAL directories. Neither function has a hard-coded superuser check, but have ACLs that prevent public execution by default. Patch includes the code and doc updates. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index d7738b18b7..ecd17a3528 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -19360,6 +19360,24 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup()); +pg_ls_logdir() + + setof record + +List the name, last modification time and size of files in the log directory. + + + + +pg_ls_waldir() + + setof record + +List the name, last modification time and size of files in the WAL directory. + + + + pg_read_file(filename text [, offset bigint, length bigint [, missing_ok boolean] ]) text @@ -19390,7 +19408,7 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup()); -All of these functions take an optional missing_ok parameter, +Some of these functions take an optional missing_ok parameter, which specifies the behavior when the file or directory does not exist. If true, the function returns NULL (except pg_ls_dir, which returns an empty result set). If diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 38be9cf1a0..4b67102439 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -1096,3 +1096,6 @@ REVOKE EXECUTE ON FUNCTION pg_stat_reset() FROM public; REVOKE EXECUTE ON FUNCTION pg_stat_reset_shared(text) FROM public; REVOKE EXECUTE ON FUNCTION pg_stat_reset_single_table_counters(oid) FROM public; REVOKE EXECUTE ON FUNCTION pg_stat_reset_single_function_counters(oid) FROM public; + +REVOKE EXECUTE ON FUNCTION pg_ls_logdir() FROM public; +REVOKE EXECUTE ON FUNCTION pg_ls_waldir() FROM public; diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c index 66d09bcb0c..a8cdf3bcbf 100644 --- a/src/backend/utils/adt/misc.c +++ b/src/backend/utils/adt/misc.c @@ -15,12 +15,14 @@ #include "postgres.h" #include +#include #include #include #include #include #include "access/sysattr.h" +#include "access/xlog_internal.h" #include "catalog/pg_authid.h" #include "catalog/catalog.h" #include "catalog/pg_tablespace.h" @@ -46,6 +48,8 @@ #define atooid(x) ((Oid) strtoul((x), NULL, 10)) +/* Generic function to return a directory listing of files */ +Datum pg_ls_dir_files(FunctionCallInfo fcinfo, char *dir); /* * Common subroutine for num_nulls() and num_nonnulls(). @@ -892,3 +896,109 @@ parse_ident(PG_FUNCTION_ARGS) PG_RETURN_DATUM(makeArrayResult(astate, CurrentMemoryContext)); } + + +typedef struct +{ + char*location; + DIR *dirdesc; +} directory_fctx; + +/* Generic function to return a directory listing of files */ +Datum +pg_ls_dir_files(FunctionCallInfo fcinfo, char *dir) +{ + FuncCallContext *funcctx; + struct dirent *de; + directory_fctx *fctx; + + if (SRF_IS_FIRSTCALL()) + { + MemoryContext oldcontext; + TupleDesc tupdesc; + + funcctx = SRF_FIRSTCALL_INIT(); + oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx); + + fctx = palloc(sizeof(directory_fctx)); + + tupdesc = CreateTemplateTupleDesc(3, false); + TupleDescInitEntry(tupdesc, (AttrNumber) 1, "mtime", + TIMESTAMPTZOID, -1, 0); + TupleDescInitEntry(tupdesc, (AttrNumber) 2, "size", + INT8OID, -1, 0); + TupleDescInitEntry(tupdesc, (AttrNumber) 3, "file", + TEXTOID, -1, 0); + + funcctx->attinmeta =
Re: [HACKERS] Replication vs. float timestamps is a disaster
On 2017-02-20 11:58:12 +0100, Petr Jelinek wrote: > That being said, I did wonder myself if we should just deprecate float > timestamps as well. I think we need a proper deprecation period for that, given that the conversion away will be painful for pg_upgrade using people with big clusters. So I think we should fix this regardless... :( -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
On 20/02/17 08:03, Andres Freund wrote: > On 2017-02-19 10:49:29 -0500, Tom Lane wrote: >> Robert Haaswrites: >>> On Sun, Feb 19, 2017 at 3:31 AM, Tom Lane wrote: Thoughts? Should we double down on trying to make this work according to the "all integer timestamps" protocol specs, or cut our losses and change the specs? >> >>> I vote for doubling down. It's bad enough that we have so many >>> internal details that depend on this setting; letting that cascade >>> into the wire protocol seems like it's just letting the chaos spread >>> farther and wider. >> >> How do you figure that it's not embedded in the wire protocol already? >> Not only the replicated data for a timestamp column, but also the >> client-visible binary I/O format, depend on this. I think having some >> parts of the protocol use a different timestamp format than other parts >> is simply weird, and as this exercise has shown, it's bug-prone as all >> get out. > > I don't think it's that closely tied together atm. Things like > pg_basebackup, pg_receivexlog etc should work, without having to match > timestamp storage. Logical replication, unless your output plugin dumps > data in binary / "raw" output, also works just fine across the timestamp > divide. > > It doesn't sound that hard to add a SystemToIntTimestamp() function, > given it only needs to do something if float timestamps are enabled? > It's definitely not hard, we already have IntegerTimestampToTimestampTz() which does the opposite conversion anyway. That being said, I did wonder myself if we should just deprecate float timestamps as well. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wait events for disk I/O
My colleague Rahila reported compilation issue with the patch. Issue was only coming with we do the clean build on the branch. Fixed the same into latest version of patch. Thanks, On Tue, Jan 31, 2017 at 11:09 AM, Rushabh Lathiawrote: > > > On Tue, Jan 31, 2017 at 8:54 AM, Michael Paquier < > michael.paqu...@gmail.com> wrote: > >> On Mon, Jan 30, 2017 at 10:01 PM, Rushabh Lathia >> wrote: >> > Attached is the patch, which extend the existing wait event >> infrastructure >> > to implement the wait events for the disk I/O. Basically >> pg_stat_activity's >> > wait event information to show data about disk I/O as well as IPC >> primitives. >> > >> > Implementation details: >> > >> > - Added PG_WAIT_IO to pgstat.h and a new enum WaitEventIO >> > - Added a wait_event_info argument to FileRead, FileWrite, FilePrefetch, >> > FileWriteback, FileSync, and FileTruncate. Set this wait event just >> before >> > performing the file system operation and clear it just after. >> > - Pass down an appropriate wait event from caller of any of those >> > functions. >> > - Also set and clear a wait event around standalone calls to read(), >> > write(), fsync() in other parts of the system. >> > - Added documentation for all newly added wait event. >> >> Looks neat, those are unlikely to overlap with other wait events. >> > > Thanks. > > >> >> > Open issue: >> > - Might missed few standalone calls to read(), write(), etc which need >> > to pass the wait_event_info. >> >> It may be an idea to use LD_PRELOAD with custom versions of read(), >> write() and fsync(), and look at the paths where no flags are set in >> MyProc->wait_event_info, and log information when that happens. >> >> > Yes, may be I will try this. > > >> > Thanks to my colleague Robert Haas for his help in design. >> > Please let me know your thought, and thanks for reading. >> >> Did you consider a wrapper of the type pg_read_event() or >> pg_write_event(), etc.? >> > > I thought on that, but eventually stick to this approach as it looks > more neat and uniform with other wait event implementation. > > > >> -- >> Michael >> > > > > Thanks, > Rushabh Lathia > www.EnterpriseDB.com > -- Rushabh Lathia wait_event_for_disk_IO_v1.patch Description: application/download -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioned tables and relfilenode
On 2017/02/19 18:53, Robert Haas wrote: > On Fri, Feb 17, 2017 at 1:12 PM, Amit Langote wrote: >> Do you mean that if database-wide analyze is to be run, we should also >> exclude those RELKIND_RELATION relations that are partitions? >> >> So the only way to update a partition's statistics is to directly specify >> it in the command or by autovacuum. > > I think if you type: > > ANALYZE; > > ...that should process all partitioned tables and all tables that are > not themselves partitions. OK. > If you type: > > ANALYZE name; > > ...that should ANALYZE that relation, whatever it is. If it's a > partitioned table, it should recurse. To be clear, by "recurse" I assume you mean to perform ANALYZE on individual partitions, not just collect the inheritance statistics. So ANALYZE partitioned_table would both a) collect the inheritance statistics for the specified table and other partitioned tables in the hierarchy, b) ANALYZE every leaf partitions updating their statistics in pg_class. While working on this, I noticed that autovacuum.c does not collect RELKIND_PARTITIONED_TABLE relations, which I think is not right. It should match what get_rel_oids() does, which in the database-wide VACUUM/ANALYZE case collects them. Attached updated patches: Updated 0001 addresses the following comments: - should recurse when vacuum/analyze is performed on a partitioned table - database-wide vacuum should ignore partitioned tables - database-wide analyze should ignore partitions; only the inheritance statistics of the partitioned tables must be collected in this case Thanks, Amit >From 982366eb019585438513b0c7b6e86acc74e459d1 Mon Sep 17 00:00:00 2001 From: amitDate: Mon, 6 Feb 2017 18:03:28 +0900 Subject: [PATCH 1/3] Partitioned tables are empty themselves So, there is not much point in trying to do things to them that need accessing files (a later commit will make it so that there is no file at all to access.) Things that needed attention are: vacuum, analyze, truncate, ATRewriteTables. --- src/backend/commands/analyze.c | 39 ++-- src/backend/commands/tablecmds.c| 15 ++-- src/backend/commands/vacuum.c | 71 + src/backend/postmaster/autovacuum.c | 20 +++ 4 files changed, 118 insertions(+), 27 deletions(-) diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c index ed3acb1673..12cd0110b0 100644 --- a/src/backend/commands/analyze.c +++ b/src/backend/commands/analyze.c @@ -201,8 +201,7 @@ analyze_rel(Oid relid, RangeVar *relation, int options, * locked the relation. */ if (onerel->rd_rel->relkind == RELKIND_RELATION || - onerel->rd_rel->relkind == RELKIND_MATVIEW || - onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) + onerel->rd_rel->relkind == RELKIND_MATVIEW) { /* Regular table, so we'll use the regular row acquisition function */ acquirefunc = acquire_sample_rows; @@ -234,6 +233,12 @@ analyze_rel(Oid relid, RangeVar *relation, int options, return; } } + else if (onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) + { + /* + * For partitioned tables, we want to do the recursive ANALYZE below. + */ + } else { /* No need for a WARNING if we already complained during VACUUM */ @@ -253,13 +258,15 @@ analyze_rel(Oid relid, RangeVar *relation, int options, LWLockRelease(ProcArrayLock); /* - * Do the normal non-recursive ANALYZE. + * Do the normal non-recursive ANALYZE, non-partitioned relations. */ - do_analyze_rel(onerel, options, params, va_cols, acquirefunc, relpages, - false, in_outer_xact, elevel); + if (onerel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE) + do_analyze_rel(onerel, options, params, va_cols, acquirefunc, + relpages, false, in_outer_xact, elevel); /* - * If there are child tables, do recursive ANALYZE. + * If there are child tables, do recursive ANALYZE. This includes + * partitioned tables. */ if (onerel->rd_rel->relhassubclass) do_analyze_rel(onerel, options, params, va_cols, acquirefunc, relpages, @@ -1316,10 +1323,20 @@ acquire_inherited_sample_rows(Relation onerel, int elevel, continue; } + /* Ignore partitioned tables as there are no tuples to collect */ + if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) + { + /* Don't try to unlock the passed-in root table. */ + if (childrel == onerel) +heap_close(childrel, NoLock); + else +heap_close(childrel, AccessShareLock); + continue; + } + /* Check table type (MATVIEW can't happen, but might as well allow) */ if (childrel->rd_rel->relkind == RELKIND_RELATION || - childrel->rd_rel->relkind == RELKIND_MATVIEW || - childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) + childrel->rd_rel->relkind == RELKIND_MATVIEW) { /* Regular table, so use the regular row acquisition function */ acquirefunc = acquire_sample_rows; @@ -1366,9 +1383,11 @@
Re: [HACKERS] GUC for cleanup indexes threshold.
On Mon, Feb 20, 2017 at 3:01 PM, Simon Riggswrote: > On 20 February 2017 at 09:15, Amit Kapila wrote: >> On Mon, Feb 20, 2017 at 7:26 AM, Masahiko Sawada >> wrote: >>> On Fri, Feb 17, 2017 at 3:41 AM, Robert Haas wrote: On Thu, Feb 16, 2017 at 6:17 AM, Simon Riggs wrote: > On 15 February 2017 at 08:07, Masahiko Sawada > wrote: >> It's a bug. Attached latest version patch, which passed make check. > > 2. The current btree vacuum code requires 2 vacuums to fully reuse > half-dead pages. So skipping an index vacuum might mean that second > index scan never happens at all, which would be bad. Maybe. If there are a tiny number of those half-dead pages in a huge index, it probably doesn't matter. Also, I don't think it would never happen, unless the table just never gets any more updates or deletes - but that case could also happen today. It's just a matter of happening less frequently. >>> >> >> Yeah thats right and I am not sure if it is worth to perform a >> complete pass to reclaim dead/deleted pages unless we know someway >> that there are many such pages. > > Agreed which is why > On 16 February 2017 at 11:17, Simon Riggs wrote: >> I suggest that we store the number of half-dead pages in the metapage >> after each VACUUM, so we can decide whether to skip the scan or not. > > >> Also, I think we do reclaim the >> complete page while allocating a new page in btree. > > That's not how it works according to the README at least. > I am referring to code (_bt_getbuf()->if (_bt_page_recyclable(page))), won't that help us in reclaiming the space? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should we cacheline align PGXACT?
Hi, >> On Thu, Feb 16, 2017 at 5:07 AM, Alexander Korotkov > >>wrote: > >> > On Wed, Feb 15, 2017 at 8:49 PM, Alvaro Herrera > >> > > >> > wrote: > >> >> Alexander Korotkov wrote: > >> >> > >> >> > Difference between master, pgxact-align-2 and pgxact-align-3 > doesn't > >> >> > exceed > >> >> > per run variation. > >> >> > >> >> FWIW this would be more visible if you added error bars to each data > >> >> point. Should be simple enough in gnuplot ... > >> > > >> > Good point. > >> > Please find graph of mean and errors in attachment. > >> > >> So ... no difference? > > > > > > Yeah, nothing surprising. It's just another graph based on the same > data. > > I wonder how pgxact-align-3 would work on machine of Ashutosh Sharma, > > because I observed regression there in write-heavy benchmark of > > pgxact-align-2. > > I am yet to benchmark pgxact-align-3 patch on a read-write workload. I > could not do it as our benchmarking machine was already reserved for > some other test work. But, I am planning to do it on this weekend. > Will try to post my results by Monday evening. Thank you and sorry for > the delayed response. > Following are the pgbench results for read-write workload, I got with pgxact-align-3 patch. The results are for 300 scale factor with 8GB of shared buffer i.e. when data fits into the shared buffer. For 1000 scale factor with 8GB shared buffer the test is still running, once it is completed I will share the results for that as well. *pgbench settings:* pgbench -i -s 300 postgres pgbench -M prepared -c $thread -j $thread -T $time_for_reading postgres where, time_for_reading = 30mins *non default GUC param:* shared_buffers=8GB max_connections=300 pg_xlog is located in SSD. *Machine details:* Architecture: x86_64 CPU op-mode(s):32-bit, 64-bit Byte Order:Little Endian CPU(s):128 On-line CPU(s) list: 0-127 Thread(s) per core:2 Core(s) per socket:8 Socket(s): 8 NUMA node(s): 8 Vendor ID: GenuineIntel CPU family:6 Model: 47 Model name:Intel(R) Xeon(R) CPU E7- 8830 @ 2.13GHz CLIENT COUNT TPS (HEAD) TPS (PATCH) % IMPROVEMENT 4 4283 4220 -1.47093159 8 7291 7261 -0.4114661912 16 11909 12149 2.015282559 32 20789 20745 -0.211650392 64 28412 27831 -2.044910601 128 29369 28765 -2.056590282 156 27949 27189 -2.719238613 180 27876 27171 -2.529057254 196 28849 27872 -3.386599189 256 30321 28188 -7.034728406 Also, Excel sheet (results-read-write-300-SF) containing the results for all the 3 runs is attached. -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com results-read-write-300-SF.xlsx Description: MS-Excel 2007 spreadsheet -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPDATE of partition key
On Mon, Feb 20, 2017 at 3:36 PM, Thomas Munrowrote: > On Thu, Feb 16, 2017 at 8:53 PM, Robert Haas wrote: >> Generally speaking, we don't throw >> serialization errors today at READ COMMITTED, so if we do so here, >> that's going to be a noticeable and perhaps unwelcome change. > > Yes we do: > > https://www.postgresql.org/docs/9.6/static/transaction-iso.html#XACT-REPEATABLE-READ Oops -- please ignore, I misread that as repeatable read. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPDATE of partition key
On Thu, Feb 16, 2017 at 8:53 PM, Robert Haaswrote: > Generally speaking, we don't throw > serialization errors today at READ COMMITTED, so if we do so here, > that's going to be a noticeable and perhaps unwelcome change. Yes we do: https://www.postgresql.org/docs/9.6/static/transaction-iso.html#XACT-REPEATABLE-READ -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] case_preservation_and_insensitivity = on
On Mon, Feb 20, 2017 at 1:45 AM, Tom Lanewrote: > The versions of autocommit that have actually stood the test of time were > implemented on the client side (in psql and JDBC, and I think ODBC as > well), where the scope of affected code was lots smaller. I wonder > whether there's any hope of providing something useful for case-folding > in that way. psql's lexer is already smart enough that you could teach it > rules like "smash any unquoted identifier to lower case" (probably it > would fold keywords too, but that seems OK). That's probably not much > help for custom applications, which aren't likely to be going through > psql scripts; but the fact that such behavior is in reach at all on the > client side seems encouraging. This sounds like a really good solution to me, since there is actually nothing missing on the PostgreSQL server-side, it's merely a matter of inconvenience on the client-side. As long as the definitions of the database objects when stored in the git repo can be written without the double-quotes, i.e. CREATE TABLE Users ( instead of CREATE TABLE "Users" ( where the object would be created as "Users" with capital "U", then I see no problem. Most people probably use psql to initiate a db instance of their project locally, so if psql would have a --preserve-case option, that would solve the problem of creating new objects. Or maybe --no-case-folding is a better name for the option. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GUC for cleanup indexes threshold.
On 20 February 2017 at 09:15, Amit Kapilawrote: > On Mon, Feb 20, 2017 at 7:26 AM, Masahiko Sawada > wrote: >> On Fri, Feb 17, 2017 at 3:41 AM, Robert Haas wrote: >>> On Thu, Feb 16, 2017 at 6:17 AM, Simon Riggs wrote: On 15 February 2017 at 08:07, Masahiko Sawada wrote: > It's a bug. Attached latest version patch, which passed make check. 2. The current btree vacuum code requires 2 vacuums to fully reuse half-dead pages. So skipping an index vacuum might mean that second index scan never happens at all, which would be bad. >>> >>> Maybe. If there are a tiny number of those half-dead pages in a huge >>> index, it probably doesn't matter. Also, I don't think it would never >>> happen, unless the table just never gets any more updates or deletes - >>> but that case could also happen today. It's just a matter of >>> happening less frequently. >> > > Yeah thats right and I am not sure if it is worth to perform a > complete pass to reclaim dead/deleted pages unless we know someway > that there are many such pages. Agreed which is why On 16 February 2017 at 11:17, Simon Riggs wrote: > I suggest that we store the number of half-dead pages in the metapage > after each VACUUM, so we can decide whether to skip the scan or not. > Also, I think we do reclaim the > complete page while allocating a new page in btree. That's not how it works according to the README at least. You might be referring to cleaning out killed tuples just before a page split? That's something different. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] case_preservation_and_insensitivity = on
On Mon, Feb 20, 2017 at 2:40 AM, Jim Nasbywrote: > Even if the project decided that "Users" and users is stupid and that we > should deprecate it, I think the odds of also deciding to tell existing > users to re-write their apps are zero. But if the feature can't be turned on without also enforcing lowercase uniqueness, then the described problem situation will never happen. Any existing projects who want to use the new feature but can't due to conflicting names, will simply just have to live without it, just like they already do. > So no matter how this is designed, there has to be some way for existing > users to be able to continue relying on "Users" and users being different. There is a way, simply don't switch on the feature, and "Users" and "users" will continue to be different. > What would work is an initdb option that controls this: when ignoring case > for uniqueness is disabled, your new column would simply be left as NULL. > With some extra effort you could probably allow changing that on a running > database as well, just not with something as easy to change as a GUC. initdb option sounds good to me, just like you specify e.g. --encoding. Also, I think the --lowercase-uniqueness feature would be useful by itself even without the --case-preserving feature, since that might be a good way to enforce a good design of new databases, as a mix of "users" and "Users" is probably considered ugly by many system designers. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPDATE of partition key
On 16 February 2017 at 20:53, Robert Haaswrote: > On Thu, Feb 16, 2017 at 5:47 AM, Greg Stark wrote: >> On 13 February 2017 at 12:01, Amit Khandekar wrote: >>> There are a few things that can be discussed about : >> >> If you do a normal update the new tuple is linked to the old one using >> the ctid forming a chain of tuple versions. This tuple movement breaks >> that chain. So the question I had reading this proposal is what >> behaviour depends on ctid and how is it affected by the ctid chain >> being broken. > > I think this is a good question. > >> I think the concurrent update case is just a symptom of this. If you >> try to update a row that's locked for a concurrent update you normally >> wait until the concurrent update finishes, then follow the ctid chain >> and recheck the where clause on the target of the link and if it still >> matches you perform the update there. > > Right. EvalPlanQual behavior, in short. > >> At least you do that if you have isolation_level set to >> repeatable_read. If you have isolation level set to serializable then >> you just fail with a serialization failure. I think that's what you >> should do if you come across a row that's been updated with a broken >> ctid chain even in repeatable read mode. Just fail with a >> serialization failure and document that in partitioned tables if you >> perform updates that move tuples between partitions then you need to >> be ensure your updates are prepared for serialization failures. > > Now, this part I'm not sure about. What's pretty clear is that, > barring some redesign of the heap format, we can't keep the CTID chain > intact when the tuple moves to a different relfilenode. What's less > clear is what to do about that. We can either (1) give up on > EvalPlanQual behavior in this case and act just as we would if the row > had been deleted; no update happens. This is what the current patch has done. > or (2) throw a serialization > error. You're advocating for #2, but I'm not sure that's right, > because: > > 1. It's a lot more work, > > 2. Your proposed implementation needs an on-disk format change that > uses up a scarce infomask bit, and > > 3. It's not obvious to me that it's clearly preferable from a user > experience standpoint. I mean, either way the user doesn't get the > behavior that they want. Either they're hoping for EPQ semantics and > they instead do a no-op update, or they're hoping for EPQ semantics > and they instead get an ERROR. Generally speaking, we don't throw > serialization errors today at READ COMMITTED, so if we do so here, > that's going to be a noticeable and perhaps unwelcome change. > > More opinions welcome. I am inclined to at least have some option for the user to decide the behaviour. In the future we can even consider support for walking through the ctid chain across multiple relfilenodes. But till then, we need to decide what default behaviour to keep. My inclination is more towards erroring out in an unfortunate even where there is an UPDATE while the row-movement is happening. One option is to not get into finding whether the DELETE was part of partition row-movement or it was indeed a DELETE, and always error out the UPDATE when heap_update() returns HeapTupleUpdated, but only if the table is a leaf partition. But this obviously will cause annoyance because of chances of getting such errors when there are concurrent updates and deletes in the same partition. But we can keep a table-level option for determining whether to error out or silently lose the UPDATE. Another option I was thinking : When the UPDATE is on a partition key, acquire ExclusiveLock (not AccessExclusiveLock) only on that partition, so that the selects will continue to execute, but UPDATE/DELETE will wait before opening the table for scan. The UPDATE on partition key is not going to be a very routine operation, it sounds more like a DBA maintenance operation; so it does not look like it would come in between usual transactions. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Gather Merge
On Mon, Feb 20, 2017 at 1:58 PM, Dilip Kumarwrote: > On Mon, Feb 20, 2017 at 12:05 PM, Rushabh Lathia > wrote: >> Thanks Amit for raising this point. I was not at all aware of mark/restore. >> I tried to come up with the case, but haven't found such case. >> >> For now here is the patch with comment update. > > I think for reproducing this you need plan something like below (I > think this is a really bad plan, but you can use to test this > particular case). > > MergeJoin > -> Index Scan > -> Gather Merge >->Parallel Index Scan > > So if only IndexScan node is there as a inner node which support > Mark/Restore then we don't need to insert any materialize node. But > after we put Gather Merge (which don't support Mark/Restore) then we > need a materialize node on top of that. Therefore, plan should become > like this, I think so. > (But anyway if we have the Gather instead of the GatherMerge we would > required a Sort node on top of the Gather and Materialize is obviously > cheaper than the Sort.) > > MergeJoin > -> Index Scan > -> Materialize >-> Gather Merge (Does not support mark/restore) >->Parallel Index Scan > Yes, exactly that's what will happen, however, we are not sure how many times such plan (Gather Merge on inner side of merge join) will be helpful and whether adding Mark/Restore support will make it any faster than just adding Materialize on top of Gather Merge. So, it seems better not to go there unless we see some use of it. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] increasing the default WAL segment size
Hello, On Thu, Feb 16, 2017 at 3:37 PM, Kuntal Ghoshwrote: > On Mon, Feb 6, 2017 at 11:09 PM, Beena Emerson > wrote: > > > > Hello, > > PFA the updated patches. > I've started reviewing the patches. > 01-add-XLogSegmentOffset-macro.patch looks clean to me. I'll post my > detailed review after looking into the second patch. But, both the > patches needs a rebase based on the commit 85c11324cabaddcfaf3347df7 > (Rename user-facing tools with "xlog" in the name to say "wal"). > > PFA the rebased patches. Thank you, Beena Emerson Have a Great Day! 01-add-XLogSegmentOffset-macro.patch Description: Binary data 02-initdb-walsegsize-v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GUC for cleanup indexes threshold.
On Mon, Feb 20, 2017 at 7:26 AM, Masahiko Sawadawrote: > On Fri, Feb 17, 2017 at 3:41 AM, Robert Haas wrote: >> On Thu, Feb 16, 2017 at 6:17 AM, Simon Riggs wrote: >>> On 15 February 2017 at 08:07, Masahiko Sawada wrote: It's a bug. Attached latest version patch, which passed make check. >>> >>> In its current form, I'm not sure this is a good idea. Problems... >>> >>> 1. I'm pretty sure the world doesn't need another VACUUM parameter >>> >>> I suggest that we use the existing vacuum scale factor/4 to reflect >>> that indexes are more sensitive to bloat. >> >> I do not think it's a good idea to control multiple behaviors with a >> single GUC. We don't really know that dividing by 4 will be right for >> everyone, or even for most people. It's better to have another >> parameter with a sensible default than to hardcode a ratio that might >> work out poorly for some people. >> >>> 2. The current btree vacuum code requires 2 vacuums to fully reuse >>> half-dead pages. So skipping an index vacuum might mean that second >>> index scan never happens at all, which would be bad. >> >> Maybe. If there are a tiny number of those half-dead pages in a huge >> index, it probably doesn't matter. Also, I don't think it would never >> happen, unless the table just never gets any more updates or deletes - >> but that case could also happen today. It's just a matter of >> happening less frequently. > Yeah thats right and I am not sure if it is worth to perform a complete pass to reclaim dead/deleted pages unless we know someway that there are many such pages. Also, I think we do reclaim the complete page while allocating a new page in btree. > The half-dead pages are never cleaned up if the ratio of pages > containing garbage is always lower than threshold. > Which threshold are you referring here? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Documentation improvements for partitioning
On Mon, Feb 20, 2017 at 2:09 AM, Simon Riggswrote: > On 15 February 2017 at 15:46, Robert Haas wrote: >>> It leaves me asking what else is missing. >> >> There is certainly a lot of room for improvement here but I don't >> understand your persistent negativity about what's been done thus far. >> I think it's pretty clearly a huge step forward, and I think Amit >> deserves a ton of credit for making it happen. The improvements in >> bulk loading performance alone are stupendous. You apparently have >> the idea that somebody could have written an even larger patch that >> solved even more problems at once, but this was already a really big >> patch, and IMHO quite a good one. > > Please explain these personal comments against me. Several of your emails, including your first post to this thread, seemed to me to be quite negative about the state of this feature. I don't think that's warranted, though perhaps I am misreading your tone, as I have been known to do. I also don't think that expressing the opinion that the feature is better than you're giving it credit for is a personal comment against you. Where exactly do you see a personal comment against you in what I wrote? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Gather Merge
On Mon, Feb 20, 2017 at 12:05 PM, Rushabh Lathiawrote: > Thanks Amit for raising this point. I was not at all aware of mark/restore. > I tried to come up with the case, but haven't found such case. > > For now here is the patch with comment update. I think for reproducing this you need plan something like below (I think this is a really bad plan, but you can use to test this particular case). MergeJoin -> Index Scan -> Gather Merge ->Parallel Index Scan So if only IndexScan node is there as a inner node which support Mark/Restore then we don't need to insert any materialize node. But after we put Gather Merge (which don't support Mark/Restore) then we need a materialize node on top of that. Therefore, plan should become like this, I think so. (But anyway if we have the Gather instead of the GatherMerge we would required a Sort node on top of the Gather and Materialize is obviously cheaper than the Sort.) MergeJoin -> Index Scan -> Materialize -> Gather Merge (Does not support mark/restore) ->Parallel Index Scan -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Push down more UPDATEs/DELETEs in postgres_fdw
On 2017/02/13 18:24, Rushabh Lathia wrote: I started reviewing the patch again. Patch applied cleanly on latest source as well as regression pass through with the patch. I also performed few manual testing and haven't found any regression. Patch look much cleaner the earlier version, and don't have any major concern as such. Thanks for the review! Here are few comments: 1) @@ -211,6 +211,12 @@ typedef struct PgFdwDirectModifyState PGresult *result;/* result for query */ intnum_tuples;/* # of result tuples */ intnext_tuple;/* index of next one to return */ +RelationresultRel;/* relcache entry for the target table */ Why we need resultRel? Can't we directly use dmstate->rel ? The reason why we need that is because in get_returning_data, we pass dmstate->rel to make_tuple_from_result_row, which requires that dmstate->rel be NULL when the scan tuple is described by fdw_scan_tlist. So in that case we set dmstate->rel to NULL and have dmstate->resultRel that is the relcache entry for the target relation in postgresBeginDirectModify. 2) In the patch somewhere scanrelid condition being used as fscan->scan.scanrelid == 0 where as some place its been used as fsplan->scan.scanrelid > 0. Infact in the same function its been used differently example postgresBeginDirectModify. Can make this consistent. Ok, done. 3) + * If UPDATE/DELETE on a join, create a RETURINING list used in the remote + * query. + */ +if (fscan->scan.scanrelid == 0) +returningList = make_explicit_returning_list(resultRelation, rel, + returningList); + Above block can be moved inside the if (plan->returningLists) condition above the block. Like this: /* * Extract the relevant RETURNING list if any. */ if (plan->returningLists) { returningList = (List *) list_nth(plan->returningLists, subplan_index); /* * If UPDATE/DELETE on a join, create a RETURINING list used in the remote * query. */ if (fscan->scan.scanrelid == 0) returningList = make_explicit_returning_list(resultRelation, rel, returningList); } Done that way. Another thing I noticed is duplicate work in apply_returning_filter; it initializes tableoid of an updated/deleted tuple if needed, but the core will do that (see ExecProcessReturning). I removed that work from apply_returning_filter. I am still doing few more testing with the patch, if I will found anything apart from this I will raise that into another mail. Thanks again! Attached is an updated version of the patch. Best regards, Etsuro Fujita *** a/contrib/postgres_fdw/deparse.c --- b/contrib/postgres_fdw/deparse.c *** *** 130,142 static void deparseTargetList(StringInfo buf, Bitmapset *attrs_used, bool qualify_col, List **retrieved_attrs); ! static void deparseExplicitTargetList(List *tlist, List **retrieved_attrs, deparse_expr_cxt *context); static void deparseReturningList(StringInfo buf, PlannerInfo *root, Index rtindex, Relation rel, bool trig_after_row, List *returningList, List **retrieved_attrs); static void deparseColumnRef(StringInfo buf, int varno, int varattno, PlannerInfo *root, bool qualify_col); static void deparseRelation(StringInfo buf, Relation rel); --- 130,151 Bitmapset *attrs_used, bool qualify_col, List **retrieved_attrs); ! static void deparseExplicitTargetList(List *tlist, ! bool is_returning, ! List **retrieved_attrs, deparse_expr_cxt *context); static void deparseReturningList(StringInfo buf, PlannerInfo *root, Index rtindex, Relation rel, bool trig_after_row, List *returningList, List **retrieved_attrs); + static void deparseExplicitReturningList(List *rlist, + List **retrieved_attrs, + deparse_expr_cxt *context); + static void pull_up_target_conditions(PlannerInfo *root, RelOptInfo *foreignrel, + Index target_rel, List **target_conds); + static void extract_target_conditions(List **joinclauses, Index target_rel, + List **target_conds); static void deparseColumnRef(StringInfo buf, int varno, int varattno, PlannerInfo *root, bool qualify_col); static void deparseRelation(StringInfo buf, Relation rel); *** *** 165,171 static void deparseLockingClause(deparse_expr_cxt *context); static void appendOrderByClause(List *pathkeys, deparse_expr_cxt *context); static void appendConditions(List *exprs, deparse_expr_cxt *context); static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root, ! RelOptInfo *joinrel, bool use_alias, List **params_list); static