Re: [HACKERS] September 2015 Commitfest
On Thu, Sep 3, 2015 at 10:06 PM, Andres Freund wrote: > To review (copying Heikki's message): > > 1. Pick a patch from the list at: > > https://commitfest.postgresql.org/6/ > > 2. Review it. Test it. Try to break it. Do we want the feature? Any > weird interactions in corner-cases? Does it have the intended or an > unintended performance impact? > > 3. Reply on the thread on pgsql-hackers with your findings. > > It is perfectly OK to review a patch that someone else has signed up for as > reviewer - different people tend to pay attention to different things. We are close to the end of October, and the numbers are a bit more encouraging than at the beginning: Needs review: 27. Waiting on Author: 12. Ready for Committer: 5. Committed: 30 Among the five patches marked as ready for committer, one is a bug fix that should be back-patched (ahem). Shouldn't we move on with those entries first? Also, to all reviewers and authors, please be sure that the status of your patch is correctly updated. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Rework access method interface
On Thu, Oct 22, 2015 at 8:37 AM, Petr Jelinek wrote: > BTW I think this is ready for committer, except for the need to check docs > by native speaker. If so, could you update the entry of this patch accordingly? https://commitfest.postgresql.org/6/353/ -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On columnar storage (2)
On Tue, Sep 1, 2015 at 8:53 AM, Alvaro Herrera wrote: > As discussed in > https://www.postgresql.org/message-id/20150611230316.gm133...@postgresql.org > we've been working on implementing columnar storage for Postgres. > Here's some initial code to show our general idea, and to gather > comments related to what we're building. This is not a complete patch, > and we don't claim that it works! This is in very early stages, and we > have a lot of work to do to get this in working shape. > > This was proposed during the Developer's Unconference in Ottawa earlier > this year. While some questions were raised about some elements of our > design, we don't think they were outright objections, so we have pressed > forward on the expectation that any limitations can be fixed before this > is final if they are critical, or in subsequent commits if not. > > The commit messages for each patch should explain what we've done in > enough technical detail, and hopefully provide a high-level overview of > what we're developing. > > The first few pieces are "ready for comment" -- feel free to speak up > about the catalog additions, the new COLUMN STORE bits we added to the > grammar, the way we handle column stores in the relcache, or the > mechanics to create column store catalog entries. > > The later half of the patch series is much less well cooked yet; for > example, the colstore_dummy module is just a simple experiment to let us > verify that the API is working. The planner and executor code are > mostly stubs, and we are not yet sure of what are the executor nodes > that we would like to have: while we have discussed this topic > internally a lot, we haven't yet formed final opinions, and of course > the stub implementations are not doing the proper things, and in many > cases they are even not doing anything at all. Fujitsu is also interested in implementing a columnar storage extension. First we thought of implementing this extension using index access methods The following is the basic design idea of the columnar extension, currently this may need to be redesigned according to columnar access methods, create an vertical columnar index on a table with specified columns that are needed to be stored in columnar storage format. To provide performance benefit for both read and write operations, the data is stored in two formats, 1) write optimized storage (WOS) 2) read optimized storage (ROS). This is useful for the users where there is a great chance of data modification that is newly added. Because of two storage's, we need maintain two entries in pg_class table. one is WOS and others are all columns in columnar storage. Insert: write optimized storage is the data of all columns that are part of VCI are stored in a row wise format. All the newly added data is stored in WOS relation with xmin/xmax information also. If user wants to update/delete the newly added data, it doesn't affect the performance much compared to deleting the data from columnar storage. The tuples which don't have multiple copies or frozen data will be moved from WOS to ROS periodically by the background worker process or autovauum process. Every column data is stored separately in it's relation file. There is no transaction information is present in ROS. The data in ROS can be referred with tuple ID. In this approach, the column data is present in both heap and columnar storage, whereas with columnar access methods the column data doesn't present in the heap. Select: Because of two storage formats, during the select operation, the data in WOS is converted into Local ROS for the statement to be executed. The conversion cost depends upon the number of tuples present in the WOS file. This may add some performance overhead for select statements. Delete: During the delete operation, whenever the data is deleted in heap at the same time the data in WOS file is marked as deleted similar like heap. But in case if the data is already migrated from WOS to ROS, then we will maintain some delete vector to store the details of tuple id, transaction information and etc. During the data read from ROS file, it is verified against delete vector and confirms whether the record is visible or not? All the delete vectors data is applied to ROS periodically. The concept of columnar extension is from Fujitsu Labs, Japan. Any comments for further evaluation of this approach according to columnar access methods? Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL logging problem in 9.4.3?
On Wed, Oct 21, 2015 at 11:53 PM, Alvaro Herrera wrote: > Heikki Linnakangas wrote: > >> Thanks. For comparison, I wrote a patch to implement what I had in mind. >> >> When a WAL-skipping COPY begins, we add an entry for that relation in a >> "pending-fsyncs" hash table. Whenever we perform any action on a heap that >> would normally be WAL-logged, we check if the relation is in the hash table, >> and skip WAL-logging if so. > > I think this wasn't applied, was it? No, it was not applied. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Making tab-complete.c easier to maintain
Here is a new version merging the recent CREATE EXTENSION ... VERSION patch from master. (Apologies for sending so many versions. tab-complete.c keeps moving and I want to keep a version that applies on top of master out there, for anyone interested in looking at this. As long as no one objects and there is interest in the patch, I'll keep doing that.) -- Thomas Munro http://www.enterprisedb.com tab-complete-macrology-v6.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Rework access method interface
Petr Jelinek writes: > On 2015-10-12 14:32, Alexander Korotkov wrote: >> Also, it was planned that documentation changes would be reviewed by >> native english speaker. > BTW I think this is ready for committer, except for the need to check > docs by native speaker. I'm working at Salesforce this week, but will take a look after I get home. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Rework access method interface
On 2015-10-12 14:32, Alexander Korotkov wrote: Also, it was planned that documentation changes would be reviewed by native english speaker. BTW I think this is ready for committer, except for the need to check docs by native speaker. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Duplicated assignment of slot_name in walsender.c
Alvaro Herrera writes: > Andres Freund wrote: >> for another every decent compiler can optimize those away. Note that >> those duplicate strlen() calls are there in a lot of places in >> walsender.c > It can? Tom has repeatedly argue the opposite, in the past. I'm prepared to believe that *some* compilers do that, but I think it's doubtful that they all do. Even if they do, it would have to be a very tightly constrained optimization, since the compiler would have to be able to prove that there is no way for the referenced string to get changed between the two call sites. That would likely mean that some places where you think this will happen will actually end up doing the strlen() twice. I'm willing to buy the argument that performance doesn't matter in this particular context; but if we think it does, I'd rather see us explicitly save and re-use the strlen() result, so that the code behaves the same on every platform. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bugs and bug tracking
On Thu, Oct 22, 2015 at 2:04 AM, Nathan Wagner wrote: > https://granicus.if.org/pgbugs/ for anyone who hasn't and wants to take a > look. FYI There seems to be an encoding problem somewhere. "Đặng Minh Dũng" is showing up as "Äặng Minh DÅ©ng" on this page: https://granicus.if.org/pgbugs/13691 -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Duplicated assignment of slot_name in walsender.c
On 2015-10-21 19:36:16 -0300, Alvaro Herrera wrote: > diff --git a/src/backend/replication/walsender.c > b/src/backend/replication/walsender.c > index c6043cd..5487cc0 100644 > --- a/src/backend/replication/walsender.c > +++ b/src/backend/replication/walsender.c > @@ -762,10 +762,10 @@ logical_read_xlog_page(XLogReaderState *state, > XLogRecPtr targetPagePtr, int req > static void > CreateReplicationSlot(CreateReplicationSlotCmd *cmd) > { > - const char *slot_name; > const char *snapshot_name = NULL; > charxpos[MAXFNAMELEN]; > StringInfoData buf; > + int len; If you want to do that, I'm unenthusiastically not objecting. But if so, let's also do it in IdentifySystem(), SendTimeLineHistory(), StartReplication(), SendBackupHeader(), SendXLogRecPtResult() - they're modeled after each other. Do you want to commit the slot_name with the rest or do you want me to do that? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Duplicated assignment of slot_name in walsender.c
Andres Freund wrote: > That seems fairly insignificant. For one this is a rather infrequent and > expensive operation, for another every decent compiler can optimize > those away. Note that those duplicate strlen() calls are there in a lot > of places in walsender.c diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index c6043cd..5487cc0 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -762,10 +762,10 @@ logical_read_xlog_page(XLogReaderState *state, XLogRecPtr targetPagePtr, int req static void CreateReplicationSlot(CreateReplicationSlotCmd *cmd) { - const char *slot_name; const char *snapshot_name = NULL; charxpos[MAXFNAMELEN]; StringInfoData buf; + int len; Assert(!MyReplicationSlot); @@ -791,14 +791,11 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd) initStringInfo(&output_message); - slot_name = NameStr(MyReplicationSlot->data.name); - if (cmd->kind == REPLICATION_KIND_LOGICAL) { LogicalDecodingContext *ctx; - ctx = CreateInitDecodingContext( - cmd->plugin, NIL, + ctx = CreateInitDecodingContext(cmd->plugin, NIL, logical_read_xlog_page, WalSndPrepareWrite, WalSndWriteData); @@ -834,7 +831,6 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd) ReplicationSlotSave(); } - slot_name = NameStr(MyReplicationSlot->data.name); snprintf(xpos, sizeof(xpos), "%X/%X", (uint32) (MyReplicationSlot->data.confirmed_flush >> 32), (uint32) MyReplicationSlot->data.confirmed_flush); @@ -885,18 +881,21 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd) pq_sendint(&buf, 4, 2); /* # of columns */ /* slot_name */ - pq_sendint(&buf, strlen(slot_name), 4); /* col1 len */ - pq_sendbytes(&buf, slot_name, strlen(slot_name)); + len = strlen(NameStr(MyReplicationSlot->data.name)); + pq_sendint(&buf, len, 4); /* col1 len */ + pq_sendbytes(&buf, NameStr(MyReplicationSlot->data.name), len); /* consistent wal location */ - pq_sendint(&buf, strlen(xpos), 4); /* col2 len */ - pq_sendbytes(&buf, xpos, strlen(xpos)); + len = strlen(xpos); + pq_sendint(&buf, len, 4); /* col2 len */ + pq_sendbytes(&buf, xpos, len); /* snapshot name */ if (snapshot_name != NULL) { - pq_sendint(&buf, strlen(snapshot_name), 4); /* col3 len */ - pq_sendbytes(&buf, snapshot_name, strlen(snapshot_name)); + len = strlen(snapshot_name); + pq_sendint(&buf, len, 4); /* col3 len */ + pq_sendbytes(&buf, snapshot_name, len); } else pq_sendint(&buf, -1, 4);/* col3 len, NULL */ @@ -904,8 +903,9 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd) /* plugin */ if (cmd->plugin != NULL) { - pq_sendint(&buf, strlen(cmd->plugin), 4); /* col4 len */ - pq_sendbytes(&buf, cmd->plugin, strlen(cmd->plugin)); + len = strlen(cmd->plugin); + pq_sendint(&buf, len, 4); /* col4 len */ + pq_sendbytes(&buf, cmd->plugin, len); } else pq_sendint(&buf, -1, 4);/* col4 len, NULL */ -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Duplicated assignment of slot_name in walsender.c
Andres Freund wrote: > On 2015-10-21 17:21:25 -0300, Alvaro Herrera wrote: > > I think the first assignment is also pointless -- I mean can't we just > > use MyReplicationSlot->data.name in both places where slot_name is used? > > Makes the lines a bit long. strlen(NameStr(MyReplicationSlot->data.name)) ... Meh. Assign the strlen somewhere? > > In the same routine, it seems wasteful to be doing strlen() twice for > > every string sent over the wire. > > That seems fairly insignificant. For one this is a rather infrequent and > expensive operation, OK, I can buy that. > for another every decent compiler can optimize those away. Note that > those duplicate strlen() calls are there in a lot of places in > walsender.c It can? Tom has repeatedly argue the opposite, in the past. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_recvlogical fixes
On 2015-10-21 11:52:31 -0300, Euler Taveira wrote: > While testing wal2json, I faced some problems with pg_recvlogical. Attached > is a serie of patches that can improve pg_recvlogical. Patches #2 and #3 are > bugfixes (and should be applied to 9.5 too). Patch #1 is not mandatory to > 9.5. > #1: add a bunch of checks to complain when using an option that is not > available in the specified action; I'm not a fan of doing that. Doing that for every option tends to be more annoying than helpful. E.g. pgbench's checks requires you to pointlessly remove a lot of harmless options just to be able to pass -i. > #2: there is a wrong check because startpos option can be specified with > --create-slot; > #3: doesn't ignore startpos in --create-slot because that action could be > specified together with --start action (that uses that option); It doesn't make sense to specify it with --create-slot though - you can't even know what a the start position would mean before the slot is created. You can't get older records than the ones at slot creation time, and you can't know what a feature xlog position would mean. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Duplicated assignment of slot_name in walsender.c
On 2015-10-21 17:21:25 -0300, Alvaro Herrera wrote: > > It seems that the 2nd assignment was an oversight. Spotted by Bernd > > Helmle. Yea, that's obviously redudant. Will remove. > I think the first assignment is also pointless -- I mean can't we just > use MyReplicationSlot->data.name in both places where slot_name is used? Makes the lines a bit long. strlen(NameStr(MyReplicationSlot->data.name)) ... > In the same routine, it seems wasteful to be doing strlen() twice for > every string sent over the wire. That seems fairly insignificant. For one this is a rather infrequent and expensive operation, for another every decent compiler can optimize those away. Note that those duplicate strlen() calls are there in a lot of places in walsender.c Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Duplicated assignment of slot_name in walsender.c
Euler Taveira wrote: > It seems that the 2nd assignment was an oversight. Spotted by Bernd > Helmle. I think the first assignment is also pointless -- I mean can't we just use MyReplicationSlot->data.name in both places where slot_name is used? In the same routine, it seems wasteful to be doing strlen() twice for every string sent over the wire. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Freezing without cleanup lock
Jim Nasby wrote: > While warning a client that just did a Slony-based version upgrade to make > sure to freeze the new database, it occurred to me that it should be safe to > freeze without the cleanup lock. This is interesting because it would allow > a scan_all vacuum to do it's job without blocking on the cleanup lock. > > Does anyone have a feel for whether scan_all vacuums blocking on the cleanup > lock is an actual problem? Yeah, I remember we discussed this and some other possible improvements related to freezing. I think other ideas proposed were that (1) during an emergency (uncancellable) autovacuum run, we process only the tables that are past the age limit, and (2) we remove the cost-based sleep so that it finishes as quickly as possible. (Yours is (3) only freeze and not do any actual pruning -- did I get that right?) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [BUGS] BUG #13694: Row Level Security by-passed with CREATEUSER permission
On 10/21/2015 12:46 PM, Tom Lane wrote: > Attached patch rips out CREATEUSER and NOCREATEUSER options lock, stock, > and barrel. Looks good to me. > Another possibility is to change them to actually mean CREATEROLE and > NOCREATEROLE. I think probably a clean break is better though. I think that would be too confusing. I'd rather see them go away ala your patch. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Duplicated assignment of slot_name in walsender.c
On 20-10-2015 08:28, Bernd Helmle wrote: The 2nd assignment to slot_name looks unnecessary? Yes, it is. Seems to be an oversight. Patch attached. -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento >From 87570993d29f2c98121c3a0a75c85cdc4211f24f Mon Sep 17 00:00:00 2001 From: Euler Taveira Date: Wed, 21 Oct 2015 16:52:26 -0300 Subject: [PATCH] Fix a duplicated assignment in walsender code It seems that the 2nd assignment was an oversight. Spotted by Bernd Helmle. --- src/backend/replication/walsender.c | 1 - 1 file changed, 1 deletion(-) diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index c6043cd..ca1b4b9 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -834,7 +834,6 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd) ReplicationSlotSave(); } - slot_name = NameStr(MyReplicationSlot->data.name); snprintf(xpos, sizeof(xpos), "%X/%X", (uint32) (MyReplicationSlot->data.confirmed_flush >> 32), (uint32) MyReplicationSlot->data.confirmed_flush); -- 2.1.4 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Freezing without cleanup lock
Jim Nasby writes: > While warning a client that just did a Slony-based version upgrade to > make sure to freeze the new database, it occurred to me that it should > be safe to freeze without the cleanup lock. What's your argument for that being safe? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Freezing without cleanup lock
On October 21, 2015 9:47:45 PM GMT+02:00, Tom Lane wrote: >Jim Nasby writes: >> While warning a client that just did a Slony-based version upgrade to > >> make sure to freeze the new database, it occurred to me that it >should >> be safe to freeze without the cleanup lock. > >What's your argument for that being safe? It doesn't affect tuple contents and thus backends having a pin can continue looking at tuple contents. The reason we need a cleanup lock is IIRC repairing page fragmentation / hot pruning, not freezing. Andres --- Please excuse brevity and formatting - I am writing this on my mobile phone. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #13694: Row Level Security by-passed with CREATEUSER permission
Joe Conway writes: > On 10/21/2015 11:26 AM, Andres Freund wrote: >> On 2015-10-21 11:17:44 -0700, Tom Lane wrote: >>> I wonder if it's time yet to remove those keywords. We've had the >>> SUPERUSER spelling since 8.1, and this report should remind us that >>> people get confused by the old spellings. >> +1 for doing that in 9.6. > 1++ Attached patch rips out CREATEUSER and NOCREATEUSER options lock, stock, and barrel. Another possibility is to change them to actually mean CREATEROLE and NOCREATEROLE. I think probably a clean break is better though. regards, tom lane diff --git a/doc/src/sgml/ref/alter_role.sgml b/doc/src/sgml/ref/alter_role.sgml index 7638817b830a1eb73ab59174591fe6eaf3378453..da36ad96967e7c842c28f5787b5ec0dc48ec5a60 100644 *** a/doc/src/sgml/ref/alter_role.sgml --- b/doc/src/sgml/ref/alter_role.sgml *** ALTER ROLE NOCREATEDB CREATEROLE NOCREATEROLE - CREATEUSER - NOCREATEUSER INHERIT NOINHERIT LOGIN --- 159,164 diff --git a/doc/src/sgml/ref/alter_user.sgml b/doc/src/sgml/ref/alter_user.sgml index 0ffaa16da2fc75fcd1121dcde627837d6a8e9166..84a0c52191fc6c82687ef49df99ac30fdbfd8f1d 100644 *** a/doc/src/sgml/ref/alter_user.sgml --- b/doc/src/sgml/ref/alter_user.sgml *** ALTER USER password' --- 28,33 diff --git a/doc/src/sgml/ref/create_role.sgml b/doc/src/sgml/ref/create_role.sgml index 240c21ce85fcd1bd4cddd4bb6af109ef319f8175..38cd4c83696053f5b70fc514e9b3939cb842fb3a 100644 *** a/doc/src/sgml/ref/create_role.sgml --- b/doc/src/sgml/ref/create_role.sgml *** CREATE ROLE - CREATEUSER - NOCREATEUSER - - - These clauses are an obsolete, but still accepted, spelling of - SUPERUSER and NOSUPERUSER. - Note that they are not equivalent to - CREATEROLE as one might naively expect! - - - - - INHERIT NOINHERIT --- 124,129 diff --git a/doc/src/sgml/ref/create_user.sgml b/doc/src/sgml/ref/create_user.sgml index 065999c85a4df610908c1e762d7bc3435406986b..6c690b36df561573f234de073e4ddbb10e6e9ec7 100644 *** a/doc/src/sgml/ref/create_user.sgml --- b/doc/src/sgml/ref/create_user.sgml *** CREATE USER -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Freezing without cleanup lock
While warning a client that just did a Slony-based version upgrade to make sure to freeze the new database, it occurred to me that it should be safe to freeze without the cleanup lock. This is interesting because it would allow a scan_all vacuum to do it's job without blocking on the cleanup lock. Does anyone have a feel for whether scan_all vacuums blocking on the cleanup lock is an actual problem? -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Freeze avoidance of very large table.
Jim Nasby wrote: > On 10/21/15 8:11 AM, Andres Freund wrote: > >Meh. Adding complexity definitely needs to be weighed against the > >benefits. As pointed out e.g. by all the multixact issues you mentioned > >upthread. In this case your argument for changing the name doesn't seem > >to hold much water. > > ISTM VISIBILITY_MAP_FROZEN_BIT_CAT_VER shold be defined in catversion.h > instead of pg_upgrade.h though, to ensure it's correctly updated when this > gets committed though. That would be untidy and pointless. pg_upgrade.h contains other catversions. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Freeze avoidance of very large table.
On 10/21/15 8:11 AM, Andres Freund wrote: Meh. Adding complexity definitely needs to be weighed against the benefits. As pointed out e.g. by all the multixact issues you mentioned upthread. In this case your argument for changing the name doesn't seem to hold much water. ISTM VISIBILITY_MAP_FROZEN_BIT_CAT_VER shold be defined in catversion.h instead of pg_upgrade.h though, to ensure it's correctly updated when this gets committed though. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Change behavior of (m)xid_age
Currently, xid_age() returns INT_MAX for a permanent xid. The comment in the function that 'Permanent XIDs are always infinitely old' may be technically correct, but returning INT_MAX is a useless behavior because it actually makes it look like that XID is in immediate wraparound danger. I think we should change it to return either 0, -1, or INT_MIN. To me, 0 makes the most sense for monitoring relfrozenxid. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY FREEZE and PD_ALL_VISIBLE
On Tue, Oct 20, 2015 at 7:02 AM, Robert Haas wrote: > On Sun, Oct 18, 2015 at 5:23 PM, Jeff Janes wrote: > > I'm planning on adding a todo item to have COPY FREEZE set > PD_ALL_VISIBLE. > > Or is there some reason this can't be done? > > > > Since the whole point of COPY FREEZE is to avoid needing to rewrite the > > entire table, it seems rather perverse that the first time the table is > > vacuumed, it needs to rewrite the entire table. > > *facepalm* > > I don't know how hard that is to implement, but +1 for trying to > figure out a way. > It turns out it was pretty easy to set PD_ALL_VISIBLE on the new pages, since the code in hio that requests the relation to be extended already has info on the tuple's intended freeze status. Then you just need to refrain from clearing PD_ALL_VISIBLE when that tuple is actually written into the page. Not only because clearing would defeat the purpose, but also because it will cause an error--apparently the incipient page is not yet in a state where visibilitymap_clear is willing to deal with it. With this patch, you get a table which has PD_ALL_VISIBLE set for all pages, but which doesn't have a _vm file. Index-only scans will visit the heap for each tuple until the first VACUUM is done. The first vacuum will read the entire table, but not need to write it anymore. And will create the _vm file. I think we really want to create _vm file as well as set PD_ALL_VISIBLE, but I don't know the best way to do that. Set a flag somewhere and then create it in bulk at the end of the transaction? Set it bit by bit as the pages are extended and initialized? Cheers, Jeff copy_freeze_all_visible_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH v3] GSSAPI encryption support
Michael Paquier writes: > On Tue, Oct 20, 2015 at 3:01 AM, Robbie Harwood wrote: >> Stephen Frost writes: >>> psql: lost synchronization with server: got message type "S", length 22 >> >> which unfortunately could be a great many things. I've said this a >> couple times now, but I really do need more information - a traffic >> dump, a list of commands that were run, etc.; unfortunately, the surface >> here is pretty large, and while I totally am willing to believe there >> are bugs in the code I've written, I do not yet see them. > > --- a/src/interfaces/libpq/fe-protocol3.c > +++ b/src/interfaces/libpq/fe-protocol3.c > @@ -129,6 +129,58 @@ pqParseInput3(PGconn *conn) > return; > } > > +#ifdef ENABLE_GSS > + /* We want to be ready in both IDLE and BUSY states > for encryption */ > + if (id == 'g' && !conn->gss_disable_enc && conn->gctx) > + { > + ssize_t encEnd, next; > [...] > + } > + else if (!conn->gss_disable_enc && conn->gss_auth_done && > +!conn->gss_decrypted_cur && id != 'E') > + /* This could be a sync error, so let's handle > it as such. */ > + handleSyncLoss(conn, id, msgLength); > +#endif > > Hm. The out-of-sync error I am seeing in my environment is caused by > this block when parsing 'g' messages coming from the backend that are > considered as being GSSAPI-encrypted messages. I am still looking at > that... If you're hitting the else-block, that suggests a GSSAPI context is not present at the time a GSSAPI message was received, I think. signature.asc Description: PGP signature
Re: [HACKERS] WAL logging problem in 9.4.3?
Heikki Linnakangas wrote: > Thanks. For comparison, I wrote a patch to implement what I had in mind. > > When a WAL-skipping COPY begins, we add an entry for that relation in a > "pending-fsyncs" hash table. Whenever we perform any action on a heap that > would normally be WAL-logged, we check if the relation is in the hash table, > and skip WAL-logging if so. I think this wasn't applied, was it? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Suporting multiple recursive table reads
I am trying to patch postgres to accept this query. A recursive query referencing the recursive table twice. By removing some checks it accepts the query and generates a plan. ### Query create table base(k1,k2,k3) as (values (1,null,null),(2,null,null),(3,1,2)); with recursive x(k1,k2,k3) as ( select k1,null::record,null ::record from base union all select b.k1,x1,x2 from base as b join x as x1 on x1.k1=b.k2 join x as x2 on x2.k1=b.k3 where b.k2 is not null or b.k3 is not null) select * from x; drop table base; ### Plan CTE Scan on x (cost=7745569.00..11990441.80 rows=212243640 width=68) CTE x -> Recursive Union (cost=0.00..7745569.00 rows=212243640 width=60) -> Seq Scan on base (cost=0.00..30.40 rows=2040 width=4) -> Merge Join (cost=31081.98..350066.58 rows=21224160 width=6 0) Merge Cond: (x2.k1 = b.k3) -> Sort (cost=1868.26..1919.26 rows=20400 width=32) Sort Key: x2.k1 -> WorkTable Scan on x x2 (cost=0.00..408.00 rows =20400 width=32) -> Materialize (cost=29213.72..30254.12 rows=208080 wid th=36) -> Sort (cost=29213.72..29733.92 rows=208080 widt h=36) Sort Key: b.k3 -> Merge Join (cost=2010.80..5142.20 rows=2 08080 width=36) Merge Cond: (b.k2 = x1.k1) -> Sort (cost=142.54..147.64 rows=204 0 width=12) Sort Key: b.k2 -> Seq Scan on base b (cost=0.0 0..30.40 rows=2040 width=12) Filter: ((k2 IS NOT NULL) O R (k3 IS NOT NULL)) -> Sort (cost=1868.26..1919.26 rows=2 0400 width=32) Sort Key: x1.k1 -> WorkTable Scan on x x1 (cost =0.00..408.00 rows=20400 width=32) But when the query is executed looks like the working table can't be read twice. Then the second join returns empty. Reading the executor of workingtablescan, i see some notes about a private read pointer and copying. It's my first time in the postgres codebase. Can someone point out if the working_table table can't really be read twice. And if there is similar code in other executor that allow it. -- Wesley S. Massuda
Re: [HACKERS] [PATCH v3] GSSAPI encryption support
Michael Paquier writes: > Robbie, > > +#ifdef ENABLE_GSS > + if (pggss_encrypt(conn) < 0) > + return EOF; > +#endif > > @@ -1528,10 +1541,20 @@ socket_putmessage(char msgtype, const char *s, > size_t len) > if (internal_putbytes(s, len)) > goto fail; > PqCommBusy = false; > +#ifdef ENABLE_GSS > + /* if we're GSSAPI encrypting, s was allocated in be_gss_encrypt */ > + if (msgtype == 'g') > + pfree((char *)s); > +#endif > > Looking at this patch in more details... Why is it necessary to wrap > all the encrypted messages into a dedicated message type 'g'? Cannot > we rely on the context on client and backend that should be set up > after authentication to perform the encryption and decryption > operations? This patch is enforcing the message type in > socket_putmessage for backend and pggss_encrypt/pqPutMsgEnd for > frontend, this just feels wrong and I think that the patch could be > really simplified, this includes the crafting added in fe-protocol3.c > that should be IMO reserved only for messages received from the > backend and should not be made aware of any protocol-level logic. Well, it's not strictly necessary in the general case, but it makes debugging a *lot* easier to have it present, and it simplifies some of the handling logic. For instance, in the part quoted above, with socket_putmessage() and socket_putmessage_noblock(), we need some way to tell whether a message blob has already been encrypted. Some enforcement of message type *will need to be carried out* anyway to avoid security issues with tampering on the wire, whether that be by sanity-checking the stated message type and then handling accordingly, or trying to decrypt and detonating the connection if it fails. GSSAPI does not define a wire protocol for the transport of messages the way, e.g., TLS does, so there must be some handling of incomplete messages, multiple messages at once, etc. There is already adequate handling of these present in postgres already, so I have structured the code to take advantage of it. That said, I could absolutely reimplement them - but it would not be simpler, I'm reasonably sure. signature.asc Description: PGP signature
Re: [HACKERS] Support for N synchronous standby servers - take 2
On Tue, Oct 20, 2015 at 8:10 PM, Beena Emerson wrote: > > On Mon, Oct 19, 2015 at 8:47 PM, Masahiko Sawada > wrote: >> >> >> Hi, >> >> Attached patch is a rough patch which supports multi sync replication >> by another approach I sent before. >> >> The new GUC parameters are: >> * synchronous_standby_num, which specifies the number of standby >> servers using sync rep. (default is 0) >> * synchronous_replication_method, which specifies replication method; >> priority or quorum. (default is priority) >> >> The behaviour of 'priority' and 'quorum' are same as what we've been >> discussing. >> But I write overview of these here again here. >> >> [Priority Method] >> The standby server has each different priority, and the active standby >> servers having the top N priroity are become sync standby server. >> If synchronous_standby_names = '*', the all active standby server >> would be sync standby server. >> If you want to set up standby like 9.5 or before, you can set >> synchronous_standby_num = 1. >> > > > I used the following setting with 2 servers A and D connected: > > synchronous_standby_names = 'A,B,C,D' > synchronous_standby_num = 2 > synchronous_replication_method = 'priority' > > Though s_r_m = 'quorum' worked fine, changing it to 'priority' caused > segmentation fault. > Thank you for taking a look! This patch is a tool for discussion, so I'm not going to fix this bug until getting consensus. We are still under the discussion to find solution that can get consensus. I felt that it's difficult to select from the two approaches within this development cycle, and there would not be time to implement such big feature even if we selected. But this feature is obviously needed by many users. So I'm considering more simple and extensible something solution, the idea I posted is one of them. The another worth considering approach is that just specifying the number of sync standby. It also can cover the main use cases in some-cases. Regards, -- Masahiko Sawada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_recvlogical fixes
Hi, While testing wal2json, I faced some problems with pg_recvlogical. Attached is a serie of patches that can improve pg_recvlogical. Patches #2 and #3 are bugfixes (and should be applied to 9.5 too). Patch #1 is not mandatory to 9.5. Short description: #1: add a bunch of checks to complain when using an option that is not available in the specified action; #2: there is a wrong check because startpos option can be specified with --create-slot; #3: doesn't ignore startpos in --create-slot because that action could be specified together with --start action (that uses that option); -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento >From 9b3d63d1744e2b65a7f1a1d44ed166f4c9684771 Mon Sep 17 00:00:00 2001 From: Euler Taveira Date: Tue, 1 Sep 2015 23:52:55 -0300 Subject: [PATCH 3/3] Fix a startpos override. Since --create-slot and --start can be specified together, we can't override startpos while creating a slot (it'll ignore the replication start position, if specified). --- src/bin/pg_basebackup/pg_recvlogical.c | 1 - 1 file changed, 1 deletion(-) diff --git a/src/bin/pg_basebackup/pg_recvlogical.c b/src/bin/pg_basebackup/pg_recvlogical.c index b59fe13..a2284d2 100644 --- a/src/bin/pg_basebackup/pg_recvlogical.c +++ b/src/bin/pg_basebackup/pg_recvlogical.c @@ -969,7 +969,6 @@ main(int argc, char **argv) if (!CreateReplicationSlot(conn, replication_slot, plugin, false, slot_exists_ok)) disconnect_and_exit(1); - startpos = InvalidXLogRecPtr; } if (!do_start_slot) -- 2.1.4 >From 8393aa62cddeaff8cc66b19c6ba36a371b090db1 Mon Sep 17 00:00:00 2001 From: Euler Taveira Date: Tue, 1 Sep 2015 23:46:33 -0300 Subject: [PATCH 2/3] Fix startpos parameter check. startpos parameter can be specified together with --create-slot (--start must be specified too). This check is wrong since --create-slot and --start could be specified together. --- src/bin/pg_basebackup/pg_recvlogical.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/bin/pg_basebackup/pg_recvlogical.c b/src/bin/pg_basebackup/pg_recvlogical.c index 9ce237f..b59fe13 100644 --- a/src/bin/pg_basebackup/pg_recvlogical.c +++ b/src/bin/pg_basebackup/pg_recvlogical.c @@ -848,9 +848,9 @@ main(int argc, char **argv) exit(1); } - if (startpos != InvalidXLogRecPtr && (do_create_slot || do_drop_slot)) + if (startpos != InvalidXLogRecPtr && !do_start_slot) { - fprintf(stderr, _("%s: cannot use --create-slot or --drop-slot together with --startpos\n"), progname); + fprintf(stderr, _("%s: can only use --startpos together with --start\n"), progname); fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname); exit(1); -- 2.1.4 >From da241d9e2529888297c0f68bd89706fb59ae5884 Mon Sep 17 00:00:00 2001 From: Euler Taveira Date: Tue, 1 Sep 2015 23:34:51 -0300 Subject: [PATCH 1/3] pg_recvlogical: Tighten checks for action parameters. Until now, the mistaken parameters are ignored (which is bad because that parameter won't take the desired effect). Complaining about the "wrong" parameter will help the user to fix the problem immediately. --- doc/src/sgml/ref/pg_recvlogical.sgml | 2 +- src/bin/pg_basebackup/pg_recvlogical.c | 82 +++--- 2 files changed, 77 insertions(+), 7 deletions(-) diff --git a/doc/src/sgml/ref/pg_recvlogical.sgml b/doc/src/sgml/ref/pg_recvlogical.sgml index 9d0b58b..a513047 100644 --- a/doc/src/sgml/ref/pg_recvlogical.sgml +++ b/doc/src/sgml/ref/pg_recvlogical.sgml @@ -149,7 +149,7 @@ PostgreSQL documentation In --start mode, start replication from the given LSN. For details on the effect of this, see the documentation in -and . Ignored in other modes. +and . diff --git a/src/bin/pg_basebackup/pg_recvlogical.c b/src/bin/pg_basebackup/pg_recvlogical.c index 93f61c3..9ce237f 100644 --- a/src/bin/pg_basebackup/pg_recvlogical.c +++ b/src/bin/pg_basebackup/pg_recvlogical.c @@ -30,12 +30,17 @@ /* Time to sleep between reconnection attempts */ #define RECONNECT_SLEEP_TIME 5 +#define DEFAULT_MESSAGE_TIMEOUT (10 * 1000) /* 10 secs = default */ +#define DEFAULT_FSYNC_INTERVAL (10 * 1000) /* 10 secs = defualt */ +#define DEFAULT_PLUGIN "test_decoding" + /* Global Options */ static char *outfile = NULL; +static char *plugin = NULL; static int verbose = 0; static int noloop = 0; -static int standby_message_timeout = 10 * 1000; /* 10 sec = default */ -static int fsync_interval = 10 * 1000; /* 10 sec = default */ +static int standby_message_timeout = -1; +static int fsync_interval = -1; static XLogRecPtr startpos = InvalidXLogRecPtr; static bool do_create_slot = false; static bool slot_exists_ok = false; @@ -45,7 +50,6 @@ static bool do_drop_slot = false; /* filled pairwise with option, value. value may be NULL */ static char **options; static size_t nopti
Re: [HACKERS] checkpointer continuous flushing
On 2015-10-21 07:49:23 +0200, Fabien COELHO wrote: > > Hello Andres, > > >>>In my performance testing it showed that calling PerformFileFlush() only > >>>at segment boundaries and in CheckpointWriteDelay() can lead to rather > >>>spikey IO - not that surprisingly. The sync in CheckpointWriteDelay() is > >>>problematic because it only is triggered while on schedule, and not when > >>>behind. > >> > >>When behind, the PerformFileFlush should be called on segment > >>boundaries. > > > >That means it's flushing up to a gigabyte of data at once. Far too > >much. > > Hmmm. I do not get it. There would not be gigabytes, I said 'up to a gigabyte' not gigabytes. But it actually can be more than one if you're unluckly. > there would be as much as was written since the last sleep, about 100 > ms ago, which is not likely to be gigabytes? In many cases we don't sleep all that frequently - after one 100ms sleep we're already behind a lot. And even so, it's pretty easy to get into checkpoint scenarios with ~500 mbyte/s as a writeout rate. Only issuing a sync_file_range() 10 times for that is obviously problematic. > >The implementation pretty always will go behind schedule for some > >time. Since sync_file_range() doesn't flush in the foreground I don't > >think it's important to do the flushing in concert with sleeping. > > For me it is important to avoid accumulating too large flushes, and that is > the point of the call before sleeping. I don't follow this argument. It's important to avoid large flushes, therefore we potentially allow large flushes to accumulate? > >>>My testing seems to show that just adding a limit of 32 buffers to > >>>FileAsynchronousFlush() leads to markedly better results. > >> > >>Hmmm. 32 buffers means 256 KB, which is quite small. > > > >Why? > > Because the point of sorting is to generate sequential writes so that the > HDD has a lot of aligned stuff to write without moving the head, and 32 is > rather small for that. A sync_file_range(SYNC_FILE_RANGE_WRITE) doesn't synchronously write data back. It just puts it into the write queue. You can have merging between IOs from either side. But more importantly you can't merge that many requests together anyway. > >The aim is to not overwhelm the request queue - which is where the > >coalescing is done. And usually that's rather small. > > That is an argument. How small, though? It seems to be 128 by default, so > I'd rather have 128? Also, it can be changed, so maybe it should really be a > guc? I couldn't see any benefits above (and below) 32 on a 20 drive system, so I doubt it's worthwhile. It's actually good for interactivity to allow other requests into the queue concurrently - otherwise other reads/writes will obviously have a higher latency... > >If you flush much more sync_file_range starts to do work in the > >foreground. > > Argh, too bad. I would have hoped that the would just deal with in an > asynchronous way, It's even in the man page: "Note that even this may block if you attempt to write more than request queue size." > this is not a "fsync" call, just a flush advise. sync_file_range isn't fadvise(). > Because it should be in shared buffers where pg needs it? Huh? I'm just suggesting p = mmap(fd, offset, bytes);msync(p, bytes);munmap(p); instead of sync_file_range(). > >>Hmmm. I'll check. I'm still unconvinced that using a tree for a 2-3 element > >>set in most case is an improvement. > > > >Yes, it'll not matter that much in many cases. But I rather disliked the > >NextBufferToWrite() implementation, especially that it walkes the array > >multiple times. And I did see setups with ~15 tablespaces. > > ISTM that it is rather an argument for taking the tablespace into the > sorting, not necessarily for a binary heap. I don't understand your problem with that. The heap specific code is small, smaller than your NextBufferToWrite() implementation? ts_heap = binaryheap_allocate(nb_spaces, ts_progress_cmp, NULL); spcContext = (FileFlushContext *) palloc(sizeof(FileFlushContext) * nb_spaces); for (i = 0; i < nb_spaces; i++) { TableSpaceCheckpointStatus *spc = &spcStatus[i]; spc->progress_slice = ((float8) num_to_write) / (float8) spc->num_to_write; ResetFileFlushContext(&spcContext[i]); spc->flushContext = &spcContext[i]; binaryheap_add_unordered(ts_heap, PointerGetDatum(&spcStatus[i])); } binaryheap_build(ts_heap); and then while (!binaryheap_empty(ts_heap)) { TableSpaceCheckpointStatus *ts = (TableSpaceCheckpointStatus *) DatumGetPointer(binaryheap_first(ts_heap)); ... ts->progress += ts->progress_slice; ts->num_written++; ... if (ts->num_written == ts->num_to_write) { ... binaryheap_remove_first(ts_heap); } else { /* update heap with the new progress */
Re: [HACKERS] Getting sorted data from foreign server
Here's patch with the regression fixed. On Wed, Oct 21, 2015 at 2:53 PM, Ashutosh Bapat < ashutosh.ba...@enterprisedb.com> wrote: > > > On Fri, Oct 16, 2015 at 11:33 PM, Robert Haas > wrote: > >> On Thu, Oct 15, 2015 at 6:28 AM, Ashutosh Bapat >> wrote: >> > Attached is the patch which takes care of above comments. >> >> I spent some time on this patch today. But it's still not right. >> >> I've attached a new version which fixes a serious problem with your >> last version - having postgresGetForeignPaths do the costing of the >> sorted path itself instead of delegating that to >> estimate_path_cost_size is wrong. In your version, 10% increment gets >> applied to the network transmission costs as well as the cost of >> generating the tupes - but only when use_remote_estimate == false. I >> fixed this and did some cosmetic cleanup. >> > > That's better. > > >> >> But you'll notice if you try this some of postgres_fdw's regression >> tests fail. This is rather mysterious: >> >> *** >> *** 697,715 >>Sort >> Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8 >> Sort Key: t1.c1 >> !-> Nested Loop Semi Join >>Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8 >> ! Join Filter: (t1.c3 = t2.c3) >>-> Foreign Scan on public.ft1 t1 >> Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, >> t1.c8 >> Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 >> FROM "S 1"."T 1" WHERE (("C 1" < 20)) >> ! -> Materialize >> Output: t2.c3 >> !-> Foreign Scan on public.ft2 t2 >>Output: t2.c3 >> ! Filter: (date(t2.c4) = '01-17-1970'::date) >> ! Remote SQL: SELECT c3, c4 FROM "S 1"."T 1" >> WHERE (("C 1" > 10)) >> ! (15 rows) >> >> EXECUTE st2(10, 20); >>c1 | c2 | c3 | c4 |c5 >> | c6 | c7 | c8 >> --- 697,718 >>Sort >> Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8 >> Sort Key: t1.c1 >> !-> Hash Join >>Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8 >> ! Hash Cond: (t1.c3 = t2.c3) >>-> Foreign Scan on public.ft1 t1 >> Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, >> t1.c8 >> Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 >> FROM "S 1"."T 1" WHERE (("C 1" < 20)) >> ! -> Hash >> Output: t2.c3 >> !-> HashAggregate >>Output: t2.c3 >> ! Group Key: t2.c3 >> ! -> Foreign Scan on public.ft2 t2 >> !Output: t2.c3 >> !Filter: (date(t2.c4) = '01-17-1970'::date) >> !Remote SQL: SELECT c3, c4 FROM "S 1"."T >> 1" WHERE (("C 1" > 10)) >> ! (18 rows) >> >> What I think is happening here is that the planner notices that >> instead of doing a parameterized nestloop, it could pull down the data >> already sorted from the remote side, cheaply unique-ify it by using >> the ordering provided by the remote side, and then do a standard hash >> join. That might well be a sensible approach, but the ORDER BY that >> would make it correct doesn't show up in the Remote SQL. I don't know >> why that's happening, but it's not good. >> >> > The unique-ifying is happening through HashAggregate, so we do not need > ORDER BY clause in RemoteSQL. So the plan is correct. > > Careful examination of paths created revealed that the change in plan is > result of our fuzzy path selection logic. Let me explain it using the cost > values I got on my machine. Please note that the costs are described as a > tuple (startup cost, total cost, number of rows, present of pathkeys). > > With your patch, base relation paths have following costs: > > ft1 path without pathkeys - (100, 123.9, 20, no) > ft1 path with pathkeys (100, 126.25, 20, yes) > ft2 path without pathkeys (100, 143.645, 4, no) > ft2 path without pathkeys with params (100.01, 125.365, 1, no) > > Notice the sorted path is only marginally costly (2.5%) compared to the > non-sorted path for ft1. During the path creation process several nested > loop paths are created, but except for two, they are too costly. The two > nested loop paths of interest have costs (200, 268.755, 10, no) and (200, > 271.105, 10, yes). The path with pathkeys kicks out the one without > pathkeys because the later's costs are "fuzzily" equal and pathkeys give > extra advantage. The "fuzzy" equality is effect of the marginally extra > sorting cost. The nested loop path with pathkeys remains in the path list. > Several hash join paths and merge join paths are created but are costlier > than one particular hash path which has costs (243.677, 267.6624, 10, no). > This hash path is not beaten by the nested loop path sin
Re: [HACKERS] Freeze avoidance of very large table.
On 2015-10-20 20:35:31 -0400, Simon Riggs wrote: > On 9 October 2015 at 15:20, Robert Haas wrote: > > > On Thu, Oct 8, 2015 at 1:52 PM, Andres Freund wrote: > > > I don't see the problem? I mean catversion will reliably tell you which > > format the vm is in? > > > > Totally agreed. > > > > This isn't an agreement competition, its a cool look at what might cause > problems for all of us. Uh, we form rough concensuses all the time. > If we want to avoid bugs in future then we'd better start acting like that > is actually true in practice. > Why should we wave away this concern? Will we wave away a concern next time > you personally raise one? Bruce would have me believe that we added months > onto 9.5 to improve robustness. So lets actually do that. Starting at the > first opportunity. Meh. Adding complexity definitely needs to be weighed against the benefits. As pointed out e.g. by all the multixact issues you mentioned upthread. In this case your argument for changing the name doesn't seem to hold much water. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] a raft of parallelism-related bug fixes
On Wednesday, 21 October 2015, Amit Kapila wrote: > On Tue, Oct 20, 2015 at 8:16 PM, Robert Haas > wrote: > > > > On Sat, Oct 17, 2015 at 6:17 PM, Robert Haas > wrote: > > > It's good to have your perspective on how this can be improved, and > > > I'm definitely willing to write more documentation. Any lack in that > > > area is probably due to being too close to the subject area, having > > > spent several years on parallelism in general, and 200+ emails on > > > parallel sequential scan in particular. Your point about the lack of > > > a good header file comment for execParallel.c is a good one, and I'll > > > rectify that next week. > > > > Here is a patch to add a hopefully-useful file header comment to > > execParallel.c. I included one for nodeGather.c as well, which seems > > to be contrary to previous practice, but actually it seems like > > previous practice is not the greatest: surely it's not self-evident > > what all of the executor nodes do. > > > > + * any ParamListInfo associated witih the query, buffer usage info, and > + * the actual plan to be passed down to the worker. > > typo 'witih'. > > + * return the results. Therefore, a plan used with a single-copy Gather > + * node not be parallel-aware. > > "node not" seems to be incomplete. > ... node *need* not be parallel aware? Thanks, Amit
Re: [HACKERS] bugs and bug tracking
On Tue, Oct 20, 2015 at 10:39:55AM -0700, Joshua D. Drake wrote: > So where are we at on this? Well, I can't speak to where we are, but my system is up, running, and seems to work well, It even attracts a few visitors. I have been meaning to write a triage interface, but I have been stuck doing postgis work for an anthropology paper I am working on for the last week. I hope to be able to take a break from that tomorrow and through the weekend. I have read the various comments, and will also do some simple scanning for phrases in emails and commit messages indicating status changes which I will reflect on the main web page. Once I write that code, I will email the list with what I have done. If people want to use it, or criticize it, they would be welcome to do so. If not, well, the interface is useful for me in any event, so I will probably maintain it for the forseeable future. https://granicus.if.org/pgbugs/ for anyone who hasn't and wants to take a look. -- nw -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] a raft of parallelism-related bug fixes
On Tue, Oct 20, 2015 at 8:16 PM, Robert Haas wrote: > > On Sat, Oct 17, 2015 at 6:17 PM, Robert Haas wrote: > > It's good to have your perspective on how this can be improved, and > > I'm definitely willing to write more documentation. Any lack in that > > area is probably due to being too close to the subject area, having > > spent several years on parallelism in general, and 200+ emails on > > parallel sequential scan in particular. Your point about the lack of > > a good header file comment for execParallel.c is a good one, and I'll > > rectify that next week. > > Here is a patch to add a hopefully-useful file header comment to > execParallel.c. I included one for nodeGather.c as well, which seems > to be contrary to previous practice, but actually it seems like > previous practice is not the greatest: surely it's not self-evident > what all of the executor nodes do. > + * any ParamListInfo associated witih the query, buffer usage info, and + * the actual plan to be passed down to the worker. typo 'witih'. + * return the results. Therefore, a plan used with a single-copy Gather + * node not be parallel-aware. "node not" seems to be incomplete. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] [Proposal] Table partition + join pushdown
Hello, KaiGai-san and Horiguchi-san. I created v2 patch. Please find attached. I believe this patch will fix the most of issues mentioned by Horiguchi-san except naming. In this v2 patch, scan node which is originally inner relation of Join node must be SeqScan (or SampleScan). This limitation is due to implementation of try_join_pushdown(), which copies Path nodes to attach new filtering conditions converted from CHECK() constraints. It uses copyObject() for this purpose, so I must implement copy functions for scan Path nodes like IndexPath, BitmapHeapPath, TidPath and so on. By the way, let me introduce the performance of this feature. Here are the results I tested in my environment. These results were got by "pushdown_test.v1.large.sql" running on the environment that "work_mem" set to "1536kB". (This file is also attached in this mail.) [Normal] QUERY PLAN - Hash Join (cost=1851.02..14638.11 rows=34 width=20) (actual time=88.188..453.926 rows=22 loops=1) Hash Cond: (check_test_div.id = inner_t.id) -> Append (cost=0.00..4911.03 rows=34 width=20) (actual time=0.089..133.456 rows=33 loops=1) -> Seq Scan on check_test_div (cost=0.00..0.00 rows=1 width=20) (actual time=0.003..0.003 rows=0 loops=1) -> Seq Scan on check_test_div_0 (cost=0.00..1637.01 rows=11 width=20) (actual time=0.085..40.741 rows=11 loops=1) -> Seq Scan on check_test_div_1 (cost=0.00..1637.01 rows=11 width=20) (actual time=0.023..29.213 rows=11 loops=1) -> Seq Scan on check_test_div_2 (cost=0.00..1637.01 rows=11 width=20) (actual time=0.021..28.592 rows=11 loops=1) -> Hash (cost=866.01..866.01 rows=60001 width=8) (actual time=87.970..87.970 rows=60001 loops=1) Buckets: 32768 Batches: 2 Memory Usage: 1446kB -> Seq Scan on inner_t (cost=0.00..866.01 rows=60001 width=8) (actual time=0.030..39.133 rows=60001 loops=1) Planning time: 0.867 ms Execution time: 470.269 ms (12 rows) [With this feature] QUERY PLAN - Append (cost=0.01..10651.37 rows=34 width=20) (actual time=55.548..377.615 rows=22 loops=1) -> Hash Join (cost=0.01..1091.04 rows=1 width=20) (actual time=0.017..0.017 rows=0 loops=1) Hash Cond: (inner_t.id = check_test_div.id) -> Seq Scan on inner_t (cost=0.00..866.01 rows=60001 width=8) (never executed) -> Hash (cost=0.00..0.00 rows=1 width=20) (actual time=0.003..0.003 rows=0 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 8kB -> Seq Scan on check_test_div (cost=0.00..0.00 rows=1 width=20) (actual time=0.002..0.002 rows=0 loops=1) -> Hash Join (cost=1169.76..3186.78 rows=11 width=20) (actual time=55.530..149.205 rows=11 loops=1) Hash Cond: (check_test_div_0.id = inner_t.id) -> Seq Scan on check_test_div_0 (cost=0.00..1637.01 rows=11 width=20) (actual time=0.058..34.268 rows=11 loops=1) -> Hash (cost=1166.01..1166.01 rows=300 width=8) (actual time=55.453..55.453 rows=20001 loops=1) Buckets: 32768 (originally 1024) Batches: 1 (originally 1) Memory Usage: 1038kB -> Seq Scan on inner_t (cost=0.00..1166.01 rows=300 width=8) (actual time=0.031..43.590 rows=20001 loops=1) Filter: ((id % 3) = 0) Rows Removed by Filter: 4 -> Hash Join (cost=1169.76..3186.78 rows=11 width=20) (actual time=27.942..97.582 rows=6 loops=1) Hash Cond: (check_test_div_1.id = inner_t.id) -> Seq Scan on check_test_div_1 (cost=0.00..1637.01 rows=11 width=20) (actual time=0.030..25.514 rows=11 loops=1) -> Hash (cost=1166.01..1166.01 rows=300 width=8) (actual time=27.890..27.890 rows=2 loops=1) Buckets: 32768 (originally 1024) Batches: 1 (originally 1) Memory Usage: 1038kB -> Seq Scan on inner_t (cost=0.00..1166.01 rows=300 width=8) (actual time=0.014..21.688 rows=2 loops=1) Filter: ((id % 3) = 1) Rows Removed by Filter: 40001 -> Hash Join (cost=1169.76..3186.78 rows=11 width=20) (actual time=27.651..97.755 rows=5 loops=1) Hash Cond: (check_test_div_2.id = inner_t.id) -> Seq Scan on check_test_div_2 (cost=0.00..1637.01 rows=11 width=20) (actual time=0.026..25.620 rows=11 loops=1) -> Hash (cost=1166.01..1166.01 rows=300 width=8) (actual time=27.599..27.599 rows=2 loops=1) Buckets: 32768 (originally 1024) Batches: 1 (originally 1)
Re: [HACKERS] [PROPOSAL] Improvements of Hunspell dictionaries support
21.10.2015 01:37, Jim Nasby пишет: On 10/20/15 9:00 AM, Artur Zakirov wrote: Internal representation of the dictionary in the PostgreSQL doesn't impose too strict limits on the number of affix rules. There are a flagval array, which size must be increased from 256 to 65000. Is that per dictionary entry, fixed at 64k? That seems pretty excessive, if that's the case... This is per dictionary only. flagval array is used for the all dictionary. And it is not used for every dictionary word. There are also flag field of AFFIX structure, wich size must be increased from 8 bit to 16 bit. This structure is used for every affix in affix file. -- Artur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Foreign join pushdown vs EvalPlanQual
On 2015/10/21 13:34, Kouhei Kaigai wrote: On 2015/10/20 13:11, Etsuro Fujita wrote: On 2015/10/20 5:34, Robert Haas wrote: No. You just need to populate fdw_recheck_quals correctly, same as for the scan case. As I said yesterday, that opinion of me is completely wrong. Sorry for the incorrectness. Let me explain a little bit more. I still think that even if ROW_MARK_COPY is in use, we would need to locally rejoin the tuples populated from the whole-row images for the foreign tables involved in a remote join, using a secondary plan. Consider eg, SELECT localtab.*, ft2 from localtab, ft1, ft2 WHERE ft1.x = ft2.x AND ft1.y = localtab.y FOR UPDATE In this case, since the output of the foreign join would not include any ft1 columns, I don't think we could do the same thing as for the scan case, even if populating fdw_recheck_quals correctly. As an aside, could you introduce the reason why you think so? It is significant point in discussion, if we want to reach the consensus. On the other hands, the joined-tuple we're talking about in this context is a tuple prior to projection; formed according to the fdw_scan_tlist. So, it contains all the necessary information to run scan/join qualifiers towards the joined-tuple. It is not affected by the target-list of user query. After research into the planner, I noticed that I was still wrong; IIUC, the planner requires that the output of foreign join include the column ft1.y even for that case. (I don't understand the reason why the planner requires that.) So, as Robert mentioned, the clause ft1.y = localtab.y could be rechecked during an EPQ recheck, if populating fdw_recheck_quals correctly. Sorry again for the incorrectness. Even though I think the approach with joined-tuple reconstruction is reasonable solution here, it is not a fair reason to introduce disadvantage of Robert's suggestion. Agreed. Also, please don't mix up "what we do" and "how we do". It is "what we do" to discuss which format of tuples shall be returned to the core backend from the extension, because it determines the role of interface. If our consensus is to return a joined-tuple, we need to design the interface according to the consensus. On the other hands, it is "how we do" discussion whether we should enforce all the FDW/CSP extension to have alternative plan, or not. Once we got a consensus in "what we do" discussion, there are variable options to solve the requirement by the consensus, however, we cannot prioritize "how we do" without "what we do". Agreed. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Getting sorted data from foreign server
On Fri, Oct 16, 2015 at 11:33 PM, Robert Haas wrote: > On Thu, Oct 15, 2015 at 6:28 AM, Ashutosh Bapat > wrote: > > Attached is the patch which takes care of above comments. > > I spent some time on this patch today. But it's still not right. > > I've attached a new version which fixes a serious problem with your > last version - having postgresGetForeignPaths do the costing of the > sorted path itself instead of delegating that to > estimate_path_cost_size is wrong. In your version, 10% increment gets > applied to the network transmission costs as well as the cost of > generating the tupes - but only when use_remote_estimate == false. I > fixed this and did some cosmetic cleanup. > That's better. > > But you'll notice if you try this some of postgres_fdw's regression > tests fail. This is rather mysterious: > > *** > *** 697,715 >Sort > Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8 > Sort Key: t1.c1 > !-> Nested Loop Semi Join >Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8 > ! Join Filter: (t1.c3 = t2.c3) >-> Foreign Scan on public.ft1 t1 > Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, > t1.c8 > Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 > FROM "S 1"."T 1" WHERE (("C 1" < 20)) > ! -> Materialize > Output: t2.c3 > !-> Foreign Scan on public.ft2 t2 >Output: t2.c3 > ! Filter: (date(t2.c4) = '01-17-1970'::date) > ! Remote SQL: SELECT c3, c4 FROM "S 1"."T 1" > WHERE (("C 1" > 10)) > ! (15 rows) > > EXECUTE st2(10, 20); >c1 | c2 | c3 | c4 |c5 > | c6 | c7 | c8 > --- 697,718 >Sort > Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8 > Sort Key: t1.c1 > !-> Hash Join >Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8 > ! Hash Cond: (t1.c3 = t2.c3) >-> Foreign Scan on public.ft1 t1 > Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, > t1.c8 > Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 > FROM "S 1"."T 1" WHERE (("C 1" < 20)) > ! -> Hash > Output: t2.c3 > !-> HashAggregate >Output: t2.c3 > ! Group Key: t2.c3 > ! -> Foreign Scan on public.ft2 t2 > !Output: t2.c3 > !Filter: (date(t2.c4) = '01-17-1970'::date) > !Remote SQL: SELECT c3, c4 FROM "S 1"."T > 1" WHERE (("C 1" > 10)) > ! (18 rows) > > What I think is happening here is that the planner notices that > instead of doing a parameterized nestloop, it could pull down the data > already sorted from the remote side, cheaply unique-ify it by using > the ordering provided by the remote side, and then do a standard hash > join. That might well be a sensible approach, but the ORDER BY that > would make it correct doesn't show up in the Remote SQL. I don't know > why that's happening, but it's not good. > > The unique-ifying is happening through HashAggregate, so we do not need ORDER BY clause in RemoteSQL. So the plan is correct. Careful examination of paths created revealed that the change in plan is result of our fuzzy path selection logic. Let me explain it using the cost values I got on my machine. Please note that the costs are described as a tuple (startup cost, total cost, number of rows, present of pathkeys). With your patch, base relation paths have following costs: ft1 path without pathkeys - (100, 123.9, 20, no) ft1 path with pathkeys (100, 126.25, 20, yes) ft2 path without pathkeys (100, 143.645, 4, no) ft2 path without pathkeys with params (100.01, 125.365, 1, no) Notice the sorted path is only marginally costly (2.5%) compared to the non-sorted path for ft1. During the path creation process several nested loop paths are created, but except for two, they are too costly. The two nested loop paths of interest have costs (200, 268.755, 10, no) and (200, 271.105, 10, yes). The path with pathkeys kicks out the one without pathkeys because the later's costs are "fuzzily" equal and pathkeys give extra advantage. The "fuzzy" equality is effect of the marginally extra sorting cost. The nested loop path with pathkeys remains in the path list. Several hash join paths and merge join paths are created but are costlier than one particular hash path which has costs (243.677, 267.6624, 10, no). This hash path is not beaten by the nested loop path since because of lower total cost (which is beyond the fuzzy margins) and gets ultimately chosen by planner to perform ft1 join ft2. Interestingly, if the nested loop path with pathkeys had not kicked out that without pathkeys, the nested loop path would have emerged as w
[HACKERS] Patch (2): Implement failover on libpq connect level.
On 2015.10.14 at 13:41:51 +0300, Victor Wagner wrote: > > Attached patch which implements client library failover and > loadbalancing as was described in the proposal > <20150818041850.ga5...@wagner.pp.ru>. > I'm sending imporoved verison of patch. As Olexander Shulgin noted, previous version of patch lacks support for service files. Now support for service files is implemented and multiple host statements in the service file are allowed. -- Victor Wagner diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index 0ee018e..ec12fce 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -792,7 +792,7 @@ host=localhost port=5432 dbname=mydb connect_timeout=10 The general form for a connection URI is: -postgresql://[user[:password]@][netloc][:port][/dbname][?param1=value1&...] +postgresql://[user[:password]@][netloc][:port][,netloc[:port]...][/dbname][?param1=value1&...] @@ -809,6 +809,7 @@ postgresql://localhost/mydb postgresql://user@localhost postgresql://user:secret@localhost postgresql://other@localhost/otherdb?connect_timeout=10&application_name=myapp +postgresql://node1,node2:5433,node3:4432,node4/mydb?hostorder=random&readonly=1 Components of the hierarchical part of the URI can also be given as parameters. For example: @@ -831,7 +832,9 @@ postgresql:///mydb?host=localhost&port=5433 For improved compatibility with JDBC connection URIs, instances of parameter ssl=true are translated into -sslmode=require. +sslmode=require and +loadBalanceHosts=true into +hostorder=random. @@ -841,6 +844,10 @@ postgresql:///mydb?host=localhost&port=5433 postgresql://[2001:db8::1234]/database + + There can be serveral host specifications, optionally accompanied + with port, separated by comma. + The host component is interpreted as described for the parameter PostgreSQL was built). On machines without Unix-domain sockets, the default is to connect to localhost. + + There can be more than one host parameter in + the connect string. In this case these hosts would be considered + alternate entries into same database and if connect to first one + fails, library would try to connect second etc. This can be used + for high availability cluster or for load balancing. See +parameter. + + + Network host name can be accompanied with port number, separated by + colon. If so, this port number is used only when connected to + this host. If there is no port number, port specified in the +parameter would be used. + @@ -942,8 +963,44 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname - + + + hostorder + + + Specifies how to choose host from list of alternate hosts, + specified in the parameter. + + + If value of this argument is sequential (the + default) library connects to the hosts in order they specified, + and tries to connect second one only if connection to the first + fails. + + + If value is random host to connect is randomly + picked from the list. It allows to balance load between several + cluster nodes. However, currently PostgreSQL doesn't support + multimaster clusters. So, without use of third-party products, + only read-only connections can take advantage from the + load-balancing. See + + + + + readonly + + + If this parameter is 0 (the default), upon successful connection + library checks if host is in recovery state, and if it is so, + tries next host in the connect string. If this parameter is + non-zero, connection to warm standby nodes are considered + successful. + + + + port @@ -985,7 +1042,6 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname - connect_timeout @@ -996,7 +1052,27 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname - + + falover_timeout + + + Maximum time to cycilically retry all the hosts in commect string. + (as decimal integer number of seconds). If not specified, then + hosts are tried just once. + + + If we have replicating cluster, and master node fails, it might + take some time to promote one of standby nodes to the new master. + So clients which notice that connect to the master fails, can + already give up attempt to reestablish a connection when new master + became available. + + + Setting this parameter to reasonable time makes library try to + reconnect all the host in cyclically until new master appears. + + + client_encoding @@ -7161,6 +7237,18 @@ user=admin
Re: [HACKERS] [PATCH v3] GSSAPI encryption support
Robbie, On Wed, Oct 21, 2015 at 3:54 PM, Michael Paquier wrote: > On Tue, Oct 20, 2015 at 3:01 AM, Robbie Harwood wrote: >> Stephen Frost writes: >>> psql: lost synchronization with server: got message type "S", length 22 >> >> which unfortunately could be a great many things. I've said this a >> couple times now, but I really do need more information - a traffic >> dump, a list of commands that were run, etc.; unfortunately, the surface >> here is pretty large, and while I totally am willing to believe there >> are bugs in the code I've written, I do not yet see them. > > --- a/src/interfaces/libpq/fe-protocol3.c > +++ b/src/interfaces/libpq/fe-protocol3.c > @@ -129,6 +129,58 @@ pqParseInput3(PGconn *conn) > return; > } > > +#ifdef ENABLE_GSS > + /* We want to be ready in both IDLE and BUSY states > for encryption */ > + if (id == 'g' && !conn->gss_disable_enc && conn->gctx) > + { > + ssize_t encEnd, next; > [...] > + } > + else if (!conn->gss_disable_enc && conn->gss_auth_done && > +!conn->gss_decrypted_cur && id != 'E') > + /* This could be a sync error, so let's handle > it as such. */ > + handleSyncLoss(conn, id, msgLength); > +#endif > > Hm. The out-of-sync error I am seeing in my environment is caused by > this block when parsing 'g' messages coming from the backend that are > considered as being GSSAPI-encrypted messages. I am still looking at > that... @@ -604,6 +604,11 @@ pqPutMsgEnd(PGconn *conn) memcpy(conn->outBuffer + conn->outMsgStart, &msgLen, 4); } +#ifdef ENABLE_GSS + if (pggss_encrypt(conn) < 0) + return EOF; +#endif @@ -1528,10 +1541,20 @@ socket_putmessage(char msgtype, const char *s, size_t len) if (internal_putbytes(s, len)) goto fail; PqCommBusy = false; +#ifdef ENABLE_GSS + /* if we're GSSAPI encrypting, s was allocated in be_gss_encrypt */ + if (msgtype == 'g') + pfree((char *)s); +#endif Looking at this patch in more details... Why is it necessary to wrap all the encrypted messages into a dedicated message type 'g'? Cannot we rely on the context on client and backend that should be set up after authentication to perform the encryption and decryption operations? This patch is enforcing the message type in socket_putmessage for backend and pggss_encrypt/pqPutMsgEnd for frontend, this just feels wrong and I think that the patch could be really simplified, this includes the crafting added in fe-protocol3.c that should be IMO reserved only for messages received from the backend and should not be made aware of any protocol-level logic. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers