Re: [HACKERS] Quorum commit for multiple synchronous replication.
On 06/04/17 03:51, Noah Misch wrote: > On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote: >> On Wed, Apr 5, 2017 at 3:45 PM, Noah Misch wrote: >>> On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote: Regarding this feature, there are some loose ends. We should work on and complete them until the release. (1) Which synchronous replication method, priority or quorum, should be chosen when neither FIRST nor ANY is specified in s_s_names? Right now, a priority-based sync replication is chosen for keeping backward compatibility. However some hackers argued to change this decision so that a quorum commit is chosen because they think that most users prefer to a quorum. (2) There will be still many source comments and documentations that we need to update, for example, in high-availability.sgml. We need to check and update them throughly. (3) The priority value is assigned to each standby listed in s_s_names even in quorum commit though those priority values are not used at all. Users can see those priority values in pg_stat_replication. Isn't this confusing? If yes, it might be better to always assign 1 as the priority, for example. >>> >>> [Action required within three days. This is a generic notification.] >>> >>> The above-described topic is currently a PostgreSQL 10 open item. Fujii, >>> since you committed the patch believed to have created it, you own this open >>> item. If some other commit is more relevant or if this does not belong as a >>> v10 open item, please let us know. Otherwise, please observe the policy on >>> open item ownership[1] and send a status update within three calendar days >>> of >>> this message. Include a date for your subsequent status update. Testers >>> may >>> discover new open items at any time, and I want to plan to get them all >>> fixed >>> well in advance of shipping v10. Consequently, I will appreciate your >>> efforts >>> toward speedy resolution. Thanks. >>> >>> [1] >>> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com >> >> Thanks for the notice! >> >> Regarding the item (2), Sawada-san told me that he will work on it after >> this CommitFest finishes. So we would receive the patch for the item from >> him next week. If there will be no patch even after the end of next week >> (i.e., April 14th), I will. Let's wait for Sawada-san's action at first. > > Sounds reasonable; I will look for your update on 14Apr or earlier. > >> The items (1) and (3) are not bugs. So I don't think that they need to be >> resolved before the beta release. After the feature freeze, many users >> will try and play with many new features including quorum-based syncrep. >> Then if many of them complain about (1) and (3), we can change the code >> at that timing. So we need more time that users can try the feature. > > I've moved (1) to a new section for things to revisit during beta. If someone > feels strongly that the current behavior is Wrong and must change, speak up as > soon as you reach that conclusion. Absent such arguments, the behavior won't > change. > I was one of the people who said in original thread that I think the default behavior should change to quorum and I am still of that opinion. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Interval for launching the table sync worker
I was thinking the same. At Thu, 6 Apr 2017 11:33:22 +0900, Masahiko Sawada wrote in > Hi all, > > While testing table sync worker for logical replication I noticed that > if the table sync worker of logical replication failed to insert the > data for whatever reason, the table sync worker process exits with > error. And then the main apply worker launches the table sync worker > again soon without interval. This routine is executed at very high > frequency without interval. > > Should we do put a interval (wal_retrieve_interval or make a new GUC > parameter?) for launching the table sync worker? After introducing encoding conversion, untranslatable characters in a published table causes this situation. Interval can reduce the frequence of reconnecting, but I think that walreciever should refrain from reconnecting on unrecoverable(or repeating) error in walsender. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki
2017-04-06 0:55 GMT+02:00 Andres Freund : > Hi, > > On 2017-04-02 22:28:40 +0200, Jan Michálek wrote: > > 2017-03-23 17:26 GMT+01:00 Pierre Ducroquet : > > > The new status of this patch is: Waiting on Author > > > > > > > Corrected problem with \pset linestyle when format is set to markdown, or > > rst. > > > > Corrected tuples only for markdown and rst (normal and expanded) > > This patch still is undergoing development and review, and the code > freeze is closing in rapidly. As this patch has been submitted just > before the last commitfest in the v10 cycle and has received plenty > feedback, I think it's fair to move this to the next commitfest. > > Regards, > > Andres > There is only few last things to complete (align by |:- in markdown and pset title, and few words about pabdoc pipe tables in documentation). I hope, I`m able to do this on sunday, or monday. Jan -- Jelen Starší čeledín datovýho chlíva
Re: [HACKERS] CREATE/ALTER ROLE PASSWORD ('value' USING 'method')
On 04/06/2017 08:33 AM, Noah Misch wrote: On Fri, Mar 17, 2017 at 10:10:59AM +0900, Michael Paquier wrote: On Fri, Mar 17, 2017 at 2:30 AM, Jeff Janes wrote: On Thu, Mar 9, 2017 at 4:59 AM, Michael Paquier wrote: On Thu, Mar 9, 2017 at 1:17 AM, Joe Conway wrote: On 03/07/2017 08:29 PM, Tom Lane wrote: Michael Paquier writes: here is a separate thread dedicated to the following extension for CREATE/ALTER ROLE: PASSWORD ('value' USING 'method'). The parentheses seem weird ... do we really need those? +1 Seeing 3 opinions in favor of that, let's do so then. I have updated the patch to not use parenthesis. The regression tests only exercise the CREATE ROLE...USING version, not the ALTER ROLE...USING version. Done. +and plain for an non-hashed password. If the password +string is already in MD5-hashed or SCRAM-hashed, then it is +stored hashed as-is. In the last line, I think "stored as-is" sounds better. Okay. Other than that, it looks good to me. Thanks for the review. Attached is an updated patch. [Action required within three days. This is a generic notification.] The above-described topic is currently a PostgreSQL 10 open item. This isn't critical, SCRAM is fully functional without this new syntax. I think we should drop this, and revisit for Postgres 11, if it feels like we still want this then. I'll remove this from the open items list. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan
2017-04-06 8:08 GMT+02:00 Pavel Stehule : > > > 2017-04-05 23:22 GMT+02:00 Tom Lane : > >> Andres Freund writes: >> > I'd like some input from other committers whether we want this. I'm >> > somewhat doubtful, but don't have particularly strong feelings. >> >> I don't really want to expose the workings of the plancache at user level. >> The heuristics it uses certainly need work, but it'll get hard to change >> those once there are SQL features depending on it. >> > > I am very sceptical about enhancing heuristics - but I am open to any > proposals. > > The advanced users disable a plan cache with dynamic SQL. But this > workaround has strong disadvantages: > > 1. it is vulnerable to SQL injection > 2. it is less readable > > > >> >> Also, as you note, there are debatable design decisions in this particular >> patch. There are already a couple of ways in which control knobs can be >> attached to plgsql functions (i.e. custom GUCs and the comp_option stuff), >> so why is this patch wanting to invent yet another fundamental mechanism? >> And I'm not very happy about it imposing a new reserved keyword, either. >> > > 1. > > custom GUC has not local scope - so it doesn't allow precious settings. > With PRAGMA I have perfect control what will be impacted. > > #option has function scope > > 2. I'll not introduce a PRAGMA keyword just for this feature. We would to > implement autonomous transactions. There was not any objection against this > feature. The PRAGMA allows to share PL/SQL syntax and functionality. > > >> A bigger-picture question is why we'd only provide such functionality >> in plpgsql, and not for other uses of prepared plans. >> > > It is out of scope of this patch. > The scope of this patch can be enhanced - but it is different task because user interface should be different. > > >> >> Lastly, it doesn't look to me like the test cases prove anything at all >> about whether the feature does what it's claimed to. >> > > I can enhance regress tests - currently there are not direct access to > these attributes - so the tests can be indirect only :( > > Regards > > Pavel > > >> >> regards, tom lane >> > >
Re: [HACKERS] Outdated comments around HandleFunctionRequest
On 04/05/2017 10:58 AM, Heikki Linnakangas wrote: On 04/05/2017 04:05 AM, Andres Freund wrote: PostgresMain() has the following blurb for fastpath functions: /* * Note: we may at this point be inside an aborted * transaction. We can't throw error for that until we've * finished reading the function-call message, so * HandleFunctionRequest() must check for it after doing so. * Be careful not to do anything that assumes we're inside a * valid transaction here. */ and in HandleFunctionRequest() there's: * INPUT: * In protocol version 3, postgres.c has already read the message body * and will pass it in msgBuf. * In old protocol, the passed msgBuf is empty and we must read the * message here. which is not true anymore. Followed by: /* * Now that we've eaten the input message, check to see if we actually * want to do the function call or not. It's now safe to ereport(); we * won't lose sync with the frontend. */ which is also not really meaningful, because there's no previous code in the function. You're right, I missed those comments in commit 2b3a8b20c2. In fact, HandleFunctionRequest() now always return 0, so we can also remove the dead code to check the return value from the caller. Barring objections, I'll commit the attached to do that and fix the comments. Committed, thanks! - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan
> > That echoes my perception - so let's move this to the next CF? It's not > like this patch has been pending for very long. > sure Regards Pavel > > - Andres >
Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan
2017-04-05 23:22 GMT+02:00 Tom Lane : > Andres Freund writes: > > I'd like some input from other committers whether we want this. I'm > > somewhat doubtful, but don't have particularly strong feelings. > > I don't really want to expose the workings of the plancache at user level. > The heuristics it uses certainly need work, but it'll get hard to change > those once there are SQL features depending on it. > I am very sceptical about enhancing heuristics - but I am open to any proposals. The advanced users disable a plan cache with dynamic SQL. But this workaround has strong disadvantages: 1. it is vulnerable to SQL injection 2. it is less readable > > Also, as you note, there are debatable design decisions in this particular > patch. There are already a couple of ways in which control knobs can be > attached to plgsql functions (i.e. custom GUCs and the comp_option stuff), > so why is this patch wanting to invent yet another fundamental mechanism? > And I'm not very happy about it imposing a new reserved keyword, either. > 1. custom GUC has not local scope - so it doesn't allow precious settings. With PRAGMA I have perfect control what will be impacted. #option has function scope 2. I'll not introduce a PRAGMA keyword just for this feature. We would to implement autonomous transactions. There was not any objection against this feature. The PRAGMA allows to share PL/SQL syntax and functionality. > A bigger-picture question is why we'd only provide such functionality > in plpgsql, and not for other uses of prepared plans. > It is out of scope of this patch. > > Lastly, it doesn't look to me like the test cases prove anything at all > about whether the feature does what it's claimed to. > I can enhance regress tests - currently there are not direct access to these attributes - so the tests can be indirect only :( Regards Pavel > > regards, tom lane >
Re: [HACKERS] [COMMITTERS] pgsql: Collect and use multi-column dependency stats
At Thu, 6 Apr 2017 13:10:48 +1200, David Rowley wrote in > On 6 April 2017 at 13:05, David Rowley wrote: > > I tested with the attached, and it does not seem to hurt planner > > performance executing: > > Here's it again, this time with a comment on the > find_relation_from_clauses() function. It seems to work as the same as the previous version with additional cost to scan over restrict clauses. But separate loop over clauses is additional overhead in any cases even irrelavant to functional dependency. The more columns are in the relation, the longer time bms_get_singleton_member takes. Although I'm not sure how much it hurts performance and I can't think of a good alternative right now, I think that the overhead should be avoided anyhow. At Thu, 6 Apr 2017 13:05:24 +1200, David Rowley wrote in > > And you measured the overhead of doing it the other way to be ... ? > > Premature optimization and all that. > > I tested with the attached, and it does not seem to hurt planner > performance executing: Here, bms_singleton_member takes longer time if the relation has many columns and there's a functional dependency covering the columns at the very tail. Maybe only two are not practical for testing. Even if it doesn't impact performance detectably, if only one attribute is needed, an AttrNumber member in context will be sufficient. No bitmap operation seems required in dependency_compatible_walker and it can bail out by the second attribute. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan
2017-04-05 22:33 GMT+02:00 Andres Freund : > Hi, > > > I'd like some input from other committers whether we want this. I'm > somewhat doubtful, but don't have particularly strong feelings. > > > > + > > + > > + Block level PRAGMA > > + > > + > > +PRAGMA > > +in PL/pgSQL > > + > > + > > + > > +The block level PRAGMA allows to change the > > +PL/pgSQL compiler behavior. Currently > > +only PRAGMA PLAN_CACHE is supported. > > Why are we doing this on a block level? > There are few reasons: 1. it is practical for some cases to mix more plan strategies in one function a) FOR IN simple_select LOOP ENFORCE ONE SHOT PLANS BEGIN .. queries .. END; END LOOP; b) ENFORCE ONE SHOT PLANS BEGIN FOR IN complex query requires one shot plan LOOP RETURNS TO DEFAULT PLAN CACHE BEGIN .. queries .. END; END LOOP; 2. This behave is defined in Ada language, and in PL/SQL too. If we will have autonomous transactions, then we can have a equal functionality a) run complete function under autonomous transaction b) run some parts of function (some blocks) under autonomous transaction It is not necessary, but it can avoid to generate auxiliary functions. > > > > + > > +CREATE FUNCTION enforce_fresh_plan(_id text) RETURNS boolean AS $$ > > +DECLARE > > + PRAGMA PLAN_CACHE(force_custom_plan); > > +BEGIN > > + -- in this block every embedded query uses one shot plan > > *plans > > > > + > > + PRAGMA PLAN_CACHE > > + > > + > > + The plan cache behavior can be controlled using > > + PRAGMA PLAN_CACHE. This PRAGMA can be > used both > > + for whole function or in individual blocks. The following options > are > > *functions > > > > + possible: DEFAULT - default > > + PL/pgSQL implementation - the system > tries > > + to decide between custom plan and generic plan after five query > > + executions, FORCE_CUSTOM_PLAN - the chosen > execution > > + plan will be the one shot plan - it is specific for every set of > > + used paramaters, FORCE_GENERIC_PLAN - the > generic > > + plan will be used from the start. > > I don't think it's a good idea to explain this here, this'll just get > outdated. I think we should rather have a link here. > > > > + > > + > > + > > + > > + PRAGMA PLAN_CACHE > > + in PL/pgSQL > > + > > + The plan for INSERT is always a generic > > plan. > > That's this specific insert, right? Should be mentioned, sounds more > generic to me. > > > +/* -- > > + * Returns pointer to current compiler settings > > + * -- > > + */ > > +PLpgSQL_settings * > > +plpgsql_current_settings(void) > > +{ > > + return current_settings; > > +} > > + > > + > > +/* -- > > + * Setup default compiler settings > > + * -- > > + */ > > +void > > +plpgsql_settings_init(PLpgSQL_settings *settings) > > +{ > > + current_settings = settings; > > +} > > Hm. This is only ever set to &default_settings. > > > > +/* -- > > + * Set compiler settings > > + * -- > > + */ > > +void > > +plpgsql_settings_set(PLpgSQL_settings *settings) > > +{ > > + PLpgSQL_nsitem *ns_cur = ns_top; > > + > > + /* > > + * Modify settings directly, when ns has local settings data. > > + * When ns uses shared settings, create settings first. > > + */ > > + while (ns_cur->itemtype != PLPGSQL_NSTYPE_LABEL) > > + ns_cur = ns_cur->prev; > > + > > + if (ns_cur->local_settings == NULL) > > + { > > + ns_cur->local_settings = palloc(sizeof(PLpgSQL_settings)); > > + ns_cur->local_settings->prev = current_settings; > > + current_settings = ns_cur->local_settings; > > + } > > + > > + current_settings->cursor_options = settings->cursor_options; > > +} > > This seems like a somewhat weird method. Why do we have a global > settings, when we essentially just want to use something in the current > ns? > > I am not sure if I understand to question. This settings is implemented as lazy. If ns has not any own settings, then nothing is done. It requires some global variable, because some ns can be skipped. My first implementation was 1:1 .. ns:settings - but it add some overhead for any ns although ns has not own settings. Regards Pavel > > > - Andres >
Re: [HACKERS] Re: new set of psql patches for loading (saving) data from (to) text, binary files
2017-04-06 3:34 GMT+02:00 Stephen Frost : > Andres, > > * Andres Freund (and...@anarazel.de) wrote: > > On 2017-04-05 21:07:59 -0400, Stephen Frost wrote: > > > * Andres Freund (and...@anarazel.de) wrote: > > > > I don't yet have a good idea how to deal with moving individual cells > > > > into files, so they can be loaded. One approach would be to to have > > > > something like > > > > > > > > \storequeryresult filename_template.%row.%column > > > > > > > > which'd then print the current query buffer into the relevant file > after > > > > doing replacement on %row and %column. > > > > > > I don't actually agree that there's a problem having the output from a > > > query stored direclty in binary form into a single file. The above > > > approach seems to imply that every binary result must go into an > > > independent file, and perhaps that would be useful in some cases, but I > > > don't see it as required. > > > > Well, it'd not be enforced - it'd depend on your template. But for a > > lot of types of files, it'd not make sense to store multiple > > columns/rows in file. Particularly for ones where printing them out to > > files is actually meaningful (i.e. binary ones). > > Having the template not require the row/column place-holders included > strikes me as more likely to be confusing. My initial thinking around > this was that users who actually want independent files would simply > issue independent queries, while users who want to take a bunch of int4 > columns and dump them into a single binary file would be able to do so > easily. > > I'm not against adding the ability for a single query result to be saved > into independent files, but it strikes me as feature creep on this basic > capability. Further, I don't see any particular reason why splitting up > the output from a query into multiple files is only relevant for binary > data. > The files can be simply joined together outside psql Personally I prefer relation type: single field, single file in special g command - because I can simply off all formatting and result should be correct every time. Stephen, have you some use case for your request? Regards Pavel > Thanks! > > Stephen >
Re: [HACKERS] BRIN cost estimate
On 5 April 2017 at 17:34, Emre Hasegeli wrote: > >> Interested to hear comments on this. > > > I don't have chance to test it right now, but I am sure it would be an > improvement over what we have right now. There is no single correct > equation with so many unknowns we have. > >> *indexTotalCost += (numTuples * *indexSelectivity) * >> (cpu_index_tuple_cost + qual_op_cost); > > Have you preserved this line intentionally? This would make BRIN index scan > cost even higher, but the primary property of BRIN is to be cheap to scan. > Shouldn't bitmap heap scan node take into account of checking the tuples in > its cost? Good point. That's wrong, but I'm confused at why you kept the: + *indexTotalCost += selec * numTuples * cpu_index_tuple_cost; at all if that's the case. All the BRIN scan is going to do is build a bitmap of the matching ranges found. I've ended up with: + /* + * Charge a small amount per range tuple which we expect to match to. This + * is meant to reflect the costs of manipulating the bitmap. The BRIN scan + * will set a bit for each page in the range when we find a matching + * range, so we must multiply the charge by the number of pages in the + * range. + */ + *indexTotalCost += 0.1 * cpu_operator_cost * estimatedRanges * + pagesPerRange; Which is inspired from cost_bitmap_tree_node(), and seems like a fair cost for setting a single bit. I also noticed that you were doing: + if (get_index_stats_hook && + (*get_index_stats_hook) (root, index->indexoid, 1, &vardata)) and + vardata.statsTuple = SearchSysCache3(STATRELATTINH, + ObjectIdGetDatum(index->indexoid), + Int16GetDatum(1), I wondered why you picked to hardcode that as 1, and I thought that's surely a bug. But then looking deeper it seems to be copied from btcostestimate(), which also seems to be a long-standing bug introduced in a536ed53bca40cb0d199824e358a86fcfd5db7f2. I'll go write a patch for that one now. I've been looking at the estimates output by the following: create table ab (a int, b int); insert into ab select x/1000,x%1000 from generate_series(1,100)x; create index ab_a_idx on ab using brin (a) with (pages_per_range = 128); analyze ab; explain analyze select * from ab where a = 1; drop index ab_a_idx; create index ab_a_idx on ab using brin (a) with (pages_per_range = 64); analyze ab; explain analyze select * from ab where a = 1; drop index ab_a_idx; create index ab_a_idx on ab using brin (a) with (pages_per_range = 32); analyze ab; explain analyze select * from ab where a = 1; drop index ab_a_idx; create index ab_a_idx on ab using brin (a) with (pages_per_range = 16); analyze ab; explain analyze select * from ab where a = 1; drop index ab_a_idx; create index ab_a_idx on ab using brin (a) with (pages_per_range = 8); analyze ab; explain analyze select * from ab where a = 1; drop index ab_a_idx; create index ab_a_idx on ab using brin (a) with (pages_per_range = 4); analyze ab; explain analyze select * from ab where a = 1; drop index ab_a_idx; create index ab_a_idx on ab using brin (a) with (pages_per_range = 2); analyze ab; explain analyze select * from ab where a = 1; drop index ab_a_idx; create index ab_a_idx on ab using brin (a) with (pages_per_range = 1); analyze ab; explain analyze select * from ab where a = 1; drop index ab_a_idx; and I see that the bitmap index scan gets more expensive the more ranges that are in the index, which of course makes sense. The small bitmap maintenance cost I added should stay about the same, since I multiplied it by the pagesPerRange, it may only drop slightly as it may expect to set fewer bits in the bitmap due to the ranges being more specific to the tuples in the heap. The row estimates seem pretty sane there too, based on how many were filtered out on the recheck. I do still have concerns about how the correlation value is used, and the fact that we use the highest value, but then the whole use of this value is far from perfect, and is just a rough indication of how things are. Apart from that, I'm pretty happy with this now. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services brin-correlation-drowley_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SCRAM authentication, take three
On Tue, Mar 07, 2017 at 02:36:13PM +0200, Heikki Linnakangas wrote: > I didn't include the last-minute changes to the way you specify this in > pg_hba.conf. So it's still just "scram". I agree in general that we should > think about how to extend that too, but I think the proposed syntax was > overly verbose for what we actually support right now. Let's discuss that as > a separate thread, as well. [Action required within three days. This is a generic notification.] The above-described topic is currently a PostgreSQL 10 open item. Heikki, since you committed the patch believed to have created it, you own this open item. If some other commit is more relevant or if this does not belong as a v10 open item, please let us know. Otherwise, please observe the policy on open item ownership[1] and send a status update within three calendar days of this message. Include a date for your subsequent status update. Testers may discover new open items at any time, and I want to plan to get them all fixed well in advance of shipping v10. Consequently, I will appreciate your efforts toward speedy resolution. Thanks. [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE/ALTER ROLE PASSWORD ('value' USING 'method')
On Fri, Mar 17, 2017 at 10:10:59AM +0900, Michael Paquier wrote: > On Fri, Mar 17, 2017 at 2:30 AM, Jeff Janes wrote: > > On Thu, Mar 9, 2017 at 4:59 AM, Michael Paquier > > wrote: > >> > >> On Thu, Mar 9, 2017 at 1:17 AM, Joe Conway wrote: > >> > On 03/07/2017 08:29 PM, Tom Lane wrote: > >> >> Michael Paquier writes: > >> >>> here is a separate thread dedicated to the following extension for > >> >>> CREATE/ALTER ROLE: PASSWORD ('value' USING 'method'). > >> >> > >> >> The parentheses seem weird ... do we really need those? > >> > > >> > +1 > >> > >> Seeing 3 opinions in favor of that, let's do so then. I have updated > >> the patch to not use parenthesis. > > > > The regression tests only exercise the CREATE ROLE...USING version, not the > > ALTER ROLE...USING version. > > Done. > > > +and plain for an non-hashed password. If the password > > +string is already in MD5-hashed or SCRAM-hashed, then it is > > +stored hashed as-is. > > > > In the last line, I think "stored as-is" sounds better. > > Okay. > > > Other than that, it looks good to me. > > Thanks for the review. Attached is an updated patch. [Action required within three days. This is a generic notification.] The above-described topic is currently a PostgreSQL 10 open item. Heikki, since you committed the patch believed to have created it, you own this open item. If some other commit is more relevant or if this does not belong as a v10 open item, please let us know. Otherwise, please observe the policy on open item ownership[1] and send a status update within three calendar days of this message. Include a date for your subsequent status update. Testers may discover new open items at any time, and I want to plan to get them all fixed well in advance of shipping v10. Consequently, I will appreciate your efforts toward speedy resolution. Thanks. [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast Default WIP patch for discussion
Andres, Yes, I still want to push this in. However I have not had time to get back to it. I’m embarrassed to say that I don’t even know where the comments that were issued occurred. Cheers Serge via Newton Mail [https://cloudmagic.com/k/d/mailapp?ct=dx&cv=9.4.52&pv=10.11.6&source=email_footer_2] On Wed, Apr 5, 2017 at 4:47 PM, Andres Freund wrote: Hi Serge, On 2016-10-28 08:28:11 -0700, Serge Rielau wrote: > Time for me to dig into that then. Are you planning to update your POC at some point? This'd be a very welcome improvement. Regards, Andres
Re: [HACKERS] Changing references of password encryption to hashing
On Thu, Mar 16, 2017 at 07:27:11AM -0700, Joe Conway wrote: > On 03/16/2017 06:19 AM, Robert Haas wrote: > > On Mon, Mar 13, 2017 at 4:48 AM, Craig Ringer wrote: > >> So I'm in favour of fixing the docs but I'm not keen on changing the > >> SQL syntax in a way that just kind of papers over part of the > >> problems. > > > > I agree. I think that trying to design new SQL syntax at this point > > is unlikely to be a good idea - we're just about out of time here, and > > some people who might care about this are busy on other things, and > > the deadline for patches that do new things has long since passed. > > But I like the idea of trying to improve the documentation. Seems like a good direction. > Agreed. I think the documentation fixes definitely should be done, but > understand that the grammar is a longer term issue with backward > compatibility implications. Acknowledging the problem is the first step ;-) Most of the user-visible (including doc/) proposed changes alter material predating v10. Therefore, I've removed this from open item status. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multiple false-positive warnings from Valgrind
On Wed, Mar 29, 2017 at 12:34:52PM +0900, Michael Paquier wrote: > On Thu, Mar 23, 2017 at 5:15 PM, Michael Paquier > wrote: > > On Tue, Mar 21, 2017 at 10:57 PM, Aleksander Alekseev > > wrote: > >> Recently I've decided to run PostgreSQL under Valgrind according to wiki > >> description [1]. Lots of warnings are generated [2] but it is my > >> understanding that all of them are false-positive. For instance I've > >> found these two reports particularly interesting: > >> > >> ``` > >> ==00:00:40:40.161 7677== Use of uninitialised value of size 8 > >> ==00:00:40:40.161 7677==at 0xA15FF5: pg_b64_encode (base64.c:68) > >> ==00:00:40:40.161 7677==by 0x6FFE85: scram_build_verifier > >> (auth-scram.c:348) > >> ==00:00:40:40.161 7677==by 0x6F3F76: encrypt_password (crypt.c:171) > >> ==00:00:40:40.161 7677==by 0x68F40C: CreateRole (user.c:403) > >> ==00:00:40:40.161 7677==by 0x85D53A: standard_ProcessUtility > >> (utility.c:716) > >> ==00:00:40:40.161 7677==by 0x85CCC7: ProcessUtility (utility.c:353) > >> ==00:00:40:40.161 7677==by 0x85BD22: PortalRunUtility (pquery.c:1165) > >> ==00:00:40:40.161 7677==by 0x85BF20: PortalRunMulti (pquery.c:1308) > >> ==00:00:40:40.161 7677==by 0x85B4A0: PortalRun (pquery.c:788) > >> ==00:00:40:40.161 7677==by 0x855672: exec_simple_query > >> (postgres.c:1101) > >> ==00:00:40:40.161 7677==by 0x8597BB: PostgresMain (postgres.c:4066) > >> ==00:00:40:40.161 7677==by 0x7C6322: BackendRun (postmaster.c:4317) > >> ==00:00:40:40.161 7677== Uninitialised value was created by a stack > >> allocation > >> ==00:00:40:40.161 7677==at 0x6FFDB7: scram_build_verifier > >> (auth-scram.c:328) > > > > I can see those warnings as well after calling a code path of > > scram_build_verifier(), and I have a hard time seeing that as nothing > > else than a false positive as you do. All those warnings go away if > > you just initialize just do MemSet(salt, 0, SCRAM_SALT_LEN) before > > calling pg_backend_random() but this data is filled in with > > RAND_bytes() afterwards (if built with openssl). The estimated lengths > > of the encoding are also correct. I don't see immediately what's wrong > > here, this deserves a second look... > > And it seems to me that this is caused by the routines of OpenSSL. > When building without --with-openssl, using the fallback > implementations of SHA256 and RAND_bytes I see no warnings generated > by scram_build_verifier... I think it makes most sense to discard that > from the list of open items. This defect has roughly the gravity of a compiler warning. Dropped from open items on that basis. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Adding support for Default partition in partitioning
On 2017/04/06 13:08, Keith Fiske wrote: > On Wed, Apr 5, 2017 at 2:51 PM, Keith Fiske wrote: >> Only issue I see with this, and I'm not sure if it is an issue, is what >> happens to that default constraint clause when 1000s of partitions start >> getting added? From what I gather the default's constraint is built based >> off the cumulative opposite of all other child constraints. I don't >> understand the code well enough to see what it's actually doing, but if >> there are no gaps, is the method used smart enough to aggregate all the >> child constraints to make a simpler constraint that is simply outside the >> current min/max boundaries? If so, for serial/time range partitioning this >> should typically work out fine since there are rarely gaps. This actually >> seems more of an issue for list partitioning where each child is a distinct >> value or range of values that are completely arbitrary. Won't that check >> and re-evaluation of the default's constraint just get worse and worse as >> more children are added? Is there really even a need for the default to >> have an opposite constraint like this? Not sure on how the planner works >> with partitioning now, but wouldn't it be better to first check all >> non-default children for a match the same as it does now without a default >> and, failing that, then route to the default if one is declared? The >> default should accept any data then so I don't see the need for the >> constraint unless it's required for the current implementation. If that's >> the case, could that be changed? Unless I misread your last sentence, I think there might be some confusion. Currently, the partition constraint (think of these as you would of user-defined check constraints) is needed for two reasons: 1. to prevent direct insertion of rows into the default partition for which a non-default partition exists; no two partitions should ever have duplicate rows. 2. so that planner can use the constraint to determine if the default partition needs to be scanned for a query using constraint exclusion; no need, for example, to scan the default partition if the query requests only key=3 rows and a partition for the same exists (no other partition should have key=3 rows by definition, not even the default). As things stand today, planner needs to look at every partition individually for using constraint exclusion to possibly exclude it, *even* with declarative partitioning and that would include the default partition. > Actually, thinking on this more, I realized this does again come back to > the lack of a global index. Without the constraint, data could be put > directly into the default that could technically conflict with the > partition scheme elsewhere. Perhaps, instead of the constraint, inserts > directly to the default could be prevented on the user level. Writing to > valid children directly certainly has its place, but been thinking about > it, and I can't see any reason why one would ever want to write directly to > the default. It's use case seems to be around being a sort of temporary > storage until that data can be moved to a valid location. Would still need > to allow removal of data, though. As mentioned above, the default partition will not allow directly inserting a row whose key maps to some existing (non-default) partition. As far as tuple-routing is concerned, it will choose the default partition only if no other partition is found for the key. Tuple-routing doesn't use the partition constraints directly per se, like one of the two things mentioned above do. One could say that tuple-routing assigns the incoming rows to partitions such that their individual partition constraints are not violated. Finally, we don't yet offer global guarantees for constraints like unique. The only guarantee that's in place is that no two partitions can contain the same partition key. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Adding support for Default partition in partitioning
On 2017/04/06 0:19, Robert Haas wrote: > On Wed, Apr 5, 2017 at 5:57 AM, Rahila Syed wrote: >>> Could you briefly elaborate why you think the lack global index support >>> would be a problem in this regard? >> I think following can happen if we allow rows satisfying the new partition >> to lie around in the >> default partition until background process moves it. >> Consider a scenario where partition key is a primary key and the data in the >> default partition is >> not yet moved into the newly added partition. If now, new data is added into >> the new partition >> which also exists(same key) in default partition there will be data >> duplication. If now >> we scan the partitioned table for that key(from both the default and new >> partition as we >> have not moved the rows) it will fetch the both rows. >> Unless we have global indexes for partitioned tables, there is chance of >> data duplication between >> child table added after default partition and the default partition. > > Yes, I think it would be completely crazy to try to migrate the data > in the background: > > - The migration might never complete because of a UNIQUE or CHECK > constraint on the partition to which rows are being migrated. > > - Even if the migration eventually succeeded, such a design abandons > all hope of making INSERT .. ON CONFLICT DO NOTHING work sensibly > while the migration is in progress, unless the new partition has no > UNIQUE constraints. > > - Partition-wise join and partition-wise aggregate would need to have > special case handling for the case of an unfinished migration, as > would any user code that accesses partitions directly. > > - More generally, I think users expect that when a DDL command > finishes execution, it's done all of the work that there is to do (or > at the very least, that any remaining work has no user-visible > consequences, which would not be the case here). Thanks Robert for this explanation. This makes it more clear, why row movement by background is not sensible idea. On Thu, Apr 6, 2017 at 9:38 AM, Keith Fiske wrote: > > On Wed, Apr 5, 2017 at 2:51 PM, Keith Fiske wrote: > >> >> >> Only issue I see with this, and I'm not sure if it is an issue, is what >> happens to that default constraint clause when 1000s of partitions start >> getting added? From what I gather the default's constraint is built based >> off the cumulative opposite of all other child constraints. I don't >> understand the code well enough to see what it's actually doing, but if >> there are no gaps, is the method used smart enough to aggregate all the >> child constraints to make a simpler constraint that is simply outside the >> current min/max boundaries? If so, for serial/time range partitioning this >> should typically work out fine since there are rarely gaps. This actually >> seems more of an issue for list partitioning where each child is a distinct >> value or range of values that are completely arbitrary. Won't that check >> and re-evaluation of the default's constraint just get worse and worse as >> more children are added? Is there really even a need for the default to >> have an opposite constraint like this? Not sure on how the planner works >> with partitioning now, but wouldn't it be better to first check all >> non-default children for a match the same as it does now without a default >> and, failing that, then route to the default if one is declared? The >> default should accept any data then so I don't see the need for the >> constraint unless it's required for the current implementation. If that's >> the case, could that be changed? >> >> Keith >> > > Actually, thinking on this more, I realized this does again come back to > the lack of a global index. Without the constraint, data could be put > directly into the default that could technically conflict with the > partition scheme elsewhere. Perhaps, instead of the constraint, inserts > directly to the default could be prevented on the user level. Writing to > valid children directly certainly has its place, but been thinking about > it, and I can't see any reason why one would ever want to write directly to > the default. It's use case seems to be around being a sort of temporary > storage until that data can be moved to a valid location. Would still need > to allow removal of data, though. > > Not sure if that's even a workable solution. Just trying to think of ways > around the current limitations and still allow this feature. > I like the idea about having DEFAULT partition for the range partition. With the way partition is designed it can have holes into range partition. I think DEFAULT for the range partition is a good idea, generally when the range having holes. When range is serial then of course DEFAULT partition doen't much sense. Regarda, Rushabh Lathia www.EnterpriseDB.com
Re: [HACKERS] Add pgstathashindex() to get hash index table statistics.
Hi, >> >> Based on the earlier discussions, I have prepared a patch that would >> allow pgstathashindex() to show the number of unused pages in hash >> index. Please find the attached patch for the same. Thanks. > > My idea is that we shouldn't end up with both a zero_pages column and > an unused_pages column. Instead, we should end up with just an > unused_pages column, which will include both pages that are all-zeroes > and pages that have a valid special space marked as LH_UNUSED. > > Also, I don't see why it's correct to test PageIsEmpty() here instead > of just testing the page type as we do in pageinspect. > > Attached is a revised patch that shows what I have in mind; please > review. Along the way I made the code for examining the page type > more similar to what pageinspect does, because I see no reason for > those things to be different, and I think the pageinspect code is > better. I have reviewed the patch and it looks good to me. Also, the idea of including both zero and unused pages in a single 'unused' column looks better. Thanks. -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Letting the client choose the protocol to use during a SASL exchange
On Tue, Apr 04, 2017 at 03:02:30PM +0900, Michael Paquier wrote: > There is still one open item pending for SCRAM that has not been > treated which is mentioned here: > https://www.postgresql.org/message-id/b081887e-1712-3aa4-7dbe-e012333d5...@iki.fi > > When doing an authentication with SASL, the server decides what is the > mechanism that the client has to use. As SCRAM-SHA-256 is only one of > such mechanisms, it would be nice to have something more generic and > have the server return to the client a list of protocols that the > client can choose from. And also the server achnowledge which protocol > is going to be used. > > Note that RFC4422 has some content on the matter > https://tools.ietf.org/html/rfc4422#section-3.1: >Mechanism negotiation is protocol specific. > >Commonly, a protocol will specify that the server advertises >supported and available mechanisms to the client via some facility >provided by the protocol, and the client will then select the "best" >mechanism from this list that it supports and finds suitable. > > So once the server sends back the list of mechanisms that are > supported, the client is free to use what it wants. > > On HEAD, a 'R' message with AUTH_REQ_SASL followed by > SCRAM_SHA256_NAME is sent to let the client know what is the mechanism > to use for the SASL exchange. In the future, this should be extended > so as a list of names is sent, for example a comma-separated list, but > we are free to choose the format we want here. With this list at hand, > the client can then choose the protocol it thinks is the best. Still, > there is a gap with our current implementation because the server > expects the first message from the client to have a SCRAM format, but > that's true only if SCRAM-SHA-256 is used as mechanism. > > In order to cover this gap, it seems to me that we need to have an > intermediate state before the server is switched to FE_SCRAM_INIT so > as the mechanism used is negotiated between the two parties. Once the > protocol negotiation is done, the server can then move on with the > mechanism to use. This would be important in the future to allow more > SASL mechanisms to work. I am adding an open item for that. If any SCRAM open item is a beta blocker, it's this one. (But SASLprep is also in or near that status.) Post-beta wire protocol changes are bad, considering beta is normally the time for projects like pgjdbc and npgsql to start adapting to such changes. [Action required within three days. This is a generic notification.] The above-described topic is currently a PostgreSQL 10 open item. Heikki, since you committed the patch believed to have created it, you own this open item. If some other commit is more relevant or if this does not belong as a v10 open item, please let us know. Otherwise, please observe the policy on open item ownership[1] and send a status update within three calendar days of this message. Include a date for your subsequent status update. Testers may discover new open items at any time, and I want to plan to get them all fixed well in advance of shipping v10. Consequently, I will appreciate your efforts toward speedy resolution. Thanks. [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug with pg_basebackup and 'shared' tablespace
At Thu, 06 Apr 2017 00:59:49 +0200, Pierre Ducroquet wrote in <2008148.rxBNyNRHPZ@peanuts2> > But it all gets messy when we want to create a streaming standby server using > pg_basebackup. When backuping Pg 9.5, there is no issue, but backuping Pg 9.6 > afterwards will say "directory "/mnt/ssd/postgres" exists but is not empty". The documentation says as follows. https://www.postgresql.org/docs/9.6/static/manage-ag-tablespaces.html | The location must be an existing, empty directory that is owned | by the PostgreSQL operating system user. This explicitly prohibits to share one tablespace directory among multiple servers. The code is just missing the case of multiple servers with different versions. I think the bug is rather that Pg9.6 in the case allowed to create the tablespace. The current naming rule of tablespace directory was introduced as of 9.0 so that pg_upgrade (or pg_migrator at the time) can perform in-place migration. It is not intended to share a directory among multiple instances with different versions. That being said, an additional trick in the attached file will work for you. regards, -- Kyotaro Horiguchi NTT Open Source Software Center --- reproduce-bug-tblspace.sh 2017-04-06 13:49:10.657803383 +0900 +++ reproduce-bug-tblspace.sh.new 2017-04-06 13:48:45.870024586 +0900 @@ -48,3 +48,14 @@ echo " SECOND BASEBACKUP " -echo "Expecting it to fail with 'directory \"/tmp/pg_bug_backup_tblspace/dest_server/tblspc\" exists but is not empty'" + +echo "stashing tablespace directories of the first backup" +cd .. +mv tblspc tblspc.tmp +mkdir tblspc +cd datas + $PG2/pg_basebackup -h /tmp -p $PORT2 -D pg2 --tablespace-mapping=/tmp/pg_bug_backup_tblspace/source_server/tblspc=/tmp/pg_bug_backup_tblspace/dest_server/tblspc + +echo "merging tablespace directories" +cd .. +mv tblspc.tmp/* tblspc/ +rmdir tblspc.tmp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel bitmapscan isn't exercised in regression tests
On Tue, Apr 4, 2017 at 5:51 AM, Dilip Kumar wrote: > Sure I can do that, In attached patch, I only fixed the problem of not > executing the bitmap test. Now, I will add few cases to cover other > parts especially rescan and prefetching logic. I have added two test cases to cover rescan, prefetch and lossy pages logic for parallel bitmap. I have removed the existing case because these two new cases will be enough to cover that part as well. Now, nodeBitmapHeapScan.c has 95.5% of line coverage. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com parallel_bitmap_test_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Faster methods for getting SPI results (460% improvement)
On 4/5/17 9:08 PM, Craig Ringer wrote: ... which I can't reproduce now. Even though I cleared ccache and "git reset -fdx" before I ran the above and got the crash. Glad to hear that, since I can't repro at all. :) Assume it's a local system peculiarity. If I can reproduce again I'll dig into it. Sounds good. Thanks! -- Jim Nasby, Chief Data Architect, Austin TX OpenSCG http://OpenSCG.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Adding support for Default partition in partitioning
On Wed, Apr 5, 2017 at 2:51 PM, Keith Fiske wrote: > > > Only issue I see with this, and I'm not sure if it is an issue, is what > happens to that default constraint clause when 1000s of partitions start > getting added? From what I gather the default's constraint is built based > off the cumulative opposite of all other child constraints. I don't > understand the code well enough to see what it's actually doing, but if > there are no gaps, is the method used smart enough to aggregate all the > child constraints to make a simpler constraint that is simply outside the > current min/max boundaries? If so, for serial/time range partitioning this > should typically work out fine since there are rarely gaps. This actually > seems more of an issue for list partitioning where each child is a distinct > value or range of values that are completely arbitrary. Won't that check > and re-evaluation of the default's constraint just get worse and worse as > more children are added? Is there really even a need for the default to > have an opposite constraint like this? Not sure on how the planner works > with partitioning now, but wouldn't it be better to first check all > non-default children for a match the same as it does now without a default > and, failing that, then route to the default if one is declared? The > default should accept any data then so I don't see the need for the > constraint unless it's required for the current implementation. If that's > the case, could that be changed? > > Keith > Actually, thinking on this more, I realized this does again come back to the lack of a global index. Without the constraint, data could be put directly into the default that could technically conflict with the partition scheme elsewhere. Perhaps, instead of the constraint, inserts directly to the default could be prevented on the user level. Writing to valid children directly certainly has its place, but been thinking about it, and I can't see any reason why one would ever want to write directly to the default. It's use case seems to be around being a sort of temporary storage until that data can be moved to a valid location. Would still need to allow removal of data, though. Not sure if that's even a workable solution. Just trying to think of ways around the current limitations and still allow this feature.
Re: [HACKERS] Faster methods for getting SPI results (460% improvement)
On 6 April 2017 at 11:46, Craig Ringer wrote: > On 6 April 2017 at 09:40, Jim Nasby wrote: >> On 4/4/17 7:44 PM, Craig Ringer wrote: >>> >>> The patch crashes in initdb with --enable-cassert builds: >> >> >> Thanks for the review! I'll get to the rest of it in a bit, but I'm unable >> to reproduce the initdb failure. I looked at the assert line and I don't see >> anything obvious either. :/ >> >> Can you send your full configure call? uname -a? Mine is: >> >> ./configure --with-includes=/opt/local/include >> --with-libraries=/opt/local/lib --enable-debug --with-libxml --with-tcl >> --with-perl --with-python --enable-depend --enable-dtrace --enable-tap-tests >> --prefix=/Users/decibel/pgsql/HEAD/i/i --with-pgport=$PGC_PORT -C >> --enable-cassert --enable-debug CFLAGS='-ggdb -O0 -fno-omit-frame-pointer' > > > ./configure --prefix=/home/craig/pg/10 --enable-cassert --enable-debug > --enable-tap-tests --with-python > make -s clean > make -s -j4 > make check > > results in the crash here. ... which I can't reproduce now. Even though I cleared ccache and "git reset -fdx" before I ran the above and got the crash. Assume it's a local system peculiarity. If I can reproduce again I'll dig into it. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Faster methods for getting SPI results (460% improvement)
On 4/5/17 7:44 PM, Jim Nasby wrote: Updated patches attached, but I still need to update the docs. Attached is a complete series of patches that includes the docs patch. Right now, the docs don't include a concrete example, because adding one would be a pretty large if it demonstrated real usage, which presumably means Yet Another Contrib Module strictly for the purpose of demonstrating something. Rather than doing that, ISTM it'd be better to point the user at what plpythonu is doing. Another option would be to have a very simple example that only uses *receiveSlot, but that seems rather pointless to me. -- Jim Nasby, Chief Data Architect, Austin TX OpenSCG http://OpenSCG.com From 0a2ef661f55a047763a43b0eebd7483760e4a427 Mon Sep 17 00:00:00 2001 From: Jim Nasby Date: Wed, 5 Apr 2017 20:52:39 -0500 Subject: [PATCH 1/3] Add SPI_execute_callback Instead of placing results in a tuplestore, this method of execution uses the supplied callback when creating the Portal for a query. --- src/backend/executor/spi.c | 80 -- src/backend/tcop/dest.c| 15 + src/include/executor/spi.h | 4 +++ src/include/tcop/dest.h| 1 + 4 files changed, 90 insertions(+), 10 deletions(-) diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c index ca547dc6d9..4f6c3011f9 100644 --- a/src/backend/executor/spi.c +++ b/src/backend/executor/spi.c @@ -55,7 +55,8 @@ static void _SPI_prepare_oneshot_plan(const char *src, SPIPlanPtr plan); static int _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI, Snapshot snapshot, Snapshot crosscheck_snapshot, - bool read_only, bool fire_triggers, uint64 tcount); + bool read_only, bool fire_triggers, uint64 tcount, + DestReceiver *callback); static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes, Datum *Values, const char *Nulls); @@ -321,7 +322,35 @@ SPI_execute(const char *src, bool read_only, long tcount) res = _SPI_execute_plan(&plan, NULL, InvalidSnapshot, InvalidSnapshot, - read_only, true, tcount); + read_only, true, tcount, NULL); + + _SPI_end_call(true); + return res; +} + +int +SPI_execute_callback(const char *src, bool read_only, long tcount, + DestReceiver *callback) +{ + _SPI_plan plan; + int res; + + if (src == NULL || tcount < 0) + return SPI_ERROR_ARGUMENT; + + res = _SPI_begin_call(true); + if (res < 0) + return res; + + memset(&plan, 0, sizeof(_SPI_plan)); + plan.magic = _SPI_PLAN_MAGIC; + plan.cursor_options = 0; + + _SPI_prepare_oneshot_plan(src, &plan); + + res = _SPI_execute_plan(&plan, NULL, + InvalidSnapshot, InvalidSnapshot, + read_only, true, tcount, callback); _SPI_end_call(true); return res; @@ -355,7 +384,34 @@ SPI_execute_plan(SPIPlanPtr plan, Datum *Values, const char *Nulls, _SPI_convert_params(plan->nargs, plan->argtypes, Values, Nulls), InvalidSnapshot, InvalidSnapshot, - read_only, true, tcount); + read_only, true, tcount, NULL); + + _SPI_end_call(true); + return res; +} + +/* Execute a previously prepared plan with a callback */ +int +SPI_execute_plan_callback(SPIPlanPtr plan, Datum *Values, const char *Nulls, +bool read_only, long tcount, DestReceiver *callback) +{ + int res; + + if (plan == NULL || plan->magic != _SPI_PLAN_MAGIC || tcount < 0) + return SPI_ERROR_ARGUMENT; + + if (plan->nargs > 0 && Values == NULL) + return SPI_ERROR_PARAM; + + res = _SPI_begin_call(true); + if (res < 0) + return res; + + res = _SPI_execute_plan(plan, + _SPI_convert_params(plan->nargs, plan->argtypes, + Values, Nulls), + InvalidSnapshot, InvalidSnapshot, + read_only, true, tcount, callback); _SPI_end_call(true); return
Re: [HACKERS] Faster methods for getting SPI results (460% improvement)
On 6 April 2017 at 09:40, Jim Nasby wrote: > On 4/4/17 7:44 PM, Craig Ringer wrote: >> >> The patch crashes in initdb with --enable-cassert builds: > > > Thanks for the review! I'll get to the rest of it in a bit, but I'm unable > to reproduce the initdb failure. I looked at the assert line and I don't see > anything obvious either. :/ > > Can you send your full configure call? uname -a? Mine is: > > ./configure --with-includes=/opt/local/include > --with-libraries=/opt/local/lib --enable-debug --with-libxml --with-tcl > --with-perl --with-python --enable-depend --enable-dtrace --enable-tap-tests > --prefix=/Users/decibel/pgsql/HEAD/i/i --with-pgport=$PGC_PORT -C > --enable-cassert --enable-debug CFLAGS='-ggdb -O0 -fno-omit-frame-pointer' ./configure --prefix=/home/craig/pg/10 --enable-cassert --enable-debug --enable-tap-tests --with-python make -s clean make -s -j4 make check results in the crash here. if I add CFLAGS="" to the arguments (which suppresses the default "-O2"), or pass CFLAGS="-O0" then the crash goes away. $ python --version Python 2.7.11 $ lsb_release -a LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch Distributor ID: Fedora Description: Fedora release 23 (Twenty Three) Release: 23 Codename: TwentyThree $ gcc --version gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6) -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Faster methods for getting SPI results (460% improvement)
On 4/4/17 7:44 PM, Craig Ringer wrote: On 5 April 2017 at 08:23, Craig Ringer wrote: On 5 April 2017 at 08:00, Craig Ringer wrote: This patch fails to update the documentation at all. https://www.postgresql.org/docs/devel/static/spi.html I'll fix that soon. missing newline Fixed. +/* Execute a previously prepared plan with a callback Destination */ caps "Destination" Hmm, I capitalized it since DestReceiver is capitalized. I guess it's best to just drop Destination. +// XXX throw error if callback is set Fixed (opted to just use an Assert). +static DestReceiver spi_callbackDR = { +donothingReceive, donothingStartup, donothingCleanup, donothingCleanup, +DestSPICallback +}; Presumably that's a default destination you're then supposed to modify with your own callbacks? There aren't any comments to indicate what's going on here. Correct. Added the following comment: /* * This is strictly a starting point for creating a callback. It should not * actually be used. */ Comments on patch 2: If this is the "bare minimum" patch, what is pending? Does it impose any downsides or limits? No limits. I'm not aware of any downsides. It's "bare minimum" because a follow-on step is to allow different methods of returning results. In particular, my testing indicates that patch 1 + returning a dict of lists (as opposed to the current list of dicts) results in a 460% improvement vs ~30% with patch 2. +/* Get switch execution contexts */ +extern PLyExecutionContext *PLy_switch_execution_context(PLyExecutionContext *new); Comment makes no sense to me. This seems something like memory context switch, where you supply the new and return the old. But the comment doesn't explain it at all. Comment changed to /* Switch execution context (similar to MemoryContextSwitchTo */ +void PLy_CSStartup(DestReceiver *self, int operation, TupleDesc typeinfo); +void PLy_CSDestroy(DestReceiver *self); These are declared in the plpy_spi.c. Why aren't these static? Derp. Fixed. +volatile MemoryContext oldcontext; +volatile ResourceOwner oldowner; int rv; -volatile MemoryContext oldcontext; -volatile ResourceOwner oldowner; Unnecessary code movement. IMHO it reads better that way. I've left it for now so $COMMITTER can decide, but if it really bugs you let me know and I'll swap it. In PLy_Callback_New, I think your use of a sub-context is sensible. Is it necessary to palloc0 though? Hrm, maybe not... but it seems like cheap insurance considering the amount of other stuff involved in just starting a new SPI call. And honestly, I'd rather not mess with it at this point. :) I have added an XXX comment though. +static CallbackState * +PLy_Callback_Free(CallbackState *callback) The code here looks like it could be part of spi.c's callback support, rather than plpy_spi specific, since you provide a destroy callback in the SPI callback struct. I added this for use in PG_CATCH() blocks, because it's not clear to me that the portal gets cleaned up in those cases. It's certainly possible that it's pointless. FWIW, I'm pretty sure I copied that pattern from somewhere else... probably plpgsql or pltcl. +/* We need to store this because the TupleDesc the receive function gets has no names. */ +myState->desc = typeinfo; Is it safe to just store the pointer to the TupleDesc here? What's the lifetime? Only Portal lifetime. + * will clean it up when the time is right. XXX This might result in a leak + * if an error happens and the result doesn't get dereferenced. + */ +MemoryContextSwitchTo(TopMemoryContext); +result->tupdesc = CreateTupleDescCopy(typeinfo); especially given this XXX comment... I've changed the comment to the following. Hopefully this clarifies things: /* * Save tuple descriptor for later use by result set metadata * functions. Save it in TopMemoryContext so that it survives outside of * an SPI context. We trust that PLy_result_dealloc() will clean it up * when the time is right. The difference between result and everything * else is that result needs to survive after the portal is destroyed, * because result is what's handed back to the plpython function. While * it's tempting to use something other than TopMemoryContext, that won't * work: the user could potentially put result into the global dictionary, * which means it could survive as long as the session does. This might * result in a leak if an error happens and the result doesn't get * dereferenced, but if that happens it means the python GC has failed us, * at which point we probably have bigger problems. * * This still isn't perfect though; if something the result tupledesc * references has it's OID changed then the tupledesc will be invalid. I'm * not sure it's
[HACKERS] Interval for launching the table sync worker
Hi all, While testing table sync worker for logical replication I noticed that if the table sync worker of logical replication failed to insert the data for whatever reason, the table sync worker process exits with error. And then the main apply worker launches the table sync worker again soon without interval. This routine is executed at very high frequency without interval. Should we do put a interval (wal_retrieve_interval or make a new GUC parameter?) for launching the table sync worker? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Append implementation
On 2017-04-05 14:52:38 +0530, Amit Khandekar wrote: > This is what the earlier versions of my patch had done : just add up > per-subplan parallel_workers (1 for non-partial subplan and > subpath->parallel_workers for partial subplans) and set this total as > the Append parallel_workers. I don't think that's great, consider e.g. the case that you have one very expensive query, and a bunch of cheaper ones. Most of those workers wouldn't do much while waiting for the the expensive query. What I'm basically thinking we should do is something like the following pythonesque pseudocode: best_nonpartial_cost = -1 best_nonpartial_nworkers = -1 for numworkers in 1...#max workers: worker_work = [0 for x in range(0, numworkers)] nonpartial_cost += startup_cost * numworkers # distribute all nonpartial tasks over workers. Assign tasks to the # worker with the least amount of work already performed. for task in all_nonpartial_subqueries: least_busy_worker = worker_work.smallest() least_busy_worker += task.total_nonpartial_cost # the nonpartial cost here is the largest amount any single worker # has to perform. nonpartial_cost += worker_work.largest() total_partial_cost = 0 for task in all_partial_subqueries: total_partial_cost += task.total_nonpartial_cost # Compute resources needed by partial tasks. First compute how much # cost we can distribute to workers that take shorter than the # "busiest" worker doing non-partial tasks. remaining_avail_work = 0 for i in range(0, numworkers): remaining_avail_work += worker_work.largest() - worker_work[i] # Equally divide up remaining work over all workers if remaining_avail_work < total_partial_cost: nonpartial_cost += (worker_work.largest - remaining_avail_work) / numworkers # check if this is the best number of workers if best_nonpartial_cost == -1 or best_nonpartial_cost > nonpartial_cost: best_nonpartial_cost = worker_work.largest best_nonpartial_nworkers = nworkers Does that make sense? > BTW all of the above points apply only for non-partial plans. Indeed. But I think that's going to be a pretty common type of plan, especially if we get partitionwise joins. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Query fails when SRFs are part of FROM clause (Commit id: 69f4b9c85f)
On Wed, Apr 05, 2017 at 12:16:25AM -0700, Andres Freund wrote: > On 2017-04-05 02:47:55 -0400, Noah Misch wrote: > > On Fri, Mar 10, 2017 at 11:49:46AM -0800, Andres Freund wrote: > > > On 2017-03-09 13:34:22 -0500, Robert Haas wrote: > > > > On Mon, Jan 30, 2017 at 6:54 PM, Tom Lane wrote: > > > > > Andres Freund writes: > > > > >> Wonder if we there's an argument to be made for implementing this > > > > >> roughly similarly to split_pathtarget_at_srf - instead of injecting a > > > > >> ProjectSet node we'd add a FunctionScan node below a Result node. > > > > > > > > > > Yeah, possibly. That would have the advantage of avoiding an > > > > > ExecProject > > > > > step when the SRFs aren't buried, which would certainly be the > > > > > expected > > > > > case. > > > > > > > > > > If you don't want to make ExecInitExpr responsible, then the planner > > > > > would > > > > > have to do something like split_pathtarget_at_srf anyway to decompose > > > > > the > > > > > expressions, no matter which executor representation we use. > > > > > > > > Did we do anything about this? Are we going to? > > > > > > Working on a patch. > > > > [Action required within three days. This is a generic notification.] > > > > The above-described topic is currently a PostgreSQL 10 open item. Andres, > > since you committed the patch believed to have created it, you own this open > > item. If some other commit is more relevant or if this does not belong as a > > v10 open item, please let us know. Otherwise, please observe the policy on > > open item ownership[1] and send a status update within three calendar days > > of > > this message. Include a date for your subsequent status update. Testers > > may > > discover new open items at any time, and I want to plan to get them all > > fixed > > well in advance of shipping v10. Consequently, I will appreciate your > > efforts > > toward speedy resolution. Thanks. > > I've a very preliminary patch. I'd like to only start polishing it up > once the code freeze is over, so I can work on getting some patches in - > note that I myself have no pending patches. Once frozen I'll polish it > up and send that within a few days. > > Ok? Okay; using my simplistic translator of "a few", I'll look for your next status update on or before 2017-04-11. As always, feel free to set a different date. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Quorum commit for multiple synchronous replication.
On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote: > On Wed, Apr 5, 2017 at 3:45 PM, Noah Misch wrote: > > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote: > >> Regarding this feature, there are some loose ends. We should work on > >> and complete them until the release. > >> > >> (1) > >> Which synchronous replication method, priority or quorum, should be > >> chosen when neither FIRST nor ANY is specified in s_s_names? Right now, > >> a priority-based sync replication is chosen for keeping backward > >> compatibility. However some hackers argued to change this decision > >> so that a quorum commit is chosen because they think that most users > >> prefer to a quorum. > >> > >> (2) > >> There will be still many source comments and documentations that > >> we need to update, for example, in high-availability.sgml. We need to > >> check and update them throughly. > >> > >> (3) > >> The priority value is assigned to each standby listed in s_s_names > >> even in quorum commit though those priority values are not used at all. > >> Users can see those priority values in pg_stat_replication. > >> Isn't this confusing? If yes, it might be better to always assign 1 as > >> the priority, for example. > > > > [Action required within three days. This is a generic notification.] > > > > The above-described topic is currently a PostgreSQL 10 open item. Fujii, > > since you committed the patch believed to have created it, you own this open > > item. If some other commit is more relevant or if this does not belong as a > > v10 open item, please let us know. Otherwise, please observe the policy on > > open item ownership[1] and send a status update within three calendar days > > of > > this message. Include a date for your subsequent status update. Testers > > may > > discover new open items at any time, and I want to plan to get them all > > fixed > > well in advance of shipping v10. Consequently, I will appreciate your > > efforts > > toward speedy resolution. Thanks. > > > > [1] > > https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com > > Thanks for the notice! > > Regarding the item (2), Sawada-san told me that he will work on it after > this CommitFest finishes. So we would receive the patch for the item from > him next week. If there will be no patch even after the end of next week > (i.e., April 14th), I will. Let's wait for Sawada-san's action at first. Sounds reasonable; I will look for your update on 14Apr or earlier. > The items (1) and (3) are not bugs. So I don't think that they need to be > resolved before the beta release. After the feature freeze, many users > will try and play with many new features including quorum-based syncrep. > Then if many of them complain about (1) and (3), we can change the code > at that timing. So we need more time that users can try the feature. I've moved (1) to a new section for things to revisit during beta. If someone feels strongly that the current behavior is Wrong and must change, speak up as soon as you reach that conclusion. Absent such arguments, the behavior won't change. > BTW, IMO (3) should be fixed so that pg_stat_replication reports NULL > as the priority if quorum-based sync rep is chosen. It's less confusing. Since you do want (3) to change, please own it like any other open item, including the mandatory status updates. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Faster methods for getting SPI results (460% improvement)
On 4/4/17 7:44 PM, Craig Ringer wrote: The patch crashes in initdb with --enable-cassert builds: Thanks for the review! I'll get to the rest of it in a bit, but I'm unable to reproduce the initdb failure. I looked at the assert line and I don't see anything obvious either. :/ Can you send your full configure call? uname -a? Mine is: ./configure --with-includes=/opt/local/include --with-libraries=/opt/local/lib --enable-debug --with-libxml --with-tcl --with-perl --with-python --enable-depend --enable-dtrace --enable-tap-tests --prefix=/Users/decibel/pgsql/HEAD/i/i --with-pgport=$PGC_PORT -C --enable-cassert --enable-debug CFLAGS='-ggdb -O0 -fno-omit-frame-pointer' Darwin decina.local 15.6.0 Darwin Kernel Version 15.6.0: Mon Jan 9 23:07:29 PST 2017; root:xnu-3248.60.11.2.1~1/RELEASE_X86_64 x86_64 -- Jim Nasby, Chief Data Architect, Austin TX OpenSCG http://OpenSCG.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
On Thu, Apr 6, 2017 at 1:06 AM, Robert Haas wrote: > On Wed, Apr 5, 2017 at 2:32 PM, Pavan Deolasee > wrote: > > > The other way is to pass old tuple values along with the new tuple > values to > > amwarminsert, build index tuples and then do a comparison. For duplicate > > index tuples, skip WARM inserts. > > This is more what I was thinking. But maybe one of the other ideas > you wrote here is better; not sure. > > Ok. I think I suggested this as one of the ideas upthread, to support hash indexes for example. This might be a good safety-net, but AFAIC what we have today should work since we pretty much construct index tuples in a consistent way before doing a comparison. Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Re: new set of psql patches for loading (saving) data from (to) text, binary files
Andres, * Andres Freund (and...@anarazel.de) wrote: > On 2017-04-05 21:07:59 -0400, Stephen Frost wrote: > > * Andres Freund (and...@anarazel.de) wrote: > > > I don't yet have a good idea how to deal with moving individual cells > > > into files, so they can be loaded. One approach would be to to have > > > something like > > > > > > \storequeryresult filename_template.%row.%column > > > > > > which'd then print the current query buffer into the relevant file after > > > doing replacement on %row and %column. > > > > I don't actually agree that there's a problem having the output from a > > query stored direclty in binary form into a single file. The above > > approach seems to imply that every binary result must go into an > > independent file, and perhaps that would be useful in some cases, but I > > don't see it as required. > > Well, it'd not be enforced - it'd depend on your template. But for a > lot of types of files, it'd not make sense to store multiple > columns/rows in file. Particularly for ones where printing them out to > files is actually meaningful (i.e. binary ones). Having the template not require the row/column place-holders included strikes me as more likely to be confusing. My initial thinking around this was that users who actually want independent files would simply issue independent queries, while users who want to take a bunch of int4 columns and dump them into a single binary file would be able to do so easily. I'm not against adding the ability for a single query result to be saved into independent files, but it strikes me as feature creep on this basic capability. Further, I don't see any particular reason why splitting up the output from a query into multiple files is only relevant for binary data. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Logical Replication and Character encoding
At Wed, 5 Apr 2017 11:33:51 -0400, Peter Eisentraut wrote in <5401fef6-c0c0-7e8a-d8b1-169e30cbd...@2ndquadrant.com> > After further thinking, I prefer the alternative approach of using > pq_sendcountedtext() as is and sticking the trailing zero byte on on the > receiving side. This is a more localized change, and keeps the logical > replication protocol consistent with the main FE/BE protocol. (Also, we > don't need to send a useless byte around.) I'm not sure about the significance of the trailing zero in the the logical replication protocol. Anyway the patch works. > Patch attached, and also a test case. The problem was revealed when a string is shortened by encoding conversion. The test covers the situation. - The patches appliy on the master cleanly. - The patch works for the UTF-8 => EUC_JP case. - The test seems proper. By the way, an untranslatable character on the publisher table stops walsender with the following error. > ERROR: character with byte sequence 0xe6 0xbc 0xa2 in encoding "UTF8" has no > equivalent in encoding "LATIN1" > STATEMENT: COPY public.t TO STDOUT > LOG: could not send data to client: Broken pipe > FATAL: connection to client lost walreceiver stops on the opposite side with the following complaint. > ERROR: could not receive data from WAL stream: ERROR: character with byte > sequence 0xe6 0xbc 0xa2 in encoding "UTF8" has no equivalent in encoding > "LATIN1" > CONTEXT: COPY t, line 1: "" > LOG: worker process: logical replication worker for subscription 16391 sync > 16384 (PID 26915) exited with exit code 1 After this, walreceiver repeats reconnecting to master with no wait. Maybe walreceiver had better refrain from reconnection after certain kinds of faiure but it is not an urgent issue. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rewriting the test of pg_upgrade as a TAP test
On Wed, Apr 05, 2017 at 11:13:33AM -0400, Stephen Frost wrote: > * Tom Lane (t...@sss.pgh.pa.us) wrote: > > Stephen Frost writes: > > > I'm all for adding tests into pg_dump which do things like drop columns > > > and change column names and other cases which could impact if the > > > pg_dump is correct or not, and there's nothing preventing those tests > > > from being added in the existing structure. Certainly, before we remove > > > the coverage provided by running the serial test suite and then using > > > pg_upgrade, we should analyze what is being tested and ensure that we're > > > providing that same set of testing in the pg_dump TAP tests. > > > > I don't think you grasped my basic point, which is that I'm worried about > > emergent cases that we don't foresee needing to test (and that no amount > > of code coverage checking would have shown up as being overlooked). > > Admittedly, relying on the core regression tests to trigger such cases is > > a pretty haphazard strategy, but it's way better than no strategy at all. > > The tests that were added to the core regression suite were done so for > a reason and hopefully we can identify cases where it'd make sense to > also run those tests for pg_upgrade/pg_dump testing. I think you _are_ missing Tom's point. We've caught pg_dump and pg_upgrade bugs thanks to regression database objects created for purposes unrelated to pg_dump. It's true that there exist other test strategies that are more efficient or catch more bugs overall. None of them substitute 100% for the serendipity seen in testing dump/restore on the regression database. > More-or-less > anything that materially changes the catalog should be included, I would > think. Things that are only really only working with the heap/index > files don't really need to be performed because the pg_upgrade process > doesn't change those. That is formally true. Also, I agree with Andres that this is not a thread for discussing test changes beyond mechanical translation of the pg_upgrade test suite. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] strange parallel query behavior after OOM crashes
On Wed, Apr 5, 2017 at 8:17 PM, Neha Khatri wrote: > The problem here seem to be the change in the max_parallel_workers value > while the parallel workers are still under execution. So this poses two > questions: > > 1. From usecase point of view, why could there be a need to tweak the > max_parallel_workers exactly at the time when the parallel workers are at > play. > 2. Could there be a restriction on tweaking of max_parallel_workers while > the parallel workers are at play? At least do not allow setting the > max_parallel_workers less than the current # of active parallel workers. Well, that would be letting the tail wag the dog. The maximum value of max_parallel_workers is only 1024, and what we're really worried about here is seeing a value near PG_UINT32_MAX, which leaves a lot of daylight. How about just creating a #define that's used by guc.c as the maximum for the GUC, and here we assert that we're <= that value? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: new set of psql patches for loading (saving) data from (to) text, binary files
Hi, On 2017-04-05 21:07:59 -0400, Stephen Frost wrote: > * Andres Freund (and...@anarazel.de) wrote: > > I don't like the API here much. Loading requires knowledge of some > > magic $1 value and allows only a single column, printing doesn't mean > > much when there's multiple columns / rows. > > > I think the loading side of things should be redesigned into a more > > general facility for providing query parameters. E.g. something like > > \setparam $1 'whateva' > > \setparamfromfile $2 'somefile' > > \setparamfromprogram $3 cat /frakbar > > > > which then would get used in the next query sent to the server. That'd > > allow importing multiple columns, and it'd be useful for other purposes > > than just loading binary data. > > I tend to agree that the loading side should probably be thought through > more. > > > I don't yet have a good idea how to deal with moving individual cells > > into files, so they can be loaded. One approach would be to to have > > something like > > > > \storequeryresult filename_template.%row.%column > > > > which'd then print the current query buffer into the relevant file after > > doing replacement on %row and %column. > > I don't actually agree that there's a problem having the output from a > query stored direclty in binary form into a single file. The above > approach seems to imply that every binary result must go into an > independent file, and perhaps that would be useful in some cases, but I > don't see it as required. Well, it'd not be enforced - it'd depend on your template. But for a lot of types of files, it'd not make sense to store multiple columns/rows in file. Particularly for ones where printing them out to files is actually meaningful (i.e. binary ones). - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Adding support for Default partition in partitioning
On 2017/04/06 0:19, Robert Haas wrote: > On Wed, Apr 5, 2017 at 5:57 AM, Rahila Syed wrote: >>> Could you briefly elaborate why you think the lack global index support >>> would be a problem in this regard? >> I think following can happen if we allow rows satisfying the new partition >> to lie around in the >> default partition until background process moves it. >> Consider a scenario where partition key is a primary key and the data in the >> default partition is >> not yet moved into the newly added partition. If now, new data is added into >> the new partition >> which also exists(same key) in default partition there will be data >> duplication. If now >> we scan the partitioned table for that key(from both the default and new >> partition as we >> have not moved the rows) it will fetch the both rows. >> Unless we have global indexes for partitioned tables, there is chance of >> data duplication between >> child table added after default partition and the default partition. > > Yes, I think it would be completely crazy to try to migrate the data > in the background: > > - The migration might never complete because of a UNIQUE or CHECK > constraint on the partition to which rows are being migrated. > > - Even if the migration eventually succeeded, such a design abandons > all hope of making INSERT .. ON CONFLICT DO NOTHING work sensibly > while the migration is in progress, unless the new partition has no > UNIQUE constraints. > > - Partition-wise join and partition-wise aggregate would need to have > special case handling for the case of an unfinished migration, as > would any user code that accesses partitions directly. > > - More generally, I think users expect that when a DDL command > finishes execution, it's done all of the work that there is to do (or > at the very least, that any remaining work has no user-visible > consequences, which would not be the case here). OK, I realize the background migration was a poorly thought out idea. And a *first* version that will handle the row-movement should be doing that as part of the same command anyway. > IMV, the question of whether we have efficient ways to move data > around between partitions is somewhat separate from the question of > whether partitions can be defined in a certain way in the first place. > The problems that Keith refers to upthread already exist for > subpartitioning; you've got to detach the old partition, create a new > one, and then reinsert the data. And for partitioning an > unpartitioned table: create a replacement table, insert all the data, > substitute it for the original table. The fact that we have these > limitation is not good, but we're not going to rip out partitioning > entirely because we don't have clever ways of migrating the data in > those cases, and the proposed behavior here is not any worse. > > Also, waiting for those problems to get fixed might be waiting for > Godot. I'm not really all that sanguine about our chances of coming > up with a really nice way of handling these cases. In a designed > based on table inheritance, you can leave it murky where certain data > is supposed to end up and migrate it on-line and you might get away > with that, but a major point of having declarative partitioning at all > is to remove that sort of murkiness. It's probably not that hard to > come up with a command that locks the parent and moves data around via > full table scans, but I'm not sure how far that really gets us; you > could do the same thing easily enough with a sequence of commands > generated via a script. And being able to do this in a general way > without a full table lock looks pretty hard - it doesn't seem > fundamentally different from trying to perform a table-rewriting > operation like CLUSTER without a full table lock, which we also don't > support. The executor is not built to cope with any aspect of the > table definition shifting under it, and that includes the set of child > tables with are partitions of the table mentioned in the query. Maybe > the executor can be taught to survive such definitional changes at > least in limited cases, but that's a much different project than > allowing default partitions. Agreed. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Collect and use multi-column dependency stats
On 6 April 2017 at 13:05, David Rowley wrote: > I tested with the attached, and it does not seem to hurt planner > performance executing: Here's it again, this time with a comment on the find_relation_from_clauses() function. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services find_relation_from_clauses_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: new set of psql patches for loading (saving) data from (to) text, binary files
Andres, * Andres Freund (and...@anarazel.de) wrote: > I don't like the API here much. Loading requires knowledge of some > magic $1 value and allows only a single column, printing doesn't mean > much when there's multiple columns / rows. > I think the loading side of things should be redesigned into a more > general facility for providing query parameters. E.g. something like > \setparam $1 'whateva' > \setparamfromfile $2 'somefile' > \setparamfromprogram $3 cat /frakbar > > which then would get used in the next query sent to the server. That'd > allow importing multiple columns, and it'd be useful for other purposes > than just loading binary data. I tend to agree that the loading side should probably be thought through more. > I don't yet have a good idea how to deal with moving individual cells > into files, so they can be loaded. One approach would be to to have > something like > > \storequeryresult filename_template.%row.%column > > which'd then print the current query buffer into the relevant file after > doing replacement on %row and %column. I don't actually agree that there's a problem having the output from a query stored direclty in binary form into a single file. The above approach seems to imply that every binary result must go into an independent file, and perhaps that would be useful in some cases, but I don't see it as required. > I don't think we can find an API we agree upon in the next 48h... Now that there's more than one opinion being voiced on the API, I tend to agree with this. Hopefully we can keep the discussion moving forward, however, as I do see value in this capability in general. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables
On Wed, Apr 5, 2017 at 2:42 AM, Ashutosh Bapat wrote: > Only inner join conditions have equivalence classes associated with > those. Outer join conditions create single element equivalence > classes. So, we can not associate equivalence classes as they are with > partition scheme. If we could do that, it makes life much easier since > checking whether equi-join between all partition keys exist, is simply > looking up equivalence classes that cover joining relations and find > em_member corresponding to partition keys. OK. > It looks like we should only keep strategy, partnatts, partopfamily > and parttypcoll in PartitionScheme. A partition-wise join between two > relations would be possible if all those match. Yes, I think so. Conceivably you could even exclude partnatts and strategy, since there's nothing preventing a partitionwise join between a list-partitioned table and a range-partitioned table, or between a table range-partitioned on (a) and another range-partitioned on (a, b), but there is probably not much benefit in trying to cover such cases. I think it's reasonable to tell users that this is only going to work when the partitioning strategy is the same and the join conditions include all of the partitioning columns on both sides. > There's a relevant comment in 0006, build_joinrel_partition_info() > (probably that name needs to change, but I will do that once we have > settled on design) > + /* > +* Construct partition keys for the join. > +* > +* An INNER join between two partitioned relations is partition by key > +* expressions from both the relations. For tables A and B > partitioned by a and b > +* respectively, (A INNER JOIN B ON A.a = B.b) is partitioned by both A.a > +* and B.b. > +* > +* An OUTER join like (A LEFT JOIN B ON A.a = B.b) may produce rows with > +* B.b NULL. These rows may not fit the partitioning conditions imposed on > +* B.b. Hence, strictly speaking, the join is not partitioned by B.b. > +* Strictly speaking, partition keys of an OUTER join should include > +* partition key expressions from the OUTER side only. Consider a join > like > +* (A LEFT JOIN B on (A.a = B.b) LEFT JOIN C ON B.b = C.c. If we do not > +* include B.b as partition key expression for (AB), it prohibits us from > +* using partition-wise join when joining (AB) with C as there is no > +* equi-join between partition keys of joining relations. But two NULL > +* values are never equal and no two rows from mis-matching partitions can > +* join. Hence it's safe to include B.b as partition key expression for > +* (AB), even though rows in (AB) are not strictly partitioned by B.b. > +*/ > > I think that also needs to be reviewed carefully. The following passage from src/backend/optimizer/README seems highly relevant: === The planner's treatment of outer join reordering is based on the following identities: 1. (A leftjoin B on (Pab)) innerjoin C on (Pac) = (A innerjoin C on (Pac)) leftjoin B on (Pab) where Pac is a predicate referencing A and C, etc (in this case, clearly Pac cannot reference B, or the transformation is nonsensical). 2. (A leftjoin B on (Pab)) leftjoin C on (Pac) = (A leftjoin C on (Pac)) leftjoin B on (Pab) 3. (A leftjoin B on (Pab)) leftjoin C on (Pbc) = A leftjoin (B leftjoin C on (Pbc)) on (Pab) Identity 3 only holds if predicate Pbc must fail for all-null B rows (that is, Pbc is strict for at least one column of B). If Pbc is not strict, the first form might produce some rows with nonnull C columns where the second form would make those entries null. === In other words, I think your statement that null is never equal to null is a bit imprecise. Somebody could certainly create an operator that is named "=" which returns true in that case, and then they could say, hey, two nulls are equal (when you use that operator). The argument needs to be made in terms of the formal properties of the operator. The relevant logic is in have_partkey_equi_join: + /* Skip clauses which are not equality conditions. */ + if (rinfo->hashjoinoperator == InvalidOid && !rinfo->mergeopfamilies) + continue; Actually, I think the hashjoinoperator test is formally and practically unnecessary here; lower down there is a test to see whether the partitioning scheme's operator family is a member of rinfo->mergeopfamilies, which will certainly fail if we got through this test with rinfo->mergeopfamilies == NIL just on the strength of rinfo->hashjoinoperator != InvalidOid. So you can just bail out if rinfo->mergeopfamilies == NIL. But the underlying point here is that the only thing you really know about the function is that it's got to be a strategy-3 operator in some btree opclass; if that guarantees strictness, then so be it -- but I wasn't able to find anything in the code or documentation off-hand that supports that contention,
Re: [HACKERS] [COMMITTERS] pgsql: Collect and use multi-column dependency stats
On 6 April 2017 at 11:33, Tom Lane wrote: > David Rowley writes: >> On 6 April 2017 at 10:48, Tom Lane wrote: >>> The buildfarm is unhappy about the fact that this changed the API >>> for clauselist_selectivity(). I am not convinced that that change >>> was a good idea, so before telling FDW authors that they need to >>> change their code, I'd like to hear a defense of the API change. > >> Because varReliId is often passed in as 0, and that meant we'd have to >> write some code to check of the clause was made up of RestrictInfos >> from a single relation or not, and look for extended stats on that >> singleton rel. > > Generally, if it's passed as zero, that's a good clue that the clause > *is* a join clause. In any case, this defense fails to address my > other question, which is what's going to happen to this API when you > want to use extended stats in join-clause estimates, which I'd expect > to surely happen before very long. > > Also, I find it hard to believe that a bms_get_singleton_member call is > going to be material in comparison to all the work that will be invoked > indirectly via whatever selectivity estimation function gets called for > each clause. Even a single catcache fetch would swamp that. > > So no, you have not convinced me that this isn't a broken design. > >> FWIW, I found this function being called 72 times in a 5 way join >> search problem. > > And you measured the overhead of doing it the other way to be ... ? > Premature optimization and all that. I tested with the attached, and it does not seem to hurt planner performance executing: explain select * from ab ab1 inner join ab ab2 on ab1.a = ab2.a and ab1.b = ab2.b inner join ab ab3 on ab1.a = ab3.a and ab1.b = ab3.b inner join ab ab4 on ab1.a = ab4.a and ab1.b = ab4.b inner join ab ab5 on ab1.a = ab5.a and ab1.b = ab5.b inner join ab ab6 on ab1.a = ab6.a and ab1.b = ab6.b inner join ab ab7 on ab1.a = ab7.a and ab1.b = ab7.b inner join ab ab8 on ab1.a = ab8.a and ab1.b = ab8.b; after having executed: create table ab (a int, b int); I get: find_relation_from_clauses tps = 48.992918 (excluding connections establishing) tps = 49.060407 (excluding connections establishing) tps = 49.075815 (excluding connections establishing) Master tps = 48.938027 (excluding connections establishing) tps = 48.066274 (excluding connections establishing) tps = 48.727089 (excluding connections establishing) running pgbench -n -T 60 -f 8wayjoin.sql -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services find_relation_from_clauses.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: new set of psql patches for loading (saving) data from (to) text, binary files
On 2017-03-18 17:51:48 +0100, Pavel Stehule wrote: > What is done: > > create table foo foo(a bytea); > > -- import > insert into foo values($1) > \gloadfrom ~/xxx.jpg bytea > > -- export > \pset format binary > select a from foo > \g ~/xxx2.jpg > > tested on import 55MB binary file > > Comments, notes? I don't like the API here much. Loading requires knowledge of some magic $1 value and allows only a single column, printing doesn't mean much when there's multiple columns / rows. I think the loading side of things should be redesigned into a more general facility for providing query parameters. E.g. something like \setparam $1 'whateva' \setparamfromfile $2 'somefile' \setparamfromprogram $3 cat /frakbar which then would get used in the next query sent to the server. That'd allow importing multiple columns, and it'd be useful for other purposes than just loading binary data. I don't yet have a good idea how to deal with moving individual cells into files, so they can be loaded. One approach would be to to have something like \storequeryresult filename_template.%row.%column which'd then print the current query buffer into the relevant file after doing replacement on %row and %column. I don't think we can find an API we agree upon in the next 48h... - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench - allow to store select results into variables
> Well, personally, as an all-ASCII guy I'm not too fussed about that, > but I can see that other people might be annoyed. > > The main problem in dealing with it seems to be whether you're willing > to support pgbench running in non-backend-safe encodings (eg SJIS). > If we rejected that case then it'd be a relatively simple change to allow > pgbench to treat any high-bit-set byte as a valid variable name character. > (I think anyway, haven't checked the code.) > > Although ... actually, psql allows any high-bit-set byte in variable > names (cf valid_variable_name()) without concern about encoding. > That means it's formally incorrect in SJIS, but it's been like that > for an awful lot of years and nobody's complained. Maybe it'd be fine > for pgbench to act the same. That's my thought too. > Having said all that, I think we're at the point in the commitfest > where if there's any design question at all about a patch, it should > get booted to the next cycle. We are going to have more than enough > to do to stabilize what's already committed, we don't need to be > adding more uncertainty. Ok, I will move the patch to the next cf. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench - allow to store select results into variables
On 2017-04-05 20:24:19 -0400, Tom Lane wrote: > Having said all that, I think we're at the point in the commitfest > where if there's any design question at all about a patch, it should > get booted to the next cycle. We are going to have more than enough > to do to stabilize what's already committed, we don't need to be > adding more uncertainty. +1 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench - allow to store select results into variables
Tatsuo Ishii writes: > I still wonder whether I should commit this or not because this patch > does not allow none ascii column names. Well, personally, as an all-ASCII guy I'm not too fussed about that, but I can see that other people might be annoyed. The main problem in dealing with it seems to be whether you're willing to support pgbench running in non-backend-safe encodings (eg SJIS). If we rejected that case then it'd be a relatively simple change to allow pgbench to treat any high-bit-set byte as a valid variable name character. (I think anyway, haven't checked the code.) Although ... actually, psql allows any high-bit-set byte in variable names (cf valid_variable_name()) without concern about encoding. That means it's formally incorrect in SJIS, but it's been like that for an awful lot of years and nobody's complained. Maybe it'd be fine for pgbench to act the same. Having said all that, I think we're at the point in the commitfest where if there's any design question at all about a patch, it should get booted to the next cycle. We are going to have more than enough to do to stabilize what's already committed, we don't need to be adding more uncertainty. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] strange parallel query behavior after OOM crashes
On Wed, Apr 5, 2017 at 5:34 PM, Kuntal Ghosh wrote: > On Tue, Apr 4, 2017 at 12:16 PM, Neha Khatri > wrote: > > > I feel there should be an assert if > > (BackgroundWorkerData->parallel_register_count - > > BackgroundWorkerData->parallel_terminate_count) > max_parallel_workers) > > > Backend 1 > SET max_parallel_worker = 8; > Backend 1 > Execute a long running parallel query q1 with number of > parallel worker spawned is say, 4. > Backend 2> SET max_parallel_worker = 3; > Backend 2 > Execute a long running parallel query q2 with number of > parallel worker spawned > 0. > > The above assert statement will bring down the server unnecessarily > while executing q2. Right, with multiple backends trying to fiddle with max_parallel_workers, that might bring the server down with the said assert: Assert(parallel_register_count - parallel_terminate_count <= max_parallel_workers) The problem here seem to be the change in the max_parallel_workers value while the parallel workers are still under execution. So this poses two questions: 1. From usecase point of view, why could there be a need to tweak the max_parallel_workers exactly at the time when the parallel workers are at play. 2. Could there be a restriction on tweaking of max_parallel_workers while the parallel workers are at play? At least do not allow setting the max_parallel_workers less than the current # of active parallel workers. Regards, Neha
Re: [HACKERS] pgbench - allow to store select results into variables
Tom and others, I still wonder whether I should commit this or not because this patch does not allow none ascii column names. We know pgbench variable name has been restricted since the functionality was born. When users explicitly define a pgbench variable using \set, it is not a too strong limitation, because it's in a "pgbench world" anyway and nothing is related to PostgreSQL core specs. However, \gset is not happy with perfectly valid column names in PostgreSQL core, which looks too inconsistent and confusing for users. So the choices are: 1) commit the patch now with documenting the limitation. (the patch looks good to me except the issue above) 2) move it to next cf hoping that someone starts the implementation to eliminate the limitation of none ascii variable names. Comments? Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp > Hello Tatsuo-san, > >> It seems the new feature \gset doesn't work with tables having none >> ascii column names: > > Indeed. The same error is triggered with the \set syntax, which does > not involve any query execution. > > I have added a sentence mentionning the restriction when variables are > first discussed in the documentation, see attached patch. > > -- > Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast Default WIP patch for discussion
Hi Serge, On 2016-10-28 08:28:11 -0700, Serge Rielau wrote: > Time for me to dig into that then. Are you planning to update your POC at some point? This'd be a very welcome improvement. Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench more operators & functions
Hi, On 2017-03-16 12:21:31 -0400, David Steele wrote: > Any reviewers want to have a look? Unfortunately there wasn't much of that. I think that this patch atm doesn't have sufficient design agreement to be considered for v10. As the current CF has formally ended, I think we'll have to punt this to v11. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Collect and use multi-column dependency stats
David Rowley writes: > On 6 April 2017 at 10:48, Tom Lane wrote: >> The buildfarm is unhappy about the fact that this changed the API >> for clauselist_selectivity(). I am not convinced that that change >> was a good idea, so before telling FDW authors that they need to >> change their code, I'd like to hear a defense of the API change. > Because varReliId is often passed in as 0, and that meant we'd have to > write some code to check of the clause was made up of RestrictInfos > from a single relation or not, and look for extended stats on that > singleton rel. Generally, if it's passed as zero, that's a good clue that the clause *is* a join clause. In any case, this defense fails to address my other question, which is what's going to happen to this API when you want to use extended stats in join-clause estimates, which I'd expect to surely happen before very long. Also, I find it hard to believe that a bms_get_singleton_member call is going to be material in comparison to all the work that will be invoked indirectly via whatever selectivity estimation function gets called for each clause. Even a single catcache fetch would swamp that. So no, you have not convinced me that this isn't a broken design. > FWIW, I found this function being called 72 times in a 5 way join > search problem. And you measured the overhead of doing it the other way to be ... ? Premature optimization and all that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLock optimization for multicore Power machines
Andres Freund writes: > On 2017-04-03 11:56:13 -0700, Andres Freund wrote: >> Have you done x86 benchmarking? > I think unless such benchmarking is done in the next 24h we need to move > this patch to the next CF... In theory, inlining the _impl function should lead to exactly the same assembly code as before, on all non-PPC platforms. Confirming that would be a good substitute for benchmarking. Having said that, I'm not sure this patch is solid enough to push in right now, and would be happy with seeing it pushed to v11. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PROPOSAL] Temporal query processing with range types
Hi, On 2017-03-30 14:11:28 +0200, Peter Moser wrote: > 2017-03-01 10:56 GMT+01:00 Peter Moser : > > A similar walkthrough for ALIGN will follow soon. > > > > We are thankful for any suggestion or ideas, to be used to write a > > good SGML documentation. > > The attached README explains the ALIGN operation step-by-step with a > TEMPORAL LEFT OUTER JOIN example. That is, we start from a query > input, show how we rewrite it during parser stage, and show how the > final execution generates result tuples. Unfortunately I don't think this patch has received sufficient design and implementation to consider merging it into v10. As code freeze is in two days, I think we'll have to move this to the next commitfest. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: recursive json_populate_record()
On 04/04/2017 05:18 PM, Andrew Dunstan wrote: > > On 04/03/2017 05:17 PM, Andres Freund wrote: >> On 2017-03-21 14:31:08 -0400, Andrew Dunstan wrote: >>> On 03/21/2017 01:37 PM, David Steele wrote: On 3/16/17 11:54 AM, David Steele wrote: > On 2/1/17 12:53 AM, Michael Paquier wrote: >> On Thu, Jan 26, 2017 at 6:49 AM, Tom Lane wrote: >>> Nikita Glukhov writes: On 25.01.2017 23:58, Tom Lane wrote: > I think you need to take a second look at the code you're producing > and realize that it's not so clean either. This extract from > populate_record_field, for example, is pretty horrid: But what if we introduce some helper macros like this: #define JsValueIsNull(jsv) \ ((jsv)->is_json ? !(jsv)->val.json.str \ : !(jsv)->val.jsonb || (jsv)->val.jsonb->type == jbvNull) #define JsValueIsString(jsv) \ ((jsv)->is_json ? (jsv)->val.json.type == JSON_TOKEN_STRING \ : (jsv)->val.jsonb && (jsv)->val.jsonb->type == jbvString) >>> Yeah, I was wondering about that too. I'm not sure that you can make >>> a reasonable set of helper macros that will fix this, but if you want >>> to try, go for it. >>> >>> BTW, just as a stylistic thing, I find "a?b:c||d" unreadable: I have >>> to go back to the manual every darn time to convince myself whether >>> that means (a?b:c)||d or a?b:(c||d). It's better not to rely on >>> the reader (... or the author) having memorized C's precedence rules >>> in quite that much detail. Extra parens help. >> Moved to CF 2017-03 as discussion is going on and more review is >> needed on the last set of patches. >> > The current patches do not apply cleanly at cccbdde: > > $ git apply ../other/0001-introduce-JsonContainerXxx-macros-v04.patch > error: patch failed: src/backend/utils/adt/jsonb_util.c:328 > error: src/backend/utils/adt/jsonb_util.c: patch does not apply > error: patch failed: src/backend/utils/adt/jsonfuncs.c:1266 > error: src/backend/utils/adt/jsonfuncs.c: patch does not apply > error: patch failed: src/include/utils/jsonb.h:218 > error: src/include/utils/jsonb.h: patch does not apply > > In addition, it appears a number of suggestions have been made by Tom > that warrant new patches. Marked "Waiting on Author". This thread has been idle for months since Tom's review. The submission has been marked "Returned with Feedback". Please feel free to resubmit to a future commitfest. >>> Please revive. I am planning to look at this later this week. >> Since that was 10 days ago - you're still planning to get this in before >> the freeze? >> > > Hoping to, other demands have intervened a bit, but I might be able to. > OK, I have been through this and I think it's basically committable. I am currently doing some cleanup, such as putting all the typedefs in one place, fixing redundant comments and adding more comments, declaring all the static functions, and minor code changes, but nothing major. I should be ready to commit tomorrow some time. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Collect and use multi-column dependency stats
On 6 April 2017 at 10:48, Tom Lane wrote: > Simon Riggs writes: >> Collect and use multi-column dependency stats > > The buildfarm is unhappy about the fact that this changed the API > for clauselist_selectivity(). I am not convinced that that change > was a good idea, so before telling FDW authors that they need to > change their code, I'd like to hear a defense of the API change. > Why not just use the existing varRelid parameter for that? Why > is there an assumption that only one rel's extended stats will > ever be of interest? This function does get used for join clauses. Because varReliId is often passed in as 0, and that meant we'd have to write some code to check of the clause was made up of RestrictInfos from a single relation or not, and look for extended stats on that singleton rel. Given the number of times this function gets called during planning, especially so with queries involving many joins, I think it's best to not have to extract the correct relids each time. I coded it that way to reduce the overhead during planning, which is something that often comes up with planner patches. FWIW, I found this function being called 72 times in a 5 way join search problem. Perhaps there is a nicer way to do this, I just couldn't think of it. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLock optimization for multicore Power machines
Hi, On 2017-04-03 11:56:13 -0700, Andres Freund wrote: > > > +/* > > + * Generic implementation of pg_atomic_fetch_mask_add_u32() via loop > > + * of compare & exchange. > > + */ > > +static inline uint32 > > +pg_atomic_fetch_mask_add_u32_impl(volatile pg_atomic_uint32 *ptr, > > + uint32 mask_, > > uint32 add_) > > +{ > > + uint32 old_value; > > + > > + /* > > +* Read once outside the loop, later iterations will get the newer value > > +* via compare & exchange. > > +*/ > > + old_value = pg_atomic_read_u32_impl(ptr); > > + > > + /* loop until we've determined whether we could make an increment or > > not */ > > + while (true) > > + { > > + uint32 desired_value; > > + boolfree; > > + > > + desired_value = old_value; > > + free = (old_value & mask_) == 0; > > + if (free) > > + desired_value += add_; > > + > > + /* > > +* Attempt to swap in the value we are expecting. If we didn't > > see > > +* masked bits to be clear, that's just the old value. If we > > saw them > > +* as clear, we'll attempt to make an increment. The reason > > that we > > +* always swap in the value is that this doubles as a memory > > barrier. > > +* We could try to be smarter and only swap in values if we saw > > the > > +* maked bits as clear, but benchmark haven't shown it as > > beneficial > > +* so far. > > +* > > +* Retry if the value changed since we last looked at it. > > +*/ > > + if (pg_atomic_compare_exchange_u32_impl(ptr, &old_value, > > desired_value)) > > + return old_value; > > + } > > + pg_unreachable(); > > +} > > +#endif > > Have you done x86 benchmarking? I think unless such benchmarking is done in the next 24h we need to move this patch to the next CF... Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] C++ port of Postgres
Hi Peter, On 2017-02-28 22:30:16 -0800, Andres Freund wrote: > On 2017-02-28 23:42:45 -0500, Peter Eisentraut wrote: > > On 1/26/17 22:46, Andres Freund wrote: > > > On 2016-09-30 15:24:09 -0400, Peter Eisentraut wrote: > > >> Yeah, I have committed a few of the patches now and I'll close the CF > > >> entry now. Thanks for your research. > > > > > > Are you planning to push more of these at some point? > > > > Sure, let's take a look. > [ partial review ] Are you planning to get any of this into v10? There's one or two patches here that could qualify as bugfixes (the GinTernary stuff), but the rest doesn't strike me as something that should be committed at the tail end of the merge window. Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: proposal - psql: possibility to specify sort for describe commands, when size is printed
Hi, On 2017-03-11 13:06:13 +0100, Pavel Stehule wrote: > 2017-03-10 15:45 GMT+01:00 Alexander Korotkov : > > > On Fri, Mar 10, 2017 at 5:10 PM, Peter Eisentraut < > > peter.eisentr...@2ndquadrant.com> wrote: > > > >> On 2/24/17 16:32, Pavel Stehule wrote: > >> > set EXTENDED_DESCRIBE_SORT size_desc > >> > \dt+ > >> > \l+ > >> > \di+ > >> > > >> > Possible variants: schema_table, table_schema, size_desc, size_asc > >> > >> I can see this being useful, but I think it needs to be organized a > >> little better. > >> > >> Sort key and sort direction should be separate settings. > >> > > > > I agree. > > > > I'm not sure why we need to have separate settings to sort by schema > >> name and table name. > > > > > > I think sorting by schema name, object name makes sense for people, who > > have objects of same name in different schemas. > > > > I am sending a updated version with separated sort direction in special > variable > > There is a question. Has desc direction sense for columns like schema or > table name? > > Using desc, asc for size is natural. But for tablename? I think it's pretty clear that we don't have sufficient agreement on the design, not to speak of an implementation for an agreed upon design, to get this into v10. The patch also has been submitted late in the v10 cycle, and has received attention. I'm therefore moving it to the next commitfest. Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] scram and \password
On Thu, Apr 6, 2017 at 2:11 AM, Heikki Linnakangas wrote: > At a quick glance, moving pg_frontend_random() to src/common looks like a > non-starter. It uses pglock_thread() which is internal to libpq, so it won't > compile as it is. I think I'm going to change scram_build_verifier() to take > a pre-generated salt as argument, to avoid the need for a random number > generator in src/common. Oops. Need an updated set of patches? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: new set of psql patches for loading (saving) data from (to) text, binary files
Hi, On 2017-03-18 17:51:48 +0100, Pavel Stehule wrote: > What is done: > > create table foo foo(a bytea); > > -- import > insert into foo values($1) > \gloadfrom ~/xxx.jpg bytea > > -- export > \pset format binary > select a from foo > \g ~/xxx2.jpg > > tested on import 55MB binary file > > Comments, notes? > > Available import formats are limited to text, bytea, xml - these formats > are safe for receiving data via recv function. I don't think we have design agreement on this at this point. Given the upcoming code freeze, I think we'll have to hash this out during the next development cycle. Any counterarguments? - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Bug with pg_basebackup and 'shared' tablespace
Hi On our servers, we are running different PostgreSQL versions because we can not migrate every application at the same time to Pg 9.6… Since we have several disks, we use tablespaces and, for historical reasons, we used the same folder for both Pg versions, say /mnt/ssd/postgres The server has absolutely no issue with that, there is not a single warning when running CREATE TABLESPACE. But it all gets messy when we want to create a streaming standby server using pg_basebackup. When backuping Pg 9.5, there is no issue, but backuping Pg 9.6 afterwards will say "directory "/mnt/ssd/postgres" exists but is not empty". The attached script to this mail will easily reproduce that issue if you did not understand me. Just... read it before launching it, it's a dirty script to reproduce the issue. :) I have looked a bit at the pg_basebackup code and the replication protocol, and I did not find any simple way to solve the issue. PostgreSQL use the CATALOG_VERSION_NO to have one folder per Pg version, but that constant is not accessible out of the PostgreSQL code and is not exposed in the replication protocol as described in the documentation https://www.postgresql.org/docs/current/static/protocol-replication.html The only easy way I see to fix that issue is to alter the replication protocol to expose the CATALOG_VERSION_NO constant, making it possible for pg_basebackup to know the proper path to check. Another way would be to change the pg_basebackup logic and backup the main data folder in order to be able to read the pg_control file… but this seems ugly too. I am willing to write the patch for this issue, but I would appreciate some help to find a proper way to fix it. Thanks Pierre reproduce-bug-tblspace.sh Description: application/shellscript signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki
Hi, On 2017-04-02 22:28:40 +0200, Jan Michálek wrote: > 2017-03-23 17:26 GMT+01:00 Pierre Ducroquet : > > The new status of this patch is: Waiting on Author > > > > Corrected problem with \pset linestyle when format is set to markdown, or > rst. > > Corrected tuples only for markdown and rst (normal and expanded) This patch still is undergoing development and review, and the code freeze is closing in rapidly. As this patch has been submitted just before the last commitfest in the v10 cycle and has received plenty feedback, I think it's fair to move this to the next commitfest. Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GUC for cleanup indexes threshold.
Hi, On 2017-04-01 03:05:07 +0900, Masahiko Sawada wrote: > On Fri, Mar 31, 2017 at 11:44 PM, Robert Haas wrote: > [ lots of valuable discussion ] I think this patch clearly still is in the design stage, and has received plenty feedback this CF. I'll therefore move this to the next commitfest. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Collect and use multi-column dependency stats
Simon Riggs writes: > Collect and use multi-column dependency stats The buildfarm is unhappy about the fact that this changed the API for clauselist_selectivity(). I am not convinced that that change was a good idea, so before telling FDW authors that they need to change their code, I'd like to hear a defense of the API change. Why not just use the existing varRelid parameter for that? Why is there an assumption that only one rel's extended stats will ever be of interest? This function does get used for join clauses. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rewriting the test of pg_upgrade as a TAP test
On Wed, Apr 5, 2017 at 10:24 PM, Stephen Frost wrote: > * Michael Paquier (michael.paqu...@gmail.com) wrote: >> 1) Initialize the old cluster and start it. >> 2) create a bunch of databases with full range of ascii characters. >> 3) Run regression tests. >> 4) Take dump on old cluster. >> 4) Stop the old cluster. >> 5) Initialize the new cluster. >> 6) Run pg_upgrade. >> 7) Start new cluster. >> 8) Take dump from it. >> 9) Run deletion script (Oops forgot this part!) > > Presumably the check to match the old dump against the new one is also > performed? Yes. That's run with command_ok() at the end. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: Batch/pipelining support for libpq
Hi, On 2017-04-05 17:00:42 +1000, Vaishnavi Prabakaran wrote: > Regarding test patch, I have corrected the test suite after David Steele's > comments. > Also, I would like to mention that a companion patch was submitted by David > Steele up-thread. > > Attached the latest code and test patch. My impression is that this'll need a couple more rounds of review. Given that this'll establish API we'll pretty much ever going to be able to change/remove, I think it'd be a bad idea to rush this into v10. Therefore I propose moving this to the next CF. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : For Auto-Prewarm.
On 2017-03-13 18:45:00 +0530, Mithun Cy wrote: > I have implemented a similar logic now. The prewarm bgworker will > launch a sub-worker per database in the dump file. And, each > sub-worker will load its database block info. The sub-workers will be > launched only after previous one is finished. All of this will only > start if the database has reached a consistent state. Hm. For replay performance it'd possibly be good to start earlier, before reaching consistency. Is there an issue starting earlier? > diff --git a/contrib/pg_prewarm/autoprewarm.c > b/contrib/pg_prewarm/autoprewarm.c > new file mode 100644 > index 000..f4b34ca > --- /dev/null > +++ b/contrib/pg_prewarm/autoprewarm.c > @@ -0,0 +1,1137 @@ > +/*- > + * > + * autoprewarm.c > + * > + * -- Automatically prewarm the shared buffer pool when server restarts. Don't think we ususally use -- here. > + * Copyright (c) 2013-2017, PostgreSQL Global Development Group Hm, that's a bit of a weird date range. > + * IDENTIFICATION > + * contrib/pg_prewarm.c/autoprewarm.c > + *- > + */ The pg_prewarm.c in there looks like some search & replace gone awry. > +#include "postgres.h" > +#include > + > +/* These are always necessary for a bgworker. */ > +#include "miscadmin.h" > +#include "postmaster/bgworker.h" > +#include "storage/ipc.h" > +#include "storage/latch.h" > +#include "storage/lwlock.h" > +#include "storage/proc.h" > +#include "storage/shmem.h" > + > +/* These are necessary for prewarm utilities. */ > +#include "pgstat.h" > +#include "storage/buf_internals.h" > +#include "storage/smgr.h" > +#include "utils/memutils.h" > +#include "utils/resowner.h" > +#include "utils/guc.h" > +#include "catalog/pg_class.h" > +#include "catalog/pg_type.h" > +#include "executor/spi.h" > +#include "access/xact.h" > +#include "utils/rel.h" > +#include "port/atomics.h" I'd rather just sort these alphabetically. I think this should rather be in the initial header. > +/* > + * autoprewarm : > + * > + * What is it? > + * === > + * A bgworker which automatically records information about blocks which were > + * present in buffer pool before server shutdown and then prewarm the buffer > + * pool upon server restart with those blocks. > + * > + * How does it work? > + * = > + * When the shared library "pg_prewarm" is preloaded, a > + * bgworker "autoprewarm" is launched immediately after the server has > reached > + * consistent state. The bgworker will start loading blocks recorded in the > + * format BlockInfoRecord > + * <> in > + * $PGDATA/AUTOPREWARM_FILE, until there is a free buffer left in the buffer > + * pool. This way we do not replace any new blocks which were loaded either > by > + * the recovery process or the querying clients. s/until there is a/until there is no/? > +/* > + * > > + * ===SIGNAL HANDLERS > === > + * > > + */ Hm... > +static void sigtermHandler(SIGNAL_ARGS); > +static void sighupHandler(SIGNAL_ARGS); I don't think that's a casing we commonly use. We mostly use CamelCase or underscore_case. > +/* > + * Signal handler for SIGUSR1. > + */ > +static void > +sigusr1Handler(SIGNAL_ARGS) > +{ > + int save_errno = errno; > + > + if (MyProc) > + SetLatch(&MyProc->procLatch); > + > + errno = save_errno; > +} Hm, what's this one for? > +/* > + * Shared state information about the running autoprewarm bgworker. > + */ > +typedef struct AutoPrewarmSharedState > +{ > + pg_atomic_uint32 current_task; /* current tasks performed by > + > * autoprewarm workers. */ > +} AutoPrewarmSharedState; Hm. Why do we need atomics here? I thought there's no concurrency? > +/* > + * sort_cmp_func - compare function used for qsort(). > + */ > +static int > +sort_cmp_func(const void *p, const void *q) > +{ rename to blockinfo_cmp? > +static AutoPrewarmTask > +get_autoprewarm_task(AutoPrewarmTask todo_task) > +{ > + boolfound; > + > + state = NULL; > + > + LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE); > + state = ShmemInitStruct("autoprewarm", > + > sizeof(AutoPrewarmSharedState), > + &found); > + if (!found) > + pg_atomic_write_u32(&(state->current_task), todo_task); Superflous parens (repeated a lot). > + LWLockRelease(AddinShmemInitLock); > + > + /* If found check if we can go ahead. */ > + if (found) > + { > + if (pg_atomic
Re: [HACKERS] multivariate statistics (v25)
On 6 April 2017 at 10:17, Simon Riggs wrote: > On 5 April 2017 at 10:47, David Rowley wrote: > >> I've attached an updated patch to address Tomas' concerns and yours too. > > Commited, with some doc changes and additions based upon my explorations. Great. Thanks for committing! -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multivariate statistics (v25)
On 5 April 2017 at 10:47, David Rowley wrote: > I've attached an updated patch to address Tomas' concerns and yours too. Commited, with some doc changes and additions based upon my explorations. For the record, I measured the time to calc extended statistics as +800ms on 2 million row sample. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multivariate statistics (v25)
On 6 April 2017 at 07:19, Tels wrote: > I know I'm a bit late, but isn't the syntax backwards? > > "CREATE STATISTICS s1 WITH (dependencies) ON (col_a, col_b) FROM table;" > > These do it the other way round: > > CREATE INDEX idx ON table (col_a); > > AND: > >CREATE TABLE t ( > id INT REFERENCES table_2 (col_b); >); > > Won't this be confusing and make things hard to remember? > > Sorry for not asking earlier, I somehow missed this. The reasoning is in [1] [1] https://www.postgresql.org/message-id/CAEZATCUtGR+U5+QTwjHhe9rLG2nguEysHQ5NaqcK=vbj78v...@mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan
On 2017-04-05 17:22:34 -0400, Tom Lane wrote: > Andres Freund writes: > > I'd like some input from other committers whether we want this. I'm > > somewhat doubtful, but don't have particularly strong feelings. > > I don't really want to expose the workings of the plancache at user level. > The heuristics it uses certainly need work, but it'll get hard to change > those once there are SQL features depending on it. > > Also, as you note, there are debatable design decisions in this particular > patch. There are already a couple of ways in which control knobs can be > attached to plgsql functions (i.e. custom GUCs and the comp_option stuff), > so why is this patch wanting to invent yet another fundamental mechanism? > And I'm not very happy about it imposing a new reserved keyword, either. > > A bigger-picture question is why we'd only provide such functionality > in plpgsql, and not for other uses of prepared plans. > > Lastly, it doesn't look to me like the test cases prove anything at all > about whether the feature does what it's claimed to. That echoes my perception - so let's move this to the next CF? It's not like this patch has been pending for very long. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] partitioned tables and contrib/sepgsql
On Wed, Apr 5, 2017 at 1:22 PM, Mike Palmiotto wrote: > On Wed, Apr 5, 2017 at 1:19 PM, Tom Lane wrote: >> Peter Eisentraut writes: >>> On 4/5/17 12:04, Tom Lane wrote: Conclusion: Fedora's gcc is playing fast and loose somehow with the command "#include "; that does not include the file you'd think it does, it does something magic inside the compiler. The magic evidently includes not complaining about duplicate macro definitions for true and false. >> >>> Perhaps -Wsystem-headers would change the outcome in your case. >> >> Hah, you're right: with that, >> >> In file included from /usr/include/selinux/label.h:9:0, >> from label.c:40: >> /usr/lib/gcc/x86_64-redhat-linux/6.3.1/include/stdbool.h:34:0: warning: >> "true" redefined >> #define true 1 >> >> In file included from ../../src/include/postgres.h:47:0, >> from label.c:11: >> ../../src/include/c.h:206:0: note: this is the location of the previous >> definition >> #define true ((bool) 1) >> >> and likewise for "false". So that mystery is explained. >> >> I stand by my previous patch suggestion, except that we can replace >> the parenthetical comment with something like "(We don't care if >> redefines "true"/"false"; those are close enough.)". >> > > Sounds good. Updated patchset will include that verbiage, along with > some regression tests for partitioned tables. Looks like the label regression test is failing even without these changes. Weirdly enough, this only happens in one place: 84 SELECT objtype, objname, label FROM pg_seclabels 85 WHERE provider = 'selinux' AND objtype = 'column' AND (objname like 't3.%' OR objname like 't4.%'); I haven't figured out why this may be (yet), but it seems like we shouldn't assume the order of output from a SELECT. As I understand it, the order of the output is subject to change with changes to the planner/configuration. I'm going to hold the partition table regression changes for a separate patch and include some ORDER BY fixes. Will post tomorrow In the meantime, attached are the latest and greatest patches. Thanks, -- Mike Palmiotto Software Engineer Crunchy Data Solutions https://crunchydata.com From 50969159f0fd7c93a15bfb7b9046512399d0b099 Mon Sep 17 00:00:00 2001 From: Mike Palmiotto Date: Wed, 29 Mar 2017 14:59:37 + Subject: [PATCH 3/3] Add partitioned table support to sepgsql Account for RELKIND_PARTITIONED_RELATIONS in sepgsql and treat the objects like regular relations. This allows for proper object_access hook behavior for partitioned tables. --- contrib/sepgsql/label.c| 3 ++- contrib/sepgsql/relation.c | 30 +- 2 files changed, 23 insertions(+), 10 deletions(-) diff --git a/contrib/sepgsql/label.c b/contrib/sepgsql/label.c index f7d79ab..1a1ec57 100644 --- a/contrib/sepgsql/label.c +++ b/contrib/sepgsql/label.c @@ -789,7 +789,8 @@ exec_object_restorecon(struct selabel_handle * sehnd, Oid catalogId) case RelationRelationId: relForm = (Form_pg_class) GETSTRUCT(tuple); -if (relForm->relkind == RELKIND_RELATION) +if (relForm->relkind == RELKIND_RELATION || + relForm->relkind == RELKIND_PARTITIONED_TABLE) objtype = SELABEL_DB_TABLE; else if (relForm->relkind == RELKIND_SEQUENCE) objtype = SELABEL_DB_SEQUENCE; diff --git a/contrib/sepgsql/relation.c b/contrib/sepgsql/relation.c index b794abe..0d8905c 100644 --- a/contrib/sepgsql/relation.c +++ b/contrib/sepgsql/relation.c @@ -54,12 +54,14 @@ sepgsql_attribute_post_create(Oid relOid, AttrNumber attnum) ObjectAddress object; Form_pg_attribute attForm; StringInfoData audit_name; + char relkind; /* * Only attributes within regular relation have individual security * labels. */ - if (get_rel_relkind(relOid) != RELKIND_RELATION) + relkind = get_rel_relkind(relOid); + if (relkind != RELKIND_RELATION && relkind != RELKIND_PARTITIONED_TABLE) return; /* @@ -135,8 +137,10 @@ sepgsql_attribute_drop(Oid relOid, AttrNumber attnum) { ObjectAddress object; char *audit_name; + char relkind; - if (get_rel_relkind(relOid) != RELKIND_RELATION) + relkind = get_rel_relkind(relOid); + if (relkind != RELKIND_RELATION && relkind != RELKIND_PARTITIONED_TABLE) return; /* @@ -167,8 +171,11 @@ sepgsql_attribute_relabel(Oid relOid, AttrNumber attnum, { ObjectAddress object; char *audit_name; + char relkind; - if (get_rel_relkind(relOid) != RELKIND_RELATION) + relkind = get_rel_relkind(relOid); + + if (relkind != RELKIND_RELATION && relkind != RELKIND_PARTITIONED_TABLE) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("cannot set security label on non-regular columns"))); @@ -209,8 +216,11 @@ sepgsql_attribute_setattr(Oid relOid, AttrNumber attnum) { ObjectAddress object; char *audit_name; + char relkind; + + relkind = get_rel_relkind(relOid); - if (get_rel_relkind(relOid) != RELKIND_RELATION) + if (relkind != RELKIND_RELATION && relkind
Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan
Andres Freund writes: > I'd like some input from other committers whether we want this. I'm > somewhat doubtful, but don't have particularly strong feelings. I don't really want to expose the workings of the plancache at user level. The heuristics it uses certainly need work, but it'll get hard to change those once there are SQL features depending on it. Also, as you note, there are debatable design decisions in this particular patch. There are already a couple of ways in which control knobs can be attached to plgsql functions (i.e. custom GUCs and the comp_option stuff), so why is this patch wanting to invent yet another fundamental mechanism? And I'm not very happy about it imposing a new reserved keyword, either. A bigger-picture question is why we'd only provide such functionality in plpgsql, and not for other uses of prepared plans. Lastly, it doesn't look to me like the test cases prove anything at all about whether the feature does what it's claimed to. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] possible encoding issues with libxml2 functions
2017-03-17 4:23 GMT+01:00 Noah Misch : > On Sun, Mar 12, 2017 at 10:26:33PM +0100, Pavel Stehule wrote: > > 2017-03-12 21:57 GMT+01:00 Noah Misch : > > > On Sun, Mar 12, 2017 at 08:36:58PM +0100, Pavel Stehule wrote: > > > > 2017-03-12 0:56 GMT+01:00 Noah Misch : > > > Please add a test case. > I am sending test case. I am afraid so this cannot be part of regress tests due strong dependency on locale win1250. > > > > It needs a application - currently there is not possibility to import XML > > document via recv API :( > > I think xml_in() can create every value that xml_recv() can create; > xml_recv() > is just more convenient given diverse source encodings. If you make your > application store the value into a table, does "pg_dump --inserts" emit > code > that reproduces the same value? If so, you can use that in your test case. > If not, please provide precise instructions (code, SQL commands) for > reproducing the bug manually. > I was wrong - both methods produces broken internal XML document 1 Alter Ego de Palmer 63 2012 0 0 0 0 1425 1085 0 0 51 % Merlot, 40 % Cabernet Sauvignon,9 % Petit Verdot 2017 - 2026 2 0 0 0 0 0 13,4 % Premiant oblasti Margaux Ch. Palmer tentokrát ve svých obou vínech těžil z dokonale zralého Merlotu, kterého do svých směsí naládoval více než Cabernet Sauvignonu. 2. víno Alter Ego de Palmer 2012 se může p ochlubit skvělou kondicí. Není pochyb o tom, že na Ch. Palmer sklízeli z vinice dokonale zralé hrozny. Alter Ego je mohutné, husté a syté víno, nabízí objemnou dávku ovoce, voní po ostružinách, chuť je krásně kulatá, pevná a svěží, taniny vykazují fenomenální strukturu, takto krásné spojení všech aromatických a chuťových složek s příměsí úžasných kyselin a alkoholu je radost mít ve sklepě. Robert Parker: /100 TOPVINO SCORE: 92-94/100 James Suckling: 92-93/100 Wine Spectator: 90-93/100 1 8 1 Document is well encoded from win1250 to UTF8 - it is well readable, but the header is wrong - it is not in win1250 now (after encoding). This xml is attached (in original form (win1250 encoding)). Import with create table target(x xml); \set xml `cat ~/Downloads/e6.xml` postgres=# set client_encoding to win1250; postgres=# insert into target values(:'xml'); disconnect or reset client encoding Document should be correctly inserted. Then run a query postgres=# select * from target, xpath('/enprimeur/vino', x); ERROR: could not parse XML document DETAIL: input conversion failed due to input error, bytes 0x88 0x73 0x6B 0x75 input conversion failed due to input error, bytes 0x88 0x73 0x6B 0x75 encoder error line 25: Premature end of data in tag remark line 25 line 25: Premature end of data in tag vino line 3 line 25: Premature end of data in tag enprimeur line 2 select xmltable.* from target, xmltable('/enprimeur/vino' passing x columns id int, nazev text, remark text); XMLTABLE is not vulnerable against this bug and result should be correct. when you do select x from target you can see a correct xml without header but select x::text from target; shows wrong header > > > > Why not use xml_parse() instead of calling xmlCtxtReadMemory() > directly? > > > The > > > answer is probably in the archives, because someone understood the > problem > > > enough to document "Some XML-related functions may not work at all on > > > non-ASCII data when the server encoding is not UTF-8. This is known to > be > > > an > > > issue for xpath() in particular." > > > > > > Probably there are two possible issues > > Would you research in the archives to confirm? > > > 1. what I touched - recv function does encoding to database encoding - > but > > document encoding is not updated. > > Using xml_parse() would fix that, right? > It should to help, because it cuts a header - but it does little bit more work, than we would - it check if xml is doc or not too. In mostly functions there the important work is done in parse_xml_decl function One possible fix - and similar technique is used more times in xml.c code .. xmlroot diff --git a/src/backend/utils/adt/xml.c
Re: [HACKERS] pg_stat_wal_write statistics view
Hi, On 2017-03-30 13:10:41 +1100, Haribabu Kommi wrote: > diff --git a/src/backend/access/transam/xlog.c > b/src/backend/access/transam/xlog.c > index 5d58f09..a29c108 100644 > --- a/src/backend/access/transam/xlog.c > +++ b/src/backend/access/transam/xlog.c > @@ -600,6 +600,9 @@ typedef struct XLogCtlData >*/ > XLogwrtResult LogwrtResult; > > + /* Protected by WALWriteLock */ > + PgStat_WalWritesCounts stats; > /* > + * Check whether the current process is a normal backend or not. > + * This function checks for the background processes that does > + * some WAL write activity only and other background processes > + * are not considered. It considers all the background workers > + * as WAL write activity workers. > + * > + * Returns FALSE - when the current process is a normal backend > + * TRUE - when the current process a background process/worker > + */ I don't think we commonly capitalize true/false these days. > +static bool > +am_background_process() > +{ > + /* check whether current process is a background process/worker? */ > + if (AmBackgroundWriterProcess() || > + AmWalWriterProcess() || > + AmCheckpointerProcess() || > + AmStartupProcess() || > + IsBackgroundWorker || > + am_walsender || > + am_autovacuum_worker) > + { > + return true; > + } > + > + return false; > +} Uhm, wouldn't inverting this be a lot easier and future proof? There's far fewer non-background processes. > +/* > * Write and/or fsync the log at least as far as WriteRqst indicates. > * > * If flexible == TRUE, we don't have to write as far as WriteRqst, but > @@ -2341,6 +2377,9 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible) > int npages; > int startidx; > uint32 startoffset; > + instr_time io_start, > + io_time; > + boolis_background_process = am_background_process(); > > /* We should always be inside a critical section here */ > Assert(CritSectionCount > 0); > @@ -2458,6 +2497,11 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible) > /* OK to write the page(s) */ > from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ; > nbytes = npages * (Size) XLOG_BLCKSZ; > + > + /* Start timer to acquire start time of the wal write */ > + if (track_io_timing) > + INSTR_TIME_SET_CURRENT(io_start); > + I'm more than a bit hesitant adding overhead to WAL writing - it's already quite a bit of a bottleneck. Normally track_io_timing just makes things a tiny bit slower, but doesn't cause a scalability issue, here it does. This is all done under an already highly contended lock. > nleft = nbytes; > do > { > @@ -2480,6 +2524,34 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible) > from += written; > } while (nleft > 0); > > + /* calculate the total time spent for wal writing */ > + if (track_io_timing) > + { > + INSTR_TIME_SET_CURRENT(io_time); > + INSTR_TIME_SUBTRACT(io_time, io_start); > + > + if (is_background_process) > + XLogCtl->stats.total_write_time += > INSTR_TIME_GET_MILLISEC(io_time); > + else > + XLogCtl->stats.backend_total_write_time > += INSTR_TIME_GET_MILLISEC(io_time); > + } > + else > + { > + XLogCtl->stats.total_write_time = 0; > + XLogCtl->stats.backend_total_write_time = 0; > + } > + > + /* check whether writer is a normal backend or not? */ > + if (is_background_process) > + XLogCtl->stats.writes++; > + else > + XLogCtl->stats.backend_writes++; > + > + if (is_background_process) > + XLogCtl->stats.write_blocks += npages; > + else > + XLogCtl->stats.backend_write_blocks += npages; > + > /* Update state for write */ > openLogOff += nbytes; > npages = 0; > @@ -2499,8 +2571,29 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible) >*/ > if (finishing_seg) > { > + /* Start timer to acquire start time of the wal > sync */ > + if
Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan
Hi, I'd like some input from other committers whether we want this. I'm somewhat doubtful, but don't have particularly strong feelings. > + > + > + Block level PRAGMA > + > + > +PRAGMA > +in PL/pgSQL > + > + > + > +The block level PRAGMA allows to change the > +PL/pgSQL compiler behavior. Currently > +only PRAGMA PLAN_CACHE is supported. Why are we doing this on a block level? > + > +CREATE FUNCTION enforce_fresh_plan(_id text) RETURNS boolean AS $$ > +DECLARE > + PRAGMA PLAN_CACHE(force_custom_plan); > +BEGIN > + -- in this block every embedded query uses one shot plan *plans > + > + PRAGMA PLAN_CACHE > + > + > + The plan cache behavior can be controlled using > + PRAGMA PLAN_CACHE. This PRAGMA can be used both > + for whole function or in individual blocks. The following options are *functions > + possible: DEFAULT - default > + PL/pgSQL implementation - the system tries > + to decide between custom plan and generic plan after five query > + executions, FORCE_CUSTOM_PLAN - the chosen execution > + plan will be the one shot plan - it is specific for every set of > + used paramaters, FORCE_GENERIC_PLAN - the generic > + plan will be used from the start. I don't think it's a good idea to explain this here, this'll just get outdated. I think we should rather have a link here. > + > + > + > + > + PRAGMA PLAN_CACHE > + in PL/pgSQL > + > + The plan for INSERT is always a generic > plan. That's this specific insert, right? Should be mentioned, sounds more generic to me. > +/* -- > + * Returns pointer to current compiler settings > + * -- > + */ > +PLpgSQL_settings * > +plpgsql_current_settings(void) > +{ > + return current_settings; > +} > + > + > +/* -- > + * Setup default compiler settings > + * -- > + */ > +void > +plpgsql_settings_init(PLpgSQL_settings *settings) > +{ > + current_settings = settings; > +} Hm. This is only ever set to &default_settings. > +/* -- > + * Set compiler settings > + * -- > + */ > +void > +plpgsql_settings_set(PLpgSQL_settings *settings) > +{ > + PLpgSQL_nsitem *ns_cur = ns_top; > + > + /* > + * Modify settings directly, when ns has local settings data. > + * When ns uses shared settings, create settings first. > + */ > + while (ns_cur->itemtype != PLPGSQL_NSTYPE_LABEL) > + ns_cur = ns_cur->prev; > + > + if (ns_cur->local_settings == NULL) > + { > + ns_cur->local_settings = palloc(sizeof(PLpgSQL_settings)); > + ns_cur->local_settings->prev = current_settings; > + current_settings = ns_cur->local_settings; > + } > + > + current_settings->cursor_options = settings->cursor_options; > +} This seems like a somewhat weird method. Why do we have a global settings, when we essentially just want to use something in the current ns? - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PG_GETARG_GISTENTRY?
> On Apr 5, 2017, at 1:12 PM, Tom Lane wrote: > > Mark Dilger writes: >> I have written a patch to fix these macro definitions across src/ and >> contrib/. >> Find the patch, attached. All regression tests pass on my Mac laptop. > > Thanks for doing the legwork on that. You are welcome. > This seems a bit late for v10, > especially since it's only cosmetic Agreed. > , but please put it in the first > v11 commitfest. Done. > >> I don't find any inappropriate uses of _P where _PP would be called for. I >> do, >> however, notice that some datatypes' functions are written to use >> PG_GETARG_*_P >> where PG_GETARG_*_PP might be more efficient. > > Yeah. I think Noah did some work in that direction already, but I don't > believe he claimed to have caught everything. Feel free to push further. Thanks for clarifying. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PG_GETARG_GISTENTRY?
Mark Dilger writes: > I have written a patch to fix these macro definitions across src/ and > contrib/. > Find the patch, attached. All regression tests pass on my Mac laptop. Thanks for doing the legwork on that. This seems a bit late for v10, especially since it's only cosmetic, but please put it in the first v11 commitfest. > I don't find any inappropriate uses of _P where _PP would be called for. I > do, > however, notice that some datatypes' functions are written to use > PG_GETARG_*_P > where PG_GETARG_*_PP might be more efficient. Yeah. I think Noah did some work in that direction already, but I don't believe he claimed to have caught everything. Feel free to push further. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PG_GETARG_GISTENTRY?
> On Apr 5, 2017, at 9:23 AM, Tom Lane wrote: > > Robert Haas writes: >> On Wed, Mar 29, 2017 at 11:54 PM, Tom Lane wrote: >>> Andres Freund writes: we have a good number of '(GISTENTRY *) PG_GETARG_POINTER(n)' in our code - looks a bit better & shorter to have PG_GETARG_GISTENTRY(n). > >>> Should be PG_GETARG_GISTENTRY_P to match existing conventions, >>> otherwise +1 > >> I have never quite understood why some of those macros have _P or _PP >> on the end and others don't. > > _P means "pointer to". _PP was introduced later to mean "pointer to > packed (ie, possibly short-header) datum". Macros that mean to fetch > pointers to pass-by-ref data, but aren't using either of those naming > conventions, are violating project conventions, not least because you > don't know what they're supposed to do with short-header varlena input. > If I had a bit more spare time I'd run around and change any such macros. > > In short, if you are supposed to write > > FOO *val = PG_GETARG_FOO(n); > > then the macro designer blew it, because the name implies that it > returns FOO, not pointer to FOO. This should be > > FOO *val = PG_GETARG_FOO_P(n); > > regards, tom lane I have written a patch to fix these macro definitions across src/ and contrib/. Find the patch, attached. All regression tests pass on my Mac laptop. I don't find any inappropriate uses of _P where _PP would be called for. I do, however, notice that some datatypes' functions are written to use PG_GETARG_*_P where PG_GETARG_*_PP might be more efficient. Varbit's bitoctetlength function could detoast only the header ala PG_DETOAST_DATUM_SLICE to return the octet length, rather than detoasting the whole thing. But that seems a different issue, and patches to change that might have been rejected in the past so far as I know, so I did not attempt any such changes here. Mark Dilger PG_GETARG_foo_P.patch.1 Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] increasing the default WAL segment size
On 5 April 2017 at 08:36, Peter Eisentraut wrote: > On 4/5/17 06:04, Beena Emerson wrote: >> I suggest the next step is to dial up the allowed segment size in >> configure and run some tests about what a reasonable maximum value could >> be. I did a little bit of that, but somewhere around 256 MB, things got >> really slow. >> >> >> Would it be better if just increase the limit to 128MB for now? >> In next we can change the WAL file name format and expand the range? > > I don't think me saying it felt a bit slow around 256 MB is a proper > technical analysis that should lead to the conclusion that that upper > limit should be 128 MB. ;-) > > This tells me that there is a lot of explore and test here before we > should let it loose on users. Agreed > I think the best we should do now is spend a bit of time exploring > whether/how larger values of segment size behave, and bump the hardcoded > configure limit if we get positive results. Everything else should > probably be postponed. > > (Roughly speaking, to get started, this would mean compiling with > --with-wal-segsize 16, 32, 64, 128, 256, run make check-world both > sequentially and in parallel, and take note of a) passing, b) run time, > c) disk space.) I've committed the rest of Beena's patch to allow this testing to occur up to 1024MB. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
On Wed, Apr 5, 2017 at 2:32 PM, Pavan Deolasee wrote: >> The only other idea that I have for a really clean solution here is to >> support this only for index types that are amcanreturn, and actually >> compare the value stored in the index tuple with the one stored in the >> heap tuple, ensuring that new index tuples are inserted whenever they >> don't match and then using the exact same test to determine the >> applicability of a given index pointer to a given heap tuple. > > Just so that I understand, are you suggesting that while inserting WARM > index pointers, we check if the new index tuple will look exactly the same > as the old index tuple and not insert a duplicate pointer at all? Yes. > I considered that, but it will require us to do an index lookup during WARM > index insert and for non-unique keys, that may or may not be exactly cheap. I don't think it requires that. You should be able to figure out based on the tuple being updated and the corresponding new tuple whether this will bet true or not. > Or we need something like what Claudio wrote to sort all index entries by > heap TIDs. If we do that, then the recheck can be done just based on the > index and heap flags (because we can then turn the old index pointer into a > CLEAR pointer. Index pointer is set to COMMON during initial insert). Yeah, I think that patch is going to be needed for some of the storage work I'm interesting in doing, too, so I am tentatively in favor of it, but I wasn't proposing using it here. > The other way is to pass old tuple values along with the new tuple values to > amwarminsert, build index tuples and then do a comparison. For duplicate > index tuples, skip WARM inserts. This is more what I was thinking. But maybe one of the other ideas you wrote here is better; not sure. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add pgstathashindex() to get hash index table statistics.
On Thu, Mar 23, 2017 at 1:54 PM, Ashutosh Sharma wrote: >> Yeah, but I think "unused" might be better. Because a page could be >> in use (as an overflow page or primary bucket page) and still be >> empty. > > Based on the earlier discussions, I have prepared a patch that would > allow pgstathashindex() to show the number of unused pages in hash > index. Please find the attached patch for the same. Thanks. My idea is that we shouldn't end up with both a zero_pages column and an unused_pages column. Instead, we should end up with just an unused_pages column, which will include both pages that are all-zeroes and pages that have a valid special space marked as LH_UNUSED. Also, I don't see why it's correct to test PageIsEmpty() here instead of just testing the page type as we do in pageinspect. Attached is a revised patch that shows what I have in mind; please review. Along the way I made the code for examining the page type more similar to what pageinspect does, because I see no reason for those things to be different, and I think the pageinspect code is better. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company pgstathashindex-empty.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multivariate statistics (v25)
Moin, On Wed, April 5, 2017 2:52 pm, Simon Riggs wrote: > On 5 April 2017 at 10:47, David Rowley > wrote: > >>> I have some other comments. > > Me too. > > > CREATE STATISTICS should take ShareUpdateExclusiveLock like ANALYZE. > > This change is in line with other changes in this and earlier > releases. Comments and docs included. > > Patch ready to be applied directly barring objections. I know I'm a bit late, but isn't the syntax backwards? "CREATE STATISTICS s1 WITH (dependencies) ON (col_a, col_b) FROM table;" These do it the other way round: CREATE INDEX idx ON table (col_a); AND: CREATE TABLE t ( id INT REFERENCES table_2 (col_b); ); Won't this be confusing and make things hard to remember? Sorry for not asking earlier, I somehow missed this. Regard, Tels -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New SQL counter statistics view (pg_stat_sql)
Hi, I'm somewhat inclined to think that this really would be better done in an extension like pg_stat_statements. On 2017-03-08 14:39:23 +1100, Haribabu Kommi wrote: > On Wed, Feb 1, 2017 at 3:13 PM, Michael Paquier > + > + track_sql (boolean) > + > + track_sql configuration parameter > + > + > + > + > +Enables collection of different SQL statement statistics that are > +executed on the instance. This parameter is off by default. Only > +superusers can change this setting. > + > + > + > + I don't like this name much, seems a bit too generic to me. 'track_activities', 'track_io_timings' are at least a bit clearer. How about track_statement_statistics + corresponding view/function renaming? > + > + pg_stat_sql View > + > + > + > + Column > + Type > + Description > + > + > + > + > + > + tag > + text > + Name of the SQL statement > + It's not really the "name" of a statement. Internally and IIRC elsewhere in the docs we describe something like this as tag? > +/* -- > + * pgstat_send_sqlstmt(void) > + * > + * Send SQL statement statistics to the collector > + * -- > + */ > +static void > +pgstat_send_sqlstmt(void) > +{ > + PgStat_MsgSqlstmt msg; > + PgStat_SqlstmtEntry *entry; > + HASH_SEQ_STATUS hstat; > + > + if (pgstat_backend_sql == NULL) > + return; > + > + pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_SQLSTMT); > + msg.m_nentries = 0; > + > + hash_seq_init(&hstat, pgstat_backend_sql); > + while ((entry = (PgStat_SqlstmtEntry *) hash_seq_search(&hstat)) != > NULL) > + { > + PgStat_SqlstmtEntry *m_ent; > + > + /* Skip it if no counts accumulated since last time */ > + if (entry->count == 0) > + continue; > + > + /* need to convert format of time accumulators */ > + m_ent = &msg.m_entry[msg.m_nentries]; > + memcpy(m_ent, entry, sizeof(PgStat_SqlstmtEntry)); > + > + if (++msg.m_nentries >= PGSTAT_NUM_SQLSTMTS) > + { > + pgstat_send(&msg, offsetof(PgStat_MsgSqlstmt, > m_entry[0]) + > + msg.m_nentries * > sizeof(PgStat_SqlstmtEntry)); > + msg.m_nentries = 0; > + } > + > + /* reset the entry's count */ > + entry->count = 0; > + } > + > + if (msg.m_nentries > 0) > + pgstat_send(&msg, offsetof(PgStat_MsgSqlstmt, m_entry[0]) + > + msg.m_nentries * > sizeof(PgStat_SqlstmtEntry)); Hm. So pgstat_backend_sql is never deleted, which'll mean that if a backend lives long we'll continually iterate over a lot of 0 entries. I think performance evaluation of this feature should take that into account. > +} > + > +/* > + * Count SQL statement for pg_stat_sql view > + */ > +void > +pgstat_count_sqlstmt(const char *commandTag) > +{ > + PgStat_SqlstmtEntry *htabent; > + boolfound; > + > + if (!pgstat_backend_sql) > + { > + /* First time through - initialize SQL statement stat table */ > + HASHCTL hash_ctl; > + > + memset(&hash_ctl, 0, sizeof(hash_ctl)); > + hash_ctl.keysize = NAMEDATALEN; > + hash_ctl.entrysize = sizeof(PgStat_SqlstmtEntry); > + pgstat_backend_sql = hash_create("SQL statement stat entries", > + > PGSTAT_SQLSTMT_HASH_SIZE, > + > &hash_ctl, > + > HASH_ELEM | HASH_BLOBS); > + } > + > + /* Get the stats entry for this SQL statement, create if necessary */ > + htabent = hash_search(pgstat_backend_sql, commandTag, > + HASH_ENTER, &found); > + if (!found) > + htabent->count = 1; > + else > + htabent->count++; > +} That's not really something in this patch, but a lot of this would be better if we didn't internally have command tags as strings, but as an enum. We should really only convert to a string with needed. That we're doing string comparisons on Portal->commandTag is just plain bad. If so, we could also make this whole patch a lot cheaper - have a fixed size array that has an entry for every possible tag (possibly even in shared memory, and then use atomics there). > +++ b/src/backend/tcop/postgres.c > @@ -1109,6 +1109,12 @@ exec_simple_query(const char *query_string) > > PortalDrop(portal, false); > > + /* > + * Count SQL statement for pg_stat_sql view > + */ > + if (pgstat_track_sql) > +
Re: [HACKERS] multivariate statistics (v25)
On 5 April 2017 at 10:47, David Rowley wrote: >> I have some other comments. Me too. CREATE STATISTICS should take ShareUpdateExclusiveLock like ANALYZE. This change is in line with other changes in this and earlier releases. Comments and docs included. Patch ready to be applied directly barring objections. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services create_statistics_lock_reduction.v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Adding support for Default partition in partitioning
On Wed, Apr 5, 2017 at 11:19 AM, Robert Haas wrote: > On Wed, Apr 5, 2017 at 5:57 AM, Rahila Syed > wrote: > >>Could you briefly elaborate why you think the lack global index support > >>would be a problem in this regard? > > I think following can happen if we allow rows satisfying the new > partition > > to lie around in the > > default partition until background process moves it. > > Consider a scenario where partition key is a primary key and the data in > the > > default partition is > > not yet moved into the newly added partition. If now, new data is added > into > > the new partition > > which also exists(same key) in default partition there will be data > > duplication. If now > > we scan the partitioned table for that key(from both the default and new > > partition as we > > have not moved the rows) it will fetch the both rows. > > Unless we have global indexes for partitioned tables, there is chance of > > data duplication between > > child table added after default partition and the default partition. > > Yes, I think it would be completely crazy to try to migrate the data > in the background: > > - The migration might never complete because of a UNIQUE or CHECK > constraint on the partition to which rows are being migrated. > > - Even if the migration eventually succeeded, such a design abandons > all hope of making INSERT .. ON CONFLICT DO NOTHING work sensibly > while the migration is in progress, unless the new partition has no > UNIQUE constraints. > > - Partition-wise join and partition-wise aggregate would need to have > special case handling for the case of an unfinished migration, as > would any user code that accesses partitions directly. > > - More generally, I think users expect that when a DDL command > finishes execution, it's done all of the work that there is to do (or > at the very least, that any remaining work has no user-visible > consequences, which would not be the case here). > > IMV, the question of whether we have efficient ways to move data > around between partitions is somewhat separate from the question of > whether partitions can be defined in a certain way in the first place. > The problems that Keith refers to upthread already exist for > subpartitioning; you've got to detach the old partition, create a new > one, and then reinsert the data. And for partitioning an > unpartitioned table: create a replacement table, insert all the data, > substitute it for the original table. The fact that we have these > limitation is not good, but we're not going to rip out partitioning > entirely because we don't have clever ways of migrating the data in > those cases, and the proposed behavior here is not any worse. > > Also, waiting for those problems to get fixed might be waiting for > Godot. I'm not really all that sanguine about our chances of coming > up with a really nice way of handling these cases. In a designed > based on table inheritance, you can leave it murky where certain data > is supposed to end up and migrate it on-line and you might get away > with that, but a major point of having declarative partitioning at all > is to remove that sort of murkiness. It's probably not that hard to > come up with a command that locks the parent and moves data around via > full table scans, but I'm not sure how far that really gets us; you > could do the same thing easily enough with a sequence of commands > generated via a script. And being able to do this in a general way > without a full table lock looks pretty hard - it doesn't seem > fundamentally different from trying to perform a table-rewriting > operation like CLUSTER without a full table lock, which we also don't > support. The executor is not built to cope with any aspect of the > table definition shifting under it, and that includes the set of child > tables with are partitions of the table mentioned in the query. Maybe > the executor can be taught to survive such definitional changes at > least in limited cases, but that's a much different project than > allowing default partitions. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > Confirmed that v5 patch works with examples given in the original post but segfaulted when I tried the examples I used in my blog post (taken from the documentation at the time I wrote it). https://www.keithf4.com/postgresql-10-built-in-partitioning/ keith@keith=# drop table cities; DROP TABLE Time: 6.055 ms keith@keith=# CREATE TABLE cities ( city_id bigserial not null, name text not null, population int ) PARTITION BY LIST (initcap(name)); CREATE TABLE Time: 7.130 ms keith@keith=# CREATE TABLE cities_west PARTITION OF cities ( CONSTRAINT city_id_nonzero CHECK (city_id != 0) ) FOR VALUES IN ('Los Angeles', 'San Francisco'); CREATE TABLE Time: 6.690 ms keith@keith=# CREATE TABLE cities_default keith-# PARTITION OF cities FOR VALUES IN (DEFAULT); server closed the connection unexpec
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
On Wed, Apr 5, 2017 at 11:27 AM, Andres Freund wrote: > I propose we move this patch to the next CF. I agree. I think it's too late to be working out fine details around TOAST like this. This is a patch that touches the storage format in a fairly fundamental way. The idea of turning WARM on or off reminds me a little bit of the way it was at one time suggested that HOT not be used against catalog tables, a position that Tom pushed against. I'm not saying that it's necessarily a bad idea, but we should exhaust alternatives, and have a clear rationale for it. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench - allow to store select results into variables
Hello Tatsuo-san, It seems the new feature \gset doesn't work with tables having none ascii column names: Indeed. The same error is triggered with the \set syntax, which does not involve any query execution. I have added a sentence mentionning the restriction when variables are first discussed in the documentation, see attached patch. -- Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index 1eee8dc..c40f73e 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -773,6 +773,7 @@ pgbench options dbname There is a simple variable-substitution facility for script files. Variables can be set by the command-line -D option, explained above, or by the meta commands explained below. + Variable names are restricted to ASCII strings. In addition to any variables preset by -D command-line options, there are a few variables that are preset automatically, listed in . A value specified for these @@ -816,6 +817,51 @@ pgbench options dbname + + + \gcset [prefix] or + \gset [prefix] + + + + + These commands may be used to end SQL queries, replacing a semicolon. + \gcset replaces an embedded semicolon (\;) within + a compound SQL command, and \gset replaces a final + (;) semicolon which ends the SQL command. + + + + When these commands are used, the preceding SQL query is expected to + return one row, the columns of which are stored into variables named after + column names, and prefixed with prefix if provided. + + + + The following example puts the final account balance from the first query + into variable abalance, and fills variables + one, two and p_three with + integers from a compound query. + +UPDATE pgbench_accounts + SET abalance = abalance + :delta + WHERE aid = :aid + RETURNING abalance \gset +-- compound of two queries +SELECT 1 AS one, 2 AS two \gcset +SELECT 3 AS three \gset p_ + + + + + +\gcset and \gset commands do not work when +empty SQL queries appear within a compound SQL command. + + + + + \set varname expression diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index 78f1e6b..ac9289b 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -375,11 +375,14 @@ static const char *QUERYMODE[] = {"simple", "extended", "prepared"}; typedef struct { - char *line; /* text of command line */ + char *line; /* first line for short display */ + char *lines; /* full multi-line text of command */ int command_num; /* unique index of this Command struct */ int type; /* command type (SQL_COMMAND or META_COMMAND) */ int argc; /* number of command words */ char *argv[MAX_ARGS]; /* command word list */ + int compound; /* last compound command (number of \;) */ + char **gset; /* per-compound command prefix */ PgBenchExpr *expr; /* parsed expression, if needed */ SimpleStats stats; /* time spent in this command */ } Command; @@ -1225,6 +1228,105 @@ getQueryParams(CState *st, const Command *command, const char **params) params[i] = getVariable(st, command->argv[i + 1]); } +/* read all responses from backend */ +static bool +read_response(CState *st, char **gset) +{ + PGresult *res; + int compound = -1; + + while ((res = PQgetResult(st->con)) != NULL) + { + compound += 1; + + switch (PQresultStatus(res)) + { + case PGRES_COMMAND_OK: /* non-SELECT commands */ + case PGRES_EMPTY_QUERY: /* may be used for testing no-op overhead */ +if (gset[compound] != NULL) +{ + fprintf(stderr, + "client %d file %d command %d compound %d: " + "\\gset expects a row\n", + st->id, st->use_file, st->command, compound); + st->ecnt++; + return false; +} +break; /* OK */ + + case PGRES_TUPLES_OK: +if (gset[compound] != NULL) +{ + /* store result into variables */ + int ntuples = PQntuples(res), + nfields = PQnfields(res), + f; + + if (ntuples != 1) + { + fprintf(stderr, +"client %d file %d command %d compound %d: " +"expecting one row, got %d\n", +st->id, st->use_file, st->command, compound, ntuples); + st->ecnt++; + PQclear(res); + discard_response(st); + return false; + } + + for (f = 0; f < nfields ; f++) + { + char *varname = PQfname(res, f); + if (*gset[compound] != '\0') + varname = psprintf("%s%s", gset[compound], varname); + + /* store result as a string */ + if (!putVariable(st, "gset", varname, + PQgetvalue(res, 0, f))) + { + /* internal error, should it rather abort? */ + fprintf(stderr, + "client %d file %d command %d compound %d: " + "error storing into var %s\n", + st->id, st->use_file, st->command, compound, + va
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
On 2017-04-05 09:36:47 -0400, Robert Haas wrote: > By the way, the "Converting WARM chains back to HOT chains" section of > README.WARM seems to be out of date. Any chance you could update that > to reflect the current state and thinking of the patch? I propose we move this patch to the next CF. That shouldn't prevent you working on it, although focusing on review of patches that still might make it wouldn't hurt either. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [pgsql-www] Small issue in online devel documentation build
On Wed, Apr 5, 2017 at 2:18 PM, Fabien COELHO wrote: >>> [Action required within three days. This is a generic notification.] >> >> >> I will work on this next week. I believe I will be able to provide a >> patch for the web site CSS by April 12, but ultimate success will likely >> depend on the collaboration of the web team. > > > Just to point out that as it has been decided to fix the web site CSS, there > is no expected change for pg 10 source base, so ISTM that the item can be > closed as far as pg 10 is concerned. Well, the documentation doesn't look right, currently, so I think that should be fixed before we close this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pageinspect and hash indexes
On Fri, Mar 24, 2017 at 3:44 PM, Ashutosh Sharma wrote: > Agreed. Moreover, previous approach might even change the current > behaviour of functions like hash_page_items() and hash_page_stats() > basically when we pass UNUSED pages to these functions. Attached is > the newer version of patch. This patch doesn't completely solve the problem: pgbench -i -s 40 alter table pgbench_accounts drop constraint pgbench_accounts_pkey; create index on pgbench_accounts using hash (aid); insert into pgbench_accounts select * from pgbench_accounts; select hash_page_type, min(n), max(n), count(*) from (select n, hash_page_type(get_raw_page('pgbench_accounts_aid_idx'::text, n)) from generate_series(0, (pg_relation_size('pgbench_accounts_aid_idx')/8192)::integer - 1) n) x group by 1; ERROR: index table contains zero page This happens because of the sparse allocation forced by _hash_alloc_buckets. Perhaps the real fix is to have that function initialize all of the pages instead of just the last one, but unless and until we decide to do that, we've got to cope with zero pages in the index. Actually, even if we did fix that I think we'd still need to do this, because the way relation extension works in general is that we first write a page of zeroes and then go back and fill in the data; an intervening crash can leave behind the intermediate state. A second problem is that the one page initialized by _hash_alloc_buckets needs to end up with a valid special space; otherwise, you get the same error Jeff complained about originally when you try to use hash_page_type() on it. I fixed those issues and committed this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [pgsql-www] Small issue in online devel documentation build
(https://wiki.postgresql.org/wiki/PostgreSQL_10_Open_Items#Documentation_tool_chain) [Action required within three days. This is a generic notification.] I will work on this next week. I believe I will be able to provide a patch for the web site CSS by April 12, but ultimate success will likely depend on the collaboration of the web team. Just to point out that as it has been decided to fix the web site CSS, there is no expected change for pg 10 source base, so ISTM that the item can be closed as far as pg 10 is concerned. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Use American English spelling in pg_waldump error message
On 3/29/17 07:10, Daniel Gustafsson wrote: > We use “unrecognize” rather than “unrecognise” in all other error messages in > the tree, the attached patch fixes the one place where the latter spelling was > used. Committed, and I fixed a couple of code comments similarly. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [pgsql-www] Small issue in online devel documentation build
Peter, * Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote: > On 4/5/17 02:56, Noah Misch wrote: > > On Thu, Mar 23, 2017 at 11:21:39PM -0400, Peter Eisentraut wrote: > >> I think the fix belongs into the web site CSS, so there is nothing to > >> commit into PostgreSQL here. I will close the commit fest entry, but I > >> have added a section to the open items list so we keep track of it. > >> (https://wiki.postgresql.org/wiki/PostgreSQL_10_Open_Items#Documentation_tool_chain) > > > > [Action required within three days. This is a generic notification.] > > I will work on this next week. I believe I will be able to provide a > patch for the web site CSS by April 12, but ultimate success will likely > depend on the collaboration of the web team. I'm happy to try and help with this. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] [pgsql-www] Small issue in online devel documentation build
On 4/5/17 02:56, Noah Misch wrote: > On Thu, Mar 23, 2017 at 11:21:39PM -0400, Peter Eisentraut wrote: >> I think the fix belongs into the web site CSS, so there is nothing to >> commit into PostgreSQL here. I will close the commit fest entry, but I >> have added a section to the open items list so we keep track of it. >> (https://wiki.postgresql.org/wiki/PostgreSQL_10_Open_Items#Documentation_tool_chain) > > [Action required within three days. This is a generic notification.] I will work on this next week. I believe I will be able to provide a patch for the web site CSS by April 12, but ultimate success will likely depend on the collaboration of the web team. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] partitioned tables and contrib/sepgsql
On Wed, Apr 5, 2017 at 1:19 PM, Tom Lane wrote: > Peter Eisentraut writes: >> On 4/5/17 12:04, Tom Lane wrote: >>> Conclusion: Fedora's gcc is playing fast and loose somehow with the >>> command "#include "; that does not include the file >>> you'd think it does, it does something magic inside the compiler. >>> The magic evidently includes not complaining about duplicate macro >>> definitions for true and false. > >> Perhaps -Wsystem-headers would change the outcome in your case. > > Hah, you're right: with that, > > In file included from /usr/include/selinux/label.h:9:0, > from label.c:40: > /usr/lib/gcc/x86_64-redhat-linux/6.3.1/include/stdbool.h:34:0: warning: > "true" redefined > #define true 1 > > In file included from ../../src/include/postgres.h:47:0, > from label.c:11: > ../../src/include/c.h:206:0: note: this is the location of the previous > definition > #define true ((bool) 1) > > and likewise for "false". So that mystery is explained. > > I stand by my previous patch suggestion, except that we can replace > the parenthetical comment with something like "(We don't care if > redefines "true"/"false"; those are close enough.)". > Sounds good. Updated patchset will include that verbiage, along with some regression tests for partitioned tables. Thanks, -- Mike Palmiotto Software Engineer Crunchy Data Solutions https://crunchydata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical replication and SIGHUP
On Wed, Apr 5, 2017 at 10:21 AM, Fujii Masao wrote: > Both launcher and worker don't handle SIGHUP signal and cannot > reload the configuration. I think that this is a bug. +1 -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers