Re: [HACKERS] Small improvement to parallel query docs
David Rowley wrote: > I propose we just remove the whole paragraph, and mention about > the planning and estimated number of groups stuff in another new paragraph. > > I've attached a patch to this effect ... s/In a worse case scenario/In the worst case scenario,/ Other than that, the phrasing in the new patch reads very smoothly. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small improvement to parallel query docs
Robert Haas wrote: > +COUNT(*), each worker must compute subtotals which later must > +be combined to produce an overall total in order to produce the final > +answer. If the query involves a GROUP BY clause, > +separate subtotals must be computed for each group seen by each parallel > +worker. Each of these subtotals must then be combined into an overall > +total for each group once the parallel aggregate portion of the plan is > +complete. This means that queries which produce a low number of groups > +relative to the number of input rows are often far more attractive to the > +query planner, whereas queries which don't collect many rows into each > +group are less attractive, due to the overhead of having to combine the > +subtotals into totals, of which cannot run in parallel. > I don't think "of which cannot run in parallel" is good grammar. I'm > somewhat unsure whether the rest is an improvement or not. Other opinions? Does this read any more clearly? +COUNT(*), each worker must compute subtotals which are later +combined in order to produce an overall total for the final answer. If +the query involves a GROUP BY clause, separate subtotals +must be computed for each group seen by each parallel worker. After the +parallel aggregate portion of the plan is complete, there is a serial step +where the group subtotals from all of the parallel workers are combined +into an overall total for each group. Because of the overhead of combining +the subtotals into totals, plans which produce few groups relative to the +number of input rows are often more attractive to the query planner +than plans which produce many groups relative to the number of input rows. I got rid of the ", of which cannot run in parallel" entirely. It was already stated that the subtotals->totals step runs "once the parallel aggregate portion of the plan is complete." which implies that it is serial. I made that explicit with "there is a serial step". Also, the purpose of the ", of which cannot run in parallel" sentence is to communicate why the planner prefers one plan over another and, if I'm reading this correctly, the subtotals->totals step is serial for both plans so that is not a reason to prefer one over the other. I think that the planner prefers plans rather than queries, so I changed that as well. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC 2017
On January 27, 2017 07:08, Tom Lane wrote: > ... The things I think are unique to the currency situation are: ... Add the potential for regulatory requirements to change at any time - sort of like timezone information. So no hard coded behavior. rounding method/accuracy storage precision different than display precision conversion method (multiply, divide, triangulate, other) use of spot rates (multiple rate sources) rather than/in addition to time-varying rates responding to the overall idea of a currency type Numeric values with units so that you get a warning/error when you mix different units in calculations? Ability to specify rounding methods and intermediate precisions for calculations? +1 Good ideas with lots of potential applications. Built-in currency type? -1 I suspect this is one of those things that seems like a good idea but really isn't. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Snapshot too old logging
On Wed, Nov 16, 2016 at 6:43 AM, Robert Haas wrote: > I think I was suggesting: One or more rows required by this query may > already have been removed from "%s". I keep reading that as "you have data corruption because something removed rows that your query needs" rather than "this query took too long to complete but may succeed if you resubmit it". -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Snapshot too old logging
Magnus wrote: > Just to be clear, you're suggesting 'One or more rows may have already been > removed from "%s"? Perhaps just 'This query attempted to access a page in "%s" that was modified after the snapshot was acquired.'
Re: [HACKERS] Snapshot too old logging
On Tue, Nov 15, 2016 at 12:20 PM Magnus Hagander wrote: > Is there value in showing the snapshot as well? I don't think so. Knowing the relname let's you look at your report/job and figure out if the access to that relation can be moved. Having the exact snapshot version isn't going to change that. And if you're doing auto-retry, you probably only care if it is happening on the same relation consistently.
Re: [HACKERS] Snapshot too old logging
On Tue, Nov 15, 2016 at 10:30 AM, Tom Lane wrote: > > Kevin Grittnerwrites: > > On Tue, Nov 15, 2016 at 3:38 AM, Magnus Hagander > wrote: > >> Is there a reason why we don't log which relation triggered the > >> snapshot too old error when it happens? > > > I would probably not want to mess with the text of the error itself, > > in case any client-side software bases recovery on that rather than > > the SQLSTATE value; > > Any such code is broken on its face because of localization. > Perhaps including the relname in the main message would make it unduly > long, but if not I'd vote for doing it that way. +1 for relname in the main message. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Covering + unique indexes
Anastasia, et al, This is a review of including_columns_9.7_v5.patch. I looked through the commit fest list and this patch was interesting and I wanted to try it. I have used include columns on Microsoft SQL Server; DB2 for Linux, Unix, Windows; and DB2 for z/OS. After reading the e-mail discussions, there are still points that I am not clear on. Given "create table foo (a int, b int, c int, d int)" and "create unique index foo_a_b on foo (a, b) including (c)". index only? heap tuple needed? select a, b, c from foo where a = 1yes no select a, b, d from foo where a = 1no yes select a, bfrom foo where a = 1 and c = 1 ?? It was clear from the discussion that the index scan/access method would not resolve the "c = 1" predicate but it was not clear if "c = 1" could be resolved without accessing the heap tuple. Are included columns counted against the 32 column and 2712 byte index limits? I did not see either explicitly mentioned in the discussion or the documentation. I only ask because in SQL Server the limits are different for include columns. 1. syntax - on 2016-08-14, Andrey Borodin wrote "I think MS SQL syntax INCLUDE instead of INCLUDING would be better". I would go further than that. This feature is already supported by 2 of the top 5 SQL databases and they both use INCLUDE. Using different syntax because of an internal implementation detail seems short sighted. 2. The patch does not seem to apply cleanly anymore. > git checkout master Already on 'master' > git pull From http://git.postgresql.org/git/postgresql d49cc58..06f5fd2 master -> origin/master Updating d49cc58..06f5fd2 Fast-forward src/include/port.h | 2 +- src/port/dirmod.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) > patch -pl < including_columns_9.7_v5.patch patching file contrib/dblink/dblink.c ... patching file src/backend/parser/gram.y ... Hunk #2 FAILED at 375. ... 1 out of 12 hunks FAILED -- saving rejects to file src/backend/parser/gram.y.rej ... patching file src/bin/pg_dump/pg_dump.c ... Hunk #8 FAILED at 6399. Hunk #9 FAILED at 6426. ... 2 out of 13 hunks FAILED -- saving rejects to file src/bin/pg_dump/pg_dump.c.rej ... 3. After "fixing" compilation errors (best guess based on similar change in other location), "make check" failed. > make check ... parallel group (3 tests): index_including create_view create_index create_index ... ok create_view ... ok index_including ... FAILED ... Looking at regression.diffs, it looks like the output format of \d tbl changed (lines 288,300) from the expected "Column | Type | Modifiers" to "Column | Type | Collation | Nullable | Default". 4. documentation - minor items (these are not actual diffs) create_index.sgml -[ INCLUDING ( column_name )] +[ INCLUDING ( column_name [, ...] )] An optional INCLUDING clause allows a list of columns to be -specified which will be included in the index, in the non-key portion of the index. +specified which will be included in the non-key portion of the index. The whole paragraph on INCLUDING does not include many of the points raised in the feature discussions. create_table.sgml (for both UNIQUE and PRIMARY KEY constraints) (similar change could be made to nbtree/README) -Optional clause INCLUDING allows to add into the index -a portion of columns on which the uniqueness is not enforced upon. -Note, that althogh constraint is not enforced on included columns, it still -depends on them. Consequently, some operations on these columns (e.g. DROP COLUMN) -can cause cascade constraint and index deletion. +An optional INCLUDING clause allows a list of columns +to be specified which will be included in the non-key portion of the index. +Although uniqueness is not enforced on the included columns, the constraint +still depends on them. Consequently, some operations on the included columns +(e.g. DROP COLUMN) can cause cascading constraint and index deletion. indexcmds.c -* are key columns, and which are just part of the INCLUDING list by check +* are key columns, and which are just part of the INCLUDING list by checking ruleutils.c - * meaningless there, so do not include them into the message. + * meaningless there, so do not include them in the message. pg_dump.c (does "if (fout->remoteVersion >= 90600)" also need to change?) - * In 9.6 we add
Re: [HACKERS] Remove the comment on the countereffectiveness of large shared_buffers on Windows
Change "even large" to "even larger" because it is used in a comparison. > ... a reasonable starting value for shared_buffers is 25% > of the memory in your system. There are some workloads where even large > settings > for shared_buffers are effective, ... ... are some workloads where even larger settings ... Regards, Brad DeJong -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers