Re: [HACKERS] Small improvement to parallel query docs

2017-02-13 Thread Brad DeJong
David Rowley wrote:
> I propose we just remove the whole paragraph, and mention about
> the planning and estimated number of groups stuff in another new paragraph.
> 
> I've attached a patch to this effect ...

s/In a worse case scenario/In the worst case scenario,/

Other than that, the phrasing in the new patch reads very smoothly.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small improvement to parallel query docs

2017-02-13 Thread Brad DeJong
Robert Haas wrote:

> +COUNT(*), each worker must compute subtotals which later must
> +be combined to produce an overall total in order to produce the final
> +answer.  If the query involves a GROUP BY clause,
> +separate subtotals must be computed for each group seen by each parallel
> +worker. Each of these subtotals must then be combined into an overall
> +total for each group once the parallel aggregate portion of the plan is
> +complete.  This means that queries which produce a low number of groups
> +relative to the number of input rows are often far more attractive to the
> +query planner, whereas queries which don't collect many rows into each
> +group are less attractive, due to the overhead of having to combine the
> +subtotals into totals, of which cannot run in parallel.

> I don't think "of which cannot run in parallel" is good grammar.  I'm 
> somewhat unsure whether the rest is an improvement or not.  Other opinions?

Does this read any more clearly?

+COUNT(*), each worker must compute subtotals which are later
+combined in order to produce an overall total for the final answer.  If
+the query involves a GROUP BY clause, separate subtotals
+must be computed for each group seen by each parallel worker.  After the
+parallel aggregate portion of the plan is complete, there is a serial step
+where the group subtotals from all of the parallel workers are combined
+into an overall total for each group.  Because of the overhead of combining
+the subtotals into totals, plans which produce few groups relative to the
+number of input rows are often more attractive to the query planner
+than plans which produce many groups relative to the number of input rows.


I got rid of the ", of which cannot run in parallel" entirely. It was
already stated that the subtotals->totals step runs "once the parallel
aggregate portion of the plan is complete." which implies that it is serial.
I made that explicit with "there is a serial step". Also, the purpose of the 
", of which cannot run in parallel" sentence is to communicate why the
planner prefers one plan over another and, if I'm reading this correctly,
the subtotals->totals step is serial for both plans so that is not a reason
to prefer one over the other.

I think that the planner prefers plans rather than queries, so I changed that 
as well.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC 2017

2017-01-27 Thread Brad DeJong
On January 27, 2017 07:08, Tom Lane wrote:
> ... The things I think are unique to the currency situation are: ...

Add the potential for regulatory requirements to change at any time - sort of 
like timezone information. So no hard coded behavior.
rounding method/accuracy
storage precision different than display precision
conversion method (multiply, divide, triangulate, other)
use of spot rates (multiple rate sources) rather than/in addition to 
time-varying rates

responding to the overall idea of a currency type

Numeric values with units so that you get a warning/error when you mix 
different units in calculations? Ability to specify rounding methods and 
intermediate precisions for calculations?
+1 Good ideas with lots of potential applications.

Built-in currency type? 
-1 I suspect this is one of those things that seems like a good idea but really 
isn't.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot too old logging

2016-11-18 Thread Brad DeJong
On Wed, Nov 16, 2016 at 6:43 AM, Robert Haas wrote:
> I think I was suggesting: One or more rows required by this query may
> already have been removed from "%s".

I keep reading that as "you have data corruption because something removed rows 
that your query needs" rather than "this query took too long to complete but 
may succeed if you resubmit it".



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot too old logging

2016-11-15 Thread Brad DeJong
Magnus wrote:
> Just to be clear, you're suggesting 'One or more rows may have already been 
> removed from "%s"?

Perhaps just 'This query attempted to access a page in "%s" that was modified 
after the snapshot was acquired.'



Re: [HACKERS] Snapshot too old logging

2016-11-15 Thread Brad DeJong
On Tue, Nov 15, 2016 at 12:20 PM Magnus Hagander wrote:
> Is there value in showing the snapshot as well?

I don't think so. Knowing the relname let's you look at your report/job and 
figure out if the access to that relation can be moved. Having the exact 
snapshot version isn't going to change that. And if you're doing auto-retry, 
you probably only care if it is happening on the same relation consistently.


Re: [HACKERS] Snapshot too old logging

2016-11-15 Thread Brad DeJong
On Tue, Nov 15, 2016 at 10:30 AM, Tom Lane wrote:
> 
> Kevin Grittner  writes:
> > On Tue, Nov 15, 2016 at 3:38 AM, Magnus Hagander 
> wrote:
> >> Is there a reason why we don't log which relation triggered the
> >> snapshot too old error when it happens?
> 
> > I would probably not want to mess with the text of the error itself,
> > in case any client-side software bases recovery on that rather than
> > the SQLSTATE value;
> 
> Any such code is broken on its face because of localization.
> Perhaps including the relname in the main message would make it unduly
> long, but if not I'd vote for doing it that way.

+1 for relname in the main message.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Covering + unique indexes

2016-11-14 Thread Brad DeJong
Anastasia, et al,

This is a review of including_columns_9.7_v5.patch.

I looked through the commit fest list and this patch was interesting and I 
wanted to try it.

I have used include columns on Microsoft SQL Server; DB2 for Linux, Unix, 
Windows; and DB2 for z/OS.

After reading the e-mail discussions, there are still points that I am not 
clear on.

Given "create table foo (a int, b int, c int, d int)" and "create unique index 
foo_a_b on foo (a, b) including (c)".

   index only?   heap tuple 
needed?
select a, b, c from foo where a = 1yes  no
select a, b, d from foo where a = 1no   yes
select a, bfrom foo where a = 1 and c = 1  ??

It was clear from the discussion that the index scan/access method would not 
resolve the "c = 1" predicate but it was not clear if "c = 1" could be resolved 
without accessing the heap tuple.

Are included columns counted against the 32 column and 2712 byte index limits? 
I did not see either explicitly mentioned in the discussion or the 
documentation. I only ask because in SQL Server the limits are different for 
include columns.


1. syntax - on 2016-08-14, Andrey Borodin wrote "I think MS SQL syntax INCLUDE 
instead of INCLUDING would be better". I would go further than that. This 
feature is already supported by 2 of the top 5 SQL databases and they both use 
INCLUDE. Using different syntax because of an internal implementation detail 
seems short sighted.

2. The patch does not seem to apply cleanly anymore.

> git checkout master
Already on 'master'
> git pull
From http://git.postgresql.org/git/postgresql
   d49cc58..06f5fd2  master -> origin/master
Updating d49cc58..06f5fd2
Fast-forward
 src/include/port.h | 2 +-
 src/port/dirmod.c  | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)
> patch -pl < including_columns_9.7_v5.patch
patching file contrib/dblink/dblink.c
...
patching file src/backend/parser/gram.y
...
Hunk #2 FAILED at 375.
...
1 out of 12 hunks FAILED -- saving rejects to file 
src/backend/parser/gram.y.rej
...
patching file src/bin/pg_dump/pg_dump.c
...
Hunk #8 FAILED at 6399.
Hunk #9 FAILED at 6426.
...
2 out of 13 hunks FAILED -- saving rejects to file 
src/bin/pg_dump/pg_dump.c.rej
...

3. After "fixing" compilation errors (best guess based on similar change in 
other location), "make check" failed.

> make check
...
parallel group (3 tests):  index_including create_view create_index
 create_index ... ok
 create_view  ... ok
 index_including  ... FAILED
...

Looking at regression.diffs, it looks like the output format of \d tbl 
changed (lines 288,300) from the expected "Column | Type | Modifiers" to 
"Column | Type | Collation | Nullable | Default".

4. documentation - minor items (these are not actual diffs)
  create_index.sgml
  -[ INCLUDING ( column_name )]
  +[ INCLUDING ( column_name [, ...] )]

   An optional INCLUDING clause allows a list of 
columns to be
  -specified which will be included in the index, in the non-key 
portion of the index.
  +specified which will be included in the non-key portion of the 
index.

  The whole paragraph on INCLUDING does not include many of the points 
raised in the feature discussions.

  create_table.sgml (for both UNIQUE and PRIMARY KEY constraints) (similar 
change could be made to nbtree/README)
  -Optional clause INCLUDING allows to add into 
the index
  -a portion of columns on which the uniqueness is not enforced 
upon.
  -Note, that althogh constraint is not enforced on included 
columns, it still
  -depends on them. Consequently, some operations on these columns 
(e.g. DROP COLUMN)
  -can cause cascade constraint and index deletion.

  +An optional INCLUDING clause allows a list of 
columns
  +to be specified which will be included in the non-key portion of 
the index.
  +Although uniqueness is not enforced on the included columns, the 
constraint
  +still depends on them. Consequently, some operations on the 
included columns
  +(e.g. DROP COLUMN) can cause cascading 
constraint and index deletion.

  indexcmds.c
  -* are key columns, and which are just part of the INCLUDING list 
by check
  +* are key columns, and which are just part of the INCLUDING list 
by checking

ruleutils.c
  -   * meaningless there, so do not include them into the 
message.
  +   * meaningless there, so do not include them in the 
message.

pg_dump.c (does "if (fout->remoteVersion >= 90600)" also need to change?)
  -   * In 9.6 we add 

Re: [HACKERS] Remove the comment on the countereffectiveness of large shared_buffers on Windows

2016-11-03 Thread Brad DeJong
Change "even large" to "even larger" because it is used in a comparison.

> ... a reasonable starting value for shared_buffers is 25%
> of the memory in your system. There are some workloads where even large 
> settings
> for shared_buffers are effective, ...

... are some workloads where even larger settings ...

Regards,
Brad DeJong



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers