Re: [HACKERS] Hash support for grouping sets

2017-03-23 Thread Andrew Gierth
> "Andres" == Andres Freund  writes:

 Andres> a) cast result of lfirst/lnext/whatnot.

Again, what we need here is something like

#define lfirst_node(_type_, l) (castNode(_type_, lfirst(l)))

etc.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-23 Thread Andrew Gierth
> "Andres" == Andres Freund  writes:

 Andres> We usually cast the result of palloc.

 >> Rough count in the backend has ~400 without casts to ~1350 with, so
 >> this doesn't seem to have been consistently enforced.

 Andres> Yea, but we're still trying.

Well, a lot of the uncasted ones are in fairly new code, from quite a
number of different committers.

So if this is a big deal, why don't we already have

#define palloc_array(etype,ecount) (((etype) *) palloc((ecount) * 
sizeof(etype)))
#define palloc_object(otype) (((otype) *) palloc(sizeof(otype)))

or something of that ilk?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-23 Thread Andrew Gierth
> "Mark" == Mark Dilger  writes:

 Mark> Is there a performance test case where this patch should shine
 Mark> brightest?  I'd like to load a schema with lots of data, and run
 Mark> a grouping sets query, both before and after applying the patch,
 Mark> to see what the performance advantage is.

The area which I think is most important for performance is the handling
of small cubes; without this patch, a 2d cube needs 2 full sorts, a 3d
one needs 3, and a 4d one needs 6. In many real-world data sets these
would all hash entirely in memory.

So here's a very simple example (deliberately using integers for
grouping to minimize the advantage; grouping by text columns in a non-C
locale would show a much greater speedup for the patch):

create table sales (
  id serial,
  product_id integer,
  store_id integer,
  customer_id integer,
  qty integer);

-- random integer function
create function d(integer) returns integer language sql
 as $f$ select floor(random()*$1)::integer + 1; $f$;

-- 10 million random rows
insert into sales (product_id,store_id,customer_id,qty)
  select d(20), d(6), d(10), d(100) from generate_series(1,1000);

-- example 2d cube:
select product_id, store_id, count(*), sum(qty)
  from sales
 group by cube(product_id, store_id);

-- example 3d cube:
select product_id, store_id, customer_id, count(*), sum(qty)
  from sales
 group by cube(product_id, store_id, customer_id);

-- example 4d cube with a computed column:
select product_id, store_id, customer_id, (qty / 10), count(*), sum(qty)
  from sales
 group by cube(product_id, store_id, customer_id, (qty / 10));

On my machine, the 2d cube is about 3.6 seconds with the patch, and
about 8 seconds without it; the 4d is about 18 seconds with the patch
and about 32 seconds without it (all with work_mem=1GB, compiled with
-O2 and assertions off).

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-23 Thread Andres Freund
On 2017-03-23 05:09:46 +, Andrew Gierth wrote:
> > "Andres" == Andres Freund  writes:
> 
>  >> - Assert(newphase == 0 || newphase == aggstate->current_phase + 1);
>  >> + Assert(newphase <= 1 || newphase == aggstate->current_phase + 1);
> 
>  Andres> I think this somewhere in the file header needs an expanded
>  Andres> explanation about what these "special" phases mean.
> 
> I could move most of the block comment for ExecInitAgg up into the file
> header; would that be better?

Yes, I think so.


>  >> + values = palloc((1 + max_weight) * sizeof(double));
>  >> + sets = palloc((1 + max_weight) * sizeof(Bitmapset *));
> 
>  Andres> We usually cast the result of palloc.
> 
> Rough count in the backend has ~400 without casts to ~1350 with, so this
> doesn't seem to have been consistently enforced.

Yea, but we're still trying.

>  >> + RelOptInfo *grouped_rel,
>  >> + Path *path,
>  >> + bool is_sorted,
>  >> + bool can_hash,
>  >> + PathTarget *target,
>  >> + grouping_sets_data *gd,
>  >> + const AggClauseCosts 
> *agg_costs,
>  >> + double dNumGroups)
>  >> +{
>  >> + Query  *parse = root->parse;
>  >> +
>  >> + /*
>  >> +  * If we're not being offered sorted input, then only consider plans 
> that
>  >> +  * can be done entirely by hashing.
>  >> +  *
>  >> +  * We can hash everything if it looks like it'll fit in work_mem. But if
>  >> +  * the input is actually sorted despite not being advertised as such, we
>  >> +  * prefer to make use of that in order to use less memory.
>  >> +  * If none of the grouping sets are sortable, then ignore the work_mem
>  >> +  * limit and generate a path anyway, since otherwise we'll just fail.
>  >> +  */
>  >> + if (!is_sorted)
>  >> + {
>  >> + List   *new_rollups = NIL;
>  >> + RollupData *unhashable_rollup = NULL;
>  >> + List   *sets_data;
>  >> + List   *empty_sets_data = NIL;
>  >> + List   *empty_sets = NIL;
>  >> + ListCell   *lc;
>  >> + ListCell   *l_start = list_head(gd->rollups);
>  >> + AggStrategy strat = AGG_HASHED;
>  >> + Sizehashsize;
>  >> + double  exclude_groups = 0.0;
>  >> +
>  >> + Assert(can_hash);
>  >> +
>  >> + if (pathkeys_contained_in(root->group_pathkeys, path->pathkeys))
>  >> + {
>  >> + unhashable_rollup = lfirst(l_start);
> 
>  Andres> a) cast result of lfirst/lnext/whatnot.
> 
> casting lfirst is defensible (and only about 10% of existing uses don't
> cast it), but why would you ever cast the result of lnext?

Obviously lnext is bogus, was thinking of linitial...

>  >> + if (can_hash && gd->any_hashable)
>  >> + {
>  >> + List   *rollups = NIL;
>  >> + List   *hash_sets = list_copy(gd->unsortable_sets);
>  >> + double  availspace = (work_mem * 1024.0);
> 
>  Andres> Why the parens, and why the .0?
> 
> The .0 is because the * should be done as a double, not an int.

I was just missing why - but now that I've not already been awake for 20
hours, 8 of those crammed into a plane, it's obviously beneficial
because of overflow concerns.


>  Andres> I think you need a higher level explanation of how you're
>  Andres> mapping the work_mem issue to knapsack somewhere.
> 
> The general overview ("work_mem is the knapsack size, and we need to
> figure out how best to pack it with hashtables"), or the specific issue
> of the scale factor for discrete chunking of the size?

The general overview - the scaling thing doesn't seem that important to
understand in detail, to understand the approach.  Briefly explaining
what we're trying to minimize (sort steps), what the constraint is
(memory), that the problem is representable as the classic "knapsack
problem".


>  Andres> Hm.  So we're solely optimizing for the number of sorts, rather
>  Andres> the cost of the sorts.  Probably an acceptable tradeoff.
> 
> The existing cost model for sorts doesn't actually seem to help us here;
> all of our sorts sort the same data, just with different comparison
> columns, but cost_sort doesn't actually account for that (all its
> callers except the CLUSTER planning code pass in 0.0 for
> comparison_cost).
> 
> If the sort costs were correct, we could compute a "value" for each
> rollup that reflected the cost difference between the sort+unique
> comparisons for it, and the hash functions that would be called if we
> hashed instead; and just pass that to the knapsack function.

That'd indeed be desirable - but I think we can punt on that for now.

Greetings,

Andres Freund


-- 

Re: [HACKERS] Hash support for grouping sets

2017-03-23 Thread Mark Dilger

> On Mar 23, 2017, at 11:22 AM, Andrew Gierth  
> wrote:
> 
>> "Mark" == Mark Dilger  writes:
> 
> Mark> You define DiscreteKnapsack to take integer weights and double
> Mark> values, and perform the usual Dynamic Programming algorithm to
> Mark> solve.  But the only place you call this, you pass in NULL for
> Mark> the values, indicating that all the values are equal.  I'm
> Mark> confused why in this degenerate case you bother with the DP at
> Mark> all.  Can't you just load the knapsack from lightest to heaviest
> Mark> after sorting, reducing the runtime complexity to O(nlogn)?  It's
> Mark> been a long day.  Sorry if I am missing something obvious.
> 
> That's actually a very good question.
> 
> (Though in passing I should point out that the runtime cost is going to
> be negligible in all practical cases. Even an 8-way cube (256 grouping
> sets) has only 70 rollups, and a 4-way cube has only 6; the limit of
> 4096 grouping sets was only chosen because it was a nice round number
> that was larger than comparable limits in other database products. Other
> stuff in the grouping sets code has worse time bounds; the
> bipartite-match code used to minimize the number of rollups is
> potentially O(n^2.5) in the number of grouping sets.)
> 
> Thinking about it, it's not at all obvious what we should be optimizing
> for here in the absence of accurate sort costs.

Is there a performance test case where this patch should shine
brightest?  I'd like to load a schema with lots of data, and run
a grouping sets query, both before and after applying the patch,
to see what the performance advantage is.

mark

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-23 Thread Andres Freund
On 2017-03-23 03:43:57 +, Andrew Gierth wrote:
> > "Andres" == Andres Freund  writes:
> 
>  Andres> Changes to advance_aggregates() are, in my experience, quite
>  Andres> likely to have performance effects.  This needs some
>  Andres> performance tests.
>  [...]
>  Andres> Looks like it could all be noise, but it seems worthwhile to
>  Andres> look into some.
> 
> Trying to sort out signal from noise when dealing with performance
> impacts of no more than a few percent is _extremely hard_ these days.

Indeed. But that doesn't mean we needn't try.  With some determination
and profiling you can often sepearate signal from noise - I managed to
track down 0.12% regressions in the expression evaluation work...


> I will go ahead and do this, out of sheer curiosity if nothing else,
> but the preliminary results suggest there's probably nothing worth
> holding up the patch for.

Agreed. I'd want to run one more profile, checking whether the profiles
indicate new hotspots, but other than that...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-23 Thread Andrew Gierth
> "Mark" == Mark Dilger  writes:

 Mark> You define DiscreteKnapsack to take integer weights and double
 Mark> values, and perform the usual Dynamic Programming algorithm to
 Mark> solve.  But the only place you call this, you pass in NULL for
 Mark> the values, indicating that all the values are equal.  I'm
 Mark> confused why in this degenerate case you bother with the DP at
 Mark> all.  Can't you just load the knapsack from lightest to heaviest
 Mark> after sorting, reducing the runtime complexity to O(nlogn)?  It's
 Mark> been a long day.  Sorry if I am missing something obvious.

That's actually a very good question.

(Though in passing I should point out that the runtime cost is going to
be negligible in all practical cases. Even an 8-way cube (256 grouping
sets) has only 70 rollups, and a 4-way cube has only 6; the limit of
4096 grouping sets was only chosen because it was a nice round number
that was larger than comparable limits in other database products. Other
stuff in the grouping sets code has worse time bounds; the
bipartite-match code used to minimize the number of rollups is
potentially O(n^2.5) in the number of grouping sets.)

Thinking about it, it's not at all obvious what we should be optimizing
for here in the absence of accurate sort costs. Right now what matters
is saving as many sort steps as possible, since that's a saving likely
to be many orders of magnitude greater than the differences between two
sorts. The one heuristic that might be useful is to prefer the smaller
estimated size if other factors are equal, just on memory usage grounds,
but even that seems a bit tenuous.

I'm inclined to leave things as they are in the current patch, and maybe
revisit it during beta if we get any relevant feedback from people
actually using grouping sets?

 Mark> I'm guessing you intend to extend the code at some later date to
 Mark> have different values for items.  Is that right?

Well, as I mentioned in a reply to Andres, we probably should eventually
optimize for greatest reduction in cost, and the code as it stands would
allow that to be added easily, but that would require having more
accurate cost info than we have now. cost_sort doesn't take into
consideration the number or types of sort columns or the costs of their
comparison functions, so all sorts of the same input data are costed
equally.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-22 Thread Andrew Gierth
> "Andres" == Andres Freund  writes:

 >> -   Assert(newphase == 0 || newphase == aggstate->current_phase + 1);
 >> +   Assert(newphase <= 1 || newphase == aggstate->current_phase + 1);

 Andres> I think this somewhere in the file header needs an expanded
 Andres> explanation about what these "special" phases mean.

I could move most of the block comment for ExecInitAgg up into the file
header; would that be better?

 Andres> I've not yet read up on the thread, or the whole patch, but an
 Andres> explanation of the memory management for all those tables would
 Andres> be good somewhere (maybe I've just not read that far).

I can expand on that.

 Andres> "mixed types" sounds a bit weird.

changing to "if there are both types present"

 >> +   values = palloc((1 + max_weight) * sizeof(double));
 >> +   sets = palloc((1 + max_weight) * sizeof(Bitmapset *));

 Andres> We usually cast the result of palloc.

Rough count in the backend has ~400 without casts to ~1350 with, so this
doesn't seem to have been consistently enforced.

 >> +static void
 >> +_outGroupingSetData(StringInfo str, const GroupingSetData *node)
 >> +{
 >> +   WRITE_NODE_TYPE("GSDATA");
 >> +
 >> +   WRITE_NODE_FIELD(set);
 >> +   WRITE_FLOAT_FIELD(numGroups, "%.0f");
 >> +}

 Andres> .0f?

Copied from every other node that uses "%.0f" to write out a numGroups
value.

 >> +static grouping_sets_data *
 >> +preprocess_grouping_sets(PlannerInfo *root)
 >> +{

 Andres> It'd not hurt to add a comment as to what this function is
 Andres> actually doing.

Sure.

 Andres> I hope there's tests for both these branches.

A number of tests for both unsortable and unhashable cases are in the
regression test.

 >> +   /*
 >> +* Is it hashable? We pretend empty sets are hashable even 
 >> though we
 >> +* actually force them not to be hashed later. But don't bother 
 >> if
 >> +* there's nothing but empty sets.
 >> +*/

 Andres> Why?

If there's no non-empty grouping set then there's nothing to hash (the
hash code assumes at least one column). I'll put that in the comment.

 >> @@ -3214,6 +3407,11 @@ get_number_of_groups(PlannerInfo *root,
 >> * estimate_hashagg_tablesize
 >> * estimate the number of bytes that a hash aggregate hashtable will
 >> * require based on the agg_costs, path width and dNumGroups.
 >> + *
 >> + * XXX this may be over-estimating the size now that hashagg knows to omit
 >> + * unneeded columns from the hashtable. Also for mixed-mode grouping sets,
 >> + * grouping columns not in the hashed set are counted here even though 
 >> hashagg
 >> + * won't store them. Is this a problem?
 >> */

 Andres> Hm. This seems like it could move us to use sorting, even if
 Andres> not actually warranted.

This is largely a pre-existing issue that this patch doesn't try and
fix.  That would be an issue for a separate patch, I think.  I merely
added the comment to point it out.

 Andres> What's the newline after the comment about?

Old style habits die hard.

 >> +   RelOptInfo *grouped_rel,
 >> +   Path *path,
 >> +   bool is_sorted,
 >> +   bool can_hash,
 >> +   PathTarget *target,
 >> +   grouping_sets_data *gd,
 >> +   const AggClauseCosts 
 >> *agg_costs,
 >> +   double dNumGroups)
 >> +{
 >> +   Query  *parse = root->parse;
 >> +
 >> +   /*
 >> +* If we're not being offered sorted input, then only consider plans 
 >> that
 >> +* can be done entirely by hashing.
 >> +*
 >> +* We can hash everything if it looks like it'll fit in work_mem. But if
 >> +* the input is actually sorted despite not being advertised as such, we
 >> +* prefer to make use of that in order to use less memory.
 >> +* If none of the grouping sets are sortable, then ignore the work_mem
 >> +* limit and generate a path anyway, since otherwise we'll just fail.
 >> +*/
 >> +   if (!is_sorted)
 >> +   {
 >> +   List   *new_rollups = NIL;
 >> +   RollupData *unhashable_rollup = NULL;
 >> +   List   *sets_data;
 >> +   List   *empty_sets_data = NIL;
 >> +   List   *empty_sets = NIL;
 >> +   ListCell   *lc;
 >> +   ListCell   *l_start = list_head(gd->rollups);
 >> +   AggStrategy strat = AGG_HASHED;
 >> +   Sizehashsize;
 >> +   double  exclude_groups = 0.0;
 >> +
 >> +   Assert(can_hash);
 >> +
 >> +   if (pathkeys_contained_in(root->group_pathkeys, path->pathkeys))
 >> +   {
 >> +   unhashable_rollup = lfirst(l_start);

 Andres> a) cast result of

Re: [HACKERS] Hash support for grouping sets

2017-03-22 Thread Mark Dilger

> On Mar 22, 2017, at 8:09 AM, Mark Dilger  wrote:
> 
> 
>> On Mar 22, 2017, at 7:55 AM, Andrew Gierth  
>> wrote:
>> 
>> [snip]
>> 
>> This thread seems to have gone quiet - is it time for me to just go
>> ahead and commit the thing anyway? Anyone else want to weigh in?
> 
> Sorry for the delay.  I'll try to look at this today.  Others are most welcome
> to weigh in.

You define DiscreteKnapsack to take integer weights and double values,
and perform the usual Dynamic Programming algorithm to solve.  But
the only place you call this, you pass in NULL for the values, indicating
that all the values are equal.  I'm confused why in this degenerate case
you bother with the DP at all.  Can't you just load the knapsack from
lightest to heaviest after sorting, reducing the runtime complexity to O(nlogn)?
It's been a long day.  Sorry if I am missing something obvious.

I'm guessing you intend to extend the code at some later date to have
different values for items.  Is that right?

I reapplied your patch and reran the regression tests on my laptop and they
all pass.

mark



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-22 Thread Andrew Gierth
> "Andres" == Andres Freund  writes:

 Andres> Changes to advance_aggregates() are, in my experience, quite
 Andres> likely to have performance effects.  This needs some
 Andres> performance tests.
 [...]
 Andres> Looks like it could all be noise, but it seems worthwhile to
 Andres> look into some.

Trying to sort out signal from noise when dealing with performance
impacts of no more than a few percent is _extremely hard_ these days.

Remember this, from a couple of years back?  http://tinyurl.com/op9qg8a

That's a graph of performance results from tests where the only change
being made was adding padding bytes to a function that was never called
during the test. Performance differences on the order of 3% were being
introduced by doing nothing more than changing the actual memory
locations of functions.

My latest round of tests on this patch shows a similar effect. Here's
one representative test timing (a select count(*) with no grouping sets
involved):

 master: 5727ms
 gsets_hash: 5949ms
 gsets_hash + 500 padding bytes: 5680ms

Since the effect of padding is going to vary over time as other,
unrelated, patches happen, I think I can safely claim that the
performance of the patch at least overlaps the noise range of the
performance of current code.  To get a more definitive result, it would
be necessary to run at least some dozens of tests, with different
padding sizes, and determine whether the average changes detectably
between the two versions.  I will go ahead and do this, out of sheer
curiosity if nothing else, but the preliminary results suggest there's
probably nothing worth holding up the patch for.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-22 Thread Andres Freund
Hi,

> +/*
> + * Switch to phase "newphase", which must either be 0 or 1 (to reset) or
>   * current_phase + 1. Juggle the tuplesorts accordingly.
>   */
>  static void
>  initialize_phase(AggState *aggstate, int newphase)
>  {
> - Assert(newphase == 0 || newphase == aggstate->current_phase + 1);
> + Assert(newphase <= 1 || newphase == aggstate->current_phase + 1);

I think this somewhere in the file header needs an expanded explanation
about what these "special" phases mean.


> @@ -838,17 +897,21 @@ advance_transition_function(AggState *aggstate,
>  /*
>   * Advance each aggregate transition state for one input tuple.  The input
>   * tuple has been stored in tmpcontext->ecxt_outertuple, so that it is
> - * accessible to ExecEvalExpr.  pergroup is the array of per-group structs to
> - * use (this might be in a hashtable entry).
> + * accessible to ExecEvalExpr.
> + *
> + * We have two sets of transition states to handle: one for sorted 
> aggregation
> + * and one for hashed; we do them both here, to avoid multiple evaluation of
> + * the inputs.
>   *
>   * When called, CurrentMemoryContext should be the per-query context.
>   */

Changes to advance_aggregates() are, in my experience, quite likely to
have performance effects.  This needs some performance tests.

I scheduled a small TPCH comparison run:

master q01 min: 28924.766 dev-hash min: 28893.923 [diff -0.11]
master q02 min: 2448.135 dev-hash min: 2430.589 [diff -0.72]
master q03 min: 14596.292 dev-hash min: 14421.054 [diff -1.22]
master q04 min: 1999.684 dev-hash min: 1958.727 [diff -2.09]
master q05 min: 10638.148 dev-hash min: 10753.075 [diff +1.07]
master q06 min: 3679.955 dev-hash min: 3649.114 [diff -0.85]
master q07 min: 10097.272 dev-hash min: 10358.089 [diff +2.52]
master q08 min: 3103.698 dev-hash min: 3078.108 [diff -0.83]
master q09 min: 13034.87 dev-hash min: 13049.402 [diff +0.11]
master q10 min: 11702.966 dev-hash min: 11535.05 [diff -1.46]
master q11 min: 577.114 dev-hash min: 586.767 [diff +1.65]
master q12 min: 9087.413 dev-hash min: 9272.874 [diff +2.00]
master q13 min: 16353.813 dev-hash min: 16473.882 [diff +0.73]
master q14 min: 1564.321 dev-hash min: 1552.31 [diff -0.77]
master q15 min: 3958.565 dev-hash min: 3946.728 [diff -0.30]
master q16 min: 3793.345 dev-hash min: 3784.996 [diff -0.22]
master q17 min: 828.286 dev-hash min: 834.929 [diff +0.80]
master q18 min: 13473.084 dev-hash min: 13777.327 [diff +2.21]
master q19 min: 654.241 dev-hash min: 673.863 [diff +2.91]
master q20 min: 2906.811 dev-hash min: 2882.793 [diff -0.83]
master q22 min: 1226.397 dev-hash min: 1262.045 [diff +2.82]

master total min: 154649.176 dev-hash min: 155175.645 [diff +0.34]

Looks like it could all be noise, but it seems worthwhile to look into
some.



>   * GROUP BY columns.  The per-group data is allocated in lookup_hash_entry(),
>   * for each entry.
>   *
> - * The hash table always lives in the aggcontext memory context.
> + * We have a separate hashtable and associated perhash data structure for 
> each
> + * grouping set for which we're doing hashing.
> + *
> + * The hash tables always live in the hashcontext's per-tuple memory context
> + * (there is only one of these for all tables together, since they are all
> + * reset at the same time).
>   */

I've not yet read up on the thread, or the whole patch, but an
explanation of the memory management for all those tables would be good
somewhere (maybe I've just not read that far).


> @@ -2388,7 +2597,36 @@ agg_retrieve_hash_table(AggState *aggstate)
>   * ExecInitAgg
>   *
>   *   Creates the run-time information for the agg node produced by the
> - *   planner and initializes its outer subtree
> + *   planner and initializes its outer subtree.
> + *
> + *   What we get from the planner is actually one "real" Agg node which is 
> part
> + *   of the plan tree proper, but which optionally has an additional list of 
> Agg
> + *   nodes hung off the side via the "chain" field.  This is because an Agg 
> node
> + *   happens to be a convenient representation of all the data we need for
> + *   grouping sets.
> + *
> + *   For many purposes, we treat the "real" node as if it were just the first
> + *   node in the chain.  The chain must be ordered such that hashed entries 
> come
> + *   before sorted/plain entries; the real node is marked AGG_MIXED if there 
> are
> + *   mixed types (in which case the real node describes one of the hashed

"mixed types" sounds a bit weird.


> + *   We collect all hashed nodes into a single "phase", numbered 0, and 
> create a
> + *   sorted phase (numbered 1..n) for each AGG_SORTED or AGG_PLAIN node.  
> Phase
> + *   0 is allocated even if there are no hashes, but remains unused in that
> + *   case.

Ah, here's the explanation I'd been looking for ;)



> +Bitmapset *
> +DiscreteKnapsack(int max_weight, int num_items,
> +  int *item_weights, double *item_values)
> +{
> + MemoryContext local_ctx = AllocSetContextCre

Re: [HACKERS] Hash support for grouping sets

2017-03-22 Thread Andrew Gierth
[snip]

This thread seems to have gone quiet - is it time for me to just go
ahead and commit the thing anyway? Anyone else want to weigh in?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-08 Thread Andrew Gierth
> "Mark" == Mark Dilger  writes:

 Mark> Hi Andrew,

 Mark> Reviewing the patch a bit more, I find it hard to understand the
 Mark> comment about passing -1 as a flag for finalize_aggregates.  Any
 Mark> chance you can spend a bit more time word-smithing that code
 Mark> comment?

Actually, ignore that prior response, I'll refactor it a bit to remove
the need for the comment at all.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-08 Thread Mark Dilger

> On Mar 8, 2017, at 9:40 AM, Andrew Gierth  wrote:
> 
>> "Mark" == Mark Dilger  writes:
> 
> Mark> Hi Andrew,
> 
> Mark> Reviewing the patch a bit more, I find it hard to understand the
> Mark> comment about passing -1 as a flag for finalize_aggregates.  Any
> Mark> chance you can spend a bit more time word-smithing that code
> Mark> comment?
> 
> Sure.
> 
> How does this look for wording (I'll incorporate it into an updated
> patch later):

Much better.  Thanks!

> /*
> * Compute the final value of all aggregates for one group.
> *
> * This function handles only one grouping set at a time.  But in the hash
> * case, it's the caller's responsibility to have selected the set already, and
> * we pass in -1 as currentSet here to flag that; this also changes how we
> * handle the indexing into AggStatePerGroup as explained below.
> *
> * Results are stored in the output econtext aggvalues/aggnulls.
> */
> static void
> finalize_aggregates(AggState *aggstate,
>   AggStatePerAgg peraggs,
>   AggStatePerGroup pergroup,
>   int currentSet)
> {
>   ExprContext *econtext = aggstate->ss.ps.ps_ExprContext;
>   Datum  *aggvalues = econtext->ecxt_aggvalues;
>   bool   *aggnulls = econtext->ecxt_aggnulls;
>   int aggno;
>   int transno;
> 
>   /*
>* If currentSet >= 0, then we're doing sorted grouping, and pergroup 
> is an
>* array of size numTrans*numSets which we have to index into using
>* currentSet in addition to transno. The caller may not have selected 
> the
>* set, so we do that.
>*
>* If currentSet < 0, then we're doing hashed grouping, and pergroup is 
> an
>* array of only numTrans items (since for hashed grouping, each 
> grouping
>* set is in a separate hashtable).  We rely on the caller having done
>* select_current_set, and we fudge currentSet to 0 in order to make the
>* same indexing calculations work as for the grouping case.
>*/
>   if (currentSet >= 0)
>   select_current_set(aggstate, currentSet, false);
>   else
>   currentSet = 0;
> 
> 
> -- 
> Andrew (irc:RhodiumToad)



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-08 Thread Andrew Gierth
> "Mark" == Mark Dilger  writes:

 Mark> Hi Andrew,

 Mark> Reviewing the patch a bit more, I find it hard to understand the
 Mark> comment about passing -1 as a flag for finalize_aggregates.  Any
 Mark> chance you can spend a bit more time word-smithing that code
 Mark> comment?

Sure.

How does this look for wording (I'll incorporate it into an updated
patch later):

/*
 * Compute the final value of all aggregates for one group.
 *
 * This function handles only one grouping set at a time.  But in the hash
 * case, it's the caller's responsibility to have selected the set already, and
 * we pass in -1 as currentSet here to flag that; this also changes how we
 * handle the indexing into AggStatePerGroup as explained below.
 *
 * Results are stored in the output econtext aggvalues/aggnulls.
 */
static void
finalize_aggregates(AggState *aggstate,
AggStatePerAgg peraggs,
AggStatePerGroup pergroup,
int currentSet)
{
ExprContext *econtext = aggstate->ss.ps.ps_ExprContext;
Datum  *aggvalues = econtext->ecxt_aggvalues;
bool   *aggnulls = econtext->ecxt_aggnulls;
int aggno;
int transno;

/*
 * If currentSet >= 0, then we're doing sorted grouping, and pergroup 
is an
 * array of size numTrans*numSets which we have to index into using
 * currentSet in addition to transno. The caller may not have selected 
the
 * set, so we do that.
 *
 * If currentSet < 0, then we're doing hashed grouping, and pergroup is 
an
 * array of only numTrans items (since for hashed grouping, each 
grouping
 * set is in a separate hashtable).  We rely on the caller having done
 * select_current_set, and we fudge currentSet to 0 in order to make the
 * same indexing calculations work as for the grouping case.
 */
if (currentSet >= 0)
select_current_set(aggstate, currentSet, false);
else
currentSet = 0;


-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-08 Thread Mark Dilger
Hi Andrew,

Reviewing the patch a bit more, I find it hard to understand the comment about
passing -1 as a flag for finalize_aggregates.  Any chance you can spend a bit
more time word-smithing that code comment?


@@ -1559,7 +1647,9 @@ prepare_projection_slot(AggState *aggstate, 
TupleTableSlot *slot, int currentSet
 /*
  * Compute the final value of all aggregates for one group.
  *
- * This function handles only one grouping set at a time.
+ * This function handles only one grouping set at a time.  But in the hash
+ * case, it's the caller's responsibility to have selected the set already, and
+ * we pass in -1 here to flag that and to control the indexing into pertrans.
  *
  * Results are stored in the output econtext aggvalues/aggnulls.
  */
@@ -1575,10 +1665,11 @@ finalize_aggregates(AggState *aggstate,
int aggno;
int transno;

-   Assert(currentSet == 0 ||
-  ((Agg *) aggstate->ss.ps.plan)->aggstrategy != AGG_HASHED);
-
-   aggstate->current_set = currentSet;
+   /* ugly hack for hash */
+   if (currentSet >= 0)
+   select_current_set(aggstate, currentSet, false);
+   else
+   currentSet = 0;


> On Mar 8, 2017, at 8:00 AM, Mark Dilger  wrote:
> 
> 
>> On Mar 8, 2017, at 5:47 AM, Andrew Gierth  
>> wrote:
>> 
>>> "Mark" == Mark Dilger  writes:
>> 
>> Mark> On my MacBook, `make check-world` gives differences in the
>> Mark> contrib modules:
>> 
>> Thanks! Latest cleaned up version of patch is attached.
> 
> This fixes both the warning and the contrib tests on linux and mac.
> 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-08 Thread Mark Dilger

> On Mar 8, 2017, at 5:47 AM, Andrew Gierth  wrote:
> 
>> "Mark" == Mark Dilger  writes:
> 
> Mark> On my MacBook, `make check-world` gives differences in the
> Mark> contrib modules:
> 
> Thanks! Latest cleaned up version of patch is attached.

This fixes both the warning and the contrib tests on linux and mac.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-08 Thread Mark Dilger

> On Mar 8, 2017, at 2:30 AM, Andrew Gierth  wrote:
> 
>> "Mark" == Mark Dilger  writes:
> 
> Mark> On linux/gcc the patch generates a warning in nodeAgg.c that is
> Mark> fairly easy to fix.  Using -Werror to make catching the error
> Mark> easier, I get:
> 
> what gcc version is this exactly?
> 

Linux version 2.6.32-573.18.1.el6.x86_64 (mockbu...@c6b8.bsys.dev.centos.org) 
(gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC) ) #1 SMP Tue Feb 9 
22:46:17 UTC 2016



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-08 Thread Andrew Gierth
> "Mark" == Mark Dilger  writes:

 Mark> On linux/gcc the patch generates a warning in nodeAgg.c that is
 Mark> fairly easy to fix.  Using -Werror to make catching the error
 Mark> easier, I get:

what gcc version is this exactly?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-07 Thread Mark Dilger
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

On linux/gcc the patch generates a warning in nodeAgg.c that is fairly easy
to fix.  Using -Werror to make catching the error easier, I get:

make[3]: Entering directory `/home/mark/postgresql.clean/src/backend/executor'
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement 
-Wendif-labels -Wmissing-format-attribute -Wformat-security 
-fno-strict-aliasing -fwrapv -O2 -Werror -I../../../src/include -D_GNU_SOURCE   
-c -o nodeAgg.o nodeAgg.c -MMD -MP -MF .deps/nodeAgg.Po
cc1: warnings being treated as errors
nodeAgg.c: In function ‘ExecAgg’:
nodeAgg.c:2053: error: ‘result’ may be used uninitialized in this function
make[3]: *** [nodeAgg.o] Error 1
make[3]: Leaving directory `/home/mark/postgresql.clean/src/backend/executor'
make[2]: *** [executor-recursive] Error 2
make[2]: Leaving directory `/home/mark/postgresql.clean/src/backend'
make[1]: *** [all-backend-recurse] Error 2
make[1]: Leaving directory `/home/mark/postgresql.clean/src'
make: *** [all-src-recurse] Error 2


There are two obvious choices to silence the warning:

diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index f059560..8959f5b 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -2066,6 +2066,7 @@ ExecAgg(AggState *node)
break;
case AGG_PLAIN:
case AGG_SORTED:
+   default:
result = agg_retrieve_direct(node);
break;
}

which I like better, because I don't expect it would create
an extra instruction in the compiled code, but also:

diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index f059560..eab2605 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -2050,7 +2050,7 @@ lookup_hash_entries(AggState *aggstate)
 TupleTableSlot *
 ExecAgg(AggState *node)
 {
-   TupleTableSlot *result;
+   TupleTableSlot *result = NULL;
 
if (!node->agg_done)
{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-07 Thread Mark Dilger

> On Mar 7, 2017, at 1:43 PM, Mark Dilger  wrote:
> 
> On my MacBook, `make check-world` gives differences in the contrib modules:

I get the same (or similar -- didn't check) regression failure on CentOS, so 
this
doesn't appear to be MacBook / hardware specific.

Mark Dilger

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-07 Thread Mark Dilger
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

On my MacBook, `make check-world` gives differences in the contrib modules:

cat contrib/postgres_fdw/regression.diffs
*** 
/Users/mark/hydra/postgresql.review/contrib/postgres_fdw/expected/postgres_fdw.out
  2017-03-03 13:33:47.0 -0800
--- 
/Users/mark/hydra/postgresql.review/contrib/postgres_fdw/results/postgres_fdw.out
   2017-03-07 13:27:56.0 -0800
***
*** 3148,3163 
  -- Grouping sets
  explain (verbose, costs off)
  select c2, sum(c1) from ft1 where c2 < 3 group by rollup(c2) order by 1 nulls 
last;
! QUERY PLAN
 
! 
---
!  GroupAggregate
!Output: c2, sum(c1)
!Group Key: ft1.c2
!Group Key: ()
!->  Foreign Scan on public.ft1
!  Output: c2, c1
!  Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1" WHERE ((c2 < 3)) ORDER 
BY c2 ASC NULLS LAST
! (7 rows)
  
  select c2, sum(c1) from ft1 where c2 < 3 group by rollup(c2) order by 1 nulls 
last;
   c2 |  sum   
--- 3148,3166 
  -- Grouping sets
  explain (verbose, costs off)
  select c2, sum(c1) from ft1 where c2 < 3 group by rollup(c2) order by 1 nulls 
last;
!   QUERY PLAN  
! --
!  Sort
!Output: c2, (sum(c1))
!Sort Key: ft1.c2
!->  MixedAggregate
!  Output: c2, sum(c1)
!  Hash Key: ft1.c2
!  Group Key: ()
!  ->  Foreign Scan on public.ft1
!Output: c2, c1
!Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1" WHERE ((c2 < 3))
! (10 rows)
  
  select c2, sum(c1) from ft1 where c2 < 3 group by rollup(c2) order by 1 nulls 
last;
   c2 |  sum   
***
*** 3170,3185 
  
  explain (verbose, costs off)
  select c2, sum(c1) from ft1 where c2 < 3 group by cube(c2) order by 1 nulls 
last;
! QUERY PLAN
 
! 
---
!  GroupAggregate
!Output: c2, sum(c1)
!Group Key: ft1.c2
!Group Key: ()
!->  Foreign Scan on public.ft1
!  Output: c2, c1
!  Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1" WHERE ((c2 < 3)) ORDER 
BY c2 ASC NULLS LAST
! (7 rows)
  
  select c2, sum(c1) from ft1 where c2 < 3 group by cube(c2) order by 1 nulls 
last;
   c2 |  sum   
--- 3173,3191 
  
  explain (verbose, costs off)
  select c2, sum(c1) from ft1 where c2 < 3 group by cube(c2) order by 1 nulls 
last;
!   QUERY PLAN  
! --
!  Sort
!Output: c2, (sum(c1))
!Sort Key: ft1.c2
!->  MixedAggregate
!  Output: c2, sum(c1)
!  Hash Key: ft1.c2
!  Group Key: ()
!  ->  Foreign Scan on public.ft1
!Output: c2, c1
!Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1" WHERE ((c2 < 3))
! (10 rows)
  
  select c2, sum(c1) from ft1 where c2 < 3 group by cube(c2) order by 1 nulls 
last;
   c2 |  sum   
***
*** 3192,3211 
  
  explain (verbose, costs off)
  select c2, c6, sum(c1) from ft1 where c2 < 3 group by grouping sets(c2, c6) 
order by 1 nulls last, 2 nulls last;
!  QUERY PLAN   
   
! 
-
   Sort
 Output: c2, c6, (sum(c1))
 Sort Key: ft1.c2, ft1.c6
!->  GroupAggregate
   Output: c2, c6, sum(c1)
!  Group Key: ft1.c2
!  Sort Key: ft1.c6
!Group Key: ft1.c6
   ->  Foreign Scan on public.ft1
 Output: c2, c6, c1
!Remote SQL: SELECT "C 1", c2, c6 FROM "S 1"."T 1" WHERE ((c2 < 
3)) ORDER BY c2 ASC NULLS LAST
! (11 rows)
  
  select c2, c6, sum(c1) from ft1 where c2 < 3 group by grouping sets(c2, c6) 
order by 1 nulls last, 2 nulls last;
   c2 | c6 |  sum  
--- 3198,3216 
  
  explain (verbose, costs off)
  select c2, c6, sum(c1) from ft1 where c2 < 3 group by grouping sets(c2, c6) 
order by 1 nulls last, 2 nulls last;
! QUERY PLAN

! 
--
   Sort
 Output: c2, c6, (sum(c1))
 Sort Key: ft1.c2, ft1.c6
!->  HashAggregate
   Output: c2, c6, sum(c1)
!   

Re: [HACKERS] Hash support for grouping sets

2017-02-24 Thread Andrew Gierth
> "Thom" == Thom Brown  writes:

 Thom> This doesn't apply cleanly to latest master.  Could you please
 Thom> post a rebased patch?

Sure.

-- 
Andrew (irc:RhodiumToad)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c9e0a3e..480a07e 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -996,6 +996,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		pname = "HashAggregate";
 		strategy = "Hashed";
 		break;
+	case AGG_MIXED:
+		pname = "MixedAggregate";
+		strategy = "Mixed";
+		break;
 	default:
 		pname = "Aggregate ???";
 		strategy = "???";
@@ -1924,6 +1928,19 @@ show_grouping_set_keys(PlanState *planstate,
 	ListCell   *lc;
 	List	   *gsets = aggnode->groupingSets;
 	AttrNumber *keycols = aggnode->grpColIdx;
+	const char *keyname;
+	const char *keysetname;
+
+	if (aggnode->aggstrategy == AGG_HASHED || aggnode->aggstrategy == AGG_MIXED)
+	{
+		keyname = "Hash Key";
+		keysetname = "Hash Keys";
+	}
+	else
+	{
+		keyname = "Group Key";
+		keysetname = "Group Keys";
+	}
 
 	ExplainOpenGroup("Grouping Set", NULL, true, es);
 
@@ -1938,7 +1955,7 @@ show_grouping_set_keys(PlanState *planstate,
 			es->indent++;
 	}
 
-	ExplainOpenGroup("Group Keys", "Group Keys", false, es);
+	ExplainOpenGroup(keysetname, keysetname, false, es);
 
 	foreach(lc, gsets)
 	{
@@ -1962,12 +1979,12 @@ show_grouping_set_keys(PlanState *planstate,
 		}
 
 		if (!result && es->format == EXPLAIN_FORMAT_TEXT)
-			ExplainPropertyText("Group Key", "()", es);
+			ExplainPropertyText(keyname, "()", es);
 		else
-			ExplainPropertyListNested("Group Key", result, es);
+			ExplainPropertyListNested(keyname, result, es);
 	}
 
-	ExplainCloseGroup("Group Keys", "Group Keys", false, es);
+	ExplainCloseGroup(keysetname, keysetname, false, es);
 
 	if (sortnode && es->format == EXPLAIN_FORMAT_TEXT)
 		es->indent--;
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index aa08152..f059560 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -122,12 +122,19 @@
  *	  specific).
  *
  *	  Where more complex grouping sets are used, we break them down into
- *	  "phases", where each phase has a different sort order.  During each
- *	  phase but the last, the input tuples are additionally stored in a
- *	  tuplesort which is keyed to the next phase's sort order; during each
- *	  phase but the first, the input tuples are drawn from the previously
- *	  sorted data.  (The sorting of the data for the first phase is handled by
- *	  the planner, as it might be satisfied by underlying nodes.)
+ *	  "phases", where each phase has a different sort order (except phase 0
+ *	  which is reserved for hashing).  During each phase but the last, the
+ *	  input tuples are additionally stored in a tuplesort which is keyed to the
+ *	  next phase's sort order; during each phase but the first, the input
+ *	  tuples are drawn from the previously sorted data.  (The sorting of the
+ *	  data for the first phase is handled by the planner, as it might be
+ *	  satisfied by underlying nodes.)
+ *
+ *	  Hashing can be mixed with sorted grouping.  To do this, we have an
+ *	  AGG_MIXED strategy that populates the hashtables during the first sorted
+ *	  phase, and switches to reading them out after completing all sort phases.
+ *	  We can also support AGG_HASHED with multiple hash tables and no sorting
+ *	  at all.
  *
  *	  From the perspective of aggregate transition and final functions, the
  *	  only issue regarding grouping sets is this: a single call site (flinfo)
@@ -139,8 +146,6 @@
  *	  sensitive to the grouping set for which the aggregate function is
  *	  currently being called.
  *
- *	  TODO: AGG_HASHED doesn't support multiple grouping sets yet.
- *
  * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
@@ -432,6 +437,7 @@ typedef struct AggStatePerGroupData
  */
 typedef struct AggStatePerPhaseData
 {
+	AggStrategy aggstrategy;	/* strategy for this phase */
 	int			numsets;		/* number of grouping sets (or 0) */
 	int		   *gset_lengths;	/* lengths of grouping sets */
 	Bitmapset **grouped_cols;	/* column groupings for rollup */
@@ -440,7 +446,31 @@ typedef struct AggStatePerPhaseData
 	Sort	   *sortnode;		/* Sort node for input ordering for phase */
 }	AggStatePerPhaseData;
 
+/*
+ * AggStatePerHashData - per-hashtable state
+ *
+ * When doing grouping sets with hashing, we have one of these for each
+ * grouping set. (When doing hashing without grouping sets, we have just one of
+ * them.)
+ */
+
+typedef struct AggStatePerHashData
+{
+	TupleHashTable hashtable;	/* hash table with one entry per group */
+	TupleHashIterator hashiter; /* for iterating through hash table */
+	TupleTableSlot *hashslot;	/* slot for loading hash table */
+	FmgrInfo   *hashfunctions;	/* per-grouping-field hash fns */
+

Re: [HACKERS] Hash support for grouping sets

2017-02-22 Thread Thom Brown
On 6 January 2017 at 03:48, Andrew Gierth  wrote:
> Herewith a patch for doing grouping sets via hashing or mixed hashing
> and sorting.
>
> The principal objective is to pick whatever combination of grouping sets
> has an estimated size that fits in work_mem, and minimizes the number of
> sorting passes we need to do over the data, and hash those.  (Yes, this
> is a knapsack problem.)
>
> This is achieved by adding another strategy to the Agg node; AGG_MIXED
> means that both hashing and (sorted or plain) grouping happens.  In
> addition, support for multiple hash tables in AGG_HASHED mode is added.
>
> Sample plans look like this:
>
> explain select two,ten from onek group by cube(two,ten);
>   QUERY PLAN
> --
>  MixedAggregate  (cost=0.00..89.33 rows=33 width=8)
>Hash Key: two, ten
>Hash Key: two
>Hash Key: ten
>Group Key: ()
>->  Seq Scan on onek  (cost=0.00..79.00 rows=1000 width=8)
>
> explain select two,ten from onek group by two, cube(ten,twenty);
>   QUERY PLAN
> ---
>  HashAggregate  (cost=86.50..100.62 rows=162 width=12)
>Hash Key: two, ten, twenty
>Hash Key: two, ten
>Hash Key: two
>Hash Key: two, twenty
>->  Seq Scan on onek  (cost=0.00..79.00 rows=1000 width=12)
>
> set work_mem='64kB';
> explain select count(*) from tenk1
>   group by grouping sets (unique1,thousand,hundred,ten,two);
>QUERY PLAN
> 
>  MixedAggregate  (cost=1535.39..3023.89 rows=2 width=28)
>Hash Key: two
>Hash Key: ten
>Hash Key: hundred
>Group Key: unique1
>Sort Key: thousand
>  Group Key: thousand
>->  Sort  (cost=1535.39..1560.39 rows=1 width=20)
>  Sort Key: unique1
>  ->  Seq Scan on tenk1  (cost=0.00..458.00 rows=1 width=20)
> (10 rows)
>
> Columns with unhashable or unsortable data types are handled
> appropriately, though obviously every individual grouping set must end
> up containing compatible column types.
>
> There is one current weakness which I don't see a good solution for: the
> planner code still has to pick a single value for group_pathkeys before
> planning the input path.  This means that we sometimes can't choose a
> minimal set of sorts, because we may have chosen a sort order for a
> grouping set that should have been hashed, for example:
>
> explain select count(*) from tenk1
>   group by grouping sets (two,unique1,thousand,hundred,ten);
>QUERY PLAN
> 
>  MixedAggregate  (cost=1535.39..4126.28 rows=2 width=28)
>Hash Key: ten
>Hash Key: hundred
>Group Key: two
>Sort Key: unique1
>  Group Key: unique1
>Sort Key: thousand
>  Group Key: thousand
>->  Sort  (cost=1535.39..1560.39 rows=1 width=20)
>  Sort Key: two
>  ->  Seq Scan on tenk1  (cost=0.00..458.00 rows=1 width=20)
>
> (3 sorts, vs. 2 in the previous example that listed the same grouping
> sets in different order)
>
> Current status of the work: probably still needs cleanup, maybe some
> better regression tests, and Andres has expressed some concern over the
> performance impact of additional complexity in advance_aggregates; it
> would be useful if this could be tested.  But it should be reviewable
> as-is.

This doesn't apply cleanly to latest master.  Could you please post a
rebased patch?

Thanks

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-01-17 Thread Robert Haas
On Mon, Jan 16, 2017 at 10:59 AM, Finnerty, Jim  wrote:
> The ability to exploit hashed aggregation within sorted groups, when the 
> order of the input stream can be exploited this way, is potentially a useful 
> way to improve aggregation performance more generally.  This would 
> potentially be beneficial when the input size is expected to be larger than 
> the amount of working memory available for hashed aggregation, but where 
> there is enough memory to hash-aggregate just the unsorted grouping key 
> combinations, and when the cumulative cost of rebuilding the hash table for 
> each sorted subgroup is less than the cost of sorting the entire input.  In 
> other words, if most of the grouping key combinations are already segregated 
> by virtue of the input order, then hashing the remaining combinations within 
> each sorted group might be done in memory, at the cost of rebuilding the hash 
> table for each sorted subgroup.

Neat idea.

> I haven’t looked at the code for this change yet (I hope I will have the time 
> to do that).  Ideally the decision to choose the aggregation method as 
> sorted, hashed, or mixed hash/sort should be integrated into the cost model, 
> but given the notorious difficulty of estimating intermediate cardinalities 
> accurately it would be difficult to develop a cardinality model and a cost 
> model accurate enough to choose among these options consistently well.

Yes, that might be a little tricky.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-01-16 Thread Finnerty, Jim
The ability to exploit hashed aggregation within sorted groups, when the order 
of the input stream can be exploited this way, is potentially a useful way to 
improve aggregation performance more generally.  This would potentially be 
beneficial when the input size is expected to be larger than the amount of 
working memory available for hashed aggregation, but where there is enough 
memory to hash-aggregate just the unsorted grouping key combinations, and when 
the cumulative cost of rebuilding the hash table for each sorted subgroup is 
less than the cost of sorting the entire input.  In other words, if most of the 
grouping key combinations are already segregated by virtue of the input order, 
then hashing the remaining combinations within each sorted group might be done 
in memory, at the cost of rebuilding the hash table for each sorted subgroup.

I haven’t looked at the code for this change yet (I hope I will have the time 
to do that).  Ideally the decision to choose the aggregation method as sorted, 
hashed, or mixed hash/sort should be integrated into the cost model, but given 
the notorious difficulty of estimating intermediate cardinalities accurately it 
would be difficult to develop a cardinality model and a cost model accurate 
enough to choose among these options consistently well.

Jim Finnerty
Amazon Corp.

On 1/10/17, 10:22 AM, "pgsql-hackers-ow...@postgresql.org on behalf of Robert 
Haas"  
wrote:

On Thu, Jan 5, 2017 at 10:48 PM, Andrew Gierth
 wrote:
> Herewith a patch for doing grouping sets via hashing or mixed hashing
> and sorting.

Cool.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-01-10 Thread Robert Haas
On Thu, Jan 5, 2017 at 10:48 PM, Andrew Gierth
 wrote:
> Herewith a patch for doing grouping sets via hashing or mixed hashing
> and sorting.

Cool.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Hash support for grouping sets

2017-01-05 Thread Andrew Gierth
Herewith a patch for doing grouping sets via hashing or mixed hashing
and sorting.

The principal objective is to pick whatever combination of grouping sets
has an estimated size that fits in work_mem, and minimizes the number of
sorting passes we need to do over the data, and hash those.  (Yes, this
is a knapsack problem.)

This is achieved by adding another strategy to the Agg node; AGG_MIXED
means that both hashing and (sorted or plain) grouping happens.  In
addition, support for multiple hash tables in AGG_HASHED mode is added.

Sample plans look like this:

explain select two,ten from onek group by cube(two,ten);
  QUERY PLAN  
--
 MixedAggregate  (cost=0.00..89.33 rows=33 width=8)
   Hash Key: two, ten
   Hash Key: two
   Hash Key: ten
   Group Key: ()
   ->  Seq Scan on onek  (cost=0.00..79.00 rows=1000 width=8)

explain select two,ten from onek group by two, cube(ten,twenty);
  QUERY PLAN   
---
 HashAggregate  (cost=86.50..100.62 rows=162 width=12)
   Hash Key: two, ten, twenty
   Hash Key: two, ten
   Hash Key: two
   Hash Key: two, twenty
   ->  Seq Scan on onek  (cost=0.00..79.00 rows=1000 width=12)

set work_mem='64kB';
explain select count(*) from tenk1
  group by grouping sets (unique1,thousand,hundred,ten,two);
   QUERY PLAN   

 MixedAggregate  (cost=1535.39..3023.89 rows=2 width=28)
   Hash Key: two
   Hash Key: ten
   Hash Key: hundred
   Group Key: unique1
   Sort Key: thousand
 Group Key: thousand
   ->  Sort  (cost=1535.39..1560.39 rows=1 width=20)
 Sort Key: unique1
 ->  Seq Scan on tenk1  (cost=0.00..458.00 rows=1 width=20)
(10 rows)

Columns with unhashable or unsortable data types are handled
appropriately, though obviously every individual grouping set must end
up containing compatible column types.

There is one current weakness which I don't see a good solution for: the
planner code still has to pick a single value for group_pathkeys before
planning the input path.  This means that we sometimes can't choose a
minimal set of sorts, because we may have chosen a sort order for a
grouping set that should have been hashed, for example:

explain select count(*) from tenk1
  group by grouping sets (two,unique1,thousand,hundred,ten);
   QUERY PLAN   

 MixedAggregate  (cost=1535.39..4126.28 rows=2 width=28)
   Hash Key: ten
   Hash Key: hundred
   Group Key: two
   Sort Key: unique1
 Group Key: unique1
   Sort Key: thousand
 Group Key: thousand
   ->  Sort  (cost=1535.39..1560.39 rows=1 width=20)
 Sort Key: two
 ->  Seq Scan on tenk1  (cost=0.00..458.00 rows=1 width=20)

(3 sorts, vs. 2 in the previous example that listed the same grouping
sets in different order)

Current status of the work: probably still needs cleanup, maybe some
better regression tests, and Andres has expressed some concern over the
performance impact of additional complexity in advance_aggregates; it
would be useful if this could be tested.  But it should be reviewable
as-is.

-- 
Andrew (irc:RhodiumToad)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c762fb0..6c787ea 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -972,6 +972,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		pname = "HashAggregate";
 		strategy = "Hashed";
 		break;
+	case AGG_MIXED:
+		pname = "MixedAggregate";
+		strategy = "Mixed";
+		break;
 	default:
 		pname = "Aggregate ???";
 		strategy = "???";
@@ -1900,6 +1904,19 @@ show_grouping_set_keys(PlanState *planstate,
 	ListCell   *lc;
 	List	   *gsets = aggnode->groupingSets;
 	AttrNumber *keycols = aggnode->grpColIdx;
+	const char *keyname;
+	const char *keysetname;
+
+	if (aggnode->aggstrategy == AGG_HASHED || aggnode->aggstrategy == AGG_MIXED)
+	{
+		keyname = "Hash Key";
+		keysetname = "Hash Keys";
+	}
+	else
+	{
+		keyname = "Group Key";
+		keysetname = "Group Keys";
+	}
 
 	ExplainOpenGroup("Grouping Set", NULL, true, es);
 
@@ -1914,7 +1931,7 @@ show_grouping_set_keys(PlanState *planstate,
 			es->indent++;
 	}
 
-	ExplainOpenGroup("Group Keys", "Group Keys", false, es);
+	ExplainOpenGroup(keysetname, keysetname, false, es);
 
 	foreach(lc, gsets)
 	{
@@ -1938,12 +1955,12 @@ show_grouping_set_keys(PlanState *planstate,
 		}
 
 		if (!result && es->format == EXPLAIN_FORMAT_TEXT)
-			ExplainPropertyText("Group Key", "()", es);
+			ExplainPropertyText(keyname, "()", es);
 		else
-			ExplainPropertyListNested("Group Key", result, es);
+