Re: [HACKERS] Parallel Aggregate costs don't consider combine/serial/deserial funcs
On Tue, Apr 12, 2016 at 5:38 PM, David Rowley wrote: >>> One small point which I was a little unsure of in the attached is, >>> should the "if (aggref->aggdirectargs)" part of >>> count_agg_clauses_walker() be within the "if >>> (!context->combineStates)". I simply couldn't decide. We currently >>> have no aggregates which this affects anyway, per; select * from >>> pg_aggregate where aggcombinefn <> 0 and aggkind <> 'n'; so for now >>> I've left it outwith. >> >> The direct arguments would be evaluated in the worker, but not in the >> leader, right? Or am I confused? > > That seems right, but I just can't think of how its possible to > parallelise these aggregates anyway. Well, if you could ensure that each worker would see a whole group, you could do it, I think. But it's probably fine to just leave this for now. It's not like it can't be changed if somebody figures out some cool thing to do in this area. >>> Another thing I thought of is that it's not too nice that I have to >>> pass 3 bools to count_agg_clauses() in order to tell it what to do. I >>> was tempted to invent some bitmask flags for this, then modify >>> create_agg_path() to use the same flags, but I thought I'd better not >>> cause too much churn with this patch. >> >> I'm kinda tempted to say this should be using an enum. I note that >> serialStates has a subtly different meaning here than in some other >> places where you have used the same term. > > hmm, I'm not sure how it's subtly different. Do you mean the > preference towards costing the finalfn when finalizeAggs is true, and > ignoring the serialfn in this case? nodeAgg.c should do the same, > although it'll deserialize in such a case. We can never finalize and > serialize in the same node. I mean that, IIUC, in some other places where you use serialStates, true means that (de)serialization is known to be needed. Here, however, it only means it might be needed, contingent on whether the serial/deserial functions are actually present. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate costs don't consider combine/serial/deserial funcs
On 13 April 2016 at 08:52, Robert Haas wrote: > On Sun, Apr 10, 2016 at 8:47 AM, David Rowley > wrote: >> I realised a few days ago that the parallel aggregate code does not >> cost for the combine, serialisation and deserialisation functions at >> all. > > Oops. > >> I've attached a patch which fixes this. > > I've committed this patch. I wonder if it's going to produce compiler > warnings for some people, complaining about possible use of an > uninitialized variable. That would kind of suck. I don't much mind > having to insert a dummy assignment to shut the compiler up; a smarter > compiler will just throw it out anyway. I'm less enthused about a > dummy MemSet. The compiler is less likely to be able to get rid of > that, and it's more expensive if it doesn't. But let's see what > happens. Thanks for committing. I wondered that too, so checked a couple of compilers and got no warnings, but the buildfarm should let us know. The other option would be to palloc() them, and have them set to NULL initially... that's not very nice either... Another option would be to protect the final parallel path generation with if (grouped_rel->partial_pathlist && grouped_rel->consider_parallel). I'd imagine any compiler smart enough to work out that uninitialised is not possible would also be able to remove the check for grouped_rel->consider_parallel, but *shrug*, I don't often look at the assembly that compilers generate, so I might be giving them too much credit. >> One small point which I was a little unsure of in the attached is, >> should the "if (aggref->aggdirectargs)" part of >> count_agg_clauses_walker() be within the "if >> (!context->combineStates)". I simply couldn't decide. We currently >> have no aggregates which this affects anyway, per; select * from >> pg_aggregate where aggcombinefn <> 0 and aggkind <> 'n'; so for now >> I've left it outwith. > > The direct arguments would be evaluated in the worker, but not in the > leader, right? Or am I confused? That seems right, but I just can't think of how its possible to parallelise these aggregates anyway. >> Another thing I thought of is that it's not too nice that I have to >> pass 3 bools to count_agg_clauses() in order to tell it what to do. I >> was tempted to invent some bitmask flags for this, then modify >> create_agg_path() to use the same flags, but I thought I'd better not >> cause too much churn with this patch. > > I'm kinda tempted to say this should be using an enum. I note that > serialStates has a subtly different meaning here than in some other > places where you have used the same term. hmm, I'm not sure how it's subtly different. Do you mean the preference towards costing the finalfn when finalizeAggs is true, and ignoring the serialfn in this case? nodeAgg.c should do the same, although it'll deserialize in such a case. We can never finalize and serialize in the same node. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate costs don't consider combine/serial/deserial funcs
On Sun, Apr 10, 2016 at 8:47 AM, David Rowley wrote: > I realised a few days ago that the parallel aggregate code does not > cost for the combine, serialisation and deserialisation functions at > all. Oops. > I've attached a patch which fixes this. I've committed this patch. I wonder if it's going to produce compiler warnings for some people, complaining about possible use of an uninitialized variable. That would kind of suck. I don't much mind having to insert a dummy assignment to shut the compiler up; a smarter compiler will just throw it out anyway. I'm less enthused about a dummy MemSet. The compiler is less likely to be able to get rid of that, and it's more expensive if it doesn't. But let's see what happens. > One small point which I was a little unsure of in the attached is, > should the "if (aggref->aggdirectargs)" part of > count_agg_clauses_walker() be within the "if > (!context->combineStates)". I simply couldn't decide. We currently > have no aggregates which this affects anyway, per; select * from > pg_aggregate where aggcombinefn <> 0 and aggkind <> 'n'; so for now > I've left it outwith. The direct arguments would be evaluated in the worker, but not in the leader, right? Or am I confused? > Another thing I thought of is that it's not too nice that I have to > pass 3 bools to count_agg_clauses() in order to tell it what to do. I > was tempted to invent some bitmask flags for this, then modify > create_agg_path() to use the same flags, but I thought I'd better not > cause too much churn with this patch. I'm kinda tempted to say this should be using an enum. I note that serialStates has a subtly different meaning here than in some other places where you have used the same term. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
Good news! On Tuesday, 22 March 2016, David Rowley wrote: > On 22 March 2016 at 02:35, Robert Haas > wrote: > > I have committed this after changing some of the comments. > > > > There might still be bugs ... but I don't see them. And the speedups > > look very impressive. > > > > Really nice work, David. > > Thanks for that, and thank you for taking the time to carefully review > it and commit it. > > -- > David Rowley http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org > ) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- James Sewell, PostgreSQL Team Lead / Solutions Architect __ Level 2, 50 Queen St, Melbourne VIC 3000 *P *(+61) 3 8370 8000 *W* www.lisasoft.com *F *(+61) 3 8370 8099 -- -- The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
Re: [HACKERS] Parallel Aggregate
On 22 March 2016 at 02:35, Robert Haas wrote: > I have committed this after changing some of the comments. > > There might still be bugs ... but I don't see them. And the speedups > look very impressive. > > Really nice work, David. Thanks for that, and thank you for taking the time to carefully review it and commit it. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Sun, Mar 20, 2016 at 7:30 PM, David Rowley wrote: > An updated patch is attached. This hopefully addresses your concerns > with the comment, and also the estimate_hashagg_tablesize() NULL > checking. I have committed this after changing some of the comments. There might still be bugs ... but I don't see them. And the speedups look very impressive. Really nice work, David. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Sun, Mar 20, 2016 at 11:24 PM, Alvaro Herrera wrote: > David Rowley wrote: >> On 21 March 2016 at 15:48, Alvaro Herrera wrote: >> > David Rowley wrote: >> > >> >> I've rewritten the comment to become: >> >> >> >> /* >> >> * Providing that the estimated size of the hashtable does not exceed >> >> * work_mem, we'll generate a HashAgg Path, although if we were unable >> >> * to sort above, then we'd better generate a Path, so that we at least >> >> * have one. >> >> */ >> >> >> >> How about that? >> > >> > I think "Providing" should be "Provided". >> >> Both make sense, although I do only see instances of "Provided that" >> in the source. > > Interesting. "Providing that" seems awkward to me, and I had only seen > the other wording thus far, but > http://english.stackexchange.com/questions/149459/what-is-the-difference-between-providing-that-and-provided-that > explains that I'm wrong. Well, my instincts are the same as yours, actually... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On 21 March 2016 at 15:48, Alvaro Herrera wrote: > David Rowley wrote: > >> I've rewritten the comment to become: >> >> /* >> * Providing that the estimated size of the hashtable does not exceed >> * work_mem, we'll generate a HashAgg Path, although if we were unable >> * to sort above, then we'd better generate a Path, so that we at least >> * have one. >> */ >> >> How about that? > > I think "Providing" should be "Provided". Both make sense, although I do only see instances of "Provided that" in the source. I'm not opposed to changing it, it just does not seem worth emailing a complete patch to do that, but let me know if you feel differently. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
David Rowley wrote: > I've rewritten the comment to become: > > /* > * Providing that the estimated size of the hashtable does not exceed > * work_mem, we'll generate a HashAgg Path, although if we were unable > * to sort above, then we'd better generate a Path, so that we at least > * have one. > */ > > How about that? I think "Providing" should be "Provided". -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
Hi, On 03/21/2016 12:30 AM, David Rowley wrote: On 21 March 2016 at 09:47, Tomas Vondra wrote: ... I'm not sure changing the meaning of enable_hashagg like this is a good idea. It worked as a hard switch before, while with this change that would not be the case. Or more accurately - it would not be the case for aggregates, but it would still work the old way for other types of plans. Not sure that's a particularly good idea. Hmm, I don't see how it was a hard switch before. If we were unable to sort by the group by clause then hashagg would magically be enabled. Sure, but that's not what I meant by the "hard switch" (sorry for the inaccuracy). Let's assume we can actually so the sorted aggregate. In that case the enable_hashagg=off entirely skipped the hashagg path, irrespectively of the cost or whatever. With the new code we will add the path, and it may actually win on the basis of cost (e.g. if there's enable_sort=off at the same time). But I'm not convinced this is actually wrong, I merely pointed out the behavior is not exactly the same and may have unintended consequences. The reason I did this was to simplify the logic in create_grouping_paths(). What difference do you imagine that there actually is here? That the hashed path may win over the sorted path, as explained above. The only thing I can think of is; we now generate a hashagg path where we previously didn't. This has disable_cost added to the startup_cost so is quite unlikely to win. Perhaps there is some differences if someone did SET enable_sort = false; SET enable_hashagg = false; I'm not sure if we should be worried there though. Also maybe there's going to be a difference if the plan costings were so high that disable_cost was drowned out by the other costings. Ah, I see you came to the same conclusion ... Apart from that, It would actually be nice to be consistent with this enable_* GUCs, as to my knowledge the others all just add disable_cost to the startup_cost of the path. Perhaps. What about introducing a GUC to enable parallel aggregate, while still allowing other types of parallel nodes (I guess that would be handy for other types of parallel nodes - it's a bit blunt tool, but tweaking max_parallel_degree is even blumter)? e.g. enable_parallelagg? Haribabu this in his version of the patch and I didn't really understand the need for it, I assumed it was for testing only. We don't have enable_parallelseqscan, and would we plan on adding GUCs each time we enable a node for parallelism? I really don't think so, we already have parallel hash join and nested loop join without GUCs to disable them. I see no reason to add them there, and I also don't here. I'm not so sure about that. I certainly think we'll be debugging queries in a not so distant future, wishing for such GUCs. I do also have a question about parallel aggregate vs. work_mem. Nowadays we mostly say to users a query may allocate a multiple of work_mem, up to one per node in the plan. Apparently with parallel aggregate we'll have to say "multiplied by number of workers", because each aggregate worker may allocate up to hashaggtablesize. Is that reasonable? Shouldn't we restrict the total size of hash tables in all workers somehow? I did think about this, but thought, either; 1) that a project wide decision should be made on how to handle this, not just for parallel aggregate, but parallel hash join too, which as I understand it, for now it builds an identical hashtable per worker. > 2) the limit is per node, per connection, and parallel aggregates have multiple connections, so we might not be breaking our definition of how to define work_mem, since we're still limited by max_connections anyway. I do agree this is not just a question for parallel aggregate, and that perhaps we're not breaking the definition. But I think in practice we'll be hitting the memory limits much more often, because parallel queries are very synchronized (running the same node on all parallel workers at exactly the same time). Not sure if we have to reflect that in the planner, but it will probably impact what work_mem values are considered safe. create_grouping_paths also contains this comment: /* * Generate HashAgg Path providing the estimated hash table size is not * too big, although if no other Paths were generated above, then we'll * begrudgingly generate one so that we actually have a Path to work * with. */ I'm not sure this is a particularly clear comment, I think the old one was way much informative despite being a single line: /* Consider reasons to disable hashing, but only if we can sort instead */ hmm, I find it quite clear, but perhaps that's because I wrote the code. I'm not really sure what you're not finding clear about it to be honest. Tom's original comment was quite generic to allow for more reasons, but I removed one of those reasons by simplifying the logic around enable_hashagg, so I didn't think Tom
Re: [HACKERS] Parallel Aggregate
On 21 March 2016 at 09:47, Tomas Vondra wrote: > On 03/20/2016 09:58 AM, David Rowley wrote: >> Thank you for the reviews. The only thing I can think to mention which >> I've not already is that I designed estimate_hashagg_tablesize() to be >> reusable in various places in planner.c, yet I've only made use of it >> in create_grouping_paths(). I would imagine that it might be nice to >> also modify the other places which perform a similar calculation to >> use that function, but I don't think that needs to be done for this >> patch... perhaps a follow-on cleanup. > > Hmmm, so how many places that could use the new function are there? I've > only found these two: > >create_distinct_paths (planner.c) >choose_hashed_setop (prepunion.c) > > That doesn't seem like an extremely high number, so perhaps doing it in this > patch would be fine. However if the function is defined as static, > choose_hashed_setop won't be able to use it anyway (well, it'll have to move > the prototype into a header and remove the static). > > I wonder why we would need to allow cost_agg=NULL, though? I mean, if there > are no costing information, wouldn't it be better to either raise an error, > or at the very least do something like: > > } else > hashentrysize += hash_agg_entry_size(0); > > which is what create_distinct_paths does? Yes, it should do that... My mistake. I've ended up just removing the NULL check as I don't want to touch create_distinct_paths() in this patch. I'd rather leave that as a small cleanup patch for later, although the code in create_distinct_paths() is more simple without the aggregate sizes being calculated, so there's perhaps less of a need to use the helper function there. If that cleanup patch materialises then the else hashentrysize += hash_agg_entry_size(0); can be added with that patch. > I'm not sure changing the meaning of enable_hashagg like this is a good > idea. It worked as a hard switch before, while with this change that would > not be the case. Or more accurately - it would not be the case for > aggregates, but it would still work the old way for other types of plans. > Not sure that's a particularly good idea. Hmm, I don't see how it was a hard switch before. If we were unable to sort by the group by clause then hashagg would magically be enabled. The reason I did this was to simplify the logic in create_grouping_paths(). What difference do you imagine that there actually is here? The only thing I can think of is; we now generate a hashagg path where we previously didn't. This has disable_cost added to the startup_cost so is quite unlikely to win. Perhaps there is some differences if someone did SET enable_sort = false; SET enable_hashagg = false; I'm not sure if we should be worried there though. Also maybe there's going to be a difference if the plan costings were so high that disable_cost was drowned out by the other costings. Apart from that, It would actually be nice to be consistent with this enable_* GUCs, as to my knowledge the others all just add disable_cost to the startup_cost of the path. > What about introducing a GUC to enable parallel aggregate, while still > allowing other types of parallel nodes (I guess that would be handy for > other types of parallel nodes - it's a bit blunt tool, but tweaking > max_parallel_degree is even blumter)? e.g. enable_parallelagg? Haribabu this in his version of the patch and I didn't really understand the need for it, I assumed it was for testing only. We don't have enable_parallelseqscan, and would we plan on adding GUCs each time we enable a node for parallelism? I really don't think so, we already have parallel hash join and nested loop join without GUCs to disable them. I see no reason to add them there, and I also don't here. > I do also have a question about parallel aggregate vs. work_mem. Nowadays we > mostly say to users a query may allocate a multiple of work_mem, up to one > per node in the plan. Apparently with parallel aggregate we'll have to say > "multiplied by number of workers", because each aggregate worker may > allocate up to hashaggtablesize. Is that reasonable? Shouldn't we restrict > the total size of hash tables in all workers somehow? I did think about this, but thought, either; 1) that a project wide decision should be made on how to handle this, not just for parallel aggregate, but parallel hash join too, which as I understand it, for now it builds an identical hashtable per worker. 2) the limit is per node, per connection, and parallel aggregates have multiple connections, so we might not be breaking our definition of how to define work_mem, since we're still limited by max_connections anyway. > create_grouping_paths also contains this comment: > > /* > * Generate HashAgg Path providing the estimated hash table size is not > * too big, although if no other Paths were generated above, then we'll > * begrudgingly generate one so that we actually have a Path to work > * with. >
Re: [HACKERS] Parallel Aggregate
Hi, On 03/20/2016 09:58 AM, David Rowley wrote: On 20 March 2016 at 03:19, Robert Haas wrote: On Fri, Mar 18, 2016 at 10:32 PM, David Rowley wrote: Updated patch is attached. I think this looks structurally correct now, and I think it's doing the right thing as far as parallelism is concerned. I don't see any obvious problems in the rest of it, either, but I haven't thought hugely deeply about the way you are doing the costing, nor have I totally convinced myself that all of the PathTarget and setrefs stuff is correct. But I think it's probably pretty close. I'll study it some more next week. Thank you for the reviews. The only thing I can think to mention which I've not already is that I designed estimate_hashagg_tablesize() to be reusable in various places in planner.c, yet I've only made use of it in create_grouping_paths(). I would imagine that it might be nice to also modify the other places which perform a similar calculation to use that function, but I don't think that needs to be done for this patch... perhaps a follow-on cleanup. Hmmm, so how many places that could use the new function are there? I've only found these two: create_distinct_paths (planner.c) choose_hashed_setop (prepunion.c) That doesn't seem like an extremely high number, so perhaps doing it in this patch would be fine. However if the function is defined as static, choose_hashed_setop won't be able to use it anyway (well, it'll have to move the prototype into a header and remove the static). I wonder why we would need to allow cost_agg=NULL, though? I mean, if there are no costing information, wouldn't it be better to either raise an error, or at the very least do something like: } else hashentrysize += hash_agg_entry_size(0); which is what create_distinct_paths does? I'm not sure changing the meaning of enable_hashagg like this is a good idea. It worked as a hard switch before, while with this change that would not be the case. Or more accurately - it would not be the case for aggregates, but it would still work the old way for other types of plans. Not sure that's a particularly good idea. What about introducing a GUC to enable parallel aggregate, while still allowing other types of parallel nodes (I guess that would be handy for other types of parallel nodes - it's a bit blunt tool, but tweaking max_parallel_degree is even blumter)? e.g. enable_parallelagg? I do also have a question about parallel aggregate vs. work_mem. Nowadays we mostly say to users a query may allocate a multiple of work_mem, up to one per node in the plan. Apparently with parallel aggregate we'll have to say "multiplied by number of workers", because each aggregate worker may allocate up to hashaggtablesize. Is that reasonable? Shouldn't we restrict the total size of hash tables in all workers somehow? create_grouping_paths also contains this comment: /* * Generate HashAgg Path providing the estimated hash table size is not * too big, although if no other Paths were generated above, then we'll * begrudgingly generate one so that we actually have a Path to work * with. */ I'm not sure this is a particularly clear comment, I think the old one was way much informative despite being a single line: /* Consider reasons to disable hashing, but only if we can sort instead */ BTW create_grouping_paths probably grew to a size when splitting it into smaller pieces would be helpful? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Fri, Mar 18, 2016 at 9:16 AM, David Rowley wrote: > I've attached an updated patch. This looks substantially better than earlier versions, although I haven't studied every detail of it yet. + * Partial aggregation requires that each aggregate does not have a DISTINCT or + * ORDER BY clause, and that it also has a combine function set. Since partial I understand why partial aggregation doesn't work if you have an ORDER BY clause attached to the aggregate itself, but it's not so obvious to me that using DISTINCT should rule it out. I guess we can do it that way for now, but it seems aggregate-specific - e.g. AVG() can't cope with DISTINCT, but MIN() or MAX() wouldn't care. Maybe MIN() and MAX() are the outliers in this regard, but they are a pretty common case. + * An Aggref can operate in one of two modes. Normally an aggregate function's + * value is calculated with a single executor Agg node, however there are + * times, such as parallel aggregation when we want to calculate the aggregate I think you should adjust the punctuation to "with a single executor Agg node; however, there are". And maybe drop the word "executor". And on the next line, I'd add a comma: "such as parallel aggregation, when we want". astate->xprstate.evalfunc = (ExprStateEvalFunc) ExecEvalAggref; - if (parent && IsA(parent, AggState)) + if (!aggstate || !IsA(aggstate, AggState)) { - AggState *aggstate = (AggState *) parent; - - aggstate->aggs = lcons(astate, aggstate->aggs); - aggstate->numaggs++; + /* planner messed up */ + elog(ERROR, "Aggref found in non-Agg plan node"); } - else + if (aggref->aggpartial == aggstate->finalizeAggs) { /* planner messed up */ - elog(ERROR, "Aggref found in non-Agg plan node"); + if (aggref->aggpartial) + elog(ERROR, "Partial type Aggref found in FinalizeAgg plan node"); + else + elog(ERROR, "Non-Partial type Aggref found in Non-FinalizeAgg plan node"); } + aggstate->aggs = lcons(astate, aggstate->aggs); + aggstate->numaggs++; This seems like it involves more code rearrangement than is really necessary here. +* Partial paths in the input rel could allow us to perform aggregation in +* parallel, set_grouped_rel_consider_parallel() will determine if it's +* going to be safe to do so. Change comma to semicolon or period. /* * Generate a HashAgg Path atop of the cheapest partial path, once * again, we'll only do this if it looks as though the hash table won't * exceed work_mem. */ Same here. Commas are not the way to connect two independent sentences. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On 20 March 2016 at 03:19, Robert Haas wrote: > On Fri, Mar 18, 2016 at 10:32 PM, David Rowley > wrote: >> Updated patch is attached. > > I think this looks structurally correct now, and I think it's doing > the right thing as far as parallelism is concerned. I don't see any > obvious problems in the rest of it, either, but I haven't thought > hugely deeply about the way you are doing the costing, nor have I > totally convinced myself that all of the PathTarget and setrefs stuff > is correct. But I think it's probably pretty close. I'll study it > some more next week. Thank you for the reviews. The only thing I can think to mention which I've not already is that I designed estimate_hashagg_tablesize() to be reusable in various places in planner.c, yet I've only made use of it in create_grouping_paths(). I would imagine that it might be nice to also modify the other places which perform a similar calculation to use that function, but I don't think that needs to be done for this patch... perhaps a follow-on cleanup. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On 17 March 2016 at 01:19, Amit Kapila wrote: > Few assorted comments: > > 1. > /* > + * Determine if it's possible to perform aggregation in parallel using > + * multiple worker processes. We can permit this when there's at least one > + * partial_path in input_rel, but not if the query has grouping sets, > + * (although this likely just requires a bit more thought). We must also > + * ensure that any aggregate functions which are present in either the > + * target list, or in the HAVING clause all support parallel mode. > + */ > + can_parallel = false; > + > + if ((parse->hasAggs || parse->groupClause != NIL) && > + input_rel->partial_pathlist != NIL && > + parse->groupingSets == NIL && > + root->glob->parallelModeOK) > > I think here you need to use has_parallel_hazard() with second parameter as > false to ensure expressions are parallel safe. glob->parallelModeOK flag > indicates that there is no parallel unsafe expression, but it can still > contain parallel restricted expression. Yes, I'd not gotten to fixing that per Robert's original comment about it, but I think I have now. > 2. > AggPath * > create_agg_path(PlannerInfo *root, > @@ -2397,9 +2399,11 @@ create_agg_path(PlannerInfo *root, > > List *groupClause, > List *qual, > const AggClauseCosts > *aggcosts, > - double numGroups) > + double numGroups, > + > bool combineStates, > + bool finalizeAggs) > > Don't you need to set parallel_aware flag in this function as we do for > create_seqscan_path()? I don't really know the answer to that... I mean there's nothing special done in nodeAgg.c if the node is running in a worker or in the main process. So I guess the only difference is that EXPLAIN will read "Parallel Partial (Hash|Group)Aggregate" instead of "Partial (Hash|Group)Aggregate", is that desired? What's the logic behind having "Parallel" in EXPLAIN? > 3. > postgres=# explain select count(*) from t1; > QUERY PLAN > > > -- > Finalize Aggregate (cost=45420.57..45420.58 rows=1 width=8) >-> Gather (cost=45420.35..45420.56 rows=2 width=8) > Number of Workers: 2 > -> Partial Aggregate (cost=44420.35..44420.36 rows=1 width=8) >-> Parallel Seq Scan on t1 (cost=0.00..44107.88 rows=124988 > wid > th=0) > (5 rows) > > Isn't it better to call it as Parallel Aggregate instead of Partial > Aggregate. Initialy, we have kept Partial for seqscan, but later on we > changed to Parallel Seq Scan, so I am not able to think why it is better to > call Partial incase of Aggregates. I already commented on this. > 4. > /* > + * Likewise for any partial paths, although this case is more simple as > + > * we don't track the cheapest path. > + */ > + foreach(lc, current_rel->partial_pathlist) > + > { > + Path *subpath = (Path *) lfirst(lc); > + > + Assert(subpath->param_info == > NULL); > + lfirst(lc) = apply_projection_to_path(root, current_rel, > + > subpath, scanjoin_target); > + } > + > > > Can't we do this by teaching apply_projection_to_path() as done in the > latest patch posted by me to push down the target list beneath workers [1]. Probably, but I'm not sure I want to go changing that now. The patch is useful without it, so perhaps it can be a follow-on fix. > 5. > + /* > + * If we find any aggs with an internal transtype then we must ensure > + * that > pointers to aggregate states are not passed to other processes, > + * therefore we set the maximum degree > to PAT_INTERNAL_ONLY. > + */ > + if (aggform->aggtranstype == INTERNALOID) > + > context->allowedtype = PAT_INTERNAL_ONLY; > > In the above comment, you have refered maximum degree which is not making > much sense to me. If it is not a typo, then can you clarify the same. hmm. I don't quite understand the confusion. Perhaps you think of "degree" in the parallel sense? This is talking about the levels of degree of partial aggregation, which is nothing to do with parallel aggregation, parallel aggregation just requires that to work. The different. "Degree" in this sense was just meaning that PAY_ANY is the highest degree, PAT_INTERNAL_ONLY lesser so etc. I thought the "degree" thing was explained ok in the comment for aggregates_allow_partial(), but perhaps I should just remove it, if it's confusing. Changed to: /* * If we find any aggs with an internal transtype then we must ensure * that pointers to aggregate states are not passed to other processes, * therefore we set the maximum allowed type to PAT_INTERNAL_ONLY. */ > 6. > + * fix_combine_agg_expr > + * Like fix_upper_expr() but additionally adjusts the Aggref->args of > + * Aggrefs so > that they references the corresponding Aggref in the subplan. > + */ > +static Node * > +fix_combine_agg_expr(PlannerInfo > *root, > + Node *node, > + indexed_tlist *subplan_itlist, > + > Index newvarno, > + int rtoffset) > +{ > + fix_upper_expr_context context; > + > + con
Re: [HACKERS] Parallel Aggregate
On 19 March 2016 at 06:15, Robert Haas wrote: > On Fri, Mar 18, 2016 at 9:16 AM, David Rowley > wrote: >> I've attached an updated patch. > > This looks substantially better than earlier versions, although I > haven't studied every detail of it yet. > > + * Partial aggregation requires that each aggregate does not have a DISTINCT > or > + * ORDER BY clause, and that it also has a combine function set. Since > partial > > I understand why partial aggregation doesn't work if you have an ORDER > BY clause attached to the aggregate itself, but it's not so obvious to > me that using DISTINCT should rule it out. I guess we can do it that > way for now, but it seems aggregate-specific - e.g. AVG() can't cope > with DISTINCT, but MIN() or MAX() wouldn't care. Maybe MIN() and > MAX() are the outliers in this regard, but they are a pretty common > case. hmm? We'd have no way to ensure that two worker processes aggregated only values which other workers didn't. Of course this just happens to be equivalent for MIN() and MAX(), but today we don't attempt to transform MIN(DISTINCT col) to MIN(col), so I see no reason at all why this patch should go and add something along those lines. Perhaps it's something for the future though, although it's certainly not anything specific to parallel aggregate. > + * An Aggref can operate in one of two modes. Normally an aggregate > function's > + * value is calculated with a single executor Agg node, however there are > + * times, such as parallel aggregation when we want to calculate the > aggregate > > I think you should adjust the punctuation to "with a single executor > Agg node; however, there are". And maybe drop the word "executor". > > And on the next line, I'd add a comma: "such as parallel aggregation, > when we want". Fixed. Although I've revised that block a bit after getting rid of aggpartial. > > astate->xprstate.evalfunc = > (ExprStateEvalFunc) ExecEvalAggref; > - if (parent && IsA(parent, AggState)) > + if (!aggstate || !IsA(aggstate, AggState)) > { > - AggState *aggstate = > (AggState *) parent; > - > - aggstate->aggs = lcons(astate, > aggstate->aggs); > - aggstate->numaggs++; > + /* planner messed up */ > + elog(ERROR, "Aggref found in > non-Agg plan node"); > } > - else > + if (aggref->aggpartial == > aggstate->finalizeAggs) > { > /* planner messed up */ > - elog(ERROR, "Aggref found in > non-Agg plan node"); > + if (aggref->aggpartial) > + elog(ERROR, "Partial > type Aggref found in FinalizeAgg plan node"); > + else > + elog(ERROR, > "Non-Partial type Aggref found in Non-FinalizeAgg plan node"); > } > + aggstate->aggs = lcons(astate, > aggstate->aggs); > + aggstate->numaggs++; > > This seems like it involves more code rearrangement than is really > necessary here. This is mostly gone, as after removing aggpartial some of these checks are not possible. I just have some additional code: Aggref *aggref = (Aggref *) node; if (aggstate->finalizeAggs && aggref->aggoutputtype != aggref->aggtype) { /* planner messed up */ elog(ERROR, "Aggref aggoutputtype must match aggtype"); } But nothing to sanity check non-finalize nodes. > > +* Partial paths in the input rel could allow us to perform > aggregation in > +* parallel, set_grouped_rel_consider_parallel() will determine if > it's > +* going to be safe to do so. > > Change comma to semicolon or period. Changed. > /* > * Generate a HashAgg Path atop of the cheapest partial path, once > * again, we'll only do this if it looks as though the hash table > won't > * exceed work_mem. > */ > > Same here. Commas are not the way to connect two independent sentences. Changed. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On 18 March 2016 at 20:25, Amit Kapila wrote: > Few more comments: > > 1. > > + if (parse->groupClause) > + path = (Path *) create_sort_path(root, > + grouped_rel, > + path, > + root->group_pathkeys, > + -1.0); > > For final path, why do you want to sort just for group by case? If there's no GROUP BY then there will only be a single group, this does not require sorting, e.g SELECT SUM(col) from sometable; I added the comment: /* * Gather is always unsorted, so we'll need to sort, unless there's * no GROUP BY clause, in which case there will only be a single * group. */ > 2. > + path = (Path *) create_gather_path(root, partial_grouped_rel, path, > + NULL, &total_groups); > + > + if (parse->groupClause) > + path = (Path *) create_sort_path(root, > + grouped_rel, > + path, > + root->group_pathkeys, > + -1.0); > + > + if (parse->hasAggs) > + add_path(grouped_rel, (Path *) > + create_agg_path(root, > + grouped_rel, > + path, > + target, > + parse->groupClause ? AGG_SORTED : AGG_PLAIN, > + parse->groupClause, > + (List *) parse->havingQual, > + &agg_costs, > + partial_grouped_rel->rows, > + true, > + true)); > + else > + add_path(grouped_rel, (Path *) > + create_group_path(root, > + grouped_rel, > + path, > + target, > + parse->groupClause, > + (List *) parse->havingQual, > + total_groups)); > > In above part of patch, it seems you are using number of groups > differenetly; for create_group_path() and create_gather_path(), you have > used total_groups whereas for create_agg_path() partial_grouped_rel->rows is > used, is there a reason for the same? That's a mistake... too much code shuffling yesterday it seems. > 3. > + if (grouped_rel->partial_pathlist) > + { > + Path *path = (Path *) > linitial(grouped_rel->partial_pathlist); > + double total_groups; > + > + total_groups = > path->rows * path->parallel_degree; > + path = (Path *) create_gather_path(root, partial_grouped_rel, > path, > + NULL, &total_groups); > > A. Won't passing partial_grouped_rel lead to incomplete information required > by create_gather_path() w.r.t the case of parameterized path info? There should be no parameterized path info after joins are over, but never-the-less I took your advice about passing PathTarget to create_gather_path(), so this partial_grouped_rel no longer exists. > B. You have mentioned that passing grouped_rel will make gather path contain > the information of final path target, but what is the problem with that? I > mean to ask why Gather node is required to contain partial path target > information instead of final path target. Imagine a query such as: SELECT col,SUM(this) FROM sometable GROUP BY col HAVING SUM(somecolumn) > 0; In this case SUM(somecolumn) won't be in the final PathTarget. The partial grouping target will contain the Aggref from the HAVING clause. The other difference with te partial aggregate PathTarget is that the Aggrefs return the partial state in exprType() rather than the final value's type, which is required so the executor knows how to form and deform tuples, plus many other things. > C. Can we consider passing pathtarget to create_gather_path() as that seems > to save us from inventing new UpperRelationKind? If you are worried about > adding the new parameter (pathtarget) to create_gather_path(), then I think > we are already passing it in many other path generation functions, so why > not for gather path generation as well? That's a better idea... Changed to that... > 4A. > Overall function create_grouping_paths() looks better than previous, but I > think still it is difficult to read. I think it can be improved by > generating partial aggregate paths separately as we do for nestloop join > refer function consider_parallel_nestloop hmm, perhaps the partial path generation could be moved off to another static function, although we'd need to pass quite a few parameters to it, like can_sort, can_hash, partial_grouping_target, grouped_rel, root. Perhaps it's worth doing, but we still need the partial_grouping_target for the Gather node, so it's not like that other function can do all of the parallel stuff... We'd still need some knowledge of that in create_grouping_paths() > 4B. > Rather than directly using create_gather_path(), can't we use > generate_gather_paths as for all places where we generate gather node, > generate_gather_paths() is used. I don't think this is a good fit here, although it would be nice as it would save having to special case generating the final aggregate paths on the top of the partial paths. It does not seem that nice as it's not really that clear if we need to make a combine aggregate node, or a normal aggregate node on the path. The only way to determine that would by by checking if it was a GatherPath or not, and that does not seem like a nice way to go about doing that. Someone might go and invent something new like MergeGather one day. > 5. > +make_partialgroup_input_target(PlannerInfo *root, PathTarget *final_target) > { > .. > ..
Re: [HACKERS] Parallel Aggregate
On 17 March 2016 at 18:05, David Rowley wrote: > Updated patch attached. Please disregard 0001-Allow-aggregation-to-happen-in-parallel_2016-03-17.patch. This contained a badly thought through last minute change to how the Gather path is generated and is broken. I played around with ways of generating the Gather node as create_gather_path() is not really geared up for what's needed here, since grouped_rel cannot be passed into create_gather_path() since it contains the final aggregate PathTarget rather than the partial PathTarget, and also incorrect row estimates. I've ended up with an extra double *rows argument in this function to make it possible to override the rows from rel. I'm not sure how this'll go down... It does not seem perfect. In the end I've now added a new upper planner type to allow me to create a RelOptInfo for the partial aggregate relation, so that I can pass create_gather_path() a relation with the correct PathTarget. This seemed better than borrowing grouped_rel, then overriding the reltarget after create_gather_path() returned. Although I'm willing to consider the option that someone will disagree with how I've done things here. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services 0001-Allow-aggregation-to-happen-in-parallel_2016-03-18.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Wed, Mar 16, 2016 at 6:49 AM, David Rowley wrote: > On 16 March 2016 at 15:04, Robert Haas wrote: >> I don't think I'd be objecting if you made PartialAggref a real >> alternative to Aggref. But that's not what you've got here. A >> PartialAggref is just a wrapper around an underlying Aggref that >> changes the interpretation of it - and I think that's not a good idea. >> If you want to have Aggref and PartialAggref as truly parallel node >> types, that seems cool, and possibly better than what you've got here >> now. Alternatively, Aggref can do everything. But I don't think we >> should go with this wrapper concept. > > Ok, I've now gotten rid of the PartialAggref node, and I'm actually > quite happy with how it turned out. I made > search_indexed_tlist_for_partial_aggref() to follow-on the series of > other search_indexed_tlist_for_* functions and have made it behave the > same way, by returning the newly created Var instead of doing that in > fix_combine_agg_expr_mutator(), as the last version did. > > Thanks for the suggestion. > > New patch attached. Cool! Why not initialize aggpartialtype always? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Thu, Mar 17, 2016 at 2:13 PM, James Sewell wrote: > > Hi again, > > This is probably me missing something, but is there a reason parallel > aggregate doesn't seem to ever create append nodes containing Index scans? > > SET random_page_cost TO 0.2; > SET max_parallel_degree TO 8; > > postgres=# explain SELECT sum(count_i) FROM base GROUP BY view_time_day; >QUERY PLAN > - > Finalize GroupAggregate (cost=310596.32..310598.03 rows=31 width=16) >Group Key: view_time_day >-> Sort (cost=310596.32..310596.79 rows=186 width=16) > Sort Key: view_time_day > -> Gather (cost=310589.00..310589.31 rows=186 width=16) >Number of Workers: 5 >-> Partial HashAggregate (cost=310589.00..310589.31 rows=31 > width=16) > Group Key: view_time_day > -> Parallel Seq Scan on base (cost=0.00..280589.00 > rows=600 width=12) > > > SET max_parallel_degree TO 0; > > postgres=# explain SELECT sum(count_i) FROM base GROUP BY view_time_day; > QUERY PLAN > --- > GroupAggregate (cost=0.56..600085.92 rows=31 width=16) >Group Key: view_time_day >-> Index Only Scan using base_view_time_day_count_i_idx on base > (cost=0.56..450085.61 rows=3000 width=12) > (3 rows) To get good parallelism benefit, the workers has to execute most of the plan in parallel. If we run only some part of the upper plan in parallel, we may not get better parallelism benefit. At present only seq scan node possible for parallelism at scan node level. Index scan is not possible as of now. So because of this reason based on the overall cost of the parallel aggregate + parallel seq scan, the plan is chosen. If index scan is changed to make it parallel in future, it is possible that parallel aggregate + parallel index scan plan may chosen. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On 17 March 2016 at 00:57, Robert Haas wrote: > On Wed, Mar 16, 2016 at 6:49 AM, David Rowley > wrote: >> On 16 March 2016 at 15:04, Robert Haas wrote: >>> I don't think I'd be objecting if you made PartialAggref a real >>> alternative to Aggref. But that's not what you've got here. A >>> PartialAggref is just a wrapper around an underlying Aggref that >>> changes the interpretation of it - and I think that's not a good idea. >>> If you want to have Aggref and PartialAggref as truly parallel node >>> types, that seems cool, and possibly better than what you've got here >>> now. Alternatively, Aggref can do everything. But I don't think we >>> should go with this wrapper concept. >> >> Ok, I've now gotten rid of the PartialAggref node, and I'm actually >> quite happy with how it turned out. I made >> search_indexed_tlist_for_partial_aggref() to follow-on the series of >> other search_indexed_tlist_for_* functions and have made it behave the >> same way, by returning the newly created Var instead of doing that in >> fix_combine_agg_expr_mutator(), as the last version did. >> >> Thanks for the suggestion. >> >> New patch attached. > > Cool! Why not initialize aggpartialtype always? Because the follow-on patch sets that to either the serialtype or the aggtranstype, depending on if serialisation is required. Serialisation is required for parallel aggregate, but if we're performing the partial agg in the main process, then we'd not need to do that. This could be solved by adding more fields to AggRef to cover the aggserialtype and perhaps expanding aggpartial into an enum mode which allows NORMAL, PARTIAL, PARTIAL_SERIALIZE, and have exprType() pay attention to the mode and return 1 of the 3 possible types. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Wed, Mar 16, 2016 at 5:05 PM, David Rowley wrote: >> Cool! Why not initialize aggpartialtype always? > > Because the follow-on patch sets that to either the serialtype or the > aggtranstype, depending on if serialisation is required. Serialisation > is required for parallel aggregate, but if we're performing the > partial agg in the main process, then we'd not need to do that. This > could be solved by adding more fields to AggRef to cover the > aggserialtype and perhaps expanding aggpartial into an enum mode which > allows NORMAL, PARTIAL, PARTIAL_SERIALIZE, and have exprType() pay > attention to the mode and return 1 of the 3 possible types. Urk. That might still be better than what you have right now, but it's obviously not great. How about ditching aggpartialtype and adding aggoutputtype instead? Then you can always initialize that to whatever it's supposed to be based on the type of aggregation you are doing, and exprType() can simply return that field. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
I read this a bit, as an exercise to try to follow parallel query a bit. I think the most interesting thing I have to say is that the new error messages in ExecInitExpr do not conform to our style. Probably just downcase everything except Aggref and you're done, since they're can't-happen conditions anyway. The comments below are mostly as I learn how the whole thing works so if I'm talking nonsense, I'm happy to be ignored or, better yet, educated. I think the way we set the "consider_parallel" flag is a bit odd (we just "return" out of the function in the cases were it mustn't be set); but that mechanism is already part of set_rel_consider_parallel and similar to (but not quite like) longstanding routines such as set_rel_width, so nothing new in this patch. I find this a bit funny coding, but then this is the planner so maybe it's okay. I think the comment on search_indexed_tlist_for_partial_aggref is a bit bogus; it says it returns an existing Var, but what it does is manufacture one itself. I *think* the code is all right, but the comment seems misleading. In set_combineagg_references(), there are two calls to fix_combine_agg_expr(); I think the one hanging on the search_indexed_tlist_for_sortgroupref call is useless; you could just have the "if newexpr != NULL" in the outer block (after initializing to NULL before the ressortgroupref check). set_combineagg_references's comment says it does something "similar to set_upper_references, and additionally" some more stuff, but reading the code for both functions I think that's not quite true. I think the comment should say that both functions are parallel, but one works for partial aggs and the other doesn't. Actually, what happens if you feed an agg plan with combineStates to set_upper_references? If it still works but the result is not optimal, maybe we should check against that case, so as to avoid the case where somebody hacks this further and the plans are suddenly not agg-combined anymore. What I actually expect to happen is that something would explode during execution; in that case perhaps it's better to add a comment? (In further looking at other setrefs.c similar functions, maybe it's fine the way you have it.) Back at make_partialgroup_input_target, the comment says "so that they can be found later in Combine Aggregate nodes during setrefs". I think it's better to be explicit and say ".. can be found later during set_combineagg_references" or something. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On 17 March 2016 at 03:47, Robert Haas wrote: > More review comments: > > /* > +* Likewise for any partial paths, although this case > is more simple as > +* we don't track the cheapest path. > +*/ > > I think in the process of getting rebased over the rapidly-evolving > underlying substructure, this comment no longer makes much sense where > it is in the file. IIUC, the comment is referring back to "Forcibly > apply that target to all the Paths for the scan/join rel", but there's > now enough other stuff in the middle that it doesn't really make sense > any more. And actually, I think you should move the code up higher, > not change the comment. This belongs before setting > root->upper_targets[foo]. Yes, that's not where it was originally... contents may settle during transit... > The logic in create_grouping_paths() is too ad-hoc and, as Amit and I > have both complained about, wrong in detail because it doesn't call > has_parallel_hazard anywhere. Basically, you have the wrong design. > There shouldn't be any need to check parallelModeOK here. Rather, > what you should be doing is setting consider_parallel to true or false > on the upper rel. See set_rel_consider_parallel for how this is set > for base relations, set_append_rel_size() for append relations, and > perhaps most illustratively build_join_rel() for join relations. You > should have some equivalent of this logic for upper rels, or at least > the upper rels you care about: > >if (inner_rel->consider_parallel && outer_rel->consider_parallel && > !has_parallel_hazard((Node *) restrictlist, false)) > joinrel->consider_parallel = true; > > Then, you can consider parallel aggregation if consider_parallel is > true and any other conditions that you care about are also met. > > I think that the way you are considering sorted aggregation, hashed > aggregation, and mixed strategies does not make very much sense. It > seems to me that what you should do is: > > 1. Figure out the cheapest PartialAggregate path. You will need to > compare the costs of (a) a hash aggregate, (b) an explicit sort + > group aggregate, and (c) grouping a presorted path. (We can > technically skip (c) for right now since it can't occur.) I would go > ahead and use add_partial_path() on these to stuff them into the > partial_pathlist for the upper rel. > > 2. Take the first (cheapest) path in the partial_pathlist and stick a > Gather node on top of it. Store this in a local variable, say, > partial_aggregate_path. > > 3. Construct a finalize-hash-aggregate path for partial_aggregate_path > and also a sort+finalize-group/plain-aggregate path for > partial_aggregate_path, and add each of those to the upper rel. They > will either beat out the non-parallel paths or they won't. > > The point is that the decision as to whether to use hashing or sorting > below the Gather is completely independent from the choice of which > one to use above the Gather. Pick the best strategy below the Gather; > then pick the best strategy to stick on top of that above the Gather. Good point. I've made local alterations to the patch so that partial paths are now generated on the grouped_rel. I also got rid of the enable_hashagg test in create_grouping_paths(). The use of this did seemed rather old school planner. Instead I altered cost_agg() to add disable_cost appropriately, which simplifies the logic of when and when not to consider hash aggregate paths. The only downside of this that I can see is that the hash agg Path is still generated when enable_hashagg is off, and that means a tiny bit more work creating the path and calling add_path() for it. This change also allows nodeGroup to be used for parallel aggregate when there are no aggregate functions. This was a bit broken in the old version. > * This is very similar to make_group_input_target(), only we do not recurse > * into Aggrefs. Aggrefs are left intact and added to the target list. Here we > * also add any Aggrefs which are located in the HAVING clause into the > * PathTarget. > * > * Aggrefs are also setup into partial mode and the partial return types are > * set to become the type of the aggregate transition state rather than the > * aggregate function's return type. > > This comment isn't very helpful because it tells you what the function > does, which you can find out anyway from reading the code. What it > should do is explain why it does it. Just to take one particular > point, why would we not want to recurse into Aggrefs in this case? Good point. I've updated it to: /* * make_partialgroup_input_target * Generate appropriate PathTarget for input for Partial Aggregate nodes. * * Similar to make_group_input_target(), only we don't recurse into Aggrefs, as * we need these to remain intact so that they can be found later in Combine * Aggregate nodes during setrefs. Vars will be still pulled out of * non-Aggref n
Re: [HACKERS] Parallel Aggregate
On 17 March 2016 at 01:29, Robert Haas wrote: > On Wed, Mar 16, 2016 at 8:19 AM, Amit Kapila wrote: >> Isn't it better to call it as Parallel Aggregate instead of Partial >> Aggregate. Initialy, we have kept Partial for seqscan, but later on we >> changed to Parallel Seq Scan, so I am not able to think why it is better to >> call Partial incase of Aggregates. > > I think partial is the right terminology. Unlike a parallel > sequential scan, a partial aggregate isn't parallel-aware and could be > used in contexts having nothing to do with parallelism. It's just > that it outputs transition values instead of a finalized value. +1 the reason the partial aggregate patches have been kept separate from the parallel aggregate patches is that partial aggregate will serve for many other purposes. Parallel Aggregate is just one of many possible use cases for this, so it makes little sense to give it a name according to a single use case. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Fri, Mar 18, 2016 at 10:32 PM, David Rowley wrote: > Updated patch is attached. I think this looks structurally correct now, and I think it's doing the right thing as far as parallelism is concerned. I don't see any obvious problems in the rest of it, either, but I haven't thought hugely deeply about the way you are doing the costing, nor have I totally convinced myself that all of the PathTarget and setrefs stuff is correct. But I think it's probably pretty close. I'll study it some more next week. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Thu, Mar 17, 2016 at 6:41 PM, David Rowley wrote: > > On 18 March 2016 at 01:22, Amit Kapila wrote: > > On Thu, Mar 17, 2016 at 10:35 AM, David Rowley > > wrote: > > Updated patch is attached. Thanks for the re-review. > Few more comments: 1. + if (parse->groupClause) + path = (Path *) create_sort_path(root, + grouped_rel, + path, + root->group_pathkeys, + -1.0); For final path, why do you want to sort just for group by case? 2. + path = (Path *) create_gather_path(root, partial_grouped_rel, path, + NULL, &total_groups); + + if (parse->groupClause) + path = (Path *) create_sort_path(root, + grouped_rel, + path, + root->group_pathkeys, + -1.0); + + if (parse->hasAggs) + add_path(grouped_rel, (Path *) + create_agg_path(root, + grouped_rel, + path, + target, + parse->groupClause ? AGG_SORTED : AGG_PLAIN, + parse->groupClause, + (List *) parse->havingQual, + &agg_costs, + partial_grouped_rel->rows, + true, + true)); + else + add_path(grouped_rel, (Path *) + create_group_path(root, + grouped_rel, + path, + target, + parse->groupClause, + (List *) parse->havingQual, + total_groups)); In above part of patch, it seems you are using number of groups differenetly; for create_group_path() and create_gather_path(), you have used total_groups whereas for create_agg_path() partial_grouped_rel->rows is used, is there a reason for the same? 3. + if (grouped_rel->partial_pathlist) + { + Path *path = (Path *) linitial(grouped_rel->partial_pathlist); + double total_groups; + + total_groups = path->rows * path->parallel_degree; + path = (Path *) create_gather_path(root, partial_grouped_rel, path, + NULL, &total_groups); A. Won't passing partial_grouped_rel lead to incomplete information required by create_gather_path() w.r.t the case of parameterized path info? B. You have mentioned that passing grouped_rel will make gather path contain the information of final path target, but what is the problem with that? I mean to ask why Gather node is required to contain partial path target information instead of final path target. C. Can we consider passing pathtarget to create_gather_path() as that seems to save us from inventing new UpperRelationKind? If you are worried about adding the new parameter (pathtarget) to create_gather_path(), then I think we are already passing it in many other path generation functions, so why not for gather path generation as well? 4A. Overall function create_grouping_paths() looks better than previous, but I think still it is difficult to read. I think it can be improved by generating partial aggregate paths separately as we do for nestloop join refer function consider_parallel_nestloop 4B. Rather than directly using create_gather_path(), can't we use generate_gather_paths as for all places where we generate gather node, generate_gather_paths() is used. 5. +make_partialgroup_input_target(PlannerInfo *root, PathTarget *final_target) { .. .. + foreach(lc, final_target->exprs) + { + Expr *expr = (Expr *) lfirst(lc); + + i++; + + if (parse->groupClause) + { + Index sgref = final_target- >sortgrouprefs[i]; + + if (sgref && get_sortgroupref_clause_noerr(sgref, parse->groupClause) + != NULL) + { + /* + * It's a grouping column, so add it to the input target as-is. + */ + add_column_to_pathtarget(input_target, expr, sgref); + continue; + } + } + + /* + * Non-grouping column, so just remember the expression for later + * call to pull_var_clause. + */ + non_group_cols = lappend(non_group_cols, expr); + } .. } Do we want to achieve something different in the above foreach loop than the similar loop in make_group_input_target(), if not then why are they not exactly same? 6. + /* XXX this causes some redundant cost calculation ... */ + input_target = set_pathtarget_cost_width(root, input_target); + return input_target; Can't we use return set_pathtarget_cost_width() directly rather than fetching it in input_target and then returning input_target? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Parallel Aggregate
On Wed, Mar 16, 2016 at 4:19 PM, David Rowley wrote: > > On 16 March 2016 at 15:04, Robert Haas wrote: > > I don't think I'd be objecting if you made PartialAggref a real > > alternative to Aggref. But that's not what you've got here. A > > PartialAggref is just a wrapper around an underlying Aggref that > > changes the interpretation of it - and I think that's not a good idea. > > If you want to have Aggref and PartialAggref as truly parallel node > > types, that seems cool, and possibly better than what you've got here > > now. Alternatively, Aggref can do everything. But I don't think we > > should go with this wrapper concept. > > Ok, I've now gotten rid of the PartialAggref node, and I'm actually > quite happy with how it turned out. I made > search_indexed_tlist_for_partial_aggref() to follow-on the series of > other search_indexed_tlist_for_* functions and have made it behave the > same way, by returning the newly created Var instead of doing that in > fix_combine_agg_expr_mutator(), as the last version did. > > Thanks for the suggestion. > > New patch attached. > Few assorted comments: 1. /* + * Determine if it's possible to perform aggregation in parallel using + * multiple worker processes. We can permit this when there's at least one + * partial_path in input_rel, but not if the query has grouping sets, + * (although this likely just requires a bit more thought). We must also + * ensure that any aggregate functions which are present in either the + * target list, or in the HAVING clause all support parallel mode. + */ + can_parallel = false; + + if ((parse->hasAggs || parse->groupClause != NIL) && + input_rel->partial_pathlist != NIL && + parse->groupingSets == NIL && + root->glob->parallelModeOK) I think here you need to use has_parallel_hazard() with second parameter as false to ensure expressions are parallel safe. glob->parallelModeOK flag indicates that there is no parallel unsafe expression, but it can still contain parallel restricted expression. 2. AggPath * create_agg_path(PlannerInfo *root, @@ -2397,9 +2399,11 @@ create_agg_path(PlannerInfo *root, List *groupClause, List *qual, const AggClauseCosts *aggcosts, - double numGroups) + double numGroups, + bool combineStates, + bool finalizeAggs) Don't you need to set parallel_aware flag in this function as we do for create_seqscan_path()? 3. postgres=# explain select count(*) from t1; QUERY PLAN -- Finalize Aggregate (cost=45420.57..45420.58 rows=1 width=8) -> Gather (cost=45420.35..45420.56 rows=2 width=8) Number of Workers: 2 -> Partial Aggregate (cost=44420.35..44420.36 rows=1 width=8) -> Parallel Seq Scan on t1 (cost=0.00..44107.88 rows=124988 wid th=0) (5 rows) Isn't it better to call it as Parallel Aggregate instead of Partial Aggregate. Initialy, we have kept Partial for seqscan, but later on we changed to Parallel Seq Scan, so I am not able to think why it is better to call Partial incase of Aggregates. 4. /* + * Likewise for any partial paths, although this case is more simple as + * we don't track the cheapest path. + */ + foreach(lc, current_rel->partial_pathlist) + { + Path *subpath = (Path *) lfirst(lc); + + Assert(subpath->param_info == NULL); + lfirst(lc) = apply_projection_to_path(root, current_rel, + subpath, scanjoin_target); + } + Can't we do this by teaching apply_projection_to_path() as done in the latest patch posted by me to push down the target list beneath workers [1]. 5. + /* + * If we find any aggs with an internal transtype then we must ensure + * that pointers to aggregate states are not passed to other processes, + * therefore we set the maximum degree to PAT_INTERNAL_ONLY. + */ + if (aggform->aggtranstype == INTERNALOID) + context->allowedtype = PAT_INTERNAL_ONLY; In the above comment, you have refered maximum degree which is not making much sense to me. If it is not a typo, then can you clarify the same. 6. + * fix_combine_agg_expr + * Like fix_upper_expr() but additionally adjusts the Aggref->args of + * Aggrefs so that they references the corresponding Aggref in the subplan. + */ +static Node * +fix_combine_agg_expr(PlannerInfo *root, + Node *node, + indexed_tlist *subplan_itlist, + Index newvarno, + int rtoffset) +{ + fix_upper_expr_context context; + + context.root = root; + context.subplan_itlist = subplan_itlist; + context.newvarno = newvarno; + context.rtoffset = rtoffset; + return fix_combine_agg_expr_mutator(node, &context); +} + +static Node * +fix_combine_agg_expr_mutator(Node *node, fix_upper_expr_context *context) Don't we want to handle the case of context->subplan_itlist->has_non_vars as it is handled in fix_upper_expr_mutator()? If not then, I think adding the reason for same in comments above function would be better. 7. tlist.c +} \ No newline at end of file There should be a new
Re: [HACKERS] Parallel Aggregate
On Wed, Mar 16, 2016 at 8:19 AM, Amit Kapila wrote: > Isn't it better to call it as Parallel Aggregate instead of Partial > Aggregate. Initialy, we have kept Partial for seqscan, but later on we > changed to Parallel Seq Scan, so I am not able to think why it is better to > call Partial incase of Aggregates. I think partial is the right terminology. Unlike a parallel sequential scan, a partial aggregate isn't parallel-aware and could be used in contexts having nothing to do with parallelism. It's just that it outputs transition values instead of a finalized value. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Wed, Mar 16, 2016 at 7:57 AM, Robert Haas wrote: > On Wed, Mar 16, 2016 at 6:49 AM, David Rowley > wrote: >> On 16 March 2016 at 15:04, Robert Haas wrote: >>> I don't think I'd be objecting if you made PartialAggref a real >>> alternative to Aggref. But that's not what you've got here. A >>> PartialAggref is just a wrapper around an underlying Aggref that >>> changes the interpretation of it - and I think that's not a good idea. >>> If you want to have Aggref and PartialAggref as truly parallel node >>> types, that seems cool, and possibly better than what you've got here >>> now. Alternatively, Aggref can do everything. But I don't think we >>> should go with this wrapper concept. >> >> Ok, I've now gotten rid of the PartialAggref node, and I'm actually >> quite happy with how it turned out. I made >> search_indexed_tlist_for_partial_aggref() to follow-on the series of >> other search_indexed_tlist_for_* functions and have made it behave the >> same way, by returning the newly created Var instead of doing that in >> fix_combine_agg_expr_mutator(), as the last version did. >> >> Thanks for the suggestion. >> >> New patch attached. > > Cool! Why not initialize aggpartialtype always? More review comments: /* +* Likewise for any partial paths, although this case is more simple as +* we don't track the cheapest path. +*/ I think in the process of getting rebased over the rapidly-evolving underlying substructure, this comment no longer makes much sense where it is in the file. IIUC, the comment is referring back to "Forcibly apply that target to all the Paths for the scan/join rel", but there's now enough other stuff in the middle that it doesn't really make sense any more. And actually, I think you should move the code up higher, not change the comment. This belongs before setting root->upper_targets[foo]. The logic in create_grouping_paths() is too ad-hoc and, as Amit and I have both complained about, wrong in detail because it doesn't call has_parallel_hazard anywhere. Basically, you have the wrong design. There shouldn't be any need to check parallelModeOK here. Rather, what you should be doing is setting consider_parallel to true or false on the upper rel. See set_rel_consider_parallel for how this is set for base relations, set_append_rel_size() for append relations, and perhaps most illustratively build_join_rel() for join relations. You should have some equivalent of this logic for upper rels, or at least the upper rels you care about: if (inner_rel->consider_parallel && outer_rel->consider_parallel && !has_parallel_hazard((Node *) restrictlist, false)) joinrel->consider_parallel = true; Then, you can consider parallel aggregation if consider_parallel is true and any other conditions that you care about are also met. I think that the way you are considering sorted aggregation, hashed aggregation, and mixed strategies does not make very much sense. It seems to me that what you should do is: 1. Figure out the cheapest PartialAggregate path. You will need to compare the costs of (a) a hash aggregate, (b) an explicit sort + group aggregate, and (c) grouping a presorted path. (We can technically skip (c) for right now since it can't occur.) I would go ahead and use add_partial_path() on these to stuff them into the partial_pathlist for the upper rel. 2. Take the first (cheapest) path in the partial_pathlist and stick a Gather node on top of it. Store this in a local variable, say, partial_aggregate_path. 3. Construct a finalize-hash-aggregate path for partial_aggregate_path and also a sort+finalize-group/plain-aggregate path for partial_aggregate_path, and add each of those to the upper rel. They will either beat out the non-parallel paths or they won't. The point is that the decision as to whether to use hashing or sorting below the Gather is completely independent from the choice of which one to use above the Gather. Pick the best strategy below the Gather; then pick the best strategy to stick on top of that above the Gather. * This is very similar to make_group_input_target(), only we do not recurse * into Aggrefs. Aggrefs are left intact and added to the target list. Here we * also add any Aggrefs which are located in the HAVING clause into the * PathTarget. * * Aggrefs are also setup into partial mode and the partial return types are * set to become the type of the aggregate transition state rather than the * aggregate function's return type. This comment isn't very helpful because it tells you what the function does, which you can find out anyway from reading the code. What it should do is explain why it does it. Just to take one particular point, why would we not want to recurse into Aggrefs in this case? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To ma
Re: [HACKERS] Parallel Aggregate
On Thu, Mar 17, 2016 at 10:35 AM, David Rowley wrote: > > On 17 March 2016 at 01:19, Amit Kapila wrote: > > Few assorted comments: > > > > 2. > > AggPath * > > create_agg_path(PlannerInfo *root, > > @@ -2397,9 +2399,11 @@ create_agg_path(PlannerInfo *root, > > > > List *groupClause, > > List *qual, > > const AggClauseCosts > > *aggcosts, > > - double numGroups) > > + double numGroups, > > + > > bool combineStates, > > + bool finalizeAggs) > > > > Don't you need to set parallel_aware flag in this function as we do for > > create_seqscan_path()? > > I don't really know the answer to that... I mean there's nothing > special done in nodeAgg.c if the node is running in a worker or in the > main process. > On again thinking about it, I think it is okay to set parallel_aware flag as false. This flag means whether that particular node has any parallelism behaviour which is true for seqscan, but I think not for partial aggregate node. Few other comments on latest patch: 1. + /* + * XXX does this need estimated for each partial path, or are they + * all going to be the same anyway? + */ + dNumPartialGroups = get_number_of_groups(root, + clamp_row_est(partial_aggregate_path->rows), + rollup_lists, + rollup_groupclauses); For considering partial groups, do we need to rollup related lists? 2. + hashaggtablesize = estimate_hashagg_tablesize(partial_aggregate_path, + &agg_costs, + dNumPartialGroups); + + /* + * Generate a hashagg Path, if we can, but we'll skip this if the hash + * table looks like it'll exceed work_mem. + */ + if (can_hash && hashaggtablesize < work_mem * 1024L) hash table size should be estimated only if can_hash is true. 3. + foreach(lc, grouped_rel->partial_pathlist) + { + Path *path = (Path *) lfirst(lc); + double total_groups; + + total_groups = path- >parallel_degree * path->rows; + + path = (Path *) create_gather_path(root, grouped_rel, path, NULL, + &total_groups); Do you need to perform it foreach partial path or just do it for firstpartial path? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Parallel Aggregate
Hi again, This is probably me missing something, but is there a reason parallel aggregate doesn't seem to ever create append nodes containing Index scans? SET random_page_cost TO 0.2; SET max_parallel_degree TO 8; postgres=# explain SELECT sum(count_i) FROM base GROUP BY view_time_day; QUERY PLAN - Finalize GroupAggregate (cost=310596.32..310598.03 rows=31 width=16) Group Key: view_time_day -> Sort (cost=310596.32..310596.79 rows=186 width=16) Sort Key: view_time_day -> Gather (cost=310589.00..310589.31 rows=186 width=16) Number of Workers: 5 -> Partial HashAggregate (cost=310589.00..310589.31 rows=31 width=16) Group Key: view_time_day -> Parallel Seq Scan on base (cost=0.00..280589.00 rows=600 width=12) SET max_parallel_degree TO 0; postgres=# explain SELECT sum(count_i) FROM base GROUP BY view_time_day; QUERY PLAN --- GroupAggregate (cost=0.56..600085.92 rows=31 width=16) Group Key: view_time_day -> Index Only Scan using base_view_time_day_count_i_idx on base (cost=0.56..450085.61 rows=3000 width=12) (3 rows) Cheers, James Sewell, Solutions Architect __ Level 2, 50 Queen St, Melbourne VIC 3000 *P *(+61) 3 8370 8000 *W* www.lisasoft.com *F *(+61) 3 8370 8099 On Thu, Mar 17, 2016 at 8:08 AM, David Rowley wrote: > On 17 March 2016 at 01:29, Robert Haas wrote: > > On Wed, Mar 16, 2016 at 8:19 AM, Amit Kapila > wrote: > >> Isn't it better to call it as Parallel Aggregate instead of Partial > >> Aggregate. Initialy, we have kept Partial for seqscan, but later on we > >> changed to Parallel Seq Scan, so I am not able to think why it is > better to > >> call Partial incase of Aggregates. > > > > I think partial is the right terminology. Unlike a parallel > > sequential scan, a partial aggregate isn't parallel-aware and could be > > used in contexts having nothing to do with parallelism. It's just > > that it outputs transition values instead of a finalized value. > > +1 the reason the partial aggregate patches have been kept separate > from the parallel aggregate patches is that partial aggregate will > serve for many other purposes. Parallel Aggregate is just one of many > possible use cases for this, so it makes little sense to give it a > name according to a single use case. > > -- > David Rowley http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- -- The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
Re: [HACKERS] Parallel Aggregate
On 18 March 2016 at 01:22, Amit Kapila wrote: > On Thu, Mar 17, 2016 at 10:35 AM, David Rowley > wrote: >> >> On 17 March 2016 at 01:19, Amit Kapila wrote: >> > Few assorted comments: >> > >> > 2. >> > AggPath * >> > create_agg_path(PlannerInfo *root, >> > @@ -2397,9 +2399,11 @@ create_agg_path(PlannerInfo *root, >> > >> > List *groupClause, >> > List *qual, >> > const AggClauseCosts >> > *aggcosts, >> > - double numGroups) >> > + double numGroups, >> > + >> > bool combineStates, >> > + bool finalizeAggs) >> > >> > Don't you need to set parallel_aware flag in this function as we do for >> > create_seqscan_path()? >> >> I don't really know the answer to that... I mean there's nothing >> special done in nodeAgg.c if the node is running in a worker or in the >> main process. >> > > On again thinking about it, I think it is okay to set parallel_aware flag as > false. This flag means whether that particular node has any parallelism > behaviour which is true for seqscan, but I think not for partial aggregate > node. > > Few other comments on latest patch: > > 1. > > + /* > + * XXX does this need estimated for each partial path, or are they > + * all > going to be the same anyway? > + */ > + dNumPartialGroups = get_number_of_groups(root, > + > clamp_row_est(partial_aggregate_path->rows), > + > rollup_lists, > + > rollup_groupclauses); > > For considering partial groups, do we need to rollup related lists? No it doesn't you're right. I did mean to remove these, but they're NIL anyway. Seems better to remove them to prevent confusion. > 2. > + hashaggtablesize = estimate_hashagg_tablesize(partial_aggregate_path, > + > &agg_costs, > + > dNumPartialGroups); > + > + /* > + * Generate a > hashagg Path, if we can, but we'll skip this if the hash > + * table looks like it'll exceed work_mem. > + > */ > + if (can_hash && hashaggtablesize < work_mem * 1024L) > > > hash table size should be estimated only if can_hash is true. Good point. Changed. > > 3. > + foreach(lc, grouped_rel->partial_pathlist) > + { > + Path *path = > (Path *) lfirst(lc); > + double total_groups; > + > + total_groups = path- >>parallel_degree * path->rows; > + > + path = (Path *) create_gather_path(root, grouped_rel, path, > NULL, > + &total_groups); > > Do you need to perform it foreach partial path or just do it for > firstpartial path? That's true. The order here does not matter since we're passing directly into a Gather node, so it's wasteful to consider anything apart from the cheapest path. -- Fixed. There was also a missing hash table size check on the Finalize HashAggregate Path consideration. I've added that now. Updated patch is attached. Thanks for the re-review. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services 0001-Allow-aggregation-to-happen-in-parallel_2016-03-18a.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On 19 March 2016 at 05:46, Robert Haas wrote: > On Wed, Mar 16, 2016 at 5:05 PM, David Rowley > wrote: >>> Cool! Why not initialize aggpartialtype always? >> >> Because the follow-on patch sets that to either the serialtype or the >> aggtranstype, depending on if serialisation is required. Serialisation >> is required for parallel aggregate, but if we're performing the >> partial agg in the main process, then we'd not need to do that. This >> could be solved by adding more fields to AggRef to cover the >> aggserialtype and perhaps expanding aggpartial into an enum mode which >> allows NORMAL, PARTIAL, PARTIAL_SERIALIZE, and have exprType() pay >> attention to the mode and return 1 of the 3 possible types. > > Urk. That might still be better than what you have right now, but > it's obviously not great. How about ditching aggpartialtype and > adding aggoutputtype instead? Then you can always initialize that to > whatever it's supposed to be based on the type of aggregation you are > doing, and exprType() can simply return that field. hmm, that might be better, but it kinda leaves aggpartial without much of a job to do. The only code which depends on that is the sanity checks that I added in execQual.c, and it does not really seem worth keeping it for that. The only sanity check that I can think to do here is if (aggstate->finalizeAggs && aggref->aggoutputtype != aggref->aggtype) -- we have a problem. Obviously we can't check that for non-finalize nodes since the aggtype can match the aggoutputtype for legitimate reasons. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On 19 March 2016 at 09:53, Alvaro Herrera wrote: > I read this a bit, as an exercise to try to follow parallel query a bit. Thanks for taking a look at this. > I think the most interesting thing I have to say is that the new error > messages in ExecInitExpr do not conform to our style. Probably just > downcase everything except Aggref and you're done, since they're > can't-happen conditions anyway. The comments below are mostly as I > learn how the whole thing works so if I'm talking nonsense, I'm happy to > be ignored or, better yet, educated. Per a comment above from Robert I ended up just removing most of that code due to the removal of Aggref->aggpartial. It did not seem worth keeping aggpartial around just for these errors either, but let me know if you think otherwise. > I think the way we set the "consider_parallel" flag is a bit odd (we > just "return" out of the function in the cases were it mustn't be set); > but that mechanism is already part of set_rel_consider_parallel and > similar to (but not quite like) longstanding routines such as > set_rel_width, so nothing new in this patch. I find this a bit funny > coding, but then this is the planner so maybe it's okay. hmm, I think its very similar to set_rel_consider_parallel(), as that just does return for non-supported cases, and if it makes it until the end of the function consider_parallel is set to true. Would you rather see; /* we can do nothing in parallel if there's no aggregates or group by */ if (!parse->hasAggs && parse->groupClause == NIL) grouped_rel->consider_parallel = false; /* grouping sets are currently not supported by parallel aggregate */ else if (parse->groupingSets) grouped_rel->consider_parallel = false; ...? > I think the comment on search_indexed_tlist_for_partial_aggref is a bit > bogus; it says it returns an existing Var, but what it does is > manufacture one itself. I *think* the code is all right, but the > comment seems misleading. Ok, I've changed it to: search_indexed_tlist_for_partial_aggref - find an Aggref in an indexed tlist which seems to be equivalent to search_indexed_tlist_for_non_var()'s comment. > In set_combineagg_references(), there are two calls to > fix_combine_agg_expr(); I think the one hanging on the > search_indexed_tlist_for_sortgroupref call is useless; you could just > have the "if newexpr != NULL" in the outer block (after initializing to > NULL before the ressortgroupref check). Yes, but see set_upper_references(), it just follows the pattern there but calls fix_combine_agg_expr() instead of fix_upper_expr(). For simplicity of review it seems to be nice to keep it following the same pattern. > set_combineagg_references's comment says it does something "similar to > set_upper_references, and additionally" some more stuff, but reading the > code for both functions I think that's not quite true. I think the > comment should say that both functions are parallel, but one works for > partial aggs and the other doesn't. Ok, I've changed the comment to: /* * set_combineagg_references * This does a similar job as set_upper_references(), but treats Aggrefs * in a different way. Here we transforms Aggref nodes args to suit the * combine aggregate phase. This means that the Aggref->args are converted * to reference the corresponding aggregate function in the subplan rather * than simple Var(s), as would be the case for a non-combine aggregate * node. */ > Actually, what happens if you feed > an agg plan with combineStates to set_upper_references? If it still > works but the result is not optimal, maybe we should check against that > case, so as to avoid the case where somebody hacks this further and the > plans are suddenly not agg-combined anymore. What I actually expect to > happen is that something would explode during execution; in that case > perhaps it's better to add a comment? (In further looking at other > setrefs.c similar functions, maybe it's fine the way you have it.) This simply won't work as this is the code which causes the sum((sum(num))) in the combine aggregate's target list. The normal code is trying to look for Vars, but this code is trying to find that sum(num) inside the subplan target list. Example EXPLAIN VERBOSE output: Finalize Aggregate (cost=105780.67..105780.68 rows=1 width=8) Output: pg_catalog.sum((sum(num))) Try commenting out; if (aggplan->combineStates) set_combineagg_references(root, plan, rtoffset); else to see the horrors than ensue. It'll basically turn aggregate functions in to a rather inefficient random number generator. This is because the combine Aggref->args still point to "num", instead of sum(num). > Back at make_partialgroup_input_target, the comment says "so that they > can be found later in Combine Aggregate nodes during setrefs". I think > it's better to be explicit and say ".. can be found later during > set_combineagg_references" or something. Changed. Thanks for the review. Updated patch is attached
Re: [HACKERS] Parallel Aggregate
On 16 March 2016 at 15:04, Robert Haas wrote: > I don't think I'd be objecting if you made PartialAggref a real > alternative to Aggref. But that's not what you've got here. A > PartialAggref is just a wrapper around an underlying Aggref that > changes the interpretation of it - and I think that's not a good idea. > If you want to have Aggref and PartialAggref as truly parallel node > types, that seems cool, and possibly better than what you've got here > now. Alternatively, Aggref can do everything. But I don't think we > should go with this wrapper concept. Ok, I've now gotten rid of the PartialAggref node, and I'm actually quite happy with how it turned out. I made search_indexed_tlist_for_partial_aggref() to follow-on the series of other search_indexed_tlist_for_* functions and have made it behave the same way, by returning the newly created Var instead of doing that in fix_combine_agg_expr_mutator(), as the last version did. Thanks for the suggestion. New patch attached. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services 0001-Allow-aggregation-to-happen-in-parallel_2016-03-16.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Tue, Mar 15, 2016 at 9:23 PM, David Rowley wrote: >>> Should I update the patch to use the method you describe? >> >> Well, my feeling is that is going to make this a lot smaller and >> simpler, so I think so. But if you disagree strongly, let's discuss >> further. > > Not strongly. It means that Aggref will need another field to store > the transtype and/or the serialtype (for the follow-on patches in > Combining Aggregates thread) > The only other issue which I don't think I've addressed yet is target > list width estimates. Probably that can just pay attention to the > aggpartial flag too, and if I fix that then likely the Aggref needs to > carry around a aggtransspace field too, which we really only need to > populate when the Aggref is in partial mode... it would be wasteful to > bother looking that up from the catalogs if we're never going to use > the Aggref in partial mode, and it seems weird to leave it as zero and > only populate it when we need it. So... if I put on my object oriented > hat and think about this My brain says that Aggref should inherit > from PartialAggref and that's what I have now... or at least some > C variant. So I really do think it's cleaner and easier to understand > by keeping the ParialAggref. > > I see on looking at exprType() that we do have other node types which > conditionally return a different type, but seeing that does not fill > me with joy. I don't think I'd be objecting if you made PartialAggref a real alternative to Aggref. But that's not what you've got here. A PartialAggref is just a wrapper around an underlying Aggref that changes the interpretation of it - and I think that's not a good idea. If you want to have Aggref and PartialAggref as truly parallel node types, that seems cool, and possibly better than what you've got here now. Alternatively, Aggref can do everything. But I don't think we should go with this wrapper concept. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On 16 March 2016 at 14:08, Robert Haas wrote: > On Tue, Mar 15, 2016 at 8:55 PM, David Rowley > wrote: >> On 16 March 2016 at 13:42, Robert Haas wrote: >>> On Tue, Mar 15, 2016 at 8:04 PM, David Rowley >>> wrote: On 16 March 2016 at 12:58, Robert Haas wrote: > ...and why would one be true and the other false? One would be the combine aggregate (having aggpartial = false), and the one in the subnode would be the partial aggregate (having aggpartial = true) Notice in create_grouping_paths() I build a partial aggregate version of the PathTarget named partial_group_target, this one goes into the partial agg node, and Gather node. In this case the aggpartial will be set differently for the Aggrefs in each of the two PathTargets, so it would not be possible in setrefs.c to find the correct target list entry in the subnode by using equal(). It'll just end up triggering the elog(ERROR, "Aggref not found in subplan target list"); error. >>> >>> OK, I get it now. I still don't like it very much. There's no >>> ironclad requirement that we use equal() here as opposed to some >>> bespoke comparison function with the exact semantics we need, and ISTM >>> that getting rid of PartialAggref would shrink this patch down quite a >>> bit. >> >> Well that might work. I'd not thought of doing it that way. The only >> issue that I can foresee with that is that when new fields are added >> to Aggref in the future, we might miss updating that custom comparison >> function to include them. > > That's true, but it doesn't seem like that big a deal. A code comment > in the Aggref definition seems like sufficient insurance against such > a mistake. > >> Should I update the patch to use the method you describe? > > Well, my feeling is that is going to make this a lot smaller and > simpler, so I think so. But if you disagree strongly, let's discuss > further. Not strongly. It means that Aggref will need another field to store the transtype and/or the serialtype (for the follow-on patches in Combining Aggregates thread) The only other issue which I don't think I've addressed yet is target list width estimates. Probably that can just pay attention to the aggpartial flag too, and if I fix that then likely the Aggref needs to carry around a aggtransspace field too, which we really only need to populate when the Aggref is in partial mode... it would be wasteful to bother looking that up from the catalogs if we're never going to use the Aggref in partial mode, and it seems weird to leave it as zero and only populate it when we need it. So... if I put on my object oriented hat and think about this My brain says that Aggref should inherit from PartialAggref and that's what I have now... or at least some C variant. So I really do think it's cleaner and easier to understand by keeping the ParialAggref. I see on looking at exprType() that we do have other node types which conditionally return a different type, but seeing that does not fill me with joy. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Tue, Mar 15, 2016 at 8:55 PM, David Rowley wrote: > On 16 March 2016 at 13:42, Robert Haas wrote: >> On Tue, Mar 15, 2016 at 8:04 PM, David Rowley >> wrote: >>> On 16 March 2016 at 12:58, Robert Haas wrote: ...and why would one be true and the other false? >>> >>> One would be the combine aggregate (having aggpartial = false), and >>> the one in the subnode would be the partial aggregate (having >>> aggpartial = true) >>> Notice in create_grouping_paths() I build a partial aggregate version >>> of the PathTarget named partial_group_target, this one goes into the >>> partial agg node, and Gather node. In this case the aggpartial will be >>> set differently for the Aggrefs in each of the two PathTargets, so it >>> would not be possible in setrefs.c to find the correct target list >>> entry in the subnode by using equal(). It'll just end up triggering >>> the elog(ERROR, "Aggref not found in subplan target list"); error. >> >> OK, I get it now. I still don't like it very much. There's no >> ironclad requirement that we use equal() here as opposed to some >> bespoke comparison function with the exact semantics we need, and ISTM >> that getting rid of PartialAggref would shrink this patch down quite a >> bit. > > Well that might work. I'd not thought of doing it that way. The only > issue that I can foresee with that is that when new fields are added > to Aggref in the future, we might miss updating that custom comparison > function to include them. That's true, but it doesn't seem like that big a deal. A code comment in the Aggref definition seems like sufficient insurance against such a mistake. > Should I update the patch to use the method you describe? Well, my feeling is that is going to make this a lot smaller and simpler, so I think so. But if you disagree strongly, let's discuss further. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On 16 March 2016 at 13:42, Robert Haas wrote: > On Tue, Mar 15, 2016 at 8:04 PM, David Rowley > wrote: >> On 16 March 2016 at 12:58, Robert Haas wrote: >>> ...and why would one be true and the other false? >> >> One would be the combine aggregate (having aggpartial = false), and >> the one in the subnode would be the partial aggregate (having >> aggpartial = true) >> Notice in create_grouping_paths() I build a partial aggregate version >> of the PathTarget named partial_group_target, this one goes into the >> partial agg node, and Gather node. In this case the aggpartial will be >> set differently for the Aggrefs in each of the two PathTargets, so it >> would not be possible in setrefs.c to find the correct target list >> entry in the subnode by using equal(). It'll just end up triggering >> the elog(ERROR, "Aggref not found in subplan target list"); error. > > OK, I get it now. I still don't like it very much. There's no > ironclad requirement that we use equal() here as opposed to some > bespoke comparison function with the exact semantics we need, and ISTM > that getting rid of PartialAggref would shrink this patch down quite a > bit. Well that might work. I'd not thought of doing it that way. The only issue that I can foresee with that is that when new fields are added to Aggref in the future, we might miss updating that custom comparison function to include them. Should I update the patch to use the method you describe? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Tue, Mar 15, 2016 at 8:04 PM, David Rowley wrote: > On 16 March 2016 at 12:58, Robert Haas wrote: >> On Tue, Mar 15, 2016 at 6:55 PM, David Rowley >> wrote: >>> On 16 March 2016 at 11:00, Robert Haas wrote: I don't see why we would need to leave aggpartial out of the equals() check. I must be missing something. >>> >>> See fix_combine_agg_expr_mutator() >>> >>> This piece of code: >>> >>> /* >>> * Aggrefs for partial aggregates are wrapped up in a PartialAggref, >>> * we need to look into the PartialAggref to find the Aggref within. >>> */ >>> foreach(lc, context->subplan_itlist->tlist) >>> { >>> PartialAggref *paggref; >>> tle = (TargetEntry *) lfirst(lc); >>> paggref = (PartialAggref *) tle->expr; >>> >>> if (IsA(paggref, PartialAggref) && >>> equal(paggref->aggref, aggref)) >>> break; >>> } >>> >>> if equals() compared the aggpartial then this code would fail to find >>> the Aggref in the subnode due to the aggpartial field being true on >>> one and false on the other Aggref. >> >> ...and why would one be true and the other false? > > One would be the combine aggregate (having aggpartial = false), and > the one in the subnode would be the partial aggregate (having > aggpartial = true) > Notice in create_grouping_paths() I build a partial aggregate version > of the PathTarget named partial_group_target, this one goes into the > partial agg node, and Gather node. In this case the aggpartial will be > set differently for the Aggrefs in each of the two PathTargets, so it > would not be possible in setrefs.c to find the correct target list > entry in the subnode by using equal(). It'll just end up triggering > the elog(ERROR, "Aggref not found in subplan target list"); error. OK, I get it now. I still don't like it very much. There's no ironclad requirement that we use equal() here as opposed to some bespoke comparison function with the exact semantics we need, and ISTM that getting rid of PartialAggref would shrink this patch down quite a bit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On 16 March 2016 at 12:58, Robert Haas wrote: > On Tue, Mar 15, 2016 at 6:55 PM, David Rowley > wrote: >> On 16 March 2016 at 11:00, Robert Haas wrote: >>> I don't see why we would need to leave aggpartial out of the equals() >>> check. I must be missing something. >> >> See fix_combine_agg_expr_mutator() >> >> This piece of code: >> >> /* >> * Aggrefs for partial aggregates are wrapped up in a PartialAggref, >> * we need to look into the PartialAggref to find the Aggref within. >> */ >> foreach(lc, context->subplan_itlist->tlist) >> { >> PartialAggref *paggref; >> tle = (TargetEntry *) lfirst(lc); >> paggref = (PartialAggref *) tle->expr; >> >> if (IsA(paggref, PartialAggref) && >> equal(paggref->aggref, aggref)) >> break; >> } >> >> if equals() compared the aggpartial then this code would fail to find >> the Aggref in the subnode due to the aggpartial field being true on >> one and false on the other Aggref. > > ...and why would one be true and the other false? One would be the combine aggregate (having aggpartial = false), and the one in the subnode would be the partial aggregate (having aggpartial = true) Notice in create_grouping_paths() I build a partial aggregate version of the PathTarget named partial_group_target, this one goes into the partial agg node, and Gather node. In this case the aggpartial will be set differently for the Aggrefs in each of the two PathTargets, so it would not be possible in setrefs.c to find the correct target list entry in the subnode by using equal(). It'll just end up triggering the elog(ERROR, "Aggref not found in subplan target list"); error. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Tue, Mar 15, 2016 at 6:55 PM, David Rowley wrote: > On 16 March 2016 at 11:00, Robert Haas wrote: >> I don't see why we would need to leave aggpartial out of the equals() >> check. I must be missing something. > > See fix_combine_agg_expr_mutator() > > This piece of code: > > /* > * Aggrefs for partial aggregates are wrapped up in a PartialAggref, > * we need to look into the PartialAggref to find the Aggref within. > */ > foreach(lc, context->subplan_itlist->tlist) > { > PartialAggref *paggref; > tle = (TargetEntry *) lfirst(lc); > paggref = (PartialAggref *) tle->expr; > > if (IsA(paggref, PartialAggref) && > equal(paggref->aggref, aggref)) > break; > } > > if equals() compared the aggpartial then this code would fail to find > the Aggref in the subnode due to the aggpartial field being true on > one and false on the other Aggref. ...and why would one be true and the other false? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On 16 March 2016 at 11:00, Robert Haas wrote: > I don't see why we would need to leave aggpartial out of the equals() > check. I must be missing something. See fix_combine_agg_expr_mutator() This piece of code: /* * Aggrefs for partial aggregates are wrapped up in a PartialAggref, * we need to look into the PartialAggref to find the Aggref within. */ foreach(lc, context->subplan_itlist->tlist) { PartialAggref *paggref; tle = (TargetEntry *) lfirst(lc); paggref = (PartialAggref *) tle->expr; if (IsA(paggref, PartialAggref) && equal(paggref->aggref, aggref)) break; } if equals() compared the aggpartial then this code would fail to find the Aggref in the subnode due to the aggpartial field being true on one and false on the other Aggref. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Tue, Mar 15, 2016 at 5:50 PM, David Rowley wrote: >> I still think that's solving the problem the wrong way. Why can't >> exprType(), when applied to the Aggref, do something like this? >> >> { >> Aggref *aref = (Aggref *) expr; >> if (aref->aggpartial) >> return aref->aggtranstype; >> else >> return aref->aggtype; >> } >> >> The obvious answer is "well, because those fields don't exist in >> Aggref". But shouldn't they? From here, it looks like PartialAggref >> is a cheap hack around not having whacked Aggref around hard for >> partial aggregation. > > We could do it that way if we left the aggpartial field out of the > equals() check, but I think we go at length to not do that. Just look > at what's done for all of the location fields. In any case if we did > that then it might not actually be what we want all of the time... > Perhaps in some cases we'd want equals() to return false when the > aggpartial does not match, and in other cases we'd want it to return > true. There's no way to control that behaviour, so to get around the > setrefs.c problem I created the wrapper node type, which I happen to > think is quite clean. Just see Tom's comments about Haribabu's temp > fix for the problem where he put some hacks into the equals for aggref > in [1]. I don't see why we would need to leave aggpartial out of the equals() check. I must be missing something. I don't see where this applies has_parallel_hazard or anything comparable to the aggregate functions. I think it needs to do that. >>> >>> Not sure what you mean here. >> >> If the aggregate isn't parallel-safe, you can't do this optimization. >> For example, imagine an aggregate function written in PL/pgsql that >> for some reason writes data to a side table. It's >> has_parallel_hazard's job to check the parallel-safety properties of >> the functions used in the query. > > Sorry, I do know what you mean by that. I might have been wrong to > assume that the parallelModeOK check did this. I will dig into how > that is set exactly. Hmm, sorry, I wasn't very accurate, there. The parallelModeOK check will handle indeed the case where there are parallel-unsafe functions, but it will not handle the case where there are parallel-restricted functions. In that latter case, the query can still use parallelism someplace, but the parallel-restricted functions cannot be executed beneath the Gather. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On 16 March 2016 at 09:23, Robert Haas wrote: > On Mon, Mar 14, 2016 at 7:56 PM, David Rowley > wrote: >> A comment does explain this, but perhaps it's not good enough, so I've >> rewritten it to become. >> >> /* >> * PartialAggref >> * >> * When partial aggregation is required in a plan, the nodes from the partial >> * aggregate node, up until the finalize aggregate node must pass the >> partially >> * aggregated states up the plan tree. In regards to target list construction >> * in setrefs.c, this requires that exprType() returns the state's type >> rather >> * than the final aggregate value's type, and since exprType() for Aggref is >> * coded to return the aggtype, this is not correct for us. We can't fix this >> * by going around modifying the Aggref to change it's return type as >> setrefs.c >> * requires searching for that Aggref using equals() which compares all >> fields >> * in Aggref, and changing the aggtype would cause such a comparison to fail. >> * To get around this problem we wrap the Aggref up in a PartialAggref, this >> * allows exprType() to return the correct type and we can handle a >> * PartialAggref in setrefs.c by just peeking inside the PartialAggref to >> check >> * the underlying Aggref. The PartialAggref lives as long as executor >> start-up, >> * where it's removed and replaced with it's underlying Aggref. >> */ >> typedef struct PartialAggref >> >> does that help explain it better? > > I still think that's solving the problem the wrong way. Why can't > exprType(), when applied to the Aggref, do something like this? > > { > Aggref *aref = (Aggref *) expr; > if (aref->aggpartial) > return aref->aggtranstype; > else > return aref->aggtype; > } > > The obvious answer is "well, because those fields don't exist in > Aggref". But shouldn't they? From here, it looks like PartialAggref > is a cheap hack around not having whacked Aggref around hard for > partial aggregation. We could do it that way if we left the aggpartial field out of the equals() check, but I think we go at length to not do that. Just look at what's done for all of the location fields. In any case if we did that then it might not actually be what we want all of the time... Perhaps in some cases we'd want equals() to return false when the aggpartial does not match, and in other cases we'd want it to return true. There's no way to control that behaviour, so to get around the setrefs.c problem I created the wrapper node type, which I happen to think is quite clean. Just see Tom's comments about Haribabu's temp fix for the problem where he put some hacks into the equals for aggref in [1]. >>> I don't see where this applies has_parallel_hazard or anything >>> comparable to the aggregate functions. I think it needs to do that. >> >> Not sure what you mean here. > > If the aggregate isn't parallel-safe, you can't do this optimization. > For example, imagine an aggregate function written in PL/pgsql that > for some reason writes data to a side table. It's > has_parallel_hazard's job to check the parallel-safety properties of > the functions used in the query. Sorry, I do know what you mean by that. I might have been wrong to assume that the parallelModeOK check did this. I will dig into how that is set exactly. >>> + /* XXX +1 ? do we expect the main process to actually do real work? >>> */ >>> + numPartialGroups = Min(numGroups, subpath->rows) * >>> + (subpath->parallel_degree + >>> 1); >>> I'd leave out the + 1, but I don't think it matters much. >> >> Actually I meant to ask you about this. I see that subpath->rows is >> divided by the Path's parallel_degree, but it seems the main process >> does some work too, so this is why I added + 1, as during my tests >> using a query which produces 10 groups, and had 4 workers, I noticed >> that Gather was getting 50 groups back, rather than 40, I assumed this >> is because the main process is helping too, but from my reading of the >> parallel query threads I believe this is because the Gather, instead >> of sitting around idle tries to do a bit of work too, if it appears >> that nothing else is happening quickly enough. I should probably go >> read nodeGather.c to learn that though. > > Yes, the main process does do some work, but less and less as the > query gets more complicated. See comments in cost_seqscan(). Thanks [1] http://www.postgresql.org/message-id/10158.1457329...@sss.pgh.pa.us -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Mon, Mar 14, 2016 at 7:56 PM, David Rowley wrote: >> More generally, why are we inventing PartialAggref >> instead of reusing Aggref? The code comments seem to contain no >> indication as to why we shouldn't need all the same details for >> PartialAggref that we do for Aggref, instead of only a small subset of >> them. Hmm... actually, it looks like PartialAggref is intended to >> wrap Aggref, but that seems like a weird design. Why not make Aggref >> itself DTRT? There's not enough explanation in the patch of what is >> going on here and why. > > A comment does explain this, but perhaps it's not good enough, so I've > rewritten it to become. > > /* > * PartialAggref > * > * When partial aggregation is required in a plan, the nodes from the partial > * aggregate node, up until the finalize aggregate node must pass the > partially > * aggregated states up the plan tree. In regards to target list construction > * in setrefs.c, this requires that exprType() returns the state's type rather > * than the final aggregate value's type, and since exprType() for Aggref is > * coded to return the aggtype, this is not correct for us. We can't fix this > * by going around modifying the Aggref to change it's return type as > setrefs.c > * requires searching for that Aggref using equals() which compares all fields > * in Aggref, and changing the aggtype would cause such a comparison to fail. > * To get around this problem we wrap the Aggref up in a PartialAggref, this > * allows exprType() to return the correct type and we can handle a > * PartialAggref in setrefs.c by just peeking inside the PartialAggref to > check > * the underlying Aggref. The PartialAggref lives as long as executor > start-up, > * where it's removed and replaced with it's underlying Aggref. > */ > typedef struct PartialAggref > > does that help explain it better? I still think that's solving the problem the wrong way. Why can't exprType(), when applied to the Aggref, do something like this? { Aggref *aref = (Aggref *) expr; if (aref->aggpartial) return aref->aggtranstype; else return aref->aggtype; } The obvious answer is "well, because those fields don't exist in Aggref". But shouldn't they? From here, it looks like PartialAggref is a cheap hack around not having whacked Aggref around hard for partial aggregation. >> I don't see where this applies has_parallel_hazard or anything >> comparable to the aggregate functions. I think it needs to do that. > > Not sure what you mean here. If the aggregate isn't parallel-safe, you can't do this optimization. For example, imagine an aggregate function written in PL/pgsql that for some reason writes data to a side table. It's has_parallel_hazard's job to check the parallel-safety properties of the functions used in the query. >> + /* XXX +1 ? do we expect the main process to actually do real work? >> */ >> + numPartialGroups = Min(numGroups, subpath->rows) * >> + (subpath->parallel_degree + >> 1); >> I'd leave out the + 1, but I don't think it matters much. > > Actually I meant to ask you about this. I see that subpath->rows is > divided by the Path's parallel_degree, but it seems the main process > does some work too, so this is why I added + 1, as during my tests > using a query which produces 10 groups, and had 4 workers, I noticed > that Gather was getting 50 groups back, rather than 40, I assumed this > is because the main process is helping too, but from my reading of the > parallel query threads I believe this is because the Gather, instead > of sitting around idle tries to do a bit of work too, if it appears > that nothing else is happening quickly enough. I should probably go > read nodeGather.c to learn that though. Yes, the main process does do some work, but less and less as the query gets more complicated. See comments in cost_seqscan(). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On 15 March 2016 at 13:48, David Rowley wrote: > An updated patch will follow soon. I've attached an updated patch which addresses some of Robert's and Tomas' concerns. I've not done anything about the exprCollation() yet, as I'm still unsure of what it should do. I just don't see why returning the Aggref's collation is correct, and we have nothing else to return. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services 0001-Allow-aggregation-to-happen-in-parallel_2016-03-16.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On 15 March 2016 at 11:24, James Sewell wrote: > On Tuesday, 15 March 2016, Robert Haas wrote: >> >> > Does the cost of the aggregate function come into this calculation at >> > all? In PostGIS land, much smaller numbers of rows can generate loads >> > that would be effective to parallelize (worker time much >> than >> > startup cost). >> >> Unfortunately, no - only the table size. This is a problem, and needs >> to be fixed. However, it's probably not going to get fixed for 9.6. >> :-( > > > Any chance of getting a GUC (say min_parallel_degree) added to allow setting > the initial value of parallel_degree, then changing the small relation check > to also pass if parallel_degree > 1? > > That way you could set min_parallel_degree on a query by query basis if you > are running aggregates which you know will take a lot of CPU. > > I suppose it wouldn't make much sense at all to set globally though, so it > could just confuse matters. I agree that it would be nice to have more influence on this decision, but let's start a new thread for that. I don't want this one getting bloated with debates on that. It's not code I'm planning on going anywhere near for this patch. I'll start a thread... -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On 15 March 2016 at 11:39, Tomas Vondra wrote: > I've looked at this patch today. The patch seems quite solid, but I do have > a few minor comments (or perhaps questions, given that this is the first > time I looked at the patch). > > 1) exprCollation contains this bit: > --- > > case T_PartialAggref: > coll = InvalidOid; /* XXX is this correct? */ > break; > > I doubt this is the right thing to do. Can we actually get to this piece of > code? I haven't tried too hard, but regression tests don't seem to trigger > this piece of code. Thanks for looking at this. Yeah, it's there because it it being called during setrefs.c in makeVarFromTargetEntry() via fix_combine_agg_expr_mutator(), so it's required when building the new target list for Aggref->args to point to the underlying Aggref's Var. As for the collation, I'm still not convinced if it's right or wrong. I know offlist you mentioned about string_agg() and sorting, but there' no code that'll sort the agg's state. The only possible thing that gets sorted there is the group by key. > Moreover, if we're handling PartialAggref in exprCollation(), maybe we > should handle it also in exprInputCollation and exprSetCollation? hmm, maybe, that I'm not sure about. I don't see where we'd call exprSetCollation() for this, but I think I need to look at exprInputCollation() > And if we really need the collation there, why not to fetch the actual > collation from the nested Aggref? Why should it be InvalidOid? It seems quite random to me to do that. If the trans type is bytea, why would it be useful to inherit the collation from the aggregate? I'm not confident I'm right with InvalidOId... I just don't think we can pretend the collation is the same as the Aggref's. > 2) partial_aggregate_walker > --- > > I think this should follow the naming convention that clearly identifies the > purpose of the walker, not what kind of nodes it is supposed to walk. So it > should be: > > aggregates_allow_partial_walker Agreed and changed. > 3) create_parallelagg_path > -- > > I do agree with the logic that under-estimates are more dangerous than > over-estimates, so the current estimate is safer. But I think this would be > a good place to apply the formula I proposed a few days ago (or rather the > one Dean Rasheed proposed in response). > > That is, we do know that there are numGroups in total, and each parallel > worker sees subpath->rows then it's expected to see > > sel = (subpath->rows / rel->tuples); > perGroup = (rel->tuples / numGroups); > workerGroups = numGroups * (1 - powl(1 - s, perGroup)); > numPartialGroups = numWorkers * workerGroups > > It's probably better to see Dean's message from 13/3. I think what I have works well when there's a small number of groups, as there's a good chance that each worker will see at least 1 tuple from each group. However I understand that will become increasingly unlikely with a larger number of groups, which is why I capped it to the total input rows, but even in cases before that cap is reached I think it will still overestimate. I'd need to analyze the code above to understand it better, but my initial reaction is that, you're probably right, but I don't think I want to inherit the fight for this. Perhaps it's better to wait until GROUP BY estimate improvement patch gets in, and change this, or if this gets in first, then you can include this change in your patch. I'm not trying to brush off the work, I just would rather it didn't delay parallel aggregate. > 4) Is clauses.h the right place for PartialAggType? I'm not sure that it is to be honest. I just put it there because the patch never persisted the value of a PartialAggType in any struct field anywhere and checks it later in some other file. In all places where we use PartialAggType we're also calling aggregates_allow_partial(), which does require clauses.h. So that's why it ended up there... I think I'll leave it there until someone gives me a good reason to move it. An updated patch will follow soon. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On 15 March 2016 at 08:53, Robert Haas wrote: > I haven't fully studied every line of this yet, but here are a few comments: > > + case T_PartialAggref: > + coll = InvalidOid; /* XXX is this correct? */ > + break; > > I doubt it. Thanks for looking at this. Yeah, I wasn't so sure of the collation thing either, so stuck a reminder on there. The way I'm seeing it at the moment is that since the partial aggregate is never displayed to the user, and we never perform equality comparison on them (since HAVING is applied in the final aggregate stage), then my line of thought was that the collation should not matter. Over on the combine aggregate states thread I'm doing work to make the standard serialize functions use bytea, and bytea don't allow collate: # create table c (b bytea collate "en_NZ"); ERROR: collations are not supported by type bytea LINE 1: create table c (b bytea collate "en_NZ"); I previously did think of reusing the Aggref's collation, but I ended up leaning towards the more "does not matter" side of the argument. Of course, I may have completely failed to think of some important reason, which is why I left that comment, so it might provoke some thought with someone else with more collation knowledge. > More generally, why are we inventing PartialAggref > instead of reusing Aggref? The code comments seem to contain no > indication as to why we shouldn't need all the same details for > PartialAggref that we do for Aggref, instead of only a small subset of > them. Hmm... actually, it looks like PartialAggref is intended to > wrap Aggref, but that seems like a weird design. Why not make Aggref > itself DTRT? There's not enough explanation in the patch of what is > going on here and why. A comment does explain this, but perhaps it's not good enough, so I've rewritten it to become. /* * PartialAggref * * When partial aggregation is required in a plan, the nodes from the partial * aggregate node, up until the finalize aggregate node must pass the partially * aggregated states up the plan tree. In regards to target list construction * in setrefs.c, this requires that exprType() returns the state's type rather * than the final aggregate value's type, and since exprType() for Aggref is * coded to return the aggtype, this is not correct for us. We can't fix this * by going around modifying the Aggref to change it's return type as setrefs.c * requires searching for that Aggref using equals() which compares all fields * in Aggref, and changing the aggtype would cause such a comparison to fail. * To get around this problem we wrap the Aggref up in a PartialAggref, this * allows exprType() to return the correct type and we can handle a * PartialAggref in setrefs.c by just peeking inside the PartialAggref to check * the underlying Aggref. The PartialAggref lives as long as executor start-up, * where it's removed and replaced with it's underlying Aggref. */ typedef struct PartialAggref does that help explain it better? > } > + if (can_parallel) > + { > > Seems like a blank line would be in order. Fixed. > I don't see where this applies has_parallel_hazard or anything > comparable to the aggregate functions. I think it needs to do that. Not sure what you mean here. > > + /* XXX +1 ? do we expect the main process to actually do real work? */ > + numPartialGroups = Min(numGroups, subpath->rows) * > + (subpath->parallel_degree + > 1); > I'd leave out the + 1, but I don't think it matters much. Actually I meant to ask you about this. I see that subpath->rows is divided by the Path's parallel_degree, but it seems the main process does some work too, so this is why I added + 1, as during my tests using a query which produces 10 groups, and had 4 workers, I noticed that Gather was getting 50 groups back, rather than 40, I assumed this is because the main process is helping too, but from my reading of the parallel query threads I believe this is because the Gather, instead of sitting around idle tries to do a bit of work too, if it appears that nothing else is happening quickly enough. I should probably go read nodeGather.c to learn that though. In the meantime I've removed the + 1, as it's not correct to do subpath->rows * (subpath->parallel_degree + 1), since it was divided by subpath->parallel_degree in the first place, we'd end up with an extra worker's worth of rows for queries which estimate a larger number of groups than partial path rows. > + aggstate->finalizeAggs == true) > > We usually just say if (a) not if (a == true) when it's a boolean. > Similarly !a rather than a == false. hmm, thanks. It appears that I've not been all that consistent in that area. I didn't know that was convention. I see that some of my way have crept into the explain.c changes already :/ I will
Re: [HACKERS] Parallel Aggregate
On Tue, Mar 15, 2016 at 9:32 AM, Robert Haas wrote: > > I kind of doubt this would work well, but somebody could write a patch > for it and try it out. OK I'll give this a go today and report back. Would the eventual plan be to use pg_proc.procost for the functions from each aggregate concerned? If so I might have a peek at that too, although I imagine I won't get far. Cheers, -- -- The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
Re: [HACKERS] Parallel Aggregate
Hi, On 03/13/2016 10:44 PM, David Rowley wrote: On 12 March 2016 at 16:31, David Rowley wrote: I've attached an updated patch which is based on commit 7087166, things are really changing fast in the grouping path area at the moment, but hopefully the dust is starting to settle now. The attached patch fixes a harmless compiler warning about a possible uninitialised variable. I've looked at this patch today. The patch seems quite solid, but I do have a few minor comments (or perhaps questions, given that this is the first time I looked at the patch). 1) exprCollation contains this bit: --- case T_PartialAggref: coll = InvalidOid; /* XXX is this correct? */ break; I doubt this is the right thing to do. Can we actually get to this piece of code? I haven't tried too hard, but regression tests don't seem to trigger this piece of code. Moreover, if we're handling PartialAggref in exprCollation(), maybe we should handle it also in exprInputCollation and exprSetCollation? And if we really need the collation there, why not to fetch the actual collation from the nested Aggref? Why should it be InvalidOid? 2) partial_aggregate_walker --- I think this should follow the naming convention that clearly identifies the purpose of the walker, not what kind of nodes it is supposed to walk. So it should be: aggregates_allow_partial_walker 3) create_parallelagg_path -- I do agree with the logic that under-estimates are more dangerous than over-estimates, so the current estimate is safer. But I think this would be a good place to apply the formula I proposed a few days ago (or rather the one Dean Rasheed proposed in response). That is, we do know that there are numGroups in total, and each parallel worker sees subpath->rows then it's expected to see sel = (subpath->rows / rel->tuples); perGroup = (rel->tuples / numGroups); workerGroups = numGroups * (1 - powl(1 - s, perGroup)); numPartialGroups = numWorkers * workerGroups It's probably better to see Dean's message from 13/3. 4) Is clauses.h the right place for PartialAggType? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Mon, Mar 14, 2016 at 6:24 PM, James Sewell wrote: > Any chance of getting a GUC (say min_parallel_degree) added to allow setting > the initial value of parallel_degree, then changing the small relation check > to also pass if parallel_degree > 1? > > That way you could set min_parallel_degree on a query by query basis if you > are running aggregates which you know will take a lot of CPU. > > I suppose it wouldn't make much sense at all to set globally though, so it > could just confuse matters. I kind of doubt this would work well, but somebody could write a patch for it and try it out. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Tuesday, 15 March 2016, Robert Haas wrote: > > > Does the cost of the aggregate function come into this calculation at > > all? In PostGIS land, much smaller numbers of rows can generate loads > > that would be effective to parallelize (worker time much >> than > > startup cost). > > Unfortunately, no - only the table size. This is a problem, and needs > to be fixed. However, it's probably not going to get fixed for 9.6. > :-( > Any chance of getting a GUC (say min_parallel_degree) added to allow setting the initial value of parallel_degree, then changing the small relation check to also pass if parallel_degree > 1? That way you could set min_parallel_degree on a query by query basis if you are running aggregates which you know will take a lot of CPU. I suppose it wouldn't make much sense at all to set globally though, so it could just confuse matters. Cheers, -- -- The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
Re: [HACKERS] Parallel Aggregate
On Mon, Mar 14, 2016 at 3:56 PM, Paul Ramsey wrote: > On Sun, Mar 13, 2016 at 7:31 PM, David Rowley > wrote: >> On 14 March 2016 at 14:52, James Sewell wrote: >>> One question - how is the upper limit of workers chosen? >> >> See create_parallel_paths() in allpaths.c. Basically the bigger the >> relation (in pages) the more workers will be allocated, up until >> max_parallel_degree. > > Does the cost of the aggregate function come into this calculation at > all? In PostGIS land, much smaller numbers of rows can generate loads > that would be effective to parallelize (worker time much >> than > startup cost). Unfortunately, no - only the table size. This is a problem, and needs to be fixed. However, it's probably not going to get fixed for 9.6. :-( -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Sun, Mar 13, 2016 at 7:31 PM, David Rowley wrote: > On 14 March 2016 at 14:52, James Sewell wrote: >> One question - how is the upper limit of workers chosen? > > See create_parallel_paths() in allpaths.c. Basically the bigger the > relation (in pages) the more workers will be allocated, up until > max_parallel_degree. Does the cost of the aggregate function come into this calculation at all? In PostGIS land, much smaller numbers of rows can generate loads that would be effective to parallelize (worker time much >> than startup cost). P -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Sun, Mar 13, 2016 at 5:44 PM, David Rowley wrote: > On 12 March 2016 at 16:31, David Rowley wrote: >> I've attached an updated patch which is based on commit 7087166, >> things are really changing fast in the grouping path area at the >> moment, but hopefully the dust is starting to settle now. > > The attached patch fixes a harmless compiler warning about a possible > uninitialised variable. I haven't fully studied every line of this yet, but here are a few comments: + case T_PartialAggref: + coll = InvalidOid; /* XXX is this correct? */ + break; I doubt it. More generally, why are we inventing PartialAggref instead of reusing Aggref? The code comments seem to contain no indication as to why we shouldn't need all the same details for PartialAggref that we do for Aggref, instead of only a small subset of them. Hmm... actually, it looks like PartialAggref is intended to wrap Aggref, but that seems like a weird design. Why not make Aggref itself DTRT? There's not enough explanation in the patch of what is going on here and why. } + if (can_parallel) + { Seems like a blank line would be in order. I don't see where this applies has_parallel_hazard or anything comparable to the aggregate functions. I think it needs to do that. + /* XXX +1 ? do we expect the main process to actually do real work? */ + numPartialGroups = Min(numGroups, subpath->rows) * + (subpath->parallel_degree + 1); I'd leave out the + 1, but I don't think it matters much. + aggstate->finalizeAggs == true) We usually just say if (a) not if (a == true) when it's a boolean. Similarly !a rather than a == false. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On 14 March 2016 at 17:05, James Sewell wrote: > > Hi again, > > I've been playing around with inheritance combined with this patch. Currently > it looks like you are taking max(parallel_degree) from all the child tables > and using that for the number of workers. > > For large machines it makes much more sense to use sum(parallel_degree) - but > I've just seen this comment in the code: > > /* > * Decide what parallel degree to request for this append path. For > * now, we just use the maximum parallel degree of any member. It > * might be useful to use a higher number if the Append node were > * smart enough to spread out the workers, but it currently isn't. > */ > > Does this mean that even though we are aggregating in parallel, we are only > operating on one child table at a time currently? There is nothing in the planner yet, or any patch that I know of to push the Partial Aggregate node to below an Append node. That will most likely come in 9.7. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Mon, Mar 14, 2016 at 3:05 PM, David Rowley wrote: > > Things to try: > 1. alter table a add column ts_date date; update a set ts_date = > date_trunc('DAY',ts); vacuum full analyze ts; > 2. or, create index on a (date_trunc('DAY',ts)); analyze a; > 3. or for testing, set the work_mem higher. > > Ah, that makes sense. Tried with a BTREE index, and it works as perfectly but the index is 428MB - which is a bit rough. Removed that and put on a BRIN index, same result for 48kB - perfect! Thanks for the help, James -- -- The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
Re: [HACKERS] Parallel Aggregate
Hi again, I've been playing around with inheritance combined with this patch. Currently it looks like you are taking max(parallel_degree) from all the child tables and using that for the number of workers. For large machines it makes much more sense to use sum(parallel_degree) - but I've just seen this comment in the code: /* * Decide what parallel degree to request for this append path. For * now, we just use the maximum parallel degree of any member. It * might be useful to use a higher number if the Append node were * smart enough to spread out the workers, but it currently isn't. */ Does this mean that even though we are aggregating in parallel, we are only operating on one child table at a time currently? Cheers, James Sewell, Solutions Architect __ Level 2, 50 Queen St, Melbourne VIC 3000 *P *(+61) 3 8370 8000 *W* www.lisasoft.com *F *(+61) 3 8370 8099 On Mon, Mar 14, 2016 at 2:39 PM, James Sewell wrote: > Cool, > > I've been testing how this works with partitioning (which seems to be > strange, but I'll post separately about that) and something odd seems to be > going on now with the parallel triggering: > > postgres=# create table a as select * from base_p2015_11; > SELECT 2000 > > postgres=# select * from a limit 1; > ts | count | a | b | c | d | e > +---+-+--+--+--+--- > 2015-11-26 21:10:04.856828 | 860 | 946 | 1032 | 1118 | 1204 | > (1 row) > > postgres-# \d a > Table "datamart_owner.a" > Column |Type | Modifiers > +-+--- > ts | timestamp without time zone | > count | integer | > a | integer | > b | integer | > c | integer | > d | integer | > e | integer | > > postgres=# select pg_size_pretty(pg_relation_size('a')); > pg_size_pretty > > 1149 MB > > postgres=# explain select sum(count) from a group by date_trunc('DAY',ts); > QUERY PLAN > > -- > Finalize GroupAggregate (cost=218242.96..218254.46 rows=200 width=16) >Group Key: (date_trunc('DAY'::text, ts)) >-> Sort (cost=218242.96..218245.96 rows=1200 width=16) > Sort Key: (date_trunc('DAY'::text, ts)) > -> Gather (cost=218059.08..218181.58 rows=1200 width=16) >Number of Workers: 5 >-> Partial HashAggregate (cost=217059.08..217061.58 > rows=200 width=16) > Group Key: date_trunc('DAY'::text, ts) > -> Parallel Seq Scan on a (cost=0.00..197059.06 > rows=405 width=12) > (9 rows) > > postgres=# analyze a; > > postgres=# explain select sum(count) from a group by date_trunc('DAY',ts); > QUERY PLAN > -- > GroupAggregate (cost=3164211.55..3564212.03 rows=2024 width=16) >Group Key: (date_trunc('DAY'::text, ts)) >-> Sort (cost=3164211.55..3214211.61 rows=2024 width=12) > Sort Key: (date_trunc('DAY'::text, ts)) > -> Seq Scan on a (cost=0.00..397059.30 rows=2024 width=12) > (5 rows) > > Unsure what's happening here. > > > > James Sewell, > PostgreSQL Team Lead / Solutions Architect > __ > > > Level 2, 50 Queen St, Melbourne VIC 3000 > > *P *(+61) 3 8370 8000 *W* www.lisasoft.com *F *(+61) 3 8370 8099 > > > On Mon, Mar 14, 2016 at 1:31 PM, David Rowley < > david.row...@2ndquadrant.com> wrote: > >> On 14 March 2016 at 14:52, James Sewell >> wrote: >> > One question - how is the upper limit of workers chosen? >> >> See create_parallel_paths() in allpaths.c. Basically the bigger the >> relation (in pages) the more workers will be allocated, up until >> max_parallel_degree. >> >> There is also a comment in that function which states: >> /* >> * Limit the degree of parallelism logarithmically based on the size of the >> * relation. This probably needs to be a good deal more sophisticated, >> but we >> * need something here for now. >> */ >> >> So this will likely see some revision at some point, after 9.6. >> >> -- >> David Rowley http://www.2ndQuadrant.com/ >> PostgreSQL Development, 24x7 Support, Training & Services >> > > -- -- The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone.
Re: [HACKERS] Parallel Aggregate
On 14 March 2016 at 16:39, James Sewell wrote: > > I've been testing how this works with partitioning (which seems to be > strange, but I'll post separately about that) and something odd seems to be > going on now with the parallel triggering: > > postgres=# create table a as select * from base_p2015_11; > SELECT 2000 > > postgres=# explain select sum(count) from a group by date_trunc('DAY',ts); > QUERY PLAN > -- > Finalize GroupAggregate (cost=218242.96..218254.46 rows=200 width=16) >Group Key: (date_trunc('DAY'::text, ts)) >-> Sort (cost=218242.96..218245.96 rows=1200 width=16) > Sort Key: (date_trunc('DAY'::text, ts)) > -> Gather (cost=218059.08..218181.58 rows=1200 width=16) >Number of Workers: 5 >-> Partial HashAggregate (cost=217059.08..217061.58 rows=200 > width=16) > Group Key: date_trunc('DAY'::text, ts) > -> Parallel Seq Scan on a (cost=0.00..197059.06 > rows=405 width=12) > (9 rows) > > postgres=# analyze a; > > postgres=# explain select sum(count) from a group by date_trunc('DAY',ts); > QUERY PLAN > -- > GroupAggregate (cost=3164211.55..3564212.03 rows=2024 width=16) >Group Key: (date_trunc('DAY'::text, ts)) >-> Sort (cost=3164211.55..3214211.61 rows=2024 width=12) > Sort Key: (date_trunc('DAY'::text, ts)) > -> Seq Scan on a (cost=0.00..397059.30 rows=2024 width=12) > (5 rows) > > Unsure what's happening here. This just comes down to the fact that PostgreSQL is quite poor at estimating the number of groups that will be produced by the expression date_trunc('DAY',ts). Due to lack of stats when you run the query before ANALYZE, PostgreSQL just uses a hardcoded guess of 200, which it thinks will fit quite nicely in the HashAggregate node's hash table. After you run ANALYZE this estimate goes up to 2024, and the grouping planner thinks that's a little to much be storing in a hash table, based on the size of your your work_mem setting, so it uses a Sort plan instead. Things to try: 1. alter table a add column ts_date date; update a set ts_date = date_trunc('DAY',ts); vacuum full analyze ts; 2. or, create index on a (date_trunc('DAY',ts)); analyze a; 3. or for testing, set the work_mem higher. So, basically, it's no fault of this patch. It's just there's no real good way for the planner to go estimating something like date_trunc('DAY',ts) without either adding a column which explicitly stores that value (1), or collecting stats on the expression (2), or teaching the planner about the internals of that function, which is likely just never going to happen. (3) is just going to make the outlook of a hash plan look a little brighter, although you'll likely need a work_mem of over 1GB to make the plan change. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
Cool, I've been testing how this works with partitioning (which seems to be strange, but I'll post separately about that) and something odd seems to be going on now with the parallel triggering: postgres=# create table a as select * from base_p2015_11; SELECT 2000 postgres=# select * from a limit 1; ts | count | a | b | c | d | e +---+-+--+--+--+--- 2015-11-26 21:10:04.856828 | 860 | 946 | 1032 | 1118 | 1204 | (1 row) postgres-# \d a Table "datamart_owner.a" Column |Type | Modifiers +-+--- ts | timestamp without time zone | count | integer | a | integer | b | integer | c | integer | d | integer | e | integer | postgres=# select pg_size_pretty(pg_relation_size('a')); pg_size_pretty 1149 MB postgres=# explain select sum(count) from a group by date_trunc('DAY',ts); QUERY PLAN -- Finalize GroupAggregate (cost=218242.96..218254.46 rows=200 width=16) Group Key: (date_trunc('DAY'::text, ts)) -> Sort (cost=218242.96..218245.96 rows=1200 width=16) Sort Key: (date_trunc('DAY'::text, ts)) -> Gather (cost=218059.08..218181.58 rows=1200 width=16) Number of Workers: 5 -> Partial HashAggregate (cost=217059.08..217061.58 rows=200 width=16) Group Key: date_trunc('DAY'::text, ts) -> Parallel Seq Scan on a (cost=0.00..197059.06 rows=405 width=12) (9 rows) postgres=# analyze a; postgres=# explain select sum(count) from a group by date_trunc('DAY',ts); QUERY PLAN -- GroupAggregate (cost=3164211.55..3564212.03 rows=2024 width=16) Group Key: (date_trunc('DAY'::text, ts)) -> Sort (cost=3164211.55..3214211.61 rows=2024 width=12) Sort Key: (date_trunc('DAY'::text, ts)) -> Seq Scan on a (cost=0.00..397059.30 rows=2024 width=12) (5 rows) Unsure what's happening here. James Sewell, PostgreSQL Team Lead / Solutions Architect __ Level 2, 50 Queen St, Melbourne VIC 3000 *P *(+61) 3 8370 8000 *W* www.lisasoft.com *F *(+61) 3 8370 8099 On Mon, Mar 14, 2016 at 1:31 PM, David Rowley wrote: > On 14 March 2016 at 14:52, James Sewell wrote: > > One question - how is the upper limit of workers chosen? > > See create_parallel_paths() in allpaths.c. Basically the bigger the > relation (in pages) the more workers will be allocated, up until > max_parallel_degree. > > There is also a comment in that function which states: > /* > * Limit the degree of parallelism logarithmically based on the size of the > * relation. This probably needs to be a good deal more sophisticated, but > we > * need something here for now. > */ > > So this will likely see some revision at some point, after 9.6. > > -- > David Rowley http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > -- -- The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
Re: [HACKERS] Parallel Aggregate
On 14 March 2016 at 14:52, James Sewell wrote: > One question - how is the upper limit of workers chosen? See create_parallel_paths() in allpaths.c. Basically the bigger the relation (in pages) the more workers will be allocated, up until max_parallel_degree. There is also a comment in that function which states: /* * Limit the degree of parallelism logarithmically based on the size of the * relation. This probably needs to be a good deal more sophisticated, but we * need something here for now. */ So this will likely see some revision at some point, after 9.6. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
Hi, Happy to test, really looking forward to seeing this stuff in core. The explain analyze is below: Finalize HashAggregate (cost=810142.42..810882.62 rows=59216 width=16) (actual time=2282.092..2282.202 rows=15 loops=1) Group Key: (date_trunc('DAY'::text, pageview_start_tstamp)) -> Gather (cost=765878.46..808069.86 rows=414512 width=16) (actual time=2281.749..2282.060 rows=105 loops=1) Number of Workers: 6 -> Partial HashAggregate (cost=764878.46..765618.66 rows=59216 width=16) (actual time=2276.879..2277.030 rows=15 loops=7) Group Key: date_trunc('DAY'::text, pageview_start_tstamp) -> Parallel Seq Scan on celebrus_fact_agg_1_p2015_12 (cost=0.00..743769.76 rows=4221741 width=12) (actual time=0.066..1631 .650 rows=3618887 loops=7) One question - how is the upper limit of workers chosen? James Sewell, Solutions Architect __ Level 2, 50 Queen St, Melbourne VIC 3000 *P *(+61) 3 8370 8000 *W* www.lisasoft.com *F *(+61) 3 8370 8099 On Mon, Mar 14, 2016 at 12:30 PM, David Rowley wrote: > On 14 March 2016 at 14:16, James Sewell wrote: > >> I've done some testing with one of my data sets in an 8VPU virtual >> environment and this is looking really, really good. >> >> My test query is: >> >> SELECT pageview, sum(pageview_count) >> FROM fact_agg_2015_12 >> GROUP BY date_trunc('DAY'::text, pageview); >> >> The query returns 15 rows. The fact_agg table is 5398MB and holds around >> 25 million records. >> >> Explain with a max_parallel_degree of 8 tells me that the query will >> only use 6 background workers. I have no indexes on the table currently. >> >> Finalize HashAggregate (cost=810142.42..810882.62 rows=59216 width=16) >>Group Key: (date_trunc('DAY'::text, pageview)) >>-> Gather (cost=765878.46..808069.86 rows=414512 width=16) >> Number of Workers: 6 >> -> Partial HashAggregate (cost=764878.46..765618.66 rows=59216 >> width=16) >>Group Key: date_trunc('DAY'::text, pageview) >>-> Parallel Seq Scan on fact_agg_2015_12 >> (cost=0.00..743769.76 rows=4221741 width=12) >> > > Great! Thanks for testing this. > > If you run EXPLAIN ANALYZE on this with the 6 workers, does the actual > number of Gather rows come out at 105? I'd just like to get an idea of my > cost estimate for the Gather are going to be accurate for real world data > sets. > > > -- > David Rowley http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > -- -- The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
Re: [HACKERS] Parallel Aggregate
On 14 March 2016 at 14:16, James Sewell wrote: > I've done some testing with one of my data sets in an 8VPU virtual > environment and this is looking really, really good. > > My test query is: > > SELECT pageview, sum(pageview_count) > FROM fact_agg_2015_12 > GROUP BY date_trunc('DAY'::text, pageview); > > The query returns 15 rows. The fact_agg table is 5398MB and holds around > 25 million records. > > Explain with a max_parallel_degree of 8 tells me that the query will only > use 6 background workers. I have no indexes on the table currently. > > Finalize HashAggregate (cost=810142.42..810882.62 rows=59216 width=16) >Group Key: (date_trunc('DAY'::text, pageview)) >-> Gather (cost=765878.46..808069.86 rows=414512 width=16) > Number of Workers: 6 > -> Partial HashAggregate (cost=764878.46..765618.66 rows=59216 > width=16) >Group Key: date_trunc('DAY'::text, pageview) >-> Parallel Seq Scan on fact_agg_2015_12 > (cost=0.00..743769.76 rows=4221741 width=12) > Great! Thanks for testing this. If you run EXPLAIN ANALYZE on this with the 6 workers, does the actual number of Gather rows come out at 105? I'd just like to get an idea of my cost estimate for the Gather are going to be accurate for real world data sets. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Parallel Aggregate
Hi, I've done some testing with one of my data sets in an 8VPU virtual environment and this is looking really, really good. My test query is: SELECT pageview, sum(pageview_count) FROM fact_agg_2015_12 GROUP BY date_trunc('DAY'::text, pageview); The query returns 15 rows. The fact_agg table is 5398MB and holds around 25 million records. Explain with a max_parallel_degree of 8 tells me that the query will only use 6 background workers. I have no indexes on the table currently. Finalize HashAggregate (cost=810142.42..810882.62 rows=59216 width=16) Group Key: (date_trunc('DAY'::text, pageview)) -> Gather (cost=765878.46..808069.86 rows=414512 width=16) Number of Workers: 6 -> Partial HashAggregate (cost=764878.46..765618.66 rows=59216 width=16) Group Key: date_trunc('DAY'::text, pageview) -> Parallel Seq Scan on fact_agg_2015_12 (cost=0.00..743769.76 rows=4221741 width=12) I am getting the following timings (everything was cached before I started tested). I didn't average the runtime, but I ran each one three times and took the middle value. *max_parallel_degree runtime* 0 11693.537 ms 1 6387.937 ms 2 4328.629 ms 3 3292.376 ms 4 2743.148 ms 5 2278.449 ms 6 2000.599 ms I'm pretty happy! Cheers, James Sewell, PostgreSQL Team Lead / Solutions Architect __ Level 2, 50 Queen St, Melbourne VIC 3000 *P *(+61) 3 8370 8000 *W* www.lisasoft.com *F *(+61) 3 8370 8099 On Mon, Mar 14, 2016 at 8:44 AM, David Rowley wrote: > On 12 March 2016 at 16:31, David Rowley > wrote: > > I've attached an updated patch which is based on commit 7087166, > > things are really changing fast in the grouping path area at the > > moment, but hopefully the dust is starting to settle now. > > The attached patch fixes a harmless compiler warning about a possible > uninitialised variable. > > -- > David Rowley http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > > -- -- The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
Re: [HACKERS] Parallel Aggregate
On Mon, Mar 14, 2016 at 8:44 AM, David Rowley wrote: > On 12 March 2016 at 16:31, David Rowley wrote: >> I've attached an updated patch which is based on commit 7087166, >> things are really changing fast in the grouping path area at the >> moment, but hopefully the dust is starting to settle now. > > The attached patch fixes a harmless compiler warning about a possible > uninitialised variable. The setrefs.c fix for updating the finalize-aggregate target list is nice. I tested all the float aggregates and are working fine. Overall the patch is fine. I will do some test and provide the update later. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On 12 March 2016 at 16:31, David Rowley wrote: > I've attached an updated patch which is based on commit 7087166, > things are really changing fast in the grouping path area at the > moment, but hopefully the dust is starting to settle now. The attached patch fixes a harmless compiler warning about a possible uninitialised variable. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services parallel_aggregation_015971a_2016-03-14.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On 11 March 2016 at 03:39, David Rowley wrote: > A couple of things which I'm not 100% happy with. > > 1. make_partialgroup_input_target() doing lookups to the syscache. > Perhaps this job can be offloaded to a new function in a more suitable > location. Ideally the Aggref would already store the required > information, but I don't see a great place to be looking that up. I've made some changes and moved this work off to a new function in tlist.c. > 2. I don't really like how I had to add tlist to > create_grouping_paths(), but I didn't want to go to the trouble of > calculating the partial agg PathTarget if Parallel Aggregation is not > possible, as this does do syscache lookups, so it's not free, so I'd > rather only do it if we're actually going to add some parallel paths. This is now fixed. The solution was much easier after 49635d7b. > 3. Something about the force_parallel_mode GUC. I'll think about this > when I start to think about how to test this, as likely I'll need > that, else I'd have to create tables bigger than we'd want to in the > regression tests. On further analysis it seems that this GUC does not do what I thought it did, which will be why Robert said that I don't need to think about it here. The GUC just seems to add a Gather node at the base of the plan tree, when possible. Which leaves me a bit lost when it comes to how to write tests for this... It seems like I need to add at least 136k rows to a test table to get a Parallel Aggregate plan, which I think is a no-go for the regression test suite... that's with parallel_setup_cost=0; It would be nice if that GUC just, when enabled, preferred the cheapest parallel plan (when available), rather than hacking in a Gather node into the plan's root. This should have the same result in many cases anyway, and would allow me to test this without generating oversized tables in the regression tests. I've attached an updated patch which is based on commit 7087166, things are really changing fast in the grouping path area at the moment, but hopefully the dust is starting to settle now. This patch also fixes a couple of bugs, one in the cost estimates for the number of groups that will be produced by the final aggregate, also there was a missing copyObject() in the setrefs.c code which caused a Var not found in targetlist problem in setrefs.c for plans with more than 1 partial aggregate node... I had to modify the planner to get it to add an additional aggregate node to test this (separate test patch for this is attached). Comments/Reviews/Testing all welcome. Thanks -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services parallel_aggregation_8a26089_2016-03-12.patch Description: Binary data parallel_aggregation_extra_combine_node.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On 8 March 2016 at 11:15, David Rowley wrote: > The setrefs.c parts of the patch remain completely broken. I've not > had time to look at this again yet, sorry. Ok, so again, apologies for previously sending such a broken patch. I've since managed to free up a bit of time to work on this, which now consists of a good bit more than the sum total of my weekly lunch hours. The attached patch is a much more complete patch, and quite possible now review worthy. I set about solving the setrefs.c problem by inventing a PartialAggref node type which wraps up Aggrefs when they're in a agg node which has finalizeAggs = false. This node type exists all the way until executor init time, where it's then removed and replaced with the underlying Aggref. This seems to solve the targetlist return type issue. I'd really like some feedback on this method of solving that problem. I've also fixed numerous other bugs, including the HAVING clause problem. Things left to do: 1. Make a final pass of the patch not at 3am. 2. Write some tests. 3. I've probably missed a few places that should handle T_PartialAggref. A couple of things which I'm not 100% happy with. 1. make_partialgroup_input_target() doing lookups to the syscache. Perhaps this job can be offloaded to a new function in a more suitable location. Ideally the Aggref would already store the required information, but I don't see a great place to be looking that up. 2. I don't really like how I had to add tlist to create_grouping_paths(), but I didn't want to go to the trouble of calculating the partial agg PathTarget if Parallel Aggregation is not possible, as this does do syscache lookups, so it's not free, so I'd rather only do it if we're actually going to add some parallel paths. 3. Something about the force_parallel_mode GUC. I'll think about this when I start to think about how to test this, as likely I'll need that, else I'd have to create tables bigger than we'd want to in the regression tests. I've also attached an .sql file with an aggregate function aimed to test the new PartialAggref stuff works properly, as previously it seemed to work by accident with sum(int). Just some numbers to maybe make this more interesting: create table t (num int not null, one int not null, ten int not null, hundred int not null, thousand int not null, tenk int not null, hundredk int not null, million int not null); insert into t select x,1,x%10+1,x%100+1,x%1000+1,x%1+1,x%10+1,x%100 from generate_series(1,1000)x(x); -- Serial Plan # explain select sum(num) from t; QUERY PLAN --- Aggregate (cost=198530.00..198530.01 rows=1 width=8) -> Seq Scan on t (cost=0.00..173530.00 rows=1000 width=4) # select sum(num) from t; sum 500500 (1 row) Time: 1036.119 ms # set max_parallel_degree=4; -- Parallel Plan # explain select sum(num) from t; QUERY PLAN -- Finalize Aggregate (cost=105780.52..105780.53 rows=1 width=8) -> Gather (cost=105780.00..105780.51 rows=5 width=8) Number of Workers: 4 -> Partial Aggregate (cost=104780.00..104780.01 rows=1 width=8) -> Parallel Seq Scan on t (cost=0.00..98530.00 rows=250 width=4) (5 rows) # select sum(num) from t; sum 500500 (1 row) Time: 379.117 ms I'll try and throw a bit parallel aggregate work to a 4 socket / 64 core server which I have access to... just for fun. Reviews are now welcome. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services parallel_aggregation_cc3763e_2016-03-11.patch Description: Binary data parallel_aggregation_test.sql Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Thu, Mar 10, 2016 at 10:55 PM, Robert Haas wrote: > On Thu, Mar 10, 2016 at 6:42 AM, David Rowley > wrote: >> The one reason that I asked about force_parallel_mode was that I >> assumed there was some buildfarm member running somewhere that >> switches this on and runs the regression tests. I figured that if it >> exists for other parallel features, then it probably should for this >> too. Can you explain why you think this should be handled differently? > > Yeah, I think Noah set up such a buildfarm member, but I can't > remember the name of it off-hand. mandrill [1]? Thanks, Amit [1] http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=mandrill&br=HEAD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Thu, Mar 10, 2016 at 6:42 AM, David Rowley wrote: > Hmm, it appears I only looked as far as the enum declaration, which I > expected to have something. Perhaps I'm just not used to looking up > the manual for things relating to code. I don't mind adding some comments there, it just didn't seem all that important to me. Feel free to propose something. > The one reason that I asked about force_parallel_mode was that I > assumed there was some buildfarm member running somewhere that > switches this on and runs the regression tests. I figured that if it > exists for other parallel features, then it probably should for this > too. Can you explain why you think this should be handled differently? Yeah, I think Noah set up such a buildfarm member, but I can't remember the name of it off-hand. I think running the tests with this patch and force_parallel_mode=regress, max_parallel_degree>0 is a good idea, but I don't expect it to turn up too much. That configuration is mostly designed to test whether the basic parallelism infrastructure works or breaks things. It's not intended to test whether your parallel query plans are any good - you have to write your own tests for that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On 9 March 2016 at 04:06, Robert Haas wrote: > On Mon, Mar 7, 2016 at 5:15 PM, David Rowley > wrote: >> 3. Nothing in create_grouping_paths() looks at the force_parallel_mode >> GUC. I had a quick look at this GUC and was a bit surprised to see 3 >> possible states, but no explanation of what they do, so I've not added >> code which pays attention to this setting yet. I'd imagine this is >> just a matter of skipping serial path generation when parallel is >> possible when force_parallel_mode is FORCE_PARALLEL_ON. I've no idea >> what FORCE_PARALLEL_REGRESS is for yet. > > The GUC is documented just like all the other GUCs are documented. > Maybe that's not enough, but I don't think "no explanation of what > they do" is accurate. But I don't see why this patch should need to > care about force_parallel_mode at all. force_parallel_mode is about > making queries that wouldn't choose to run in parallel do on their own > do so anyway, whereas this patch is about making more queries able to > do more work in parallel. Hmm, it appears I only looked as far as the enum declaration, which I expected to have something. Perhaps I'm just not used to looking up the manual for things relating to code. The one reason that I asked about force_parallel_mode was that I assumed there was some buildfarm member running somewhere that switches this on and runs the regression tests. I figured that if it exists for other parallel features, then it probably should for this too. Can you explain why you think this should be handled differently? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Mon, Mar 7, 2016 at 4:39 PM, Tom Lane wrote: > Haribabu Kommi writes: >> 2. Temporary fix for float aggregate types in _equalAggref because of >> a change in aggtype to trans type, otherwise the parallel aggregation >> plan failure in set_plan_references. whenever the aggtype is not matched, >> it verifies with the trans type also. > > That is a completely unacceptable kluge. Quite aside from being ugly as > sin, it probably breaks more things than it fixes, first because it breaks > the fundamental semantics of equal() across the board, and second because > it puts catalog lookups into equal(), which *will* cause problems. You > can not expect that this will get committed, not even as a "temporary fix". I am not able to find a better solution to handle this problem, i will provide the details of the problem and why I did the change, so if you can provide some point where to look into, that would be helpful. In parallel aggregate, as the aggregate operation is divided into two steps such as finalize and partial aggregate. The partial aggregate is executed in the worker and returns the results of transition data which is of type aggtranstype. This can work fine even if we don't change the targetlist aggref return type from aggtype to aggtranstype for aggregates whose aggtype is a variable length data type. The output slot that is generated with variable length type, so even if we send the aggtrantype data that is also a variable length, this can work. But when it comes to the float aggregates, the aggtype is a fixed length and aggtranstype is a variable length data type. so if we try to change the aggtype of a aggref in set_plan_references function with aggtrantype only the partial aggregate targetlist is getting changed, because the set_plan_references works from top of the plan. To avoid this problem, I changed the target list type during the partial aggregate path generation itself and that leads to failure in _equalAggref function in set_plan_references. Because of which I put the temporary fix. Do you have any point in handling this problem? Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Tue, Mar 8, 2016 at 4:26 PM, David Rowley wrote: >> The first one in the list will be the cheapest; why not just look at >> that? Sorted partial paths are interesting if some subsequent path >> construction step can make use of that sort ordering, but they're >> never interesting from the point of view of matching the query's >> pathkeys because of the fact that Gather is order-destroying. > > In this case a sorted partial path is useful as the partial agg node > sits below Gather. The sorted input is very interesting for the > partial agg node with a strategy of AGG_SORTED. In most cases with > parallel aggregate it's the partial stage that will take the most > time, so if we do get pre-sorted partial paths, this will be very good > indeed for parallel agg. OK. So then you probably want to consider the cheapest one, which will be first. And then, if there's one that has the pathkeys you want, also consider that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On 9 March 2016 at 04:06, Robert Haas wrote: > On Mon, Mar 7, 2016 at 5:15 PM, David Rowley > wrote: >> My concerns are: >> 1. Since there's no cheapest_partial_path in RelOptInfo the code is >> currently considering every partial_path for parallel hash aggregate. >> With normal aggregation we only ever use the cheapest path, so this >> may not be future proof. As of today we do only have at most one >> partial path in the list, but there's no reason to code this with that >> assumption. I didn't put in much effort to improve this as I see code >> in generate_gather_paths() which also makes assumptions about there >> just being 1 partial path. Perhaps we should expand RelOptInfo to >> track the cheapest partial path? or maybe allpaths.c should have a >> function to fetch the cheapest out of the list? > > The first one in the list will be the cheapest; why not just look at > that? Sorted partial paths are interesting if some subsequent path > construction step can make use of that sort ordering, but they're > never interesting from the point of view of matching the query's > pathkeys because of the fact that Gather is order-destroying. In this case a sorted partial path is useful as the partial agg node sits below Gather. The sorted input is very interesting for the partial agg node with a strategy of AGG_SORTED. In most cases with parallel aggregate it's the partial stage that will take the most time, so if we do get pre-sorted partial paths, this will be very good indeed for parallel agg. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Mon, Mar 7, 2016 at 5:15 PM, David Rowley wrote: > My concerns are: > 1. Since there's no cheapest_partial_path in RelOptInfo the code is > currently considering every partial_path for parallel hash aggregate. > With normal aggregation we only ever use the cheapest path, so this > may not be future proof. As of today we do only have at most one > partial path in the list, but there's no reason to code this with that > assumption. I didn't put in much effort to improve this as I see code > in generate_gather_paths() which also makes assumptions about there > just being 1 partial path. Perhaps we should expand RelOptInfo to > track the cheapest partial path? or maybe allpaths.c should have a > function to fetch the cheapest out of the list? The first one in the list will be the cheapest; why not just look at that? Sorted partial paths are interesting if some subsequent path construction step can make use of that sort ordering, but they're never interesting from the point of view of matching the query's pathkeys because of the fact that Gather is order-destroying. > 3. Nothing in create_grouping_paths() looks at the force_parallel_mode > GUC. I had a quick look at this GUC and was a bit surprised to see 3 > possible states, but no explanation of what they do, so I've not added > code which pays attention to this setting yet. I'd imagine this is > just a matter of skipping serial path generation when parallel is > possible when force_parallel_mode is FORCE_PARALLEL_ON. I've no idea > what FORCE_PARALLEL_REGRESS is for yet. The GUC is documented just like all the other GUCs are documented. Maybe that's not enough, but I don't think "no explanation of what they do" is accurate. But I don't see why this patch should need to care about force_parallel_mode at all. force_parallel_mode is about making queries that wouldn't choose to run in parallel do on their own do so anyway, whereas this patch is about making more queries able to do more work in parallel. > The setrefs.c parts of the patch remain completely broken. I've not > had time to look at this again yet, sorry. I hope you get time soon. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On 5 March 2016 at 07:25, Robert Haas wrote: > On Thu, Mar 3, 2016 at 11:00 PM, David Rowley >> 3. The code never attempts to mix and match Grouping Agg and Hash Agg >> plans. e.g it could be an idea to perform Partial Hash Aggregate -> >> Gather -> Sort -> Finalize Group Aggregate, or hash as in the Finalize >> stage. I just thought doing this is more complex than what's really >> needed, but if someone can think of a case where this would be a great >> win then I'll listen, but you have to remember we don't have any >> pre-sorted partial paths at this stage, so an explicit sort is >> required *always*. This might change if someone invented partial btree >> index scans... but until then... > > Actually, Rahila Syed is working on that. But it's not done yet, so > presumably will not go into 9.6. > > I don't really see the logic of this, though. Currently, Gather > destroys the input ordering, so it seems preferable for the > finalize-aggregates stage to use a hash aggregate whenever possible, > whatever the partial-aggregate stage did. Otherwise, we need an > explicit sort. Anyway, it seems like the two stages should be costed > and decided on their own merits - there's no reason to chain the two > decisions together. Thanks for looking at this. I've attached an updated patch which re-bases the whole patch on top of the upper planner changes which have just been committed. In this version create_grouping_paths() does now consider mixed strategies of hashed and sorted, although I have a few concerns with the code that I've written. I'm solely posting this early to minimise any duplicate work. My concerns are: 1. Since there's no cheapest_partial_path in RelOptInfo the code is currently considering every partial_path for parallel hash aggregate. With normal aggregation we only ever use the cheapest path, so this may not be future proof. As of today we do only have at most one partial path in the list, but there's no reason to code this with that assumption. I didn't put in much effort to improve this as I see code in generate_gather_paths() which also makes assumptions about there just being 1 partial path. Perhaps we should expand RelOptInfo to track the cheapest partial path? or maybe allpaths.c should have a function to fetch the cheapest out of the list? 2. In mixed parallel aggregate mode, when the query has no aggregate functions, the code currently will use a nodeAgg for AGG_SORTED strategy rather than a nodeGroup, as it would in serial agg mode. This probably needs to be changed. 3. Nothing in create_grouping_paths() looks at the force_parallel_mode GUC. I had a quick look at this GUC and was a bit surprised to see 3 possible states, but no explanation of what they do, so I've not added code which pays attention to this setting yet. I'd imagine this is just a matter of skipping serial path generation when parallel is possible when force_parallel_mode is FORCE_PARALLEL_ON. I've no idea what FORCE_PARALLEL_REGRESS is for yet. The setrefs.c parts of the patch remain completely broken. I've not had time to look at this again yet, sorry. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services parallel_aggregation_cc75f61_2016-03-08.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On 7 March 2016 at 18:19, Haribabu Kommi wrote: > Here I attached update patch with further changes, > 1. Explain plan changes for parallel grouping Perhaps someone might disagree with me, but I'm not all that sure I really get the need for that. With nodeAgg.c we're doing something fundamentally different in Partial mode as we are in Finalize mode, that's why I wanted to give an indication of that in the explain.c originally. A query with no Aggregate functions using nodeGroup.c there is no special handling in the executor for partial and final stages, so I really don't see why we need to give the impression that there is in EXPLAIN. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
Haribabu Kommi writes: > 2. Temporary fix for float aggregate types in _equalAggref because of > a change in aggtype to trans type, otherwise the parallel aggregation > plan failure in set_plan_references. whenever the aggtype is not matched, > it verifies with the trans type also. That is a completely unacceptable kluge. Quite aside from being ugly as sin, it probably breaks more things than it fixes, first because it breaks the fundamental semantics of equal() across the board, and second because it puts catalog lookups into equal(), which *will* cause problems. You can not expect that this will get committed, not even as a "temporary fix". regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Sun, Mar 6, 2016 at 10:21 PM, Haribabu Kommi wrote: > > Pending: > 1. Explain plan needs to be corrected for parallel grouping similar like > parallel aggregate. Here I attached update patch with further changes, 1. Explain plan changes for parallel grouping 2. Temporary fix for float aggregate types in _equalAggref because of a change in aggtype to trans type, otherwise the parallel aggregation plan failure in set_plan_references. whenever the aggtype is not matched, it verifies with the trans type also. 3. Generates parallel path for all partial paths and add it to the path_list, based on the cheapest_path, the plan is chosen. To apply this patch, first apply the patch in [1] [1] - http://www.postgresql.org/message-id/14172.1457228...@sss.pgh.pa.us Regards, Hari Babu Fujitsu Australia parallelagg_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Fri, Mar 4, 2016 at 3:00 PM, David Rowley wrote: > On 17 February 2016 at 17:50, Haribabu Kommi wrote: >> Here I attached a draft patch based on previous discussions. It still needs >> better comments and optimization. > > Over in [1] Tom posted a large change to the grouping planner which > causes large conflict with the parallel aggregation patch. I've been > looking over Tom's patch and reading the related thread and I've > observed 3 things: > > 1. Parallel Aggregate will be much easier to write and less code to > base it up top of Tom's upper planner changes. The latest patch does > add a bit of cruft (e.g create_gather_plan_from_subplan()) which won't > be required after Tom pushes the changes to the upper planner. > 2. If we apply parallel aggregate before Tom's upper planner changes > go in, then Tom needs to reinvent it again when rebasing his patch. > This seems senseless, so this is why I did this work. > 3. Based on the thread, most people are leaning towards getting Tom's > changes in early to allow a bit more settle time before beta, and > perhaps also to allow other patches to go in after (e.g this) > > So, I've done a bit of work and I've rewritten the parallel aggregate > code to base it on top of Tom's patch posted in [1]. There's a few > things that are left unsolved at this stage. > > 1. exprType() for Aggref still returns the aggtype, where it needs to > return the trans type for partial agg nodes, this need to return the > trans type rather than the aggtype. I had thought I might fix this by > adding a proxy node type that sits in the targetlist until setrefs.c > where it can be plucked out and replaced by the Aggref. I need to > investigate this further. > 2. There's an outstanding bug relating to HAVING clause not seeing the > right state of aggregation and returning wrong results. I've not had > much time to look into this yet, but I suspect its an existing bug > that's already in master from my combine aggregate patch. I will > investigate this on Sunday. > Thanks for updating the patch. Here I attached updated patch with the additional changes, 1. Now parallel aggregation works with expressions along with aggregate functions also. 2. Aggref return the trans type instead of agg type, this change adds the support of parallel aggregate to float aggregates, still it needs a fix in _equalAggref function. Pending: 1. Explain plan needs to be corrected for parallel grouping similar like parallel aggregate. To apply this patch, first apply the patch in [1] [1] - http://www.postgresql.org/message-id/14172.1457228...@sss.pgh.pa.us Regards, Hari Babu Fujitsu Australia parallelagg_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Thu, Mar 3, 2016 at 11:00 PM, David Rowley wrote: > On 17 February 2016 at 17:50, Haribabu Kommi wrote: >> Here I attached a draft patch based on previous discussions. It still needs >> better comments and optimization. > > Over in [1] Tom posted a large change to the grouping planner which > causes large conflict with the parallel aggregation patch. I've been > looking over Tom's patch and reading the related thread and I've > observed 3 things: > > 1. Parallel Aggregate will be much easier to write and less code to > base it up top of Tom's upper planner changes. The latest patch does > add a bit of cruft (e.g create_gather_plan_from_subplan()) which won't > be required after Tom pushes the changes to the upper planner. > 2. If we apply parallel aggregate before Tom's upper planner changes > go in, then Tom needs to reinvent it again when rebasing his patch. > This seems senseless, so this is why I did this work. > 3. Based on the thread, most people are leaning towards getting Tom's > changes in early to allow a bit more settle time before beta, and > perhaps also to allow other patches to go in after (e.g this) > > So, I've done a bit of work and I've rewritten the parallel aggregate > code to base it on top of Tom's patch posted in [1]. Great! > 3. The code never attempts to mix and match Grouping Agg and Hash Agg > plans. e.g it could be an idea to perform Partial Hash Aggregate -> > Gather -> Sort -> Finalize Group Aggregate, or hash as in the Finalize > stage. I just thought doing this is more complex than what's really > needed, but if someone can think of a case where this would be a great > win then I'll listen, but you have to remember we don't have any > pre-sorted partial paths at this stage, so an explicit sort is > required *always*. This might change if someone invented partial btree > index scans... but until then... Actually, Rahila Syed is working on that. But it's not done yet, so presumably will not go into 9.6. I don't really see the logic of this, though. Currently, Gather destroys the input ordering, so it seems preferable for the finalize-aggregates stage to use a hash aggregate whenever possible, whatever the partial-aggregate stage did. Otherwise, we need an explicit sort. Anyway, it seems like the two stages should be costed and decided on their own merits - there's no reason to chain the two decisions together. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On 17 February 2016 at 17:50, Haribabu Kommi wrote: > Here I attached a draft patch based on previous discussions. It still needs > better comments and optimization. Over in [1] Tom posted a large change to the grouping planner which causes large conflict with the parallel aggregation patch. I've been looking over Tom's patch and reading the related thread and I've observed 3 things: 1. Parallel Aggregate will be much easier to write and less code to base it up top of Tom's upper planner changes. The latest patch does add a bit of cruft (e.g create_gather_plan_from_subplan()) which won't be required after Tom pushes the changes to the upper planner. 2. If we apply parallel aggregate before Tom's upper planner changes go in, then Tom needs to reinvent it again when rebasing his patch. This seems senseless, so this is why I did this work. 3. Based on the thread, most people are leaning towards getting Tom's changes in early to allow a bit more settle time before beta, and perhaps also to allow other patches to go in after (e.g this) So, I've done a bit of work and I've rewritten the parallel aggregate code to base it on top of Tom's patch posted in [1]. There's a few things that are left unsolved at this stage. 1. exprType() for Aggref still returns the aggtype, where it needs to return the trans type for partial agg nodes, this need to return the trans type rather than the aggtype. I had thought I might fix this by adding a proxy node type that sits in the targetlist until setrefs.c where it can be plucked out and replaced by the Aggref. I need to investigate this further. 2. There's an outstanding bug relating to HAVING clause not seeing the right state of aggregation and returning wrong results. I've not had much time to look into this yet, but I suspect its an existing bug that's already in master from my combine aggregate patch. I will investigate this on Sunday. In regards to the patch, there's a few things worth mentioning here: 1. I've had to add a parallel_degree parameter to create_group_path() and create_agg_path(). I think Tom is going to make changes to his patch so that the Path's parallel_degree is propagated to subnodes, this should allow me to remove this parameter and just use parallel_degree the one from the subpath. 2. I had to add a new parameter to pass an optional row estimate to cost_gather() as I don't have a RelOptInfo available to get a row estimate from which represents the state after partial aggregation. I thought this change was ok, but I'll listen to anyone who thinks of a better way to do it. 3. The code never attempts to mix and match Grouping Agg and Hash Agg plans. e.g it could be an idea to perform Partial Hash Aggregate -> Gather -> Sort -> Finalize Group Aggregate, or hash as in the Finalize stage. I just thought doing this is more complex than what's really needed, but if someone can think of a case where this would be a great win then I'll listen, but you have to remember we don't have any pre-sorted partial paths at this stage, so an explicit sort is required *always*. This might change if someone invented partial btree index scans... but until then... Due to the existence of the outstanding issues above, I feel like I might be posting the patch a little earlier, but wanted to do so since this is quite a hot area in the code at the moment and I wanted to post for transparency. To apply the patch please apply [1] first. [1] http://www.postgresql.org/message-id/3795.1456689...@sss.pgh.pa.us -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services parallel_aggregation_d6850a9f_2016-03-04.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Sat, Feb 13, 2016 at 3:51 PM, Robert Haas wrote: > On Sun, Feb 7, 2016 at 8:21 PM, Haribabu Kommi > wrote:future. >> Here I attached updated patch with the corrections. > > So, what about the main patch, for parallel aggregation itself? I'm > reluctant to spend too much time massaging combine functions if we > don't have the infrastructure to use them. Here I attached a draft patch based on previous discussions. It still needs better comments and optimization. Overview: 1. Before creating the plan for the best path, verify whether parallel aggregate plan is possible or not? If possible check whether it is the cheapest plan compared to normal aggregate? If parallel is cheaper then replace the best path with the cheapest_partial_path. 2. while generating parallel aggregate plan, first generate targetlist of partial aggregate by generating bare aggregate references and group by expressions. 3. Change the aggref->aggtype with aggtranstype in the partial aggregate targetlist to return a proper tuple data from worker. 4. Generate partial aggregate node using the generated targetlist. 5. Add gather and finalize aggregate nodes on top of partial aggregate plan. To do: 1. Optimize the aggregate cost calculation mechanism, currently it is used many times. 2. Better comments and etc. Please verify whether the patch is in the right direction as per your expectation? Regards, Hari Babu Fujitsu Australia parallelagg_poc_v7.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Sun, Feb 7, 2016 at 8:21 PM, Haribabu Kommi wrote:future. > Here I attached updated patch with the corrections. So, what about the main patch, for parallel aggregation itself? I'm reluctant to spend too much time massaging combine functions if we don't have the infrastructure to use them. This patch removes the comment from float8_regr_sxx in pg_proc.h for no apparent reason. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Mon, Feb 8, 2016 at 9:01 AM, Robert Haas wrote: > On Thu, Jan 21, 2016 at 11:25 PM, Haribabu Kommi > wrote: >> [ new patch ] > > This patch contains a number of irrelevant hunks that really ought not > to be here and make the patch harder to understand, like this: > > -* Generate appropriate target list for > scan/join subplan; may be > -* different from tlist if grouping or > aggregation is needed. > +* Generate appropriate target list for > subplan; may be different from > +* tlist if grouping or aggregation is needed. > > Please make a habit of getting rid of that sort of thing before submitting. sure. I will take of such things in future. > Generally, I'm not quite sure I understand the code here. It seems to > me that what we ought to be doing is that grouping_planner, right > after considering using a presorted path (that is, just after the if > (sorted_path) block between lines 1822-1850), ought to then consider > using a partial path. For the moment, it need not consider the > possibility that there may be a presorted partial path, because we > don't have any way to generate those yet. (I have plans to fix that, > but not in time for 9.6.) So it can just consider doing a Partial > Aggregate on the cheapest partial path using an explicit sort, or > hashing; then, above the Gather, it can finalize either by hashing or > by sorting and grouping. > > The trick is that there's no path representation of an aggregate, and > there won't be until Tom finishes his upper planner path-ification > work. But it seems to me we can work around that. Set best_path to > the cheapest partial path, add a partial aggregate rather than a > regular one around where it says "Insert AGG or GROUP node if needed, > plus an explicit sort step if necessary", and then push a Gather node > and a Finalize Aggregate onto the result. Thanks, i will update the patch accordingly. Along with those changes, I will try to calculate the cost involved in normal aggregate without generating the plan and compare it against the parallel cost plan before generating the actual plan. Because with less number of groups normal aggregate is performing better than parallel aggregate in tests. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Mon, Feb 8, 2016 at 2:00 AM, Robert Haas wrote: > On Sun, Jan 24, 2016 at 7:56 PM, Haribabu Kommi > wrote: >> On Sat, Jan 23, 2016 at 12:59 PM, Haribabu Kommi >> wrote: >>> Here I attached updated patch with additional combine function for >>> two stage aggregates also. >> >> A wrong combine function was added in pg_aggregate.h in the earlier >> patch that leading to >> initdb problem. Corrected one is attached. > > I'm not entirely sure I know what's going on here, but I'm pretty sure > that it makes no sense for the new float8_pl function to reject > non-aggregate callers at the beginning and then have a comment at the > end indicating what it does when not invoked as an aggregate. > Similarly for the other new function. > > It would be a lot more clear what this patch was trying to accomplish > if the new functions had header comments explaining their purpose - > not what they do, but why they exist. I added some header comments explaining the need of these functions and when they will be used? These combine functions are necessary to float4 and float8 for parallel aggregation. > float8_regr_pl is labeled in pg_proc.h as an aggregate transition > function, but I'm wondering if it should say combine function. corrected. > The changes to pg_aggregate.h include a large number of > whitespace-only changes which are unacceptable. Please change only > the lines that need to be changed. I try to align the other rows according to the new combine function addition, that leads to a white space problem, I will take care of such things in future. Here I attached updated patch with the corrections. Regards, Hari Babu Fujitsu Australia additional_combine_fns_v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Thu, Jan 21, 2016 at 11:25 PM, Haribabu Kommi wrote: > [ new patch ] This patch contains a number of irrelevant hunks that really ought not to be here and make the patch harder to understand, like this: -* Generate appropriate target list for scan/join subplan; may be -* different from tlist if grouping or aggregation is needed. +* Generate appropriate target list for subplan; may be different from +* tlist if grouping or aggregation is needed. Please make a habit of getting rid of that sort of thing before submitting. Generally, I'm not quite sure I understand the code here. It seems to me that what we ought to be doing is that grouping_planner, right after considering using a presorted path (that is, just after the if (sorted_path) block between lines 1822-1850), ought to then consider using a partial path. For the moment, it need not consider the possibility that there may be a presorted partial path, because we don't have any way to generate those yet. (I have plans to fix that, but not in time for 9.6.) So it can just consider doing a Partial Aggregate on the cheapest partial path using an explicit sort, or hashing; then, above the Gather, it can finalize either by hashing or by sorting and grouping. The trick is that there's no path representation of an aggregate, and there won't be until Tom finishes his upper planner path-ification work. But it seems to me we can work around that. Set best_path to the cheapest partial path, add a partial aggregate rather than a regular one around where it says "Insert AGG or GROUP node if needed, plus an explicit sort step if necessary", and then push a Gather node and a Finalize Aggregate onto the result. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Sun, Jan 24, 2016 at 7:56 PM, Haribabu Kommi wrote: > On Sat, Jan 23, 2016 at 12:59 PM, Haribabu Kommi > wrote: >> Here I attached updated patch with additional combine function for >> two stage aggregates also. > > A wrong combine function was added in pg_aggregate.h in the earlier > patch that leading to > initdb problem. Corrected one is attached. I'm not entirely sure I know what's going on here, but I'm pretty sure that it makes no sense for the new float8_pl function to reject non-aggregate callers at the beginning and then have a comment at the end indicating what it does when not invoked as an aggregate. Similarly for the other new function. It would be a lot more clear what this patch was trying to accomplish if the new functions had header comments explaining their purpose - not what they do, but why they exist. float8_regr_pl is labeled in pg_proc.h as an aggregate transition function, but I'm wondering if it should say combine function. The changes to pg_aggregate.h include a large number of whitespace-only changes which are unacceptable. Please change only the lines that need to be changed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Sat, Jan 23, 2016 at 12:59 PM, Haribabu Kommi wrote: > > Here I attached updated patch with additional combine function for > two stage aggregates also. A wrong combine function was added in pg_aggregate.h in the earlier patch that leading to initdb problem. Corrected one is attached. Regards, Hari Babu Fujitsu Australia additional_combine_fns_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Fri, Jan 22, 2016 at 10:13 PM, David Rowley wrote: > On 22 January 2016 at 17:25, Haribabu Kommi wrote: >> Along with these changes, I added a float8 combine function to see >> how it works under parallel aggregate, it is working fine for float4, but >> giving small data mismatch with float8 data type. >> >> postgres=# select avg(f3), avg(f4) from tbl; >>avg| avg >> --+-- >> 1.1002384186 | 100.12344879 >> (1 row) >> >> postgres=# set enable_parallelagg = true; >> SET >> postgres=# select avg(f3), avg(f4) from tbl; >>avg| avg >> --+-- >> 1.1002384186 | 100.12344918 >> (1 row) >> >> Column - f3 - float4 >> Column - f4 - float8 >> >> similar problem for all float8 var_pop, var_samp, stddev_pop and stddev_samp >> aggregates. Any special care is needed for float8 datatype? > > I'm not sure if this is what's going on here, as I don't really know > the range of numbers that you've used to populate f4 with. It would be > good to know, does "f4" contain negative values too? No negative values are present in the f4 column. Following are the SQL statements, create table tbl(f1 int, f2 char(100), f3 float4, f4 float8); insert into tbl values(generate_series(1,10), 'Fujitsu', 1.1, 100.12345); > It's not all that hard to demonstrate the instability of addition with > float8. Take the following example: > > create table d (d float8); > insert into d > values(1223123223412324.2231),(0.23),(-1223123223412324.2231); > > # select sum(d order by random()) from d; > sum > - >0 > (1 row) > > same query, once more. > > # select sum(d order by random()) from d; >sum > -- > 2.3e-013 > (1 row) > > Here the result just depends on the order which the numbers have been > added. You may need to execute a few more times to see the result > change. > > Perhaps a good test would be to perform a sum(f4 order by random()) in > serial mode, and see if you're getting a stable result from the > numbers that you have populated the table with. > > If that's the only problem at play here, then I for one am not worried > about it, as the instability already exists today depending on which > path is chosen to scan the relation. For example an index scan is > likely not to return rows in the same order as a seq scan. > > We do also warn about this in the manual: "Inexact means that some > values cannot be converted exactly to the internal format and are > stored as approximations, so that storing and retrieving a value might > show slight discrepancies. Managing these errors and how they > propagate through calculations is the subject of an entire branch of > mathematics and computer science and will not be discussed here, > except for the following points:" [1] > > > [1] http://www.postgresql.org/docs/devel/static/datatype-numeric.html > Thanks for the detailed explanation. Now I understood. Here I attached updated patch with additional combine function for two stage aggregates also. Regards, Hari Babu Fujitsu Australia additional_combine_fns_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On 22 January 2016 at 17:25, Haribabu Kommi wrote: > Along with these changes, I added a float8 combine function to see > how it works under parallel aggregate, it is working fine for float4, but > giving small data mismatch with float8 data type. > > postgres=# select avg(f3), avg(f4) from tbl; >avg| avg > --+-- > 1.1002384186 | 100.12344879 > (1 row) > > postgres=# set enable_parallelagg = true; > SET > postgres=# select avg(f3), avg(f4) from tbl; >avg| avg > --+-- > 1.1002384186 | 100.12344918 > (1 row) > > Column - f3 - float4 > Column - f4 - float8 > > similar problem for all float8 var_pop, var_samp, stddev_pop and stddev_samp > aggregates. Any special care is needed for float8 datatype? I'm not sure if this is what's going on here, as I don't really know the range of numbers that you've used to populate f4 with. It would be good to know, does "f4" contain negative values too? It's not all that hard to demonstrate the instability of addition with float8. Take the following example: create table d (d float8); insert into d values(1223123223412324.2231),(0.23),(-1223123223412324.2231); # select sum(d order by random()) from d; sum - 0 (1 row) same query, once more. # select sum(d order by random()) from d; sum -- 2.3e-013 (1 row) Here the result just depends on the order which the numbers have been added. You may need to execute a few more times to see the result change. Perhaps a good test would be to perform a sum(f4 order by random()) in serial mode, and see if you're getting a stable result from the numbers that you have populated the table with. If that's the only problem at play here, then I for one am not worried about it, as the instability already exists today depending on which path is chosen to scan the relation. For example an index scan is likely not to return rows in the same order as a seq scan. We do also warn about this in the manual: "Inexact means that some values cannot be converted exactly to the internal format and are stored as approximations, so that storing and retrieving a value might show slight discrepancies. Managing these errors and how they propagate through calculations is the subject of an entire branch of mathematics and computer science and will not be discussed here, except for the following points:" [1] [1] http://www.postgresql.org/docs/devel/static/datatype-numeric.html -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Fri, Jan 22, 2016 at 7:44 AM, David Rowley wrote: > On 21 January 2016 at 18:26, Haribabu Kommi wrote: >> Here I attached update parallel aggregate patch on top of recent commits >> of combine aggregate and parallel join patch. It still lacks of cost >> comparison >> code to compare both parallel and normal aggregates. > > Thanks for the updated patch. > > I'm just starting to look over this now. > > # create table t1 as select x from generate_series(1,100) x(x); > # vacuum ANALYZE t1; > # set max_parallel_degree =8; > # explain select sum(x) from t1; >QUERY PLAN > - > Aggregate (cost=9633.33..9633.34 rows=1 width=4) >-> Parallel Seq Scan on t1 (cost=0.00..8591.67 rows=416667 width=4) > (2 rows) > > I'm not quite sure what's happening here yet as I've not ran it > through my debugger, but how can we have a Parallel Seq Scan without a > Gather node? It appears to give correct results, so I can only assume > it's not actually a parallel scan at all. > > Let's check: > > # select relname,seq_scan from pg_stat_user_tables where relname ='t1'; > relname | seq_scan > -+-- > t1 |0 > (1 row) > > # explain analyze select sum(x) from t1; > QUERY PLAN > -- > Aggregate (cost=9633.33..9633.34 rows=1 width=4) (actual > time=161.820..161.821 rows=1 loops=1) >-> Parallel Seq Scan on t1 (cost=0.00..8591.67 rows=416667 > width=4) (actual time=0.051..85.348 rows=100 loops=1) > Planning time: 0.040 ms > Execution time: 161.861 ms > (4 rows) > > # select relname,seq_scan from pg_stat_user_tables where relname ='t1'; > relname | seq_scan > -+-- > t1 |1 > (1 row) > > Only 1 scan. > > > # explain analyze select * from t1 where x=1; >QUERY PLAN > > Gather (cost=1000.00..10633.43 rows=1 width=4) (actual > time=0.231..49.105 rows=1 loops=1) >Number of Workers: 2 >-> Parallel Seq Scan on t1 (cost=0.00..9633.33 rows=0 width=4) > (actual time=29.060..45.302 rows=0 loops=3) > Filter: (x = 1) > Rows Removed by Filter: 33 > Planning time: 0.049 ms > Execution time: 51.438 ms > (7 rows) > > # select relname,seq_scan from pg_stat_user_tables where relname ='t1'; > relname | seq_scan > -+-- > t1 |4 > (1 row) > > 3 more scans. This one seems to actually be parallel, and makes sense > based on "Number of Workers: 2" The problem was the gather path that is generated on partial path list is not getting added to path list, because of which, there is a mismatch in sorted path and cheapest_path, so it leads to a wrong plan. For temporarily, I marked the sorted_path and cheapest_path as same and it works fine. > Also looking at the patch: > > +bool > +aggregates_allow_partial(Node *clause) > +{ > > In the latest patch that I sent on the combine aggregates thread: > http://www.postgresql.org/message-id/CAKJS1f_in9J_ru4gPfygCQLUeB3=rzq3kg6rnpn-fzzhddi...@mail.gmail.com > I made it so there's 3 possible return values from this function. As > your patch stands now, if I create an aggregate function with an > INTERNAL state with a combine function set, then this patch might try > to parallel aggregate that and pass around the pointer to the internal > state in the Tuple going from the worker to the main process, when the > main process dereferences this pointer we'll get a segmentation > violation. So I'd say you should maybe use a modified version of my > latest aggregates_allow_partial() and check for PAT_ANY, and only > parallelise the aggregate if you get that value. If the use of > partial aggregate was within a single process then you could be quite > content with PAT_INTERNAL_ONLY. You'll just need to pull out the logic > that checks for serial and deserial functions, since that's not in > yet, and just have it return PAT_INTERNAL_ONLY if INTERNAL aggregates > are found which have combine functions set. > I took the suggested code changes from combine aggregate patch and changed accordingly. Along with these changes, I added a float8 combine function to see how it works under parallel aggregate, it is working fine for float4, but giving small data mismatch with float8 data type. postgres=# select avg(f3), avg(f4) from tbl; avg| avg --+-- 1.1002384186 | 100.12344879 (1 row) postgres=# set enable_parallelagg = true; SET postgres=# select avg(f3), avg(f4) from tbl; avg| avg --+-- 1.1002384186 | 100.12344918 (1 row) Column
Re: [HACKERS] Parallel Aggregate
On 21 January 2016 at 18:26, Haribabu Kommi wrote: > Here I attached update parallel aggregate patch on top of recent commits > of combine aggregate and parallel join patch. It still lacks of cost > comparison > code to compare both parallel and normal aggregates. Thanks for the updated patch. I'm just starting to look over this now. # create table t1 as select x from generate_series(1,100) x(x); # vacuum ANALYZE t1; # set max_parallel_degree =8; # explain select sum(x) from t1; QUERY PLAN - Aggregate (cost=9633.33..9633.34 rows=1 width=4) -> Parallel Seq Scan on t1 (cost=0.00..8591.67 rows=416667 width=4) (2 rows) I'm not quite sure what's happening here yet as I've not ran it through my debugger, but how can we have a Parallel Seq Scan without a Gather node? It appears to give correct results, so I can only assume it's not actually a parallel scan at all. Let's check: # select relname,seq_scan from pg_stat_user_tables where relname ='t1'; relname | seq_scan -+-- t1 |0 (1 row) # explain analyze select sum(x) from t1; QUERY PLAN -- Aggregate (cost=9633.33..9633.34 rows=1 width=4) (actual time=161.820..161.821 rows=1 loops=1) -> Parallel Seq Scan on t1 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.051..85.348 rows=100 loops=1) Planning time: 0.040 ms Execution time: 161.861 ms (4 rows) # select relname,seq_scan from pg_stat_user_tables where relname ='t1'; relname | seq_scan -+-- t1 |1 (1 row) Only 1 scan. # explain analyze select * from t1 where x=1; QUERY PLAN Gather (cost=1000.00..10633.43 rows=1 width=4) (actual time=0.231..49.105 rows=1 loops=1) Number of Workers: 2 -> Parallel Seq Scan on t1 (cost=0.00..9633.33 rows=0 width=4) (actual time=29.060..45.302 rows=0 loops=3) Filter: (x = 1) Rows Removed by Filter: 33 Planning time: 0.049 ms Execution time: 51.438 ms (7 rows) # select relname,seq_scan from pg_stat_user_tables where relname ='t1'; relname | seq_scan -+-- t1 |4 (1 row) 3 more scans. This one seems to actually be parallel, and makes sense based on "Number of Workers: 2" Also looking at the patch: +bool +aggregates_allow_partial(Node *clause) +{ In the latest patch that I sent on the combine aggregates thread: http://www.postgresql.org/message-id/CAKJS1f_in9J_ru4gPfygCQLUeB3=rzq3kg6rnpn-fzzhddi...@mail.gmail.com I made it so there's 3 possible return values from this function. As your patch stands now, if I create an aggregate function with an INTERNAL state with a combine function set, then this patch might try to parallel aggregate that and pass around the pointer to the internal state in the Tuple going from the worker to the main process, when the main process dereferences this pointer we'll get a segmentation violation. So I'd say you should maybe use a modified version of my latest aggregates_allow_partial() and check for PAT_ANY, and only parallelise the aggregate if you get that value. If the use of partial aggregate was within a single process then you could be quite content with PAT_INTERNAL_ONLY. You'll just need to pull out the logic that checks for serial and deserial functions, since that's not in yet, and just have it return PAT_INTERNAL_ONLY if INTERNAL aggregates are found which have combine functions set. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Thu, Dec 24, 2015 at 5:12 AM, Robert Haas wrote: > On Mon, Dec 21, 2015 at 6:38 PM, David Rowley > wrote: >> On 22 December 2015 at 04:16, Paul Ramsey wrote: >>> >>> Shouldn’t parallel aggregate come into play regardless of scan >>> selectivity? >> >> I'd say that the costing should take into account the estimated number of >> groups. >> >> The more tuples that make it into each group, the more attractive parallel >> grouping should seem. In the extreme case if there's 1 tuple per group, then >> it's not going to be of much use to use parallel agg, this would be similar >> to a scan with 100% selectivity. So perhaps the costings for it can be >> modeled around a the parallel scan costing, but using the estimated groups >> instead of the estimated tuples. > > Generally, the way that parallel costing is supposed to work (with the > parallel join patch, anyway) is that you've got the same nodes costed > the same way you would otherwise, but the row counts are lower because > you're only processing 1/Nth of the rows. That's probably not exactly > the whole story here, but it's something to think about. Here I attached update parallel aggregate patch on top of recent commits of combine aggregate and parallel join patch. It still lacks of cost comparison code to compare both parallel and normal aggregates. Regards, Hari Babu Fujitsu Australia parallelagg_poc_v5.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Mon, Dec 21, 2015 at 6:38 PM, David Rowley wrote: > On 22 December 2015 at 04:16, Paul Ramsey wrote: >> >> Shouldn’t parallel aggregate come into play regardless of scan >> selectivity? > > I'd say that the costing should take into account the estimated number of > groups. > > The more tuples that make it into each group, the more attractive parallel > grouping should seem. In the extreme case if there's 1 tuple per group, then > it's not going to be of much use to use parallel agg, this would be similar > to a scan with 100% selectivity. So perhaps the costings for it can be > modeled around a the parallel scan costing, but using the estimated groups > instead of the estimated tuples. Generally, the way that parallel costing is supposed to work (with the parallel join patch, anyway) is that you've got the same nodes costed the same way you would otherwise, but the row counts are lower because you're only processing 1/Nth of the rows. That's probably not exactly the whole story here, but it's something to think about. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On 22 December 2015 at 04:16, Paul Ramsey wrote: > Shouldn’t parallel aggregate come into play regardless of scan selectivity? > I'd say that the costing should take into account the estimated number of groups. The more tuples that make it into each group, the more attractive parallel grouping should seem. In the extreme case if there's 1 tuple per group, then it's not going to be of much use to use parallel agg, this would be similar to a scan with 100% selectivity. So perhaps the costings for it can be modeled around a the parallel scan costing, but using the estimated groups instead of the estimated tuples. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Parallel Aggregate
On Tue, Dec 22, 2015 at 2:16 AM, Paul Ramsey wrote: > Shouldn’t parallel aggregate come into play regardless of scan selectivity? > I know in PostGIS land there’s a lot of stuff like: > > SELECT ST_Union(geom) FROM t GROUP BY areacode; > > Basically, in the BI case, there’s often no filter at all. Hoping that’s > considered a prime case for parallel agg :) Yes, the latest patch attached in the thread addresses this issue. But it still lacks of proper cost calculation and comparison with original aggregate cost. The parallel aggregate selects only when the number of groups should be at least less than 1/4 of rows that are getting selected. Otherwise, doing aggregation two times for more number of records leads to performance drop compared to original aggregate. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers