On Tue, Nov 11, 2014 at 3:29 AM, Simon Riggs <si...@2ndquadrant.com> wrote: > * only functions marked as "CONTAINS NO SQL" > We don't really know what proisparallel is, but we do know what > CONTAINS NO SQL means and can easily check for it. > Plus I already have a patch for this, slightly bitrotted.
Interestingly, I have a fairly solid idea of what proisparallel is, but I have no clear idea what CONTAINS NO SQL is or why it's relevant. I would imagine that srandom() contains no SQL under any reasonable definition of what that means, but it ain't parallel-safe. > * parallel_workers = 2 (or at least not make it user settable) > By fixing the number of workers at 2 we avoid any problems caused by > having N variable, such as how to vary N fairly amongst users and > other such considerations. We get the main benefit of parallelism, > without causing other issues across the server. I think this is a fairly pointless restriction. The code simplification we'll get out of it appears to me to be quite minor, and we'll just end up putting the stuff back in anyway. > * Fixed Plan: aggregate-scan > To make everything simpler, allow only plans of a single type. > SELECT something, list of aggregates > FROM foo > WHERE filters > GROUP BY something > because we know that passing large amounts of data from worker to > master process will be slow, so focusing only on seq scan is not > sensible; we should focus on plans that significantly reduce the > number of rows passed upwards. We could just do this for very > selective WHERE clauses, but that is not an important class of query. > As soon as include aggregates, we reduce data passing significantly > AND we hit a very important subset of queries: This is moving the goalposts in a way that I'm not at all comfortable with. Parallel sequential-scan is pretty simple and may well be a win if there's a restrictive filter condition involved. Parallel aggregation requires introducing new infrastructure into the aggregate machinery to allow intermediate state values to be combined, and that would be a great project for someone to do at some time, but it seems like a distraction for me to do that right now. > This plan type is widely used in reporting queries, so will hit the > mainline of BI applications and many Mat View creations. > This will allow SELECT count(*) FROM foo to go faster also. > > The execution plan for that query type looks like this... > Hash Aggregate > Gather From Workers > {Worker Nodes workers = 2 > HashAggregate > PartialScan} I'm going to aim for the simpler: Hash Aggregate -> Parallel Seq Scan Workers: 4 Yeah, I know that won't perform as well as what you're proposing, but I'm fairly sure it's simpler. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers