On Tue, Jul 28, 2015 at 12:59 PM, David Rowley <david.row...@2ndquadrant.com > wrote:
> > On 27 July 2015 at 21:09, Kyotaro HORIGUCHI < > horiguchi.kyot...@lab.ntt.co.jp> wrote: > >> Hello, can I ask some questions? >> >> I suppose we can take this as the analog of ParalleSeqScan. I >> can see not so distinction between Append(ParalleSeqScan) and >> ParallelAppend(SeqScan). What difference is there between them? >> >> If other nodes will have the same functionality as you mention at >> the last of this proposal, it might be better that some part of >> this feature is implemented as a part of existing executor >> itself, but not as a deidicated additional node, just as my >> asynchronous fdw execution patch patially does. (Although it >> lacks planner part and bg worker launching..) If that is the >> case, it might be better that ExecProcNode is modified so that it >> supports both in-process and inter-bgworker cases by the single >> API. >> >> What do you think about this? >> > > I have to say that I really like the thought of us having parallel enabled > stuff in Postgres, but I also have to say that I don't think inventing all > these special parallel node types is a good idea. If we think about > everything that we can parallelise... > > Perhaps.... sort, hash join, seqscan, hash, bitmap heap scan, nested loop. > I don't want to debate that, but perhaps there's more, perhaps less. > Are we really going to duplicate all of the code and add in the parallel > stuff as new node types? > > My other concern here is that I seldom hear people talk about the > planner's architectural lack of ability to make a good choice about how > many parallel workers to choose. Surely to properly calculate costs you > need to know the exact number of parallel workers that will be available at > execution time, but you need to know this at planning time!? I can't see > how this works, apart from just being very conservative about parallel > workers, which I think is really bad, as many databases have busy times in > the day, and also quiet times, generally quiet time is when large batch > stuff gets done, and that's the time that parallel stuff is likely most > useful. Remember queries are not always planned just before they're > executed. We could have a PREPAREd query, or we could have better plan > caching in the future, or if we build some intelligence into the planner to > choose a good number of workers based on the current server load, then > what's to say that the server will be under this load at exec time? If we > plan during a quiet time, and exec in a busy time all hell may break loose. > > I really do think that existing nodes should just be initialized in a > parallel mode, and each node type can have a function to state if it > supports parallelism or not. > > I'd really like to hear more opinions in the ideas I discussed here: > > > http://www.postgresql.org/message-id/CAApHDvp2STf0=pqfpq+e7wa4qdympfm5qu_ytupe7r0jlnh...@mail.gmail.com > > > This design makes use of the Funnel node that Amit has already made and > allows more than 1 node to be executed in parallel at once. > > It appears that parallel enabling the executor node by node is > fundamentally locked into just 1 node being executed in parallel, then > perhaps a Funnel node gathering up the parallel worker buffers and > streaming those back in serial mode. I believe by design, this does not > permit a whole plan branch from executing in parallel and I really feel > like doing things this way is going to be very hard to undo and improve > later. I might be too stupid to figure it out, but how would parallel hash > join work if it can't gather tuples from the inner and outer nodes in > parallel? > > Sorry for the rant, but I just feel like we're painting ourselves into a > corner by parallel enabling the executor node by node. > Apologies if I've completely misunderstood things. > > +1, well articulated. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company