On Wed, Jul 12, 2017 at 7:08 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> On Wed, Jul 12, 2017 at 11:20 PM, Jeff Janes <jeff.ja...@gmail.com> wrote: > > On Tue, Jul 11, 2017 at 10:25 PM, Amit Kapila <amit.kapil...@gmail.com> > > wrote: > >> > >> On Wed, Jul 12, 2017 at 1:50 AM, Jeff Janes <jeff.ja...@gmail.com> > wrote: > >> > On Mon, Jul 10, 2017 at 9:51 PM, Dilip Kumar <dilipbal...@gmail.com> > >> > wrote: > >> >> > >> >> So because of this high projection cost the seqpath and parallel path > >> >> both have fuzzily same cost but seqpath is winning because it's > >> >> parallel safe. > >> > > >> > > >> > I think you are correct. However, unless parallel_tuple_cost is set > >> > very > >> > low, apply_projection_to_path never gets called with the Gather path > as > >> > an > >> > argument. It gets ruled out at some earlier stage, presumably because > >> > it > >> > assumes the projection step cannot make it win if it is already behind > >> > by > >> > enough. > >> > > >> > >> I think that is genuine because tuple communication cost is very high. > > > > > > Sorry, I don't know which you think is genuine, the early pruning or my > > complaint about the early pruning. > > > > Early pruning. See, currently, we don't have a way to maintain both > parallel and non-parallel paths till later stage and then decide which > one is better. If we want to maintain both parallel and non-parallel > paths, it can increase planning cost substantially in the case of > joins. Now, surely it can have benefit in many cases, so it is a > worthwhile direction to pursue. > If I understand it correctly, we have a way, it just can lead to exponential explosion problem, so we are afraid to use it, correct? If I just lobotomize the path domination code (make pathnode.c line 466 always test false) if (JJ_all_paths==0 && costcmp != COSTS_DIFFERENT) Then it keeps the parallel plan and later chooses to use it (after applying your other patch in this thread) as the overall best plan. It even doesn't slow down "make installcheck-parallel" by very much, which I guess just means the regression tests don't have a lot of complex joins. But what is an acceptable solution? Is there a heuristic for when retaining a parallel path could be helpful, the same way there is for fast-start paths? It seems like the best thing would be to include the evaluation costs in the first place at this step. Why is the path-cost domination code run before the cost of the function evaluation is included? Is that because the information needed to compute it is not available at that point, or because it would be too slow to include it at that point? Or just because no one thought it important to do? Cheers, Jeff