On Mon, Mar 27, 2017 at 9:52 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Mon, Mar 27, 2017 at 12:11 PM, David Rowley > <david.row...@2ndquadrant.com> wrote: > > On 28 March 2017 at 04:57, Robert Haas <robertmh...@gmail.com> wrote: > >> On Sat, Mar 25, 2017 at 12:18 PM, Rushabh Lathia > >> <rushabh.lat...@gmail.com> wrote: > >>> About the original issue reported by Tomas, I did more debugging and > >>> found that - problem was gather_merge_clear_slots() was not returning > >>> the clear slot when nreader is zero (means nworkers_launched = 0). > >>> Due to the same scan was continue even all the tuple are exhausted, > >>> and then end up with server crash at gather_merge_getnext(). In the > patch > >>> I also added the Assert into gather_merge_getnext(), about the index > >>> should be less then the nreaders + 1 (leader). > >> > >> Well, you and David Rowley seem to disagree on what the fix is here. > >> His patches posted upthread do A, and yours do B, and from a quick > >> look those things are not just different ways of spelling the same > >> underlying fix, but actually directly conflicting ideas about what the > >> fix should be. Any chance you can review his patches, and maybe he > >> can review yours, and we could try to agree on a consensus position? > >> :-) > > > > When comparing Rushabh's to my minimal patch, both change > > gather_merge_clear_slots() to clear the leader's slot. My fix > > mistakenly changes things so it does ExecInitExtraTupleSlot() on the > > leader's slot, but seems that's not required since > > gather_merge_readnext() sets the leader's slot to the output of > > ExecProcNode(outerPlan). I'd ignore my minimal fix because of that > > mistake. Rushabh's patch sidesteps this by adding a conditional > > pfree() to not free slot that we didn't allocate in the first place. > > > > I do think the code could be improved a bit. I don't like the way > > GatherMergeState's nreaders and nworkers_launched are always the same. > > I think this all threw me off a bit and may have been the reason for > > the bug in the first place. > > Yeah, if those fields are always the same, then I think that they > should be merged. That seems undeniably a good idea. Hmm I agree that it's good idea, and I will work on that as separate patch. > Possibly we > should fix the crash bug first, though, and then do that afterwards. > What bugs me a little about Rushabh's fix is that it looks like magic. > You have to know that we're looping over two things and freeing them > up, but there's one more of one thing than the other thing. I think > that at least needs some comments or something. > > So in my second version of patch I change gather_merge_clear_slots() to just clear the slot for the worker and some other clean up. Also throwing NULL from gather_merge_getnext() when all the queues and heap are exhausted - which earlier gather_merge_clear_slots() was returning clear slot. This way we make sure that we don't run over freeing the slot for the leader and gather_merge_getnext() don't need to depend on that clear slot. Thanks, Rushabh Lathia www.EnterpriseDB.com