On Mon, Apr 6, 2020 at 7:31 PM Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: > > On Mon, Apr 06, 2020 at 07:09:11PM -0400, James Coleman wrote: > >On Mon, Apr 6, 2020 at 6:13 PM Tomas Vondra > ><tomas.von...@2ndquadrant.com> wrote: > >> > >> On Mon, Apr 06, 2020 at 05:47:48PM -0400, James Coleman wrote: > >> >On Mon, Apr 6, 2020 at 5:40 PM Tomas Vondra > >> ><tomas.von...@2ndquadrant.com> wrote: > >> >> > >> >> On Mon, Apr 06, 2020 at 11:12:32PM +0200, Tomas Vondra wrote: > >> >> >On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote: > >> >> >>On 2020-Apr-06, Tom Lane wrote: > >> >> >> > >> >> >>>Locally, things pass without force_parallel_mode, but turning it on > >> >> >>>produces failures that look similar to rhinoceros's (didn't examine > >> >> >>>other BF members). > >> >> >> > >> >> >>FWIW I looked at the eight failures there were about fifteen minutes > >> >> >>ago > >> >> >>and they were all identical. I can confirm that, in my laptop, the > >> >> >>tests work without that GUC, and fail in exactly that way with it. > >> >> >> > >> >> > > >> >> >Yes, there's a thinko in show_incremental_sort_info() and it returns > >> >> >too > >> >> >soon. I'll push a fix in a minute. > >> >> > > >> >> > >> >> OK, I've pushed a fix - this should make the buildfarm happy again. > >> >> > >> >> It however seems to me a bit more needs to be done. The fix makes > >> >> show_incremental_sort_info closer to show_sort_info, but not entirely > >> >> because IncrementalSortState does not have sort_Done flag so it still > >> >> depends on (fullsortGroupInfo->groupCount > 0). I haven't noticed that > >> >> before, but not having that flag seems a bit weird to me. > >> >> > >> >> It also seems possibly incorrect - we may end up with > >> >> > >> >> fullsortGroupInfo->groupCount == 0 > >> >> prefixsortGroupInfo->groupCount > 0 > >> >> > >> >> but we won't print anything. > >> > > >> >This shouldn't ever be possible, because the only way we get any > >> >prefix groups at all is if we've already sorted a full sort group > >> >during the mode transition. > >> > > >> >> James, any opinion on this? I'd say we should restore the sort_Done flag > >> >> and make it work as in plain Sort. Or some comment explaining why > >> >> depending on the counts is OK (assuming it is). > >> > > >> >There's previous email traffic on this thread about that (I can look > >> >it up later this evening), but the short of it is that I believe that > >> >relying on the group count is actually more correct than a sort_Done > >> >flag in the case of incremental sort (in contrast to regular sort). > >> > > >> > >> OK. Maybe we should add a comment to explain.c saying it's OK. > >> > >> I've pushed a fix for failures due to different planned workers (in the > >> test I added to show changes due to add_partial_path tweaks). > >> > >> It seems we're not out of the woods yet, though. rhinoceros and > >> sidewinder failed with something like this: > >> > >> Sort Method: quicksort Memory: NNkB > >> + Sort Method: unknown Disk: NNkB > >> > >> Would you mind investigating at it? > > > >I assume that means those build farm members run with very low > >work_mem? Is it an acceptable fix to adjust work_mem up a bit just for > >these tests? Or is that bad practice and these are to expose issues > >with changing into disk sort mode? > > > > I don't think so - I don't see any work_mem changes in the config - see > the extra_config at the beginning of the page with details: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&dt=2020-04-06%2023%3A00%3A16 > > Moreover, this seems to be in regular Sort, not Incremental Sort and it > very much seems like it gets confused to print a worker info because the > only way for Sort to print two "Sort Method" lines seems to be to enter > either both > > if (sortstate->sort_Done && sortstate->tuplesortstate != NULL) > { > ... print leader info ... > } > > and > > if (sortstate->shared_info != NULL) > { > for (n = 0; n < sortstate->shared_info->num_workers; n++) > { > ... print worker info ... > } > } > > or maybe there are two workers? It's strange ... > > > It doesn't seem to be particularly platform-specific, but I've been > unable to reproduce it so far. It seems on older gcc versions, though.
I haven't been able to reproduce it, but I'm 99% confident this will fix it: - if (sinstrument->sortMethod == SORT_TYPE_STILL_IN_PROGRESS) + if (sinstrument->sortMethod == SORT_TYPE_STILL_IN_PROGRESS + || sinstrument->sortMethod == NULL) continue; /* ignore any unfilled slots */ Earlier we'd had this discussion about why SORT_TYPE_STILL_IN_PROGRESS was explicitly set to 0 in the enum declaration. Since there was no comment, we changed that, but here I believe that show_sort_info was relying on that as an indicator that a worker didn't actually do any work (since the DSM for the sort node gets set to all zeros, this would work). I'm not sure if the SORT_TYPE_STILL_IN_PROGRESS case is actually still needed, though. I've attached both a fix for this issue and a comment for the full/prefix sort group if blocks. James
From 963cb6ed7d27cd7112c4fc4b4fb138b63edf087e Mon Sep 17 00:00:00 2001 From: James Coleman <jtc...@gmail.com> Date: Mon, 6 Apr 2020 20:45:57 -0400 Subject: [PATCH v1 2/2] Comment show_incremental_sort_info assumtions It's not immediately obvious when reading this code that ignoring the prefix group info is correct if there are no full groups, so add a comment explaining the rationale. --- src/backend/commands/explain.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index b1a20eba27..b8fd542c9b 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -2884,6 +2884,15 @@ show_incremental_sort_info(IncrementalSortState *incrsortstate, if (!es->analyze) return; + /* + * Since we never have any prefix groups unless we've first sorted a full + * groups and transitioned modes (copying the tuples into a prefix group), + * we don't need to do anything if there were 0 full groups. + * + * We still have to continue after this block if there are no full groups, + * though, since it's possible that we have workers that did real work even + * if the leader didn't participate. + */ if (fullsortGroupInfo->groupCount > 0) { show_incremental_sort_group_info(fullsortGroupInfo, "Full-sort", true, es); @@ -2915,6 +2924,13 @@ show_incremental_sort_info(IncrementalSortState *incrsortstate, */ fullsortGroupInfo = &incsort_info->fullsortGroupInfo; prefixsortGroupInfo = &incsort_info->prefixsortGroupInfo; + + /* + * Since we never have any prefix groups unless we've first sorted + * a full groups and transitioned modes (copying the tuples into a + * prefix group), we don't need to do anything if there were 0 full + * groups. + */ if (fullsortGroupInfo->groupCount == 0 && prefixsortGroupInfo->groupCount == 0) continue; -- 2.17.1
From b1c0f842f46943de0035e4658e641a20522951fa Mon Sep 17 00:00:00 2001 From: James Coleman <jtc...@gmail.com> Date: Mon, 6 Apr 2020 20:38:18 -0400 Subject: [PATCH v1 1/2] Don't show worker info for sort node if no work done In d2d8a229bc58a2014dce1c7a4fcdb6c5ab9fb8da we modified the TuplesortMethod enum to be a bitmask. As such, SORT_TYPE_STILL_IN_PROGRESS is now set to one. It'd previously been explicitly set to 0, but with no comment, so that seemed reasonable. However it seems that explain.c's show_sort_info method was using that 0 as a sentinel that the worker hadn't done any work, and thus it shouldn't be printed (the worker's DSM is initialized to all zeros, so this worked properly). Instead it should explicitly check for NULL. --- src/backend/commands/explain.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index cad10662bb..b1a20eba27 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -2717,7 +2717,8 @@ show_sort_info(SortState *sortstate, ExplainState *es) long spaceUsed; sinstrument = &sortstate->shared_info->sinstrument[n]; - if (sinstrument->sortMethod == SORT_TYPE_STILL_IN_PROGRESS) + if (sinstrument->sortMethod == SORT_TYPE_STILL_IN_PROGRESS + || sinstrument->sortMethod == NULL) continue; /* ignore any unfilled slots */ sortMethod = tuplesort_method_name(sinstrument->sortMethod); spaceType = tuplesort_space_type_name(sinstrument->spaceType); -- 2.17.1