Hi,

On 2017-12-31 02:51:26 +1300, Thomas Munro wrote:
> You mentioned that prairiedog sees the problem about one time in
> thirty.  Would you mind checking if it goes away with this patch
> applied?
> 
> -- 
> Thomas Munro
> http://www.enterprisedb.com

> From cbed027275039cc5debf8db89342a133a831c9ca Mon Sep 17 00:00:00 2001
> From: Thomas Munro <thomas.mu...@enterprisedb.com>
> Date: Sun, 31 Dec 2017 02:03:07 +1300
> Subject: [PATCH] Fix EXPLAIN ANALYZE output for Parallel Hash.
> 
> In a race case, EXPLAIN ANALYZE could fail to display correct nbatch and size
> information.  Refactor so that participants report only on batches they worked
> on rather than trying to report on all of them, and teach explain.c to
> consider the HashInstrumentation object from all participants instead of
> picking the first one it can find.  This should fix an occasional build farm
> failure in the "join" regression test.

This seems buggy independent of whether it fixes the issue on prairedog,
right? So I'm inclined to go ahead and just fix it...


> +     /*
> +      * Merge results from workers.  In the parallel-oblivious case, the
> +      * results from all participants should be identical, except where
> +      * participants didn't run the join at all so have no data.  In the
> +      * parallel-aware case, we need to aggregate the results.  Each worker 
> may
> +      * have seen a different subset of batches and we want to report the 
> peak
> +      * memory usage across all batches.
> +      */

It's not necessarily the peak though, right? The largest batches might
not be read in at the same time. I'm fine with approximating it as such,
just want to make sure I understand.


> +     if (hashstate->shared_info)
>       {
>               SharedHashInfo *shared_info = hashstate->shared_info;
>               int             i;
>  
> -             /* Find the first worker that built a hash table. */
>               for (i = 0; i < shared_info->num_workers; ++i)
>               {
> -                     if (shared_info->hinstrument[i].nbatch > 0)
> +                     HashInstrumentation *worker_hi = 
> &shared_info->hinstrument[i];
> +
> +                     if (worker_hi->nbatch > 0)
>                       {
> -                             hinstrument = &shared_info->hinstrument[i];
> -                             break;
> +                             /*
> +                              * Every participant should agree on the 
> buckets, so to be
> +                              * sure we have a value we'll just overwrite 
> each time.
> +                              */
> +                             hinstrument.nbuckets = worker_hi->nbuckets;
> +                             hinstrument.nbuckets_original = 
> worker_hi->nbuckets_original;
> +                             /*
> +                              * Normally every participant should agree on 
> the number of
> +                              * batches too, but it's possible for a backend 
> that started
> +                              * late and missed the whole join not to have 
> the final nbatch
> +                              * number.  So we'll take the largest number.
> +                              */
> +                             hinstrument.nbatch = Max(hinstrument.nbatch, 
> worker_hi->nbatch);
> +                             hinstrument.nbatch_original = 
> worker_hi->nbatch_original;
> +                             /*
> +                              * In a parallel-aware hash join, for now we 
> report the
> +                              * maximum peak memory reported by any worker.
> +                              */
> +                             hinstrument.space_peak =
> +                                     Max(hinstrument.space_peak, 
> worker_hi->space_peak);

I bet pgindent will not like this layout.

Ho hum. Is this really, as you say above, an "aggregate the results"?


Greetings,

Andres Freund

Reply via email to