Thank you for the comment. Yes, I agree the alternative of using '(!parallel)', 
so that no need to test the bit. Will someone submit patch to for it 
accordingly?

-----Original Message-----
From: Thomas Munro <thomas.mu...@gmail.com> 
Sent: Thursday, January 9, 2020 6:04 PM
To: Deng, Gang <gang.d...@intel.com>
Cc: pgsql-hack...@postgresql.org
Subject: Re: [PATCH] Resolve Parallel Hash Join Performance Issue

On Thu, Jan 9, 2020 at 10:04 PM Deng, Gang <gang.d...@intel.com> wrote:
> Attached is a patch to resolve parallel hash join performance issue. This is 
> my first time to contribute patch to PostgreSQL community, I referred one of 
> previous thread as template to report the issue and patch. Please let me know 
> if need more information of the problem and patch.

Thank you very much for investigating this and for your report.

>         HeapTupleHeaderSetMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple));
>
>     changed to:
>
>         if 
> (!HeapTupleHeaderHasMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple)))
>
>         {
>
>             
> HeapTupleHeaderSetMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple));
>
>         }
>
>     Compared with original code, modified code can avoid unnecessary write to 
> memory/cache.

Right, I see.  The funny thing is that the match bit is not even used in this 
query (it's used for right and full hash join, and those aren't supported for 
parallel joins yet).  Hmm.  So, instead of the test you proposed, an 
alternative would be to use if (!parallel).
That's a value that will be constant-folded, so that there will be no branch in 
the generated code (see the pg_attribute_always_inline trick).  If, in a future 
release, we need the match bit for parallel hash join because we add parallel 
right/full hash join support, we could do it the way you showed, but only if 
it's one of those join types, using another constant parameter.

> D. Result
>
> With the modified code, performance of hash join operation can scale better 
> with number of threads. Here is result of query02 after patch. For example, 
> performance improved ~2.5x when run 28 threads.
>
> number of thread:    1       4        8     16    28
> time used(sec):    465.1  193.1   97.9   55.9  41

Wow.  That is a very nice improvement.

Reply via email to