Re: Parallel Full Hash Join

Zhihong Yu Tue, 06 Apr 2021 13:53:36 -0700

On Tue, Apr 6, 2021 at 11:59 AM Melanie Plageman <melanieplage...@gmail.com>
wrote:


> On Fri, Apr 2, 2021 at 3:06 PM Zhihong Yu <z...@yugabyte.com> wrote:
> >
> > Hi,
> > For v6-0003-Parallel-Hash-Full-Right-Outer-Join.patch
> >
> > +    * current_chunk_idx: index in current HashMemoryChunk
> >
> > The above comment seems to be better fit for
> ExecScanHashTableForUnmatched(), instead of
> ExecParallelPrepHashTableForUnmatched.
> > I wonder where current_chunk_idx should belong (considering the above
> comment and what the code does).
> >
> > +       while (hashtable->current_chunk_idx <
> hashtable->current_chunk->used)
> > ...
> > +       next = hashtable->current_chunk->next.unshared;
> > +       hashtable->current_chunk = next;
> > +       hashtable->current_chunk_idx = 0;
> >
> > Each time we advance to the next chunk, current_chunk_idx is reset. It
> seems current_chunk_idx can be placed inside chunk.
> > Maybe the consideration is that, with the current formation we save
> space by putting current_chunk_idx field at a higher level.
> > If that is the case, a comment should be added.
> >
>
> Thank you for the review. I think that moving the current_chunk_idx into
> the HashMemoryChunk would probably take up too much space.
>
> Other places that we loop through the tuples in the chunk, we are able
> to just keep a local idx, like here in
> ExecParallelHashIncreaseNumBuckets():
>
> case PHJ_GROW_BUCKETS_REINSERTING:
> ...
>         while ((chunk = ExecParallelHashPopChunkQueue(hashtable,
> &chunk_s)))
>         {
>             size_t        idx = 0;
>
>             while (idx < chunk->used)
>
> but, since we cannot do that while also emitting tuples, I thought,
> let's just stash the index in the hashtable for use in serial hash join
> and the batch accessor for parallel hash join. A comment to this effect
> sounds good to me.
>

>From the way HashJoinTable is used, I don't have better idea w.r.t. the
location of current_chunk_idx.
It is not worth introducing another level of mapping between HashJoinTable
and the chunk index.

So the current formation is fine with additional comment
in ParallelHashJoinBatchAccessor (current comment doesn't explicitly
mention current_chunk_idx).

Cheers

Re: Parallel Full Hash Join

Reply via email to