On Tue, Apr 6, 2021 at 11:59 AM Melanie Plageman <melanieplage...@gmail.com> wrote:
> On Fri, Apr 2, 2021 at 3:06 PM Zhihong Yu <z...@yugabyte.com> wrote: > > > > Hi, > > For v6-0003-Parallel-Hash-Full-Right-Outer-Join.patch > > > > + * current_chunk_idx: index in current HashMemoryChunk > > > > The above comment seems to be better fit for > ExecScanHashTableForUnmatched(), instead of > ExecParallelPrepHashTableForUnmatched. > > I wonder where current_chunk_idx should belong (considering the above > comment and what the code does). > > > > + while (hashtable->current_chunk_idx < > hashtable->current_chunk->used) > > ... > > + next = hashtable->current_chunk->next.unshared; > > + hashtable->current_chunk = next; > > + hashtable->current_chunk_idx = 0; > > > > Each time we advance to the next chunk, current_chunk_idx is reset. It > seems current_chunk_idx can be placed inside chunk. > > Maybe the consideration is that, with the current formation we save > space by putting current_chunk_idx field at a higher level. > > If that is the case, a comment should be added. > > > > Thank you for the review. I think that moving the current_chunk_idx into > the HashMemoryChunk would probably take up too much space. > > Other places that we loop through the tuples in the chunk, we are able > to just keep a local idx, like here in > ExecParallelHashIncreaseNumBuckets(): > > case PHJ_GROW_BUCKETS_REINSERTING: > ... > while ((chunk = ExecParallelHashPopChunkQueue(hashtable, > &chunk_s))) > { > size_t idx = 0; > > while (idx < chunk->used) > > but, since we cannot do that while also emitting tuples, I thought, > let's just stash the index in the hashtable for use in serial hash join > and the batch accessor for parallel hash join. A comment to this effect > sounds good to me. > >From the way HashJoinTable is used, I don't have better idea w.r.t. the location of current_chunk_idx. It is not worth introducing another level of mapping between HashJoinTable and the chunk index. So the current formation is fine with additional comment in ParallelHashJoinBatchAccessor (current comment doesn't explicitly mention current_chunk_idx). Cheers