michalursa opened a new pull request, #13686: URL: https://github.com/apache/arrow/pull/13686
Hash join implementation using HashJoinBasicImpl class was missing initialization in case of no batches one the build side. Initialization of a few data structures, mainly two RowEncoder instances for holding key and payload columns for rows on build side, was missing inside BuildHashTable_exec_task, the method responsible for transforming accumulated batches on build side of the hash join into a hash table. The initialization of RowEncoder inserts a single special row containing null values for all columns. This special row is accessed when outputting probe side rows with no matches in case of left outer and full outer join (these joins are supposed in that case to output nulls in place of all fields that would come from build side). Interestingly, the initialization was present in a similar case when batches were present on build side but all of them included zero rows. I modified the code to use the same code path for both these logically equivalent cases: a) zero build side batches and b) non-zero batches but with zero rows each. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
