leanken commented on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-662931053
update notes @cloud-fan @agrawaldevesh @maropu 1. isInputEmpty rename to isOriginInputEmpty 2. anyNullKeyExists rename to allNullColumnKeyExistsInOriginInput 3. And these two attribute to be placed inside LongHashedRelation and UnsafeHashedRelation, putting them in ByteToByteMap and LongToUnsafeRowMap is making less sense 4. add isNullAware mode in HashedRelationBroadcastMode, with isNullAware = true, during building relation from Iterator[InternalRow], it will not skip any null key rows, this was a prepare step for supporting multi-column 5. USE BHJ as only implementation and delete BroadcastNullAwareLeftAntiHashJoinExec 6. add allNull method inside UnsafeRow 7. extract common pattern from single and multi not in as follow: * isOriginInputEmpty => return all rows * allNullColumnKeyExistsInOriginInput => reject all rows * if streamedSideRow.allNull is true => drop the row * if streamedSideRow.allNull is false & findMatch in NullAwareHashedRelation => drop the row * if streamedSideRow.allNull is false & notFindMatch in NullAwareHashedRelation => return the row ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org