[GitHub] [spark] leanken commented on pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

GitBox Thu, 23 Jul 2020 03:26:13 -0700


leanken commented on pull request #29104:
URL: https://github.com/apache/spark/pull/29104#issuecomment-662931053



   update notes @cloud-fan @agrawaldevesh @maropu 
   
   1. isInputEmpty rename to isOriginInputEmpty
   2. anyNullKeyExists rename to allNullColumnKeyExistsInOriginInput
   3. And these two attribute to be placed inside LongHashedRelation and 
UnsafeHashedRelation, putting them in ByteToByteMap and LongToUnsafeRowMap is 
making less sense
   4. add isNullAware mode in HashedRelationBroadcastMode, with isNullAware = 
true, during building relation from Iterator[InternalRow], it will not skip any 
null key rows, this was a prepare step for supporting multi-column
   5. USE BHJ as only implementation and delete 
BroadcastNullAwareLeftAntiHashJoinExec
   6. add allNull method inside UnsafeRow
   7. extract common pattern from single and multi not in as follow:
   
   * isOriginInputEmpty => return all rows
   * allNullColumnKeyExistsInOriginInput => reject all rows
   * if streamedSideRow.allNull is true => drop the row
   * if streamedSideRow.allNull is false & findMatch in NullAwareHashedRelation 
=> drop the row
   * if streamedSideRow.allNull is false & notFindMatch in 
NullAwareHashedRelation => return the row


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] leanken commented on pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

Reply via email to