[GitHub] [spark] ulysses-you commented on pull request #35789: [SPARK-32268][SQL] Row-level Runtime Filtering

GitBox Wed, 09 Mar 2022 21:24:15 -0800


ulysses-you commented on pull request #35789:
URL: https://github.com/apache/spark/pull/35789#issuecomment-1063664600



   > I have a question: why do we need Semi-Join if we have Bloom Filter?
   
   I guess it is a trade-off between benifits and costs. BloomFilter has false 
positives issue and it get worse if the data set is large. So if the creation 
side (from the design docs) is small enough which can be broadcast, we can use 
semi-join to get more benifits with less cost since it is accuracy. And It is 
something like dpp did.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ulysses-you commented on pull request #35789: [SPARK-32268][SQL] Row-level Runtime Filtering

Reply via email to