[ 
https://issues.apache.org/jira/browse/HIVE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247985#comment-13247985
 ] 

alex gemini commented on HIVE-1721:
-----------------------------------

I'm wondering how we apply bloom filter to big table.we use map side join for 
small table < 25M, if we use bloom filter build small table,we maybe can 
increase small table size to 200M, but in big table map stage,we need to read 
bloom filter and writer intermediate result back to disk and then reading this 
intermediate result to check the real small table,we still can't hold the 
actual real small table into memory(correct the logic if I'm wrong),we pay the 
cost of writer a intermediate result which is very close to final result.In 
this case we can't increase the map number because it will double the penalty 
of io.I guess it will only get benefit in three table join on same join key,one 
small with 2 big.In my opinion the other db system can get benefit of bloom 
filter is because they can hold the intermediate result in memory for further 
processing (like oracle) or print it immediate (like hbase).
                
> use bloom filters to improve the performance of joins
> -----------------------------------------------------
>
>                 Key: HIVE-1721
>                 URL: https://issues.apache.org/jira/browse/HIVE-1721
>             Project: Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>              Labels: gsoc, gsoc2012, optimization
>
> In case of map-joins, it is likely that the big table will not find many 
> matching rows from the small table.
> Currently, we perform a hash-map lookup for every row in the big table, which 
> can be pretty expensive.
> It might be useful to try out a bloom-filter containing all the elements in 
> the small table.
> Each element from the big table is first searched in the bloom filter, and 
> only in case of a positive match,
> the small table hash table is explored.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to