[ https://issues.apache.org/jira/browse/HIVE-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423688#comment-13423688 ]
Namit Jain commented on HIVE-2845: ---------------------------------- Mahasa asked me the following::: In the following query: SELECT col_list FROM A JOIN B ON (A.col1 = B.col1) 1. Since all tables are buffered except for the last one which is streamed, it is only table B that can make use of the index, am I right? >>>>> No, indexes can be used for both the tables if there are filters on any >>>>> of the tables. It would be after the tablescan for either A or B. 2. In order to do this, in the mapper, the TS should be done on the index table rather than the base table; what about the reduce stage? Don't you need to have access to base table/index table in the reduce phase too? For applying SEL? >>>>> Yes, the TS would be on the index table for either A or B. There would >>>>> be no change after that -- no change in the reduce phase. 3. As far as I know, filter pushdown and group by use indexes to accelerate the query. Filter pushdown recompiles the re-written query whereas GB only replaces appropriate operators of the operator tree. Which one is more suitable to be inspired to implement HIVE-2845? >>>>> HIVE-2845 requires new changes. Essentially, one of the tables, say A >>>>> would be read completely, and the other one, B would be probed for each >>>>> key of A, or vice versa. 4. May I ask to assign this ticket to me? >>>>> Yes, I dont think anyone is working on it right now. > Add support for index joins in Hive > ----------------------------------- > > Key: HIVE-2845 > URL: https://issues.apache.org/jira/browse/HIVE-2845 > Project: Hive > Issue Type: New Feature > Components: Indexing, Query Processor > Reporter: Namit Jain > Labels: indexing, joins, performance > > Hive supports indexes, which are used for filters currently. > It would be very useful to add support for index-based joins in Hive. > If 2 tables A and B are being joined, and an index exists on the join key of > A, > B can be scanned (by the mappers), and for each row in B, a lookup for the > corresponding row in A can be performed. > This can be very useful for some usecases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira