[
https://issues.apache.org/jira/browse/HIVE-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423688#comment-13423688
]
Namit Jain commented on HIVE-2845:
----------------------------------
Mahasa asked me the following:::
In the following query:
SELECT col_list FROM A JOIN B ON (A.col1 = B.col1)
1. Since all tables are buffered except for the last one which is streamed, it
is only table B that can make use of the index, am I right?
>>>>> No, indexes can be used for both the tables if there are filters on any
>>>>> of the tables. It would be after the tablescan for either A or B.
2. In order to do this, in the mapper, the TS should be done on the index table
rather than the base table; what about the reduce stage? Don't you need to have
access to base table/index table in the reduce phase too? For applying SEL?
>>>>> Yes, the TS would be on the index table for either A or B. There would
>>>>> be no change after that -- no change in the reduce phase.
3. As far as I know, filter pushdown and group by use indexes to accelerate the
query. Filter pushdown recompiles the re-written query whereas GB only replaces
appropriate operators of the operator tree. Which one is more suitable to be
inspired to implement HIVE-2845?
>>>>> HIVE-2845 requires new changes. Essentially, one of the tables, say A
>>>>> would be read completely, and the other one, B would be probed for each
>>>>> key of A,
or vice versa.
4. May I ask to assign this ticket to me?
>>>>> Yes, I dont think anyone is working on it right now.
> Add support for index joins in Hive
> -----------------------------------
>
> Key: HIVE-2845
> URL: https://issues.apache.org/jira/browse/HIVE-2845
> Project: Hive
> Issue Type: New Feature
> Components: Indexing, Query Processor
> Reporter: Namit Jain
> Labels: indexing, joins, performance
>
> Hive supports indexes, which are used for filters currently.
> It would be very useful to add support for index-based joins in Hive.
> If 2 tables A and B are being joined, and an index exists on the join key of
> A,
> B can be scanned (by the mappers), and for each row in B, a lookup for the
> corresponding row in A can be performed.
> This can be very useful for some usecases.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira