[ https://issues.apache.org/jira/browse/HIVE-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gopal V updated HIVE-4926: -------------------------- Attachment: HIVE-4926-test.tgz Simple self-contained test-case > Queries which specify clustered-by keys as constants will still scan all > buckets > -------------------------------------------------------------------------------- > > Key: HIVE-4926 > URL: https://issues.apache.org/jira/browse/HIVE-4926 > Project: Hive > Issue Type: Improvement > Affects Versions: 0.12.0 > Reporter: Gopal V > Attachments: HIVE-4926-test.tgz > > > When tables are CLUSTERED BY (key) into multiple buckets, a query which > specifies a key in the query predicate will still scan all buckets in the > directory. > In the ideal scenario, only bucket needs to be inspected for a given key, > particularly if hive.enforce.bucketing is turned on. > When a simple filter query like the following is run > {code} > select * from store_sales where ss_item_sk = 1; > {code} > The log files contain > {code} > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file > hdfs://hadoop1.lxc:56565/user/hive/warehouse/hive_bucketed.db/store_sales/000005_0 > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file > hdfs://hadoop1.lxc:56565/user/hive/warehouse/hive_bucketed.db/store_sales/000006_0 > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file > hdfs://hadoop1.lxc:56565/user/hive/warehouse/hive_bucketed.db/store_sales/000007_0 > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file > hdfs://hadoop1.lxc:56565/user/hive/warehouse/hive_bucketed.db/store_sales/000008_0 > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file > hdfs://hadoop1.lxc:56565/user/hive/warehouse/hive_bucketed.db/store_sales/000009_0 > {code} > This is going through 32x the amount of data, compared to the right approach > of scanning only the partitions which match the predicate. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira