Namit Jain created HIVE-3502:
--------------------------------
Summary: design efficient bucketing techniques
Key: HIVE-3502
URL: https://issues.apache.org/jira/browse/HIVE-3502
Project: Hive
Issue Type: New Feature
Components: Query Processor
Reporter: Namit Jain
Currently, the bucketing techniques are fairly expensive - The bucketing keys
have to be the same as the reduction keys and the process of bucketization
requires
a fully blown map-reduce job.
It should be possible to perform a map-side bucketization. The high level idea
is
to shard the data based on the number of buckets, and create a sub-directory
for each
bucket. Then, the data from all the mappers (in the same sub-directory) can be
merged.
So, instead of having 1 file per directory, it would lead to 1 directory per
directory.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira