[jira] [Created] (HIVE-3541) Allow keeping the bucket order while streaming bucketed table

Igor Kabiljo (JIRA) Fri, 05 Oct 2012 18:34:05 -0700

Igor Kabiljo created HIVE-3541:
----------------------------------

             Summary: Allow keeping the bucket order while streaming bucketed 
table
                 Key: HIVE-3541
                 URL: https://issues.apache.org/jira/browse/HIVE-3541
             Project: Hive
          Issue Type: Improvement
            Reporter: Igor Kabiljo
            Priority: Minor



If we have a bucketed table, for example table_a with columns col_key and 
col_value (bucketed on col_key), and we need to create new derived bucketed 
table (by for example SELECT col_key, col_value*2 FROM table a), it would be 
fastest if it can be done in single streaming map-only job. 

With specifying:
SET hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
we can make sure that each input bucket will be read by exactly one mapper, and 
that they will output exactly one file. With:
SET hive.merge.mapfiles = false;
SET hive.merge.mapredfiles = false;
SET hive.enforce.bucketing = false;
We can make sure those files are inserted as is into the output table. 
But with that - bucket order is not kept, so end table is not bucketed 
correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-3541) Allow keeping the bucket order while streaming bucketed table

Reply via email to