Igor Kabiljo created HIVE-3541: ---------------------------------- Summary: Allow keeping the bucket order while streaming bucketed table Key: HIVE-3541 URL: https://issues.apache.org/jira/browse/HIVE-3541 Project: Hive Issue Type: Improvement Reporter: Igor Kabiljo Priority: Minor
If we have a bucketed table, for example table_a with columns col_key and col_value (bucketed on col_key), and we need to create new derived bucketed table (by for example SELECT col_key, col_value*2 FROM table a), it would be fastest if it can be done in single streaming map-only job. With specifying: SET hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; we can make sure that each input bucket will be read by exactly one mapper, and that they will output exactly one file. With: SET hive.merge.mapfiles = false; SET hive.merge.mapredfiles = false; SET hive.enforce.bucketing = false; We can make sure those files are inserted as is into the output table. But with that - bucket order is not kept, so end table is not bucketed correctly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira