[
https://issues.apache.org/jira/browse/HIVE-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Namit Jain updated HIVE-3244:
-----------------------------
Status: Open (was: Patch Available)
See my comments.
Can you refresh it once HIVE-3210 is in ?
> Add table property which constraints sorting/bucketing for data loading
> -----------------------------------------------------------------------
>
> Key: HIVE-3244
> URL: https://issues.apache.org/jira/browse/HIVE-3244
> Project: Hive
> Issue Type: Improvement
> Components: Query Processor
> Affects Versions: 0.10.0
> Environment: ubuntu 10.10
> Reporter: Navis
> Assignee: Navis
> Priority: Minor
>
> This ticket is intended to implement "INSERT INTO" to bucketed table.
> With hive.enforce.bucketing option, user can append data to bucketed table.
> But current implementation depends on lexical order of file names for
> determining bucket number of file, which is not always true.
> So if file name is suffixed with bucket number when inserting(moving), it can
> be acquired rightly when it is needed, such as in BucketMapJoinOptimizer.
> With simple prototype codes, which will be attached after writing this, the
> test query
> {noformat}
> create table bucket_test (key int, value string) clustered by (key) sorted by
> (key) into 4 buckets TBLPROPERTIES
> ('FORCEDBUCKETING'='TRUE', 'FORCEDSORTING'='TRUE');
> set hive.optimize.bucketmapjoin = true;
> insert into table bucket_test select key, value from src1;
> explain extended select /*+MAPJOIN(b)*/ * from bucket_test a join bucket_test
> b on a.key=b.key;
> insert into table bucket_test select key, value from src1;
> explain extended select /*+MAPJOIN(b)*/ * from bucket_test a join bucket_test
> b on a.key=b.key;
> {noformat}
> resulted as below
> {noformat}
> 1. first plan
> b {000000_0_[0]=[000000_0_[0]], 000001_0_[1]=[000001_0_[1]],
> 000002_0_[2]=[000002_0_[2]], 000003_0_[3]=[000003_0_[3]]}
> 2. second plan
> b {000000_0_[0]=[000000_0_[0], 000000_0_copy_1_[0]],
> 000000_0_copy_1_[0]=[000000_0_[0], 000000_0_copy_1_[0]],
> 000001_0_[1]=[000001_0_[1], 000001_0_copy_1_[1]],
> 000001_0_copy_1_[1]=[000001_0_[1], 000001_0_copy_1_[1]],
> 000002_0_[2]=[000002_0_[2], 000002_0_copy_1_[2]],
> 000002_0_copy_1_[2]=[000002_0_[2], 000002_0_copy_1_[2]],
> 000003_0_[3]=[000003_0_[3], 000003_0_copy_1_[3]],
> 000003_0_copy_1_[3]=[000003_0_[3], 000003_0_copy_1_[3]]}
> {noformat}
> Currently, I've prevented direct loading via 'LOAD DATA' for forced bucket
> table. But with proper name validation, that could be allowed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira