[jira] [Commented] (HIVE-10880) The bucket number is not respected in insert overwrite.

Yongzhi Chen (JIRA) Wed, 10 Jun 2015 09:35:02 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14580759#comment-14580759
 ]


Yongzhi Chen commented on HIVE-10880:
-------------------------------------

My build uses -Phadoop-2, the error is:
{noformat}
INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 3.693s
[INFO] Finished at: Tue Jun 09 17:00:24 EDT 2015
[INFO] Final Memory: 26M/310M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project hive-shims-common: Could not resolve 
dependencies for project org.apache.hive.shims:hive-shims-common:jar:1.2.0: 
Could not find artifact org.apache.hadoop:hadoop-core:jar:2.6.0 in datanucleus 
(http://www.datanucleus.org/downloads/maven2) -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :hive-shims-common

{noformat}
And in run time, it does call into functions in hadoop-core.jar. All my test in 
hadoop-2 env. The problem is that without my fix, hive after insert overwrite 
bucketed table and partition in local mode, the table can not used to do 
bucketmapjoin.sortedmerge because of missing files (always 1 vs. bucket number).

> The bucket number is not respected in insert overwrite.
> -------------------------------------------------------
>
>                 Key: HIVE-10880
>                 URL: https://issues.apache.org/jira/browse/HIVE-10880
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.2.0
>            Reporter: Yongzhi Chen
>            Assignee: Yongzhi Chen
>            Priority: Blocker
>         Attachments: HIVE-10880.1.patch, HIVE-10880.2.patch, 
> HIVE-10880.3.patch
>
>
> When hive.enforce.bucketing is true, the bucket number defined in the table 
> is no longer respected in current master and 1.2. This is a regression.
> Reproduce:
> {noformat}
> CREATE TABLE IF NOT EXISTS buckettestinput( 
> data string 
> ) 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> CREATE TABLE IF NOT EXISTS buckettestoutput1( 
> data string 
> )CLUSTERED BY(data) 
> INTO 2 BUCKETS 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> CREATE TABLE IF NOT EXISTS buckettestoutput2( 
> data string 
> )CLUSTERED BY(data) 
> INTO 2 BUCKETS 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> Then I inserted the following data into the "buckettestinput" table
> firstinsert1 
> firstinsert2 
> firstinsert3 
> firstinsert4 
> firstinsert5 
> firstinsert6 
> firstinsert7 
> firstinsert8 
> secondinsert1 
> secondinsert2 
> secondinsert3 
> secondinsert4 
> secondinsert5 
> secondinsert6 
> secondinsert7 
> secondinsert8
> set hive.enforce.bucketing = true; 
> set hive.enforce.sorting=true;
> insert overwrite table buckettestoutput1 
> select * from buckettestinput where data like 'first%';
> set hive.auto.convert.sortmerge.join=true; 
> set hive.optimize.bucketmapjoin = true; 
> set hive.optimize.bucketmapjoin.sortedmerge = true; 
> select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data);
> Error: Error while compiling statement: FAILED: SemanticException [Error 
> 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
> bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
> of buckets for table buckettestoutput1 is 2, whereas the number of files is 1 
> (state=42000,code=10141)
> {noformat}
> The related debug information related to insert overwrite:
> {noformat}
> 0: jdbc:hive2://localhost:10000> insert overwrite table buckettestoutput1 
> select * from buckettestinput where data like 'first%'insert overwrite table 
> buckettestoutput1 
> 0: jdbc:hive2://localhost:10000> ;
> select * from buckettestinput where data like ' 
> first%';
> INFO  : Number of reduce tasks determined at compile time: 2
> INFO  : In order to change the average load for a reducer (in bytes):
> INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
> INFO  : In order to limit the maximum number of reducers:
> INFO  :   set hive.exec.reducers.max=<number>
> INFO  : In order to set a constant number of reducers:
> INFO  :   set mapred.reduce.tasks=<number>
> INFO  : Job running in-process (local Hadoop)
> INFO  : 2015-06-01 11:09:29,650 Stage-1 map = 86%,  reduce = 100%
> INFO  : Ended Job = job_local107155352_0001
> INFO  : Loading data to table default.buckettestoutput1 from 
> file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-10000
> INFO  : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4, 
> totalSize=52, rawDataSize=48]
> No rows affected (1.692 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10880) The bucket number is not respected in insert overwrite.

Reply via email to