[jira] [Commented] (HIVE-10880) The bucket number is not respected in insert overwrite.

2015-08-07 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661854#comment-14661854
 ] 

Yongzhi Chen commented on HIVE-10880:
-

Thanks [~xuefuz] for reviewing the code.

 The bucket number is not respected in insert overwrite.
 ---

 Key: HIVE-10880
 URL: https://issues.apache.org/jira/browse/HIVE-10880
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Critical
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-10880.1.patch, HIVE-10880.2.patch, 
 HIVE-10880.3.patch, HIVE-10880.4.patch


 When hive.enforce.bucketing is true, the bucket number defined in the table 
 is no longer respected in current master and 1.2. 
 Reproduce:
 {code:sql}
 CREATE TABLE IF NOT EXISTS buckettestinput( 
 data string 
 ) 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput1( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput2( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 {code}
 Then I inserted the following data into the buckettestinput table:
 {noformat}
 firstinsert1 
 firstinsert2 
 firstinsert3 
 firstinsert4 
 firstinsert5 
 firstinsert6 
 firstinsert7 
 firstinsert8 
 secondinsert1 
 secondinsert2 
 secondinsert3 
 secondinsert4 
 secondinsert5 
 secondinsert6 
 secondinsert7 
 secondinsert8
 {noformat}
 {code:sql}
 set hive.enforce.bucketing = true; 
 set hive.enforce.sorting=true;
 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%';
 set hive.auto.convert.sortmerge.join=true; 
 set hive.optimize.bucketmapjoin = true; 
 set hive.optimize.bucketmapjoin.sortedmerge = true; 
 select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data);
 {code}
 {noformat}
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
 bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
 of buckets for table buckettestoutput1 is 2, whereas the number of files is 1 
 (state=42000,code=10141)
 {noformat}
 The related debug information related to insert overwrite:
 {noformat}
 0: jdbc:hive2://localhost:1 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%'insert overwrite table 
 buckettestoutput1 
 0: jdbc:hive2://localhost:1 ;
 select * from buckettestinput where data like ' 
 first%';
 INFO  : Number of reduce tasks determined at compile time: 2
 INFO  : In order to change the average load for a reducer (in bytes):
 INFO  :   set hive.exec.reducers.bytes.per.reducer=number
 INFO  : In order to limit the maximum number of reducers:
 INFO  :   set hive.exec.reducers.max=number
 INFO  : In order to set a constant number of reducers:
 INFO  :   set mapred.reduce.tasks=number
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2015-06-01 11:09:29,650 Stage-1 map = 86%,  reduce = 100%
 INFO  : Ended Job = job_local107155352_0001
 INFO  : Loading data to table default.buckettestoutput1 from 
 file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-1
 INFO  : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4, 
 totalSize=52, rawDataSize=48]
 No rows affected (1.692 seconds)
 {noformat}
 Insert use dynamic partition does not have the issue. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10880) The bucket number is not respected in insert overwrite.

2015-08-05 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14655152#comment-14655152
 ] 

Yongzhi Chen commented on HIVE-10880:
-

The failure is not related(it is known issue):
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
Error Message
Table/View 'TXNS' already exists in Schema 'APP'.
[~xuefuz], could you review the change?  Thanks

The patch is fixing following issue:
In local mode and when enforce.bucketing is true, for bucket table, insert 
overwrite to table or static partition, bucket number is not respected.
Because only dynamic partition works fine, this fix uses the same idea as how 
to handle the dynamic partition scenario.

It seems that HIVE-11360 has similar issue. 




 The bucket number is not respected in insert overwrite.
 ---

 Key: HIVE-10880
 URL: https://issues.apache.org/jira/browse/HIVE-10880
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Critical
 Attachments: HIVE-10880.1.patch, HIVE-10880.2.patch, 
 HIVE-10880.3.patch, HIVE-10880.4.patch


 When hive.enforce.bucketing is true, the bucket number defined in the table 
 is no longer respected in current master and 1.2. 
 Reproduce:
 {code:sql}
 CREATE TABLE IF NOT EXISTS buckettestinput( 
 data string 
 ) 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput1( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput2( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 {code}
 Then I inserted the following data into the buckettestinput table:
 {noformat}
 firstinsert1 
 firstinsert2 
 firstinsert3 
 firstinsert4 
 firstinsert5 
 firstinsert6 
 firstinsert7 
 firstinsert8 
 secondinsert1 
 secondinsert2 
 secondinsert3 
 secondinsert4 
 secondinsert5 
 secondinsert6 
 secondinsert7 
 secondinsert8
 {noformat}
 {code:sql}
 set hive.enforce.bucketing = true; 
 set hive.enforce.sorting=true;
 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%';
 set hive.auto.convert.sortmerge.join=true; 
 set hive.optimize.bucketmapjoin = true; 
 set hive.optimize.bucketmapjoin.sortedmerge = true; 
 select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data);
 {code}
 {noformat}
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
 bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
 of buckets for table buckettestoutput1 is 2, whereas the number of files is 1 
 (state=42000,code=10141)
 {noformat}
 The related debug information related to insert overwrite:
 {noformat}
 0: jdbc:hive2://localhost:1 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%'insert overwrite table 
 buckettestoutput1 
 0: jdbc:hive2://localhost:1 ;
 select * from buckettestinput where data like ' 
 first%';
 INFO  : Number of reduce tasks determined at compile time: 2
 INFO  : In order to change the average load for a reducer (in bytes):
 INFO  :   set hive.exec.reducers.bytes.per.reducer=number
 INFO  : In order to limit the maximum number of reducers:
 INFO  :   set hive.exec.reducers.max=number
 INFO  : In order to set a constant number of reducers:
 INFO  :   set mapred.reduce.tasks=number
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2015-06-01 11:09:29,650 Stage-1 map = 86%,  reduce = 100%
 INFO  : Ended Job = job_local107155352_0001
 INFO  : Loading data to table default.buckettestoutput1 from 
 file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-1
 INFO  : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4, 
 totalSize=52, rawDataSize=48]
 No rows affected (1.692 seconds)
 {noformat}
 Insert use dynamic partition does not have the issue. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10880) The bucket number is not respected in insert overwrite.

2015-08-05 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658528#comment-14658528
 ] 

Xuefu Zhang commented on HIVE-10880:


Okay. I will take a look shortly.

 The bucket number is not respected in insert overwrite.
 ---

 Key: HIVE-10880
 URL: https://issues.apache.org/jira/browse/HIVE-10880
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Critical
 Attachments: HIVE-10880.1.patch, HIVE-10880.2.patch, 
 HIVE-10880.3.patch, HIVE-10880.4.patch


 When hive.enforce.bucketing is true, the bucket number defined in the table 
 is no longer respected in current master and 1.2. 
 Reproduce:
 {code:sql}
 CREATE TABLE IF NOT EXISTS buckettestinput( 
 data string 
 ) 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput1( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput2( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 {code}
 Then I inserted the following data into the buckettestinput table:
 {noformat}
 firstinsert1 
 firstinsert2 
 firstinsert3 
 firstinsert4 
 firstinsert5 
 firstinsert6 
 firstinsert7 
 firstinsert8 
 secondinsert1 
 secondinsert2 
 secondinsert3 
 secondinsert4 
 secondinsert5 
 secondinsert6 
 secondinsert7 
 secondinsert8
 {noformat}
 {code:sql}
 set hive.enforce.bucketing = true; 
 set hive.enforce.sorting=true;
 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%';
 set hive.auto.convert.sortmerge.join=true; 
 set hive.optimize.bucketmapjoin = true; 
 set hive.optimize.bucketmapjoin.sortedmerge = true; 
 select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data);
 {code}
 {noformat}
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
 bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
 of buckets for table buckettestoutput1 is 2, whereas the number of files is 1 
 (state=42000,code=10141)
 {noformat}
 The related debug information related to insert overwrite:
 {noformat}
 0: jdbc:hive2://localhost:1 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%'insert overwrite table 
 buckettestoutput1 
 0: jdbc:hive2://localhost:1 ;
 select * from buckettestinput where data like ' 
 first%';
 INFO  : Number of reduce tasks determined at compile time: 2
 INFO  : In order to change the average load for a reducer (in bytes):
 INFO  :   set hive.exec.reducers.bytes.per.reducer=number
 INFO  : In order to limit the maximum number of reducers:
 INFO  :   set hive.exec.reducers.max=number
 INFO  : In order to set a constant number of reducers:
 INFO  :   set mapred.reduce.tasks=number
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2015-06-01 11:09:29,650 Stage-1 map = 86%,  reduce = 100%
 INFO  : Ended Job = job_local107155352_0001
 INFO  : Loading data to table default.buckettestoutput1 from 
 file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-1
 INFO  : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4, 
 totalSize=52, rawDataSize=48]
 No rows affected (1.692 seconds)
 {noformat}
 Insert use dynamic partition does not have the issue. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10880) The bucket number is not respected in insert overwrite.

2015-08-05 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659360#comment-14659360
 ] 

Xuefu Zhang commented on HIVE-10880:


+1

 The bucket number is not respected in insert overwrite.
 ---

 Key: HIVE-10880
 URL: https://issues.apache.org/jira/browse/HIVE-10880
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Critical
 Attachments: HIVE-10880.1.patch, HIVE-10880.2.patch, 
 HIVE-10880.3.patch, HIVE-10880.4.patch


 When hive.enforce.bucketing is true, the bucket number defined in the table 
 is no longer respected in current master and 1.2. 
 Reproduce:
 {code:sql}
 CREATE TABLE IF NOT EXISTS buckettestinput( 
 data string 
 ) 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput1( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput2( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 {code}
 Then I inserted the following data into the buckettestinput table:
 {noformat}
 firstinsert1 
 firstinsert2 
 firstinsert3 
 firstinsert4 
 firstinsert5 
 firstinsert6 
 firstinsert7 
 firstinsert8 
 secondinsert1 
 secondinsert2 
 secondinsert3 
 secondinsert4 
 secondinsert5 
 secondinsert6 
 secondinsert7 
 secondinsert8
 {noformat}
 {code:sql}
 set hive.enforce.bucketing = true; 
 set hive.enforce.sorting=true;
 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%';
 set hive.auto.convert.sortmerge.join=true; 
 set hive.optimize.bucketmapjoin = true; 
 set hive.optimize.bucketmapjoin.sortedmerge = true; 
 select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data);
 {code}
 {noformat}
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
 bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
 of buckets for table buckettestoutput1 is 2, whereas the number of files is 1 
 (state=42000,code=10141)
 {noformat}
 The related debug information related to insert overwrite:
 {noformat}
 0: jdbc:hive2://localhost:1 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%'insert overwrite table 
 buckettestoutput1 
 0: jdbc:hive2://localhost:1 ;
 select * from buckettestinput where data like ' 
 first%';
 INFO  : Number of reduce tasks determined at compile time: 2
 INFO  : In order to change the average load for a reducer (in bytes):
 INFO  :   set hive.exec.reducers.bytes.per.reducer=number
 INFO  : In order to limit the maximum number of reducers:
 INFO  :   set hive.exec.reducers.max=number
 INFO  : In order to set a constant number of reducers:
 INFO  :   set mapred.reduce.tasks=number
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2015-06-01 11:09:29,650 Stage-1 map = 86%,  reduce = 100%
 INFO  : Ended Job = job_local107155352_0001
 INFO  : Loading data to table default.buckettestoutput1 from 
 file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-1
 INFO  : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4, 
 totalSize=52, rawDataSize=48]
 No rows affected (1.692 seconds)
 {noformat}
 Insert use dynamic partition does not have the issue. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10880) The bucket number is not respected in insert overwrite.

2015-08-04 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14653693#comment-14653693
 ] 

Yongzhi Chen commented on HIVE-10880:
-

Ran the spark tests on my local machine, they passed. Re-Attach the patch. 

 The bucket number is not respected in insert overwrite.
 ---

 Key: HIVE-10880
 URL: https://issues.apache.org/jira/browse/HIVE-10880
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Critical
 Attachments: HIVE-10880.1.patch, HIVE-10880.2.patch, 
 HIVE-10880.3.patch, HIVE-10880.4.patch


 When hive.enforce.bucketing is true, the bucket number defined in the table 
 is no longer respected in current master and 1.2. 
 Reproduce:
 {code:sql}
 CREATE TABLE IF NOT EXISTS buckettestinput( 
 data string 
 ) 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput1( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput2( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 {code}
 Then I inserted the following data into the buckettestinput table:
 {noformat}
 firstinsert1 
 firstinsert2 
 firstinsert3 
 firstinsert4 
 firstinsert5 
 firstinsert6 
 firstinsert7 
 firstinsert8 
 secondinsert1 
 secondinsert2 
 secondinsert3 
 secondinsert4 
 secondinsert5 
 secondinsert6 
 secondinsert7 
 secondinsert8
 {noformat}
 {code:sql}
 set hive.enforce.bucketing = true; 
 set hive.enforce.sorting=true;
 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%';
 set hive.auto.convert.sortmerge.join=true; 
 set hive.optimize.bucketmapjoin = true; 
 set hive.optimize.bucketmapjoin.sortedmerge = true; 
 select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data);
 {code}
 {noformat}
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
 bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
 of buckets for table buckettestoutput1 is 2, whereas the number of files is 1 
 (state=42000,code=10141)
 {noformat}
 The related debug information related to insert overwrite:
 {noformat}
 0: jdbc:hive2://localhost:1 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%'insert overwrite table 
 buckettestoutput1 
 0: jdbc:hive2://localhost:1 ;
 select * from buckettestinput where data like ' 
 first%';
 INFO  : Number of reduce tasks determined at compile time: 2
 INFO  : In order to change the average load for a reducer (in bytes):
 INFO  :   set hive.exec.reducers.bytes.per.reducer=number
 INFO  : In order to limit the maximum number of reducers:
 INFO  :   set hive.exec.reducers.max=number
 INFO  : In order to set a constant number of reducers:
 INFO  :   set mapred.reduce.tasks=number
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2015-06-01 11:09:29,650 Stage-1 map = 86%,  reduce = 100%
 INFO  : Ended Job = job_local107155352_0001
 INFO  : Loading data to table default.buckettestoutput1 from 
 file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-1
 INFO  : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4, 
 totalSize=52, rawDataSize=48]
 No rows affected (1.692 seconds)
 {noformat}
 Insert use dynamic partition does not have the issue. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10880) The bucket number is not respected in insert overwrite.

2015-08-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14653235#comment-14653235
 ] 

Hive QA commented on HIVE-10880:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12748527/HIVE-10880.4.patch

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 9320 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_convert_enum_to_string
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynamic_rdd_cache
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_bigdata
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_having
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_insert_into2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_nullgroup2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_join5
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_sample5
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_timestamp_1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_timestamp_lazy
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_transform_ppr2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_11
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_19
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
org.apache.hive.jdbc.TestSSL.testSSLConnectionWithProperty
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4813/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4813/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4813/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12748527 - PreCommit-HIVE-TRUNK-Build

 The bucket number is not respected in insert overwrite.
 ---

 Key: HIVE-10880
 URL: https://issues.apache.org/jira/browse/HIVE-10880
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Critical
 Attachments: HIVE-10880.1.patch, HIVE-10880.2.patch, 
 HIVE-10880.3.patch, HIVE-10880.4.patch


 When hive.enforce.bucketing is true, the bucket number defined in the table 
 is no longer respected in current master and 1.2. 
 Reproduce:
 {code:sql}
 CREATE TABLE IF NOT EXISTS buckettestinput( 
 data string 
 ) 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput1( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput2( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 {code}
 Then I inserted the following data into the buckettestinput table:
 {noformat}
 firstinsert1 
 firstinsert2 
 firstinsert3 
 firstinsert4 
 firstinsert5 
 firstinsert6 
 firstinsert7 
 firstinsert8 
 secondinsert1 
 secondinsert2 
 secondinsert3 
 secondinsert4 
 secondinsert5 
 secondinsert6 
 secondinsert7 
 secondinsert8
 {noformat}
 {code:sql}
 set hive.enforce.bucketing = true; 
 set hive.enforce.sorting=true;
 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%';
 set hive.auto.convert.sortmerge.join=true; 
 set hive.optimize.bucketmapjoin = true; 
 set hive.optimize.bucketmapjoin.sortedmerge = true; 
 select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data);
 {code}
 {noformat}
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
 bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
 of buckets for table buckettestoutput1 is 2, whereas the number of files is 1 
 (state=42000,code=10141)
 {noformat}
 The related debug information related to insert overwrite:
 {noformat}
 0: jdbc:hive2://localhost:1 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%'insert overwrite table 
 buckettestoutput1 
 0: jdbc:hive2://localhost:1 ;
 select * from buckettestinput where data like ' 
 first%';
 INFO  : Number of reduce tasks determined at compile time: 2
 INFO  : In order to change the average load for a reducer (in 

[jira] [Commented] (HIVE-10880) The bucket number is not respected in insert overwrite.

2015-08-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654681#comment-14654681
 ] 

Hive QA commented on HIVE-10880:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12748670/HIVE-10880.4.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9323 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4826/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4826/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4826/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12748670 - PreCommit-HIVE-TRUNK-Build

 The bucket number is not respected in insert overwrite.
 ---

 Key: HIVE-10880
 URL: https://issues.apache.org/jira/browse/HIVE-10880
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Critical
 Attachments: HIVE-10880.1.patch, HIVE-10880.2.patch, 
 HIVE-10880.3.patch, HIVE-10880.4.patch


 When hive.enforce.bucketing is true, the bucket number defined in the table 
 is no longer respected in current master and 1.2. 
 Reproduce:
 {code:sql}
 CREATE TABLE IF NOT EXISTS buckettestinput( 
 data string 
 ) 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput1( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput2( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 {code}
 Then I inserted the following data into the buckettestinput table:
 {noformat}
 firstinsert1 
 firstinsert2 
 firstinsert3 
 firstinsert4 
 firstinsert5 
 firstinsert6 
 firstinsert7 
 firstinsert8 
 secondinsert1 
 secondinsert2 
 secondinsert3 
 secondinsert4 
 secondinsert5 
 secondinsert6 
 secondinsert7 
 secondinsert8
 {noformat}
 {code:sql}
 set hive.enforce.bucketing = true; 
 set hive.enforce.sorting=true;
 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%';
 set hive.auto.convert.sortmerge.join=true; 
 set hive.optimize.bucketmapjoin = true; 
 set hive.optimize.bucketmapjoin.sortedmerge = true; 
 select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data);
 {code}
 {noformat}
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
 bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
 of buckets for table buckettestoutput1 is 2, whereas the number of files is 1 
 (state=42000,code=10141)
 {noformat}
 The related debug information related to insert overwrite:
 {noformat}
 0: jdbc:hive2://localhost:1 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%'insert overwrite table 
 buckettestoutput1 
 0: jdbc:hive2://localhost:1 ;
 select * from buckettestinput where data like ' 
 first%';
 INFO  : Number of reduce tasks determined at compile time: 2
 INFO  : In order to change the average load for a reducer (in bytes):
 INFO  :   set hive.exec.reducers.bytes.per.reducer=number
 INFO  : In order to limit the maximum number of reducers:
 INFO  :   set hive.exec.reducers.max=number
 INFO  : In order to set a constant number of reducers:
 INFO  :   set mapred.reduce.tasks=number
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2015-06-01 11:09:29,650 Stage-1 map = 86%,  reduce = 100%
 INFO  : Ended Job = job_local107155352_0001
 INFO  : Loading data to table default.buckettestoutput1 from 
 file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-1
 INFO  : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4, 
 totalSize=52, rawDataSize=48]
 No rows affected (1.692 seconds)
 {noformat}
 Insert use dynamic partition does not have the issue. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10880) The bucket number is not respected in insert overwrite.

2015-08-03 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652398#comment-14652398
 ] 

Yongzhi Chen commented on HIVE-10880:
-

The patch is fixing following issue:
In local mode and when enforce.bucketing is true, for bucket table, insert 
overwrite to table or static partition, bucket number is not respected. 

Because only dynamic partition works fine, this fix uses the same idea as how 
to handle the dynamic partition scenario.

Attach patch 4 after rebase. 





 The bucket number is not respected in insert overwrite.
 ---

 Key: HIVE-10880
 URL: https://issues.apache.org/jira/browse/HIVE-10880
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Critical
 Attachments: HIVE-10880.1.patch, HIVE-10880.2.patch, 
 HIVE-10880.3.patch


 When hive.enforce.bucketing is true, the bucket number defined in the table 
 is no longer respected in current master and 1.2. 
 Reproduce:
 {code:sql}
 CREATE TABLE IF NOT EXISTS buckettestinput( 
 data string 
 ) 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput1( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput2( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 {code}
 Then I inserted the following data into the buckettestinput table:
 {noformat}
 firstinsert1 
 firstinsert2 
 firstinsert3 
 firstinsert4 
 firstinsert5 
 firstinsert6 
 firstinsert7 
 firstinsert8 
 secondinsert1 
 secondinsert2 
 secondinsert3 
 secondinsert4 
 secondinsert5 
 secondinsert6 
 secondinsert7 
 secondinsert8
 {noformat}
 {code:sql}
 set hive.enforce.bucketing = true; 
 set hive.enforce.sorting=true;
 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%';
 set hive.auto.convert.sortmerge.join=true; 
 set hive.optimize.bucketmapjoin = true; 
 set hive.optimize.bucketmapjoin.sortedmerge = true; 
 select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data);
 {code}
 {noformat}
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
 bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
 of buckets for table buckettestoutput1 is 2, whereas the number of files is 1 
 (state=42000,code=10141)
 {noformat}
 The related debug information related to insert overwrite:
 {noformat}
 0: jdbc:hive2://localhost:1 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%'insert overwrite table 
 buckettestoutput1 
 0: jdbc:hive2://localhost:1 ;
 select * from buckettestinput where data like ' 
 first%';
 INFO  : Number of reduce tasks determined at compile time: 2
 INFO  : In order to change the average load for a reducer (in bytes):
 INFO  :   set hive.exec.reducers.bytes.per.reducer=number
 INFO  : In order to limit the maximum number of reducers:
 INFO  :   set hive.exec.reducers.max=number
 INFO  : In order to set a constant number of reducers:
 INFO  :   set mapred.reduce.tasks=number
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2015-06-01 11:09:29,650 Stage-1 map = 86%,  reduce = 100%
 INFO  : Ended Job = job_local107155352_0001
 INFO  : Loading data to table default.buckettestoutput1 from 
 file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-1
 INFO  : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4, 
 totalSize=52, rawDataSize=48]
 No rows affected (1.692 seconds)
 {noformat}
 Insert use dynamic partition does not have the issue. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10880) The bucket number is not respected in insert overwrite.

2015-06-10 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580759#comment-14580759
 ] 

Yongzhi Chen commented on HIVE-10880:
-

My build uses -Phadoop-2, the error is:
{noformat}
INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 3.693s
[INFO] Finished at: Tue Jun 09 17:00:24 EDT 2015
[INFO] Final Memory: 26M/310M
[INFO] 
[ERROR] Failed to execute goal on project hive-shims-common: Could not resolve 
dependencies for project org.apache.hive.shims:hive-shims-common:jar:1.2.0: 
Could not find artifact org.apache.hadoop:hadoop-core:jar:2.6.0 in datanucleus 
(http://www.datanucleus.org/downloads/maven2) - [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn goals -rf :hive-shims-common

{noformat}
And in run time, it does call into functions in hadoop-core.jar. All my test in 
hadoop-2 env. The problem is that without my fix, hive after insert overwrite 
bucketed table and partition in local mode, the table can not used to do 
bucketmapjoin.sortedmerge because of missing files (always 1 vs. bucket number).

 The bucket number is not respected in insert overwrite.
 ---

 Key: HIVE-10880
 URL: https://issues.apache.org/jira/browse/HIVE-10880
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Blocker
 Attachments: HIVE-10880.1.patch, HIVE-10880.2.patch, 
 HIVE-10880.3.patch


 When hive.enforce.bucketing is true, the bucket number defined in the table 
 is no longer respected in current master and 1.2. This is a regression.
 Reproduce:
 {noformat}
 CREATE TABLE IF NOT EXISTS buckettestinput( 
 data string 
 ) 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput1( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput2( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 Then I inserted the following data into the buckettestinput table
 firstinsert1 
 firstinsert2 
 firstinsert3 
 firstinsert4 
 firstinsert5 
 firstinsert6 
 firstinsert7 
 firstinsert8 
 secondinsert1 
 secondinsert2 
 secondinsert3 
 secondinsert4 
 secondinsert5 
 secondinsert6 
 secondinsert7 
 secondinsert8
 set hive.enforce.bucketing = true; 
 set hive.enforce.sorting=true;
 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%';
 set hive.auto.convert.sortmerge.join=true; 
 set hive.optimize.bucketmapjoin = true; 
 set hive.optimize.bucketmapjoin.sortedmerge = true; 
 select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data);
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
 bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
 of buckets for table buckettestoutput1 is 2, whereas the number of files is 1 
 (state=42000,code=10141)
 {noformat}
 The related debug information related to insert overwrite:
 {noformat}
 0: jdbc:hive2://localhost:1 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%'insert overwrite table 
 buckettestoutput1 
 0: jdbc:hive2://localhost:1 ;
 select * from buckettestinput where data like ' 
 first%';
 INFO  : Number of reduce tasks determined at compile time: 2
 INFO  : In order to change the average load for a reducer (in bytes):
 INFO  :   set hive.exec.reducers.bytes.per.reducer=number
 INFO  : In order to limit the maximum number of reducers:
 INFO  :   set hive.exec.reducers.max=number
 INFO  : In order to set a constant number of reducers:
 INFO  :   set mapred.reduce.tasks=number
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2015-06-01 11:09:29,650 Stage-1 map = 86%,  reduce = 100%
 INFO  : Ended Job = job_local107155352_0001
 INFO  : Loading data to table default.buckettestoutput1 from 
 

[jira] [Commented] (HIVE-10880) The bucket number is not respected in insert overwrite.

2015-06-09 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579119#comment-14579119
 ] 

Yongzhi Chen commented on HIVE-10880:
-

[~xuefuz], I agree with you, there are something more serious than the missing 
files. I think the bucket algorithm is broken. I just tried to insert overwrite 
from a very big table, all the data goes to one bucket too. Seems the hash map 
no longer working. I will try to figure out why. 

 The bucket number is not respected in insert overwrite.
 ---

 Key: HIVE-10880
 URL: https://issues.apache.org/jira/browse/HIVE-10880
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Blocker
 Attachments: HIVE-10880.1.patch, HIVE-10880.2.patch, 
 HIVE-10880.3.patch


 When hive.enforce.bucketing is true, the bucket number defined in the table 
 is no longer respected in current master and 1.2. This is a regression.
 Reproduce:
 {noformat}
 CREATE TABLE IF NOT EXISTS buckettestinput( 
 data string 
 ) 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput1( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput2( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 Then I inserted the following data into the buckettestinput table
 firstinsert1 
 firstinsert2 
 firstinsert3 
 firstinsert4 
 firstinsert5 
 firstinsert6 
 firstinsert7 
 firstinsert8 
 secondinsert1 
 secondinsert2 
 secondinsert3 
 secondinsert4 
 secondinsert5 
 secondinsert6 
 secondinsert7 
 secondinsert8
 set hive.enforce.bucketing = true; 
 set hive.enforce.sorting=true;
 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%';
 set hive.auto.convert.sortmerge.join=true; 
 set hive.optimize.bucketmapjoin = true; 
 set hive.optimize.bucketmapjoin.sortedmerge = true; 
 select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data);
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
 bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
 of buckets for table buckettestoutput1 is 2, whereas the number of files is 1 
 (state=42000,code=10141)
 {noformat}
 The related debug information related to insert overwrite:
 {noformat}
 0: jdbc:hive2://localhost:1 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%'insert overwrite table 
 buckettestoutput1 
 0: jdbc:hive2://localhost:1 ;
 select * from buckettestinput where data like ' 
 first%';
 INFO  : Number of reduce tasks determined at compile time: 2
 INFO  : In order to change the average load for a reducer (in bytes):
 INFO  :   set hive.exec.reducers.bytes.per.reducer=number
 INFO  : In order to limit the maximum number of reducers:
 INFO  :   set hive.exec.reducers.max=number
 INFO  : In order to set a constant number of reducers:
 INFO  :   set mapred.reduce.tasks=number
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2015-06-01 11:09:29,650 Stage-1 map = 86%,  reduce = 100%
 INFO  : Ended Job = job_local107155352_0001
 INFO  : Loading data to table default.buckettestoutput1 from 
 file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-1
 INFO  : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4, 
 totalSize=52, rawDataSize=48]
 No rows affected (1.692 seconds)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10880) The bucket number is not respected in insert overwrite.

2015-06-09 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578871#comment-14578871
 ] 

Yongzhi Chen commented on HIVE-10880:
-

[~xuefuz], when I debug the issue, I noticed that right number of reducer is 
used. I also noticed that dynamic partition insert works fine because it adds 
the missing files. I think we should treat static partition and ordinary table 
the same way, so I fixed the issue by adding the missing buckets. Following is 
the code for dynamic partition part:

{noformat}
taskIDToFile = removeTempOrDuplicateFiles(items, fs);
// if the table is bucketed and enforce bucketing, we should check and 
generate all buckets
if (dpCtx.getNumBuckets()  0  taskIDToFile != null) {
  // refresh the file list
  items = fs.listStatus(parts[i].getPath());
  // get the missing buckets and generate empty buckets
  String taskID1 = taskIDToFile.keySet().iterator().next();
  Path bucketPath = taskIDToFile.values().iterator().next().getPath();
  for (int j = 0; j  dpCtx.getNumBuckets(); ++j) {
String taskID2 = replaceTaskId(taskID1, j);
if (!taskIDToFile.containsKey(taskID2)) {
  // create empty bucket, file name should be derived from taskID2
  String path2 = 
replaceTaskIdFromFilename(bucketPath.toUri().getPath().toString(), j);
  result.add(path2);
}
  }
}

{noformat}

 The bucket number is not respected in insert overwrite.
 ---

 Key: HIVE-10880
 URL: https://issues.apache.org/jira/browse/HIVE-10880
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Blocker
 Attachments: HIVE-10880.1.patch, HIVE-10880.2.patch, 
 HIVE-10880.3.patch


 When hive.enforce.bucketing is true, the bucket number defined in the table 
 is no longer respected in current master and 1.2. This is a regression.
 Reproduce:
 {noformat}
 CREATE TABLE IF NOT EXISTS buckettestinput( 
 data string 
 ) 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput1( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput2( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 Then I inserted the following data into the buckettestinput table
 firstinsert1 
 firstinsert2 
 firstinsert3 
 firstinsert4 
 firstinsert5 
 firstinsert6 
 firstinsert7 
 firstinsert8 
 secondinsert1 
 secondinsert2 
 secondinsert3 
 secondinsert4 
 secondinsert5 
 secondinsert6 
 secondinsert7 
 secondinsert8
 set hive.enforce.bucketing = true; 
 set hive.enforce.sorting=true;
 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%';
 set hive.auto.convert.sortmerge.join=true; 
 set hive.optimize.bucketmapjoin = true; 
 set hive.optimize.bucketmapjoin.sortedmerge = true; 
 select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data);
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
 bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
 of buckets for table buckettestoutput1 is 2, whereas the number of files is 1 
 (state=42000,code=10141)
 {noformat}
 The related debug information related to insert overwrite:
 {noformat}
 0: jdbc:hive2://localhost:1 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%'insert overwrite table 
 buckettestoutput1 
 0: jdbc:hive2://localhost:1 ;
 select * from buckettestinput where data like ' 
 first%';
 INFO  : Number of reduce tasks determined at compile time: 2
 INFO  : In order to change the average load for a reducer (in bytes):
 INFO  :   set hive.exec.reducers.bytes.per.reducer=number
 INFO  : In order to limit the maximum number of reducers:
 INFO  :   set hive.exec.reducers.max=number
 INFO  : In order to set a constant number of reducers:
 INFO  :   set mapred.reduce.tasks=number
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2015-06-01 11:09:29,650 Stage-1 map = 86%,  reduce = 100%
 INFO  : Ended Job = job_local107155352_0001
 INFO  : Loading data to table default.buckettestoutput1 from 
 file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-1
 INFO  : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4, 
 totalSize=52, rawDataSize=48]
 No rows affected (1.692 seconds)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10880) The bucket number is not respected in insert overwrite.

2015-06-08 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578293#comment-14578293
 ] 

Xuefu Zhang commented on HIVE-10880:


[~ychena], thanks for working on this. Looking at the patch, I wasn't confident 
that I know the root cause of the problem and how your patch addresses it. From 
the problem description, I originally thought that it's a problem of setting 
the right number of reducers. However, your patch seems not approaching in that 
direction. Instead, it appears that your patch adds the missing buckets by 
creating empty files. I'm not sure if this fixes the root cause. In general, 
the rows should be relatively evenly distributed in different buckets, and so 
missing or empty bucket files should be rare rather than normal.

Could you please share your thoughts on this?

 The bucket number is not respected in insert overwrite.
 ---

 Key: HIVE-10880
 URL: https://issues.apache.org/jira/browse/HIVE-10880
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Blocker
 Attachments: HIVE-10880.1.patch, HIVE-10880.2.patch, 
 HIVE-10880.3.patch


 When hive.enforce.bucketing is true, the bucket number defined in the table 
 is no longer respected in current master and 1.2. This is a regression.
 Reproduce:
 {noformat}
 CREATE TABLE IF NOT EXISTS buckettestinput( 
 data string 
 ) 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput1( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput2( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 Then I inserted the following data into the buckettestinput table
 firstinsert1 
 firstinsert2 
 firstinsert3 
 firstinsert4 
 firstinsert5 
 firstinsert6 
 firstinsert7 
 firstinsert8 
 secondinsert1 
 secondinsert2 
 secondinsert3 
 secondinsert4 
 secondinsert5 
 secondinsert6 
 secondinsert7 
 secondinsert8
 set hive.enforce.bucketing = true; 
 set hive.enforce.sorting=true;
 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%';
 set hive.auto.convert.sortmerge.join=true; 
 set hive.optimize.bucketmapjoin = true; 
 set hive.optimize.bucketmapjoin.sortedmerge = true; 
 select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data);
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
 bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
 of buckets for table buckettestoutput1 is 2, whereas the number of files is 1 
 (state=42000,code=10141)
 {noformat}
 The related debug information related to insert overwrite:
 {noformat}
 0: jdbc:hive2://localhost:1 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%'insert overwrite table 
 buckettestoutput1 
 0: jdbc:hive2://localhost:1 ;
 select * from buckettestinput where data like ' 
 first%';
 INFO  : Number of reduce tasks determined at compile time: 2
 INFO  : In order to change the average load for a reducer (in bytes):
 INFO  :   set hive.exec.reducers.bytes.per.reducer=number
 INFO  : In order to limit the maximum number of reducers:
 INFO  :   set hive.exec.reducers.max=number
 INFO  : In order to set a constant number of reducers:
 INFO  :   set mapred.reduce.tasks=number
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2015-06-01 11:09:29,650 Stage-1 map = 86%,  reduce = 100%
 INFO  : Ended Job = job_local107155352_0001
 INFO  : Loading data to table default.buckettestoutput1 from 
 file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-1
 INFO  : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4, 
 totalSize=52, rawDataSize=48]
 No rows affected (1.692 seconds)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10880) The bucket number is not respected in insert overwrite.

2015-06-05 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574432#comment-14574432
 ] 

Yongzhi Chen commented on HIVE-10880:
-

The failures are not related. 
Following two testes failed age more than 10. 
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2
org.apache.hive.jdbc.TestJdbcWithLocalClusterSpark.testTempTable

org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autogen_colalias failed 
in build 4179(the build after this build) too.

For the spark failure, I tested locally, all pass. And my code change only 
affect when hive.enforce.bucketing is true, the spark test never set this 
value. So it is not related. 

---
 T E S T S
---

---
 T E S T S
---
Running org.apache.hive.jdbc.TestJdbcWithLocalClusterSpark
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 72.272 sec - in 
org.apache.hive.jdbc.TestJdbcWithLocalClusterSpark

Results :

Tests run: 5, Failures: 0, Errors: 0, Skipped: 0

Could anyone review the code? Thanks

 The bucket number is not respected in insert overwrite.
 ---

 Key: HIVE-10880
 URL: https://issues.apache.org/jira/browse/HIVE-10880
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Blocker
 Attachments: HIVE-10880.1.patch, HIVE-10880.2.patch, 
 HIVE-10880.3.patch


 When hive.enforce.bucketing is true, the bucket number defined in the table 
 is no longer respected in current master and 1.2. This is a regression.
 Reproduce:
 {noformat}
 CREATE TABLE IF NOT EXISTS buckettestinput( 
 data string 
 ) 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput1( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput2( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 Then I inserted the following data into the buckettestinput table
 firstinsert1 
 firstinsert2 
 firstinsert3 
 firstinsert4 
 firstinsert5 
 firstinsert6 
 firstinsert7 
 firstinsert8 
 secondinsert1 
 secondinsert2 
 secondinsert3 
 secondinsert4 
 secondinsert5 
 secondinsert6 
 secondinsert7 
 secondinsert8
 set hive.enforce.bucketing = true; 
 set hive.enforce.sorting=true;
 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%';
 set hive.auto.convert.sortmerge.join=true; 
 set hive.optimize.bucketmapjoin = true; 
 set hive.optimize.bucketmapjoin.sortedmerge = true; 
 select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data);
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
 bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
 of buckets for table buckettestoutput1 is 2, whereas the number of files is 1 
 (state=42000,code=10141)
 {noformat}
 The related debug information related to insert overwrite:
 {noformat}
 0: jdbc:hive2://localhost:1 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%'insert overwrite table 
 buckettestoutput1 
 0: jdbc:hive2://localhost:1 ;
 select * from buckettestinput where data like ' 
 first%';
 INFO  : Number of reduce tasks determined at compile time: 2
 INFO  : In order to change the average load for a reducer (in bytes):
 INFO  :   set hive.exec.reducers.bytes.per.reducer=number
 INFO  : In order to limit the maximum number of reducers:
 INFO  :   set hive.exec.reducers.max=number
 INFO  : In order to set a constant number of reducers:
 INFO  :   set mapred.reduce.tasks=number
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2015-06-01 11:09:29,650 Stage-1 map = 86%,  reduce = 100%
 INFO  : Ended Job = job_local107155352_0001
 INFO  : Loading data to table default.buckettestoutput1 from 
 file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-1
 INFO  : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4, 
 totalSize=52, rawDataSize=48]
 No rows affected (1.692 seconds)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10880) The bucket number is not respected in insert overwrite.

2015-06-04 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572804#comment-14572804
 ] 

Yongzhi Chen commented on HIVE-10880:
-

The implement of method private static String replaceTaskId(String taskId, int 
bucketNum) looks not right. For the code is in the source for a while, I am not 
very confident about that. 
Attached the patch 3 fixes that issue too. If the tests pass, should use 
patch3, otherwise keep patch2. 

 The bucket number is not respected in insert overwrite.
 ---

 Key: HIVE-10880
 URL: https://issues.apache.org/jira/browse/HIVE-10880
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Blocker
 Attachments: HIVE-10880.1.patch, HIVE-10880.2.patch, 
 HIVE-10880.3.patch


 When hive.enforce.bucketing is true, the bucket number defined in the table 
 is no longer respected in current master and 1.2. This is a regression.
 Reproduce:
 {noformat}
 CREATE TABLE IF NOT EXISTS buckettestinput( 
 data string 
 ) 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput1( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput2( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 Then I inserted the following data into the buckettestinput table
 firstinsert1 
 firstinsert2 
 firstinsert3 
 firstinsert4 
 firstinsert5 
 firstinsert6 
 firstinsert7 
 firstinsert8 
 secondinsert1 
 secondinsert2 
 secondinsert3 
 secondinsert4 
 secondinsert5 
 secondinsert6 
 secondinsert7 
 secondinsert8
 set hive.enforce.bucketing = true; 
 set hive.enforce.sorting=true;
 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%';
 set hive.auto.convert.sortmerge.join=true; 
 set hive.optimize.bucketmapjoin = true; 
 set hive.optimize.bucketmapjoin.sortedmerge = true; 
 select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data);
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
 bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
 of buckets for table buckettestoutput1 is 2, whereas the number of files is 1 
 (state=42000,code=10141)
 {noformat}
 The related debug information related to insert overwrite:
 {noformat}
 0: jdbc:hive2://localhost:1 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%'insert overwrite table 
 buckettestoutput1 
 0: jdbc:hive2://localhost:1 ;
 select * from buckettestinput where data like ' 
 first%';
 INFO  : Number of reduce tasks determined at compile time: 2
 INFO  : In order to change the average load for a reducer (in bytes):
 INFO  :   set hive.exec.reducers.bytes.per.reducer=number
 INFO  : In order to limit the maximum number of reducers:
 INFO  :   set hive.exec.reducers.max=number
 INFO  : In order to set a constant number of reducers:
 INFO  :   set mapred.reduce.tasks=number
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2015-06-01 11:09:29,650 Stage-1 map = 86%,  reduce = 100%
 INFO  : Ended Job = job_local107155352_0001
 INFO  : Loading data to table default.buckettestoutput1 from 
 file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-1
 INFO  : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4, 
 totalSize=52, rawDataSize=48]
 No rows affected (1.692 seconds)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10880) The bucket number is not respected in insert overwrite.

2015-06-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573423#comment-14573423
 ] 

Hive QA commented on HIVE-10880:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12737564/HIVE-10880.3.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 8999 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autogen_colalias
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_nondeterministic
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2
org.apache.hive.jdbc.TestJdbcWithLocalClusterSpark.testTempTable
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4178/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4178/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4178/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12737564 - PreCommit-HIVE-TRUNK-Build

 The bucket number is not respected in insert overwrite.
 ---

 Key: HIVE-10880
 URL: https://issues.apache.org/jira/browse/HIVE-10880
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Blocker
 Attachments: HIVE-10880.1.patch, HIVE-10880.2.patch, 
 HIVE-10880.3.patch


 When hive.enforce.bucketing is true, the bucket number defined in the table 
 is no longer respected in current master and 1.2. This is a regression.
 Reproduce:
 {noformat}
 CREATE TABLE IF NOT EXISTS buckettestinput( 
 data string 
 ) 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput1( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput2( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 Then I inserted the following data into the buckettestinput table
 firstinsert1 
 firstinsert2 
 firstinsert3 
 firstinsert4 
 firstinsert5 
 firstinsert6 
 firstinsert7 
 firstinsert8 
 secondinsert1 
 secondinsert2 
 secondinsert3 
 secondinsert4 
 secondinsert5 
 secondinsert6 
 secondinsert7 
 secondinsert8
 set hive.enforce.bucketing = true; 
 set hive.enforce.sorting=true;
 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%';
 set hive.auto.convert.sortmerge.join=true; 
 set hive.optimize.bucketmapjoin = true; 
 set hive.optimize.bucketmapjoin.sortedmerge = true; 
 select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data);
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
 bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
 of buckets for table buckettestoutput1 is 2, whereas the number of files is 1 
 (state=42000,code=10141)
 {noformat}
 The related debug information related to insert overwrite:
 {noformat}
 0: jdbc:hive2://localhost:1 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%'insert overwrite table 
 buckettestoutput1 
 0: jdbc:hive2://localhost:1 ;
 select * from buckettestinput where data like ' 
 first%';
 INFO  : Number of reduce tasks determined at compile time: 2
 INFO  : In order to change the average load for a reducer (in bytes):
 INFO  :   set hive.exec.reducers.bytes.per.reducer=number
 INFO  : In order to limit the maximum number of reducers:
 INFO  :   set hive.exec.reducers.max=number
 INFO  : In order to set a constant number of reducers:
 INFO  :   set mapred.reduce.tasks=number
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2015-06-01 11:09:29,650 Stage-1 map = 86%,  reduce = 100%
 INFO  : Ended Job = job_local107155352_0001
 INFO  : Loading data to table default.buckettestoutput1 from 
 file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-1
 INFO  : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4, 
 totalSize=52, rawDataSize=48]
 No rows affected (1.692 seconds)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10880) The bucket number is not respected in insert overwrite.

2015-06-04 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572661#comment-14572661
 ] 

Yongzhi Chen commented on HIVE-10880:
-

[~xuefuz], [~szehon], [~ctang.ma], [~csun], could you review the code? Thanks

 The bucket number is not respected in insert overwrite.
 ---

 Key: HIVE-10880
 URL: https://issues.apache.org/jira/browse/HIVE-10880
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Blocker
 Attachments: HIVE-10880.1.patch, HIVE-10880.2.patch


 When hive.enforce.bucketing is true, the bucket number defined in the table 
 is no longer respected in current master and 1.2. This is a regression.
 Reproduce:
 {noformat}
 CREATE TABLE IF NOT EXISTS buckettestinput( 
 data string 
 ) 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput1( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput2( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 Then I inserted the following data into the buckettestinput table
 firstinsert1 
 firstinsert2 
 firstinsert3 
 firstinsert4 
 firstinsert5 
 firstinsert6 
 firstinsert7 
 firstinsert8 
 secondinsert1 
 secondinsert2 
 secondinsert3 
 secondinsert4 
 secondinsert5 
 secondinsert6 
 secondinsert7 
 secondinsert8
 set hive.enforce.bucketing = true; 
 set hive.enforce.sorting=true;
 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%';
 set hive.auto.convert.sortmerge.join=true; 
 set hive.optimize.bucketmapjoin = true; 
 set hive.optimize.bucketmapjoin.sortedmerge = true; 
 select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data);
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
 bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
 of buckets for table buckettestoutput1 is 2, whereas the number of files is 1 
 (state=42000,code=10141)
 {noformat}
 The related debug information related to insert overwrite:
 {noformat}
 0: jdbc:hive2://localhost:1 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%'insert overwrite table 
 buckettestoutput1 
 0: jdbc:hive2://localhost:1 ;
 select * from buckettestinput where data like ' 
 first%';
 INFO  : Number of reduce tasks determined at compile time: 2
 INFO  : In order to change the average load for a reducer (in bytes):
 INFO  :   set hive.exec.reducers.bytes.per.reducer=number
 INFO  : In order to limit the maximum number of reducers:
 INFO  :   set hive.exec.reducers.max=number
 INFO  : In order to set a constant number of reducers:
 INFO  :   set mapred.reduce.tasks=number
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2015-06-01 11:09:29,650 Stage-1 map = 86%,  reduce = 100%
 INFO  : Ended Job = job_local107155352_0001
 INFO  : Loading data to table default.buckettestoutput1 from 
 file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-1
 INFO  : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4, 
 totalSize=52, rawDataSize=48]
 No rows affected (1.692 seconds)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10880) The bucket number is not respected in insert overwrite.

2015-06-04 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572658#comment-14572658
 ] 

Yongzhi Chen commented on HIVE-10880:
-

All these failures are not related, their ages are 11 or more. 

 The bucket number is not respected in insert overwrite.
 ---

 Key: HIVE-10880
 URL: https://issues.apache.org/jira/browse/HIVE-10880
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Blocker
 Attachments: HIVE-10880.1.patch, HIVE-10880.2.patch


 When hive.enforce.bucketing is true, the bucket number defined in the table 
 is no longer respected in current master and 1.2. This is a regression.
 Reproduce:
 {noformat}
 CREATE TABLE IF NOT EXISTS buckettestinput( 
 data string 
 ) 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput1( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput2( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 Then I inserted the following data into the buckettestinput table
 firstinsert1 
 firstinsert2 
 firstinsert3 
 firstinsert4 
 firstinsert5 
 firstinsert6 
 firstinsert7 
 firstinsert8 
 secondinsert1 
 secondinsert2 
 secondinsert3 
 secondinsert4 
 secondinsert5 
 secondinsert6 
 secondinsert7 
 secondinsert8
 set hive.enforce.bucketing = true; 
 set hive.enforce.sorting=true;
 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%';
 set hive.auto.convert.sortmerge.join=true; 
 set hive.optimize.bucketmapjoin = true; 
 set hive.optimize.bucketmapjoin.sortedmerge = true; 
 select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data);
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
 bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
 of buckets for table buckettestoutput1 is 2, whereas the number of files is 1 
 (state=42000,code=10141)
 {noformat}
 The related debug information related to insert overwrite:
 {noformat}
 0: jdbc:hive2://localhost:1 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%'insert overwrite table 
 buckettestoutput1 
 0: jdbc:hive2://localhost:1 ;
 select * from buckettestinput where data like ' 
 first%';
 INFO  : Number of reduce tasks determined at compile time: 2
 INFO  : In order to change the average load for a reducer (in bytes):
 INFO  :   set hive.exec.reducers.bytes.per.reducer=number
 INFO  : In order to limit the maximum number of reducers:
 INFO  :   set hive.exec.reducers.max=number
 INFO  : In order to set a constant number of reducers:
 INFO  :   set mapred.reduce.tasks=number
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2015-06-01 11:09:29,650 Stage-1 map = 86%,  reduce = 100%
 INFO  : Ended Job = job_local107155352_0001
 INFO  : Loading data to table default.buckettestoutput1 from 
 file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-1
 INFO  : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4, 
 totalSize=52, rawDataSize=48]
 No rows affected (1.692 seconds)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10880) The bucket number is not respected in insert overwrite.

2015-06-03 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571767#comment-14571767
 ] 

Yongzhi Chen commented on HIVE-10880:
-

Attach second patch to fix the test failures. 

 The bucket number is not respected in insert overwrite.
 ---

 Key: HIVE-10880
 URL: https://issues.apache.org/jira/browse/HIVE-10880
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Blocker
 Attachments: HIVE-10880.1.patch


 When hive.enforce.bucketing is true, the bucket number defined in the table 
 is no longer respected in current master and 1.2. This is a regression.
 Reproduce:
 {noformat}
 CREATE TABLE IF NOT EXISTS buckettestinput( 
 data string 
 ) 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput1( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput2( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 Then I inserted the following data into the buckettestinput table
 firstinsert1 
 firstinsert2 
 firstinsert3 
 firstinsert4 
 firstinsert5 
 firstinsert6 
 firstinsert7 
 firstinsert8 
 secondinsert1 
 secondinsert2 
 secondinsert3 
 secondinsert4 
 secondinsert5 
 secondinsert6 
 secondinsert7 
 secondinsert8
 set hive.enforce.bucketing = true; 
 set hive.enforce.sorting=true;
 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%';
 set hive.auto.convert.sortmerge.join=true; 
 set hive.optimize.bucketmapjoin = true; 
 set hive.optimize.bucketmapjoin.sortedmerge = true; 
 select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data);
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
 bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
 of buckets for table buckettestoutput1 is 2, whereas the number of files is 1 
 (state=42000,code=10141)
 {noformat}
 The related debug information related to insert overwrite:
 {noformat}
 0: jdbc:hive2://localhost:1 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%'insert overwrite table 
 buckettestoutput1 
 0: jdbc:hive2://localhost:1 ;
 select * from buckettestinput where data like ' 
 first%';
 INFO  : Number of reduce tasks determined at compile time: 2
 INFO  : In order to change the average load for a reducer (in bytes):
 INFO  :   set hive.exec.reducers.bytes.per.reducer=number
 INFO  : In order to limit the maximum number of reducers:
 INFO  :   set hive.exec.reducers.max=number
 INFO  : In order to set a constant number of reducers:
 INFO  :   set mapred.reduce.tasks=number
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2015-06-01 11:09:29,650 Stage-1 map = 86%,  reduce = 100%
 INFO  : Ended Job = job_local107155352_0001
 INFO  : Loading data to table default.buckettestoutput1 from 
 file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-1
 INFO  : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4, 
 totalSize=52, rawDataSize=48]
 No rows affected (1.692 seconds)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10880) The bucket number is not respected in insert overwrite.

2015-06-02 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569750#comment-14569750
 ] 

Yongzhi Chen commented on HIVE-10880:
-

The insert overwrite problem happens for table insert or static partition 
insert, it works fine for dynamic partition insert. So make the code change 
similar to what is in dynamic partition. 

 The bucket number is not respected in insert overwrite.
 ---

 Key: HIVE-10880
 URL: https://issues.apache.org/jira/browse/HIVE-10880
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Blocker
 Attachments: HIVE-10880.1.patch


 When hive.enforce.bucketing is true, the bucket number defined in the table 
 is no longer respected in current master and 1.2. This is a regression.
 Reproduce:
 {noformat}
 CREATE TABLE IF NOT EXISTS buckettestinput( 
 data string 
 ) 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput1( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput2( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 Then I inserted the following data into the buckettestinput table
 firstinsert1 
 firstinsert2 
 firstinsert3 
 firstinsert4 
 firstinsert5 
 firstinsert6 
 firstinsert7 
 firstinsert8 
 secondinsert1 
 secondinsert2 
 secondinsert3 
 secondinsert4 
 secondinsert5 
 secondinsert6 
 secondinsert7 
 secondinsert8
 set hive.enforce.bucketing = true; 
 set hive.enforce.sorting=true;
 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%';
 set hive.auto.convert.sortmerge.join=true; 
 set hive.optimize.bucketmapjoin = true; 
 set hive.optimize.bucketmapjoin.sortedmerge = true; 
 select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data);
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
 bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
 of buckets for table buckettestoutput1 is 2, whereas the number of files is 1 
 (state=42000,code=10141)
 {noformat}
 The related debug information related to insert overwrite:
 {noformat}
 0: jdbc:hive2://localhost:1 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%'insert overwrite table 
 buckettestoutput1 
 0: jdbc:hive2://localhost:1 ;
 select * from buckettestinput where data like ' 
 first%';
 INFO  : Number of reduce tasks determined at compile time: 2
 INFO  : In order to change the average load for a reducer (in bytes):
 INFO  :   set hive.exec.reducers.bytes.per.reducer=number
 INFO  : In order to limit the maximum number of reducers:
 INFO  :   set hive.exec.reducers.max=number
 INFO  : In order to set a constant number of reducers:
 INFO  :   set mapred.reduce.tasks=number
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2015-06-01 11:09:29,650 Stage-1 map = 86%,  reduce = 100%
 INFO  : Ended Job = job_local107155352_0001
 INFO  : Loading data to table default.buckettestoutput1 from 
 file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-1
 INFO  : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4, 
 totalSize=52, rawDataSize=48]
 No rows affected (1.692 seconds)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10880) The bucket number is not respected in insert overwrite.

2015-06-02 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570078#comment-14570078
 ] 

Hive QA commented on HIVE-10880:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12737007/HIVE-10880.1.patch

{color:red}ERROR:{color} -1 due to 20 failed/errored test(s), 8991 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_delete
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_delete_own_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_update
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_update_own_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_where_no_match
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_1_23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_skew_1_23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_update_where_no_match
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_where_no_match
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_update_where_no_match
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_sort_1_23
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_sort_skew_1_23
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_nullsafe
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4147/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4147/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4147/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 20 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12737007 - PreCommit-HIVE-TRUNK-Build

 The bucket number is not respected in insert overwrite.
 ---

 Key: HIVE-10880
 URL: https://issues.apache.org/jira/browse/HIVE-10880
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Blocker
 Attachments: HIVE-10880.1.patch


 When hive.enforce.bucketing is true, the bucket number defined in the table 
 is no longer respected in current master and 1.2. This is a regression.
 Reproduce:
 {noformat}
 CREATE TABLE IF NOT EXISTS buckettestinput( 
 data string 
 ) 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput1( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput2( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 Then I inserted the following data into the buckettestinput table
 firstinsert1 
 firstinsert2 
 firstinsert3 
 firstinsert4 
 firstinsert5 
 firstinsert6 
 firstinsert7 
 firstinsert8 
 secondinsert1 
 secondinsert2 
 secondinsert3 
 secondinsert4 
 secondinsert5 
 secondinsert6 
 secondinsert7 
 secondinsert8
 set hive.enforce.bucketing = true; 
 set hive.enforce.sorting=true;
 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%';
 set hive.auto.convert.sortmerge.join=true; 
 set hive.optimize.bucketmapjoin = true; 
 set hive.optimize.bucketmapjoin.sortedmerge = true; 
 select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data);
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
 bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
 of buckets for table buckettestoutput1 is 2, whereas the number of files is 1 
 (state=42000,code=10141)
 {noformat}
 The related debug information related to insert overwrite:
 

[jira] [Commented] (HIVE-10880) The bucket number is not respected in insert overwrite.

2015-06-01 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567438#comment-14567438
 ] 

Mostafa Mokhtar commented on HIVE-10880:


[~ekoifman]

 The bucket number is not respected in insert overwrite.
 ---

 Key: HIVE-10880
 URL: https://issues.apache.org/jira/browse/HIVE-10880
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Priority: Blocker

 When hive.enforce.bucketing is true, the bucket number defined in the table 
 is no longer respected in current master and 1.2. This is a regression.
 Reproduce:
 {noformat}
 CREATE TABLE IF NOT EXISTS buckettestinput( 
 data string 
 ) 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput1( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput2( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 Then I inserted the following data into the buckettestinput table
 firstinsert1 
 firstinsert2 
 firstinsert3 
 firstinsert4 
 firstinsert5 
 firstinsert6 
 firstinsert7 
 firstinsert8 
 secondinsert1 
 secondinsert2 
 secondinsert3 
 secondinsert4 
 secondinsert5 
 secondinsert6 
 secondinsert7 
 secondinsert8
 set hive.enforce.bucketing = true; 
 set hive.enforce.sorting=true;
 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%';
 set hive.auto.convert.sortmerge.join=true; 
 set hive.optimize.bucketmapjoin = true; 
 set hive.optimize.bucketmapjoin.sortedmerge = true; 
 select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data);
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
 bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
 of buckets for table buckettestoutput1 is 2, whereas the number of files is 1 
 (state=42000,code=10141)
 {noformat}
 The related debug information related to insert overwrite:
 {noformat}
 0: jdbc:hive2://localhost:1 insert overwrite table buckettestoutput1 
 select * from buckettestinput where data like 'first%'insert overwrite table 
 buckettestoutput1 
 0: jdbc:hive2://localhost:1 ;
 select * from buckettestinput where data like ' 
 first%';
 INFO  : Number of reduce tasks determined at compile time: 2
 INFO  : In order to change the average load for a reducer (in bytes):
 INFO  :   set hive.exec.reducers.bytes.per.reducer=number
 INFO  : In order to limit the maximum number of reducers:
 INFO  :   set hive.exec.reducers.max=number
 INFO  : In order to set a constant number of reducers:
 INFO  :   set mapred.reduce.tasks=number
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2015-06-01 11:09:29,650 Stage-1 map = 86%,  reduce = 100%
 INFO  : Ended Job = job_local107155352_0001
 INFO  : Loading data to table default.buckettestoutput1 from 
 file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-1
 INFO  : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4, 
 totalSize=52, rawDataSize=48]
 No rows affected (1.692 seconds)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)