[jira] [Commented] (HIVE-10866) Give a warning when client try to insert into bucketed table

2015-06-11 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581827#comment-14581827
 ] 

Yongzhi Chen commented on HIVE-10866:
-

The 3 failures are not related to the patch. Their age is 3 or more. 

 Give a warning when client try to insert into bucketed table
 

 Key: HIVE-10866
 URL: https://issues.apache.org/jira/browse/HIVE-10866
 Project: Hive
  Issue Type: Improvement
Affects Versions: 1.2.0, 1.3.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-10866.1.patch, HIVE-10866.2.patch, 
 HIVE-10866.3.patch, HIVE-10866.4.patch


 Currently, hive does not support appends(insert into) bucketed table, see 
 open jira HIVE-3608. When insert into such table, the data will be 
 corrupted and not fit for sort merge bucket mapjoin. 
 We need find a way to prevent client from inserting into such table. Or at 
 least give a warning.
 Reproduce:
 {noformat}
 CREATE TABLE IF NOT EXISTS buckettestoutput1( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput2( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 set hive.enforce.bucketing = true; 
 set hive.enforce.sorting=true;
 insert into table buckettestoutput1 select code from sample_07 where 
 total_emp  134354250 limit 10;
 After this first insert, I did:
 set hive.auto.convert.sortmerge.join=true; 
 set hive.optimize.bucketmapjoin = true; 
 set hive.optimize.bucketmapjoin.sortedmerge = true; 
 set hive.auto.convert.sortmerge.join.noconditionaltask=true;
 0: jdbc:hive2://localhost:1 select * from buckettestoutput1 a join 
 buckettestoutput2 b on (a.data=b.data);
 +---+---+
 | data  | data  |
 +---+---+
 +---+---+
 So select works fine. 
 Second insert:
 0: jdbc:hive2://localhost:1 insert into table buckettestoutput1 select 
 code from sample_07 where total_emp = 134354250 limit 10;
 No rows affected (61.235 seconds)
 Then select:
 0: jdbc:hive2://localhost:1 select * from buckettestoutput1 a join 
 buckettestoutput2 b on (a.data=b.data);
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
 bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
 of buckets for table buckettestoutput1 is 2, whereas the number of files is 4 
 (state=42000,code=10141)
 0: jdbc:hive2://localhost:1
 {noformat}
 Insert into empty table or partition will be fine, but insert into the 
 non-empty one (after second insert in the reproduce), the bucketmapjoin will 
 throw an error. We should not let second insert succeed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10866) Give a warning when client try to insert into bucketed table

2015-06-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581519#comment-14581519
 ] 

Hive QA commented on HIVE-10866:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12738983/HIVE-10866.4.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9008 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join28
org.apache.hive.beeline.TestSchemaTool.testSchemaInit
org.apache.hive.beeline.TestSchemaTool.testSchemaUpgrade
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4245/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4245/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4245/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12738983 - PreCommit-HIVE-TRUNK-Build

 Give a warning when client try to insert into bucketed table
 

 Key: HIVE-10866
 URL: https://issues.apache.org/jira/browse/HIVE-10866
 Project: Hive
  Issue Type: Improvement
Affects Versions: 1.2.0, 1.3.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-10866.1.patch, HIVE-10866.2.patch, 
 HIVE-10866.3.patch, HIVE-10866.4.patch


 Currently, hive does not support appends(insert into) bucketed table, see 
 open jira HIVE-3608. When insert into such table, the data will be 
 corrupted and not fit for sort merge bucket mapjoin. 
 We need find a way to prevent client from inserting into such table. Or at 
 least give a warning.
 Reproduce:
 {noformat}
 CREATE TABLE IF NOT EXISTS buckettestoutput1( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput2( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 set hive.enforce.bucketing = true; 
 set hive.enforce.sorting=true;
 insert into table buckettestoutput1 select code from sample_07 where 
 total_emp  134354250 limit 10;
 After this first insert, I did:
 set hive.auto.convert.sortmerge.join=true; 
 set hive.optimize.bucketmapjoin = true; 
 set hive.optimize.bucketmapjoin.sortedmerge = true; 
 set hive.auto.convert.sortmerge.join.noconditionaltask=true;
 0: jdbc:hive2://localhost:1 select * from buckettestoutput1 a join 
 buckettestoutput2 b on (a.data=b.data);
 +---+---+
 | data  | data  |
 +---+---+
 +---+---+
 So select works fine. 
 Second insert:
 0: jdbc:hive2://localhost:1 insert into table buckettestoutput1 select 
 code from sample_07 where total_emp = 134354250 limit 10;
 No rows affected (61.235 seconds)
 Then select:
 0: jdbc:hive2://localhost:1 select * from buckettestoutput1 a join 
 buckettestoutput2 b on (a.data=b.data);
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
 bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
 of buckets for table buckettestoutput1 is 2, whereas the number of files is 4 
 (state=42000,code=10141)
 0: jdbc:hive2://localhost:1
 {noformat}
 Insert into empty table or partition will be fine, but insert into the 
 non-empty one (after second insert in the reproduce), the bucketmapjoin will 
 throw an error. We should not let second insert succeed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10866) Give a warning when client try to insert into bucketed table

2015-06-10 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580792#comment-14580792
 ] 

Yongzhi Chen commented on HIVE-10866:
-

The failures are not related. But I feel throw error is too harsh, some client 
maybe use bucketed table for other operations not just for sortmerge. So we 
should not be so strict. I will make code change to just give a warning. 

 Give a warning when client try to insert into bucketed table
 

 Key: HIVE-10866
 URL: https://issues.apache.org/jira/browse/HIVE-10866
 Project: Hive
  Issue Type: Improvement
Affects Versions: 1.2.0, 1.3.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-10866.1.patch, HIVE-10866.2.patch


 Currently, hive does not support appends(insert into) bucketed table, see 
 open jira HIVE-3608. When insert into such table, the data will be 
 corrupted and not fit for bucketmapjoin with sortmerge. 
 We need find a way to prevent client from inserting into such table. Or at 
 least give a warning.
 Reproduce:
 {noformat}
 CREATE TABLE IF NOT EXISTS buckettestoutput1( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput2( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 set hive.enforce.bucketing = true; 
 set hive.enforce.sorting=true;
 insert into table buckettestoutput1 select code from sample_07 where 
 total_emp  134354250 limit 10;
 After this first insert, I did:
 set hive.auto.convert.sortmerge.join=true; 
 set hive.optimize.bucketmapjoin = true; 
 set hive.optimize.bucketmapjoin.sortedmerge = true; 
 set hive.auto.convert.sortmerge.join.noconditionaltask=true;
 0: jdbc:hive2://localhost:1 select * from buckettestoutput1 a join 
 buckettestoutput2 b on (a.data=b.data);
 +---+---+
 | data  | data  |
 +---+---+
 +---+---+
 So select works fine. 
 Second insert:
 0: jdbc:hive2://localhost:1 insert into table buckettestoutput1 select 
 code from sample_07 where total_emp = 134354250 limit 10;
 No rows affected (61.235 seconds)
 Then select:
 0: jdbc:hive2://localhost:1 select * from buckettestoutput1 a join 
 buckettestoutput2 b on (a.data=b.data);
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
 bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
 of buckets for table buckettestoutput1 is 2, whereas the number of files is 4 
 (state=42000,code=10141)
 0: jdbc:hive2://localhost:1
 {noformat}
 Insert into empty table or partition will be fine, but insert into the 
 non-empty one (after second insert in the reproduce), the bucketmapjoin will 
 throw an error. We should not let second insert succeed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10866) Give a warning when client try to insert into bucketed table

2015-06-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581270#comment-14581270
 ] 

Hive QA commented on HIVE-10866:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12738917/HIVE-10866.3.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 9003 tests executed
*Failed tests:*
{noformat}
TestContribNegativeCliDriver - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insertinto_nonemptybucket
org.apache.hadoop.hive.metastore.txn.TestCompactionTxnHandler.testRevokeTimedOutWorkers
org.apache.hive.beeline.TestSchemaTool.testSchemaInit
org.apache.hive.beeline.TestSchemaTool.testSchemaUpgrade
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4240/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4240/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4240/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12738917 - PreCommit-HIVE-TRUNK-Build

 Give a warning when client try to insert into bucketed table
 

 Key: HIVE-10866
 URL: https://issues.apache.org/jira/browse/HIVE-10866
 Project: Hive
  Issue Type: Improvement
Affects Versions: 1.2.0, 1.3.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-10866.1.patch, HIVE-10866.2.patch, 
 HIVE-10866.3.patch


 Currently, hive does not support appends(insert into) bucketed table, see 
 open jira HIVE-3608. When insert into such table, the data will be 
 corrupted and not fit for sort merge bucket mapjoin. 
 We need find a way to prevent client from inserting into such table. Or at 
 least give a warning.
 Reproduce:
 {noformat}
 CREATE TABLE IF NOT EXISTS buckettestoutput1( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput2( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 set hive.enforce.bucketing = true; 
 set hive.enforce.sorting=true;
 insert into table buckettestoutput1 select code from sample_07 where 
 total_emp  134354250 limit 10;
 After this first insert, I did:
 set hive.auto.convert.sortmerge.join=true; 
 set hive.optimize.bucketmapjoin = true; 
 set hive.optimize.bucketmapjoin.sortedmerge = true; 
 set hive.auto.convert.sortmerge.join.noconditionaltask=true;
 0: jdbc:hive2://localhost:1 select * from buckettestoutput1 a join 
 buckettestoutput2 b on (a.data=b.data);
 +---+---+
 | data  | data  |
 +---+---+
 +---+---+
 So select works fine. 
 Second insert:
 0: jdbc:hive2://localhost:1 insert into table buckettestoutput1 select 
 code from sample_07 where total_emp = 134354250 limit 10;
 No rows affected (61.235 seconds)
 Then select:
 0: jdbc:hive2://localhost:1 select * from buckettestoutput1 a join 
 buckettestoutput2 b on (a.data=b.data);
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
 bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
 of buckets for table buckettestoutput1 is 2, whereas the number of files is 4 
 (state=42000,code=10141)
 0: jdbc:hive2://localhost:1
 {noformat}
 Insert into empty table or partition will be fine, but insert into the 
 non-empty one (after second insert in the reproduce), the bucketmapjoin will 
 throw an error. We should not let second insert succeed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)