[jira] [Commented] (HIVE-12244) Refactoring code for avoiding of comparison of Strings and do comparison on Path
[ https://issues.apache.org/jira/browse/HIVE-12244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380566#comment-15380566 ] Zoltan Haindrich commented on HIVE-12244: - [~ashutoshc] sure...I've created my first review request: https://reviews.apache.org/r/50104/ > Refactoring code for avoiding of comparison of Strings and do comparison on > Path > > > Key: HIVE-12244 > URL: https://issues.apache.org/jira/browse/HIVE-12244 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.1 >Reporter: Alina Abramova >Assignee: Zoltan Haindrich >Priority: Minor > Labels: patch > Fix For: 1.2.1 > > Attachments: HIVE-12244.1.patch, HIVE-12244.10.patch, > HIVE-12244.11.patch, HIVE-12244.12.patch, HIVE-12244.2.patch, > HIVE-12244.3.patch, HIVE-12244.4.patch, HIVE-12244.5.patch, > HIVE-12244.6.patch, HIVE-12244.7.patch, HIVE-12244.8.patch, > HIVE-12244.8.patch, HIVE-12244.9.patch > > > In Hive often String is used for representation path and it causes new issues. > We need to compare it with equals() but comparing Strings often is not right > in terms comparing paths . > I think if we use Path from org.apache.hadoop.fs we will avoid new problems > in future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14067) Rename pendingCount to activeCalls in HiveSessionImpl for easier understanding.
[ https://issues.apache.org/jira/browse/HIVE-14067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380547#comment-15380547 ] zhihai xu commented on HIVE-14067: -- Thanks for the review [~thejas]! The old patch can't be applied. I attached a new patch HIVE-14067.001.patch based on the latest code. > Rename pendingCount to activeCalls in HiveSessionImpl for easier > understanding. > > > Key: HIVE-14067 > URL: https://issues.apache.org/jira/browse/HIVE-14067 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Trivial > Attachments: HIVE-14067.000.patch, HIVE-14067.000.patch, > HIVE-14067.001.patch > > > Rename pendingCount to activeCalls in HiveSessionImpl for easier > understanding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14067) Rename pendingCount to activeCalls in HiveSessionImpl for easier understanding.
[ https://issues.apache.org/jira/browse/HIVE-14067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated HIVE-14067: - Attachment: HIVE-14067.001.patch > Rename pendingCount to activeCalls in HiveSessionImpl for easier > understanding. > > > Key: HIVE-14067 > URL: https://issues.apache.org/jira/browse/HIVE-14067 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Trivial > Attachments: HIVE-14067.000.patch, HIVE-14067.000.patch, > HIVE-14067.001.patch > > > Rename pendingCount to activeCalls in HiveSessionImpl for easier > understanding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14236) CTAS with UNION ALL puts the wrong stats in Tez
[ https://issues.apache.org/jira/browse/HIVE-14236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14236: --- Status: Open (was: Patch Available) > CTAS with UNION ALL puts the wrong stats in Tez > --- > > Key: HIVE-14236 > URL: https://issues.apache.org/jira/browse/HIVE-14236 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14236.01.patch, HIVE-14236.02.patch > > > to repo. in Tez, create table t as select * from src union all select * from > src; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14236) CTAS with UNION ALL puts the wrong stats in Tez
[ https://issues.apache.org/jira/browse/HIVE-14236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14236: --- Attachment: HIVE-14236.02.patch > CTAS with UNION ALL puts the wrong stats in Tez > --- > > Key: HIVE-14236 > URL: https://issues.apache.org/jira/browse/HIVE-14236 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14236.01.patch, HIVE-14236.02.patch > > > to repo. in Tez, create table t as select * from src union all select * from > src; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14236) CTAS with UNION ALL puts the wrong stats in Tez
[ https://issues.apache.org/jira/browse/HIVE-14236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14236: --- Status: Patch Available (was: Open) > CTAS with UNION ALL puts the wrong stats in Tez > --- > > Key: HIVE-14236 > URL: https://issues.apache.org/jira/browse/HIVE-14236 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14236.01.patch, HIVE-14236.02.patch > > > to repo. in Tez, create table t as select * from src union all select * from > src; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14236) CTAS with UNION ALL puts the wrong stats in Tez
[ https://issues.apache.org/jira/browse/HIVE-14236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14236: --- Summary: CTAS with UNION ALL puts the wrong stats in Tez (was: CTAS with UNION ALL puts the wrong stats + count(*) = 0 in Tez) > CTAS with UNION ALL puts the wrong stats in Tez > --- > > Key: HIVE-14236 > URL: https://issues.apache.org/jira/browse/HIVE-14236 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14236.01.patch > > > to repo. in Tez, create table t as select * from src union all select * from > src; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14258) Reduce task timed out because CommonJoinOperator.genUniqueJoinObject took too long to finish without reporting progress
[ https://issues.apache.org/jira/browse/HIVE-14258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380510#comment-15380510 ] zhihai xu commented on HIVE-14258: -- I attached a patch HIVE-14258.patch which will report progress to AM in genUniqueJoinObject to avoid timeout. > Reduce task timed out because CommonJoinOperator.genUniqueJoinObject took too > long to finish without reporting progress > --- > > Key: HIVE-14258 > URL: https://issues.apache.org/jira/browse/HIVE-14258 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.1.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: HIVE-14258.patch > > > Reduce task timed out because CommonJoinOperator.genUniqueJoinObject took too > long to finish without reporting progress. > This timeout happened when reducer.close() is called in ReduceTask.java. > CommonJoinOperator.genUniqueJoinObject() called by reducer.close() will loop > over every row in the AbstractRowContainer. This can take a long time if > there are a large number or rows, and during this time, it does not report > progress. If this runs for long enough more than "mapreduce.task.timeout", > ApplicationMaster will kill the task for failing to report progress. > we configured "mapreduce.task.timeout" as 10 minutes. I captured the stack > trace in the 10 minutes before AM killed the reduce task at 2016-07-15 > 07:19:11. > The following three stack traces can prove it: > at 2016-07-15 07:09:42: > {code} > "main" prio=10 tid=0x7f90ec017000 nid=0xd193 runnable [0x7f90f62e5000] >java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:272) > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.read(RawLocalFileSystem.java:154) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > - locked <0x0007deecefb0> (a > org.apache.hadoop.fs.BufferedFSInputStream) > at java.io.DataInputStream.read(DataInputStream.java:149) > at > org.apache.hadoop.fs.FSInputChecker.readFully(FSInputChecker.java:436) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:252) > at > org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:276) > at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:214) > at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:232) > at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:196) > - locked <0x0007deecb978> (a > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker) > at java.io.DataInputStream.readFully(DataInputStream.java:195) > at > org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:70) > at > org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:120) > at > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2359) > - locked <0x0007deec8f70> (a > org.apache.hadoop.io.SequenceFile$Reader) > at > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2491) > - locked <0x0007deec8f70> (a > org.apache.hadoop.io.SequenceFile$Reader) > at > org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82) > - locked <0x0007deec82f0> (a > org.apache.hadoop.mapred.SequenceFileRecordReader) > at > org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360) > at > org.apache.hadoop.hive.ql.exec.persistence.RowContainer.next(RowContainer.java:267) > at > org.apache.hadoop.hive.ql.exec.persistence.RowContainer.next(RowContainer.java:74) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:750) > at > org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256) > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:284) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method)
[jira] [Updated] (HIVE-14258) Reduce task timed out because CommonJoinOperator.genUniqueJoinObject took too long to finish without reporting progress
[ https://issues.apache.org/jira/browse/HIVE-14258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated HIVE-14258: - Status: Patch Available (was: Open) > Reduce task timed out because CommonJoinOperator.genUniqueJoinObject took too > long to finish without reporting progress > --- > > Key: HIVE-14258 > URL: https://issues.apache.org/jira/browse/HIVE-14258 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.1.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: HIVE-14258.patch > > > Reduce task timed out because CommonJoinOperator.genUniqueJoinObject took too > long to finish without reporting progress. > This timeout happened when reducer.close() is called in ReduceTask.java. > CommonJoinOperator.genUniqueJoinObject() called by reducer.close() will loop > over every row in the AbstractRowContainer. This can take a long time if > there are a large number or rows, and during this time, it does not report > progress. If this runs for long enough more than "mapreduce.task.timeout", > ApplicationMaster will kill the task for failing to report progress. > we configured "mapreduce.task.timeout" as 10 minutes. I captured the stack > trace in the 10 minutes before AM killed the reduce task at 2016-07-15 > 07:19:11. > The following three stack traces can prove it: > at 2016-07-15 07:09:42: > {code} > "main" prio=10 tid=0x7f90ec017000 nid=0xd193 runnable [0x7f90f62e5000] >java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:272) > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.read(RawLocalFileSystem.java:154) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > - locked <0x0007deecefb0> (a > org.apache.hadoop.fs.BufferedFSInputStream) > at java.io.DataInputStream.read(DataInputStream.java:149) > at > org.apache.hadoop.fs.FSInputChecker.readFully(FSInputChecker.java:436) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:252) > at > org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:276) > at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:214) > at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:232) > at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:196) > - locked <0x0007deecb978> (a > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker) > at java.io.DataInputStream.readFully(DataInputStream.java:195) > at > org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:70) > at > org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:120) > at > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2359) > - locked <0x0007deec8f70> (a > org.apache.hadoop.io.SequenceFile$Reader) > at > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2491) > - locked <0x0007deec8f70> (a > org.apache.hadoop.io.SequenceFile$Reader) > at > org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82) > - locked <0x0007deec82f0> (a > org.apache.hadoop.mapred.SequenceFileRecordReader) > at > org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360) > at > org.apache.hadoop.hive.ql.exec.persistence.RowContainer.next(RowContainer.java:267) > at > org.apache.hadoop.hive.ql.exec.persistence.RowContainer.next(RowContainer.java:74) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:750) > at > org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256) > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:284) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.
[jira] [Updated] (HIVE-14258) Reduce task timed out because CommonJoinOperator.genUniqueJoinObject took too long to finish without reporting progress
[ https://issues.apache.org/jira/browse/HIVE-14258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated HIVE-14258: - Attachment: HIVE-14258.patch > Reduce task timed out because CommonJoinOperator.genUniqueJoinObject took too > long to finish without reporting progress > --- > > Key: HIVE-14258 > URL: https://issues.apache.org/jira/browse/HIVE-14258 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.1.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: HIVE-14258.patch > > > Reduce task timed out because CommonJoinOperator.genUniqueJoinObject took too > long to finish without reporting progress. > This timeout happened when reducer.close() is called in ReduceTask.java. > CommonJoinOperator.genUniqueJoinObject() called by reducer.close() will loop > over every row in the AbstractRowContainer. This can take a long time if > there are a large number or rows, and during this time, it does not report > progress. If this runs for long enough more than "mapreduce.task.timeout", > ApplicationMaster will kill the task for failing to report progress. > we configured "mapreduce.task.timeout" as 10 minutes. I captured the stack > trace in the 10 minutes before AM killed the reduce task at 2016-07-15 > 07:19:11. > The following three stack traces can prove it: > at 2016-07-15 07:09:42: > {code} > "main" prio=10 tid=0x7f90ec017000 nid=0xd193 runnable [0x7f90f62e5000] >java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:272) > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.read(RawLocalFileSystem.java:154) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > - locked <0x0007deecefb0> (a > org.apache.hadoop.fs.BufferedFSInputStream) > at java.io.DataInputStream.read(DataInputStream.java:149) > at > org.apache.hadoop.fs.FSInputChecker.readFully(FSInputChecker.java:436) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:252) > at > org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:276) > at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:214) > at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:232) > at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:196) > - locked <0x0007deecb978> (a > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker) > at java.io.DataInputStream.readFully(DataInputStream.java:195) > at > org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:70) > at > org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:120) > at > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2359) > - locked <0x0007deec8f70> (a > org.apache.hadoop.io.SequenceFile$Reader) > at > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2491) > - locked <0x0007deec8f70> (a > org.apache.hadoop.io.SequenceFile$Reader) > at > org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82) > - locked <0x0007deec82f0> (a > org.apache.hadoop.mapred.SequenceFileRecordReader) > at > org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360) > at > org.apache.hadoop.hive.ql.exec.persistence.RowContainer.next(RowContainer.java:267) > at > org.apache.hadoop.hive.ql.exec.persistence.RowContainer.next(RowContainer.java:74) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:750) > at > org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256) > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:284) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(Use
[jira] [Commented] (HIVE-12244) Refactoring code for avoiding of comparison of Strings and do comparison on Path
[ https://issues.apache.org/jira/browse/HIVE-12244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380506#comment-15380506 ] Ashutosh Chauhan commented on HIVE-12244: - [~kgyrtkirk] Can you create a RB for your patch? > Refactoring code for avoiding of comparison of Strings and do comparison on > Path > > > Key: HIVE-12244 > URL: https://issues.apache.org/jira/browse/HIVE-12244 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.1 >Reporter: Alina Abramova >Assignee: Zoltan Haindrich >Priority: Minor > Labels: patch > Fix For: 1.2.1 > > Attachments: HIVE-12244.1.patch, HIVE-12244.10.patch, > HIVE-12244.11.patch, HIVE-12244.12.patch, HIVE-12244.2.patch, > HIVE-12244.3.patch, HIVE-12244.4.patch, HIVE-12244.5.patch, > HIVE-12244.6.patch, HIVE-12244.7.patch, HIVE-12244.8.patch, > HIVE-12244.8.patch, HIVE-12244.9.patch > > > In Hive often String is used for representation path and it causes new issues. > We need to compare it with equals() but comparing Strings often is not right > in terms comparing paths . > I think if we use Path from org.apache.hadoop.fs we will avoid new problems > in future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12244) Refactoring code for avoiding of comparison of Strings and do comparison on Path
[ https://issues.apache.org/jira/browse/HIVE-12244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380492#comment-15380492 ] Hive QA commented on HIVE-12244: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12818107/HIVE-12244.12.patch {color:green}SUCCESS:{color} +1 due to 12 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10326 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testCheckPermissions org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testGetToken org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testConnections org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/532/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/532/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-532/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12818107 - PreCommit-HIVE-MASTER-Build > Refactoring code for avoiding of comparison of Strings and do comparison on > Path > > > Key: HIVE-12244 > URL: https://issues.apache.org/jira/browse/HIVE-12244 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.1 >Reporter: Alina Abramova >Assignee: Zoltan Haindrich >Priority: Minor > Labels: patch > Fix For: 1.2.1 > > Attachments: HIVE-12244.1.patch, HIVE-12244.10.patch, > HIVE-12244.11.patch, HIVE-12244.12.patch, HIVE-12244.2.patch, > HIVE-12244.3.patch, HIVE-12244.4.patch, HIVE-12244.5.patch, > HIVE-12244.6.patch, HIVE-12244.7.patch, HIVE-12244.8.patch, > HIVE-12244.8.patch, HIVE-12244.9.patch > > > In Hive often String is used for representation path and it causes new issues. > We need to compare it with equals() but comparing Strings often is not right > in terms comparing paths . > I think if we use Path from org.apache.hadoop.fs we will avoid new problems > in future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14213) Add timeouts for various components in llap status check
[ https://issues.apache.org/jira/browse/HIVE-14213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380479#comment-15380479 ] Lefty Leverenz commented on HIVE-14213: --- Okay, thanks. > Add timeouts for various components in llap status check > > > Key: HIVE-14213 > URL: https://issues.apache.org/jira/browse/HIVE-14213 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Fix For: 2.1.1 > > Attachments: HIVE-14213.01.patch, HIVE-14213.02.patch > > > The llapstatus check connects to various compoennts - YARN, HDFS via Slider, > ZooKeeper. If either of these components are down - the command can take a > long time to exit. > NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12656) Turn hive.compute.query.using.stats on by default
[ https://issues.apache.org/jira/browse/HIVE-12656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380450#comment-15380450 ] Hive QA commented on HIVE-12656: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12818104/HIVE-12656.03.patch {color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 61 failed/errored test(s), 10326 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_escape1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_escape2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_coltype_literals org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_plan_json org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_complex_all org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_llap_counters org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_ppd_basic org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_union org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_all org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge1 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge_diff_fs org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_alter_merge_2_orc org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_alter_merge_orc org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_udf_udaf org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge10 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge_diff_fs org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_ppd_basic org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_complex_all org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_short_regress org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketizedhiveinputformat org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_orc_merge_diff_fs org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_insert_into6 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_lockneg_query_tbl_in_locked_db org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_orc org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cbo_udf_udaf org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_list_bucket_dml_2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_18 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_19 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_20 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_stats3 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_stats_noscan_2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_view org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_short_regress org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenCheck
[jira] [Updated] (HIVE-9700) HiveStorageHandler implementation throws org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path are inconsistent
[ https://issues.apache.org/jira/browse/HIVE-9700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anthony Hsu updated HIVE-9700: -- Description: I have a HiveStorageHandler if I do {{select * from myTable}} it returns all the rows in the underlying storage. When I do something like {{select col1 from myTable}} the underlying mapreduce job throws an exception {noformat} java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:413) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 17 more Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:119) ... 22 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path are inconsistent at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:526) at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:90) ... 22 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path are inconsistent at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:520) ... 23 more 2015-02-12 15:45:51,881 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task {noformat} was: I have a HiveStorageHandler if I do {{select * from myTable}} it returns all the rows in the underlying storage. When I do something like {{select col1 from myTable}} the underlying mapreduce job throws an exception {code} java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:413) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.Refle
[jira] [Updated] (HIVE-14221) set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER
[ https://issues.apache.org/jira/browse/HIVE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14221: --- Attachment: HIVE-14221.03.patch > set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER > > > Key: HIVE-14221 > URL: https://issues.apache.org/jira/browse/HIVE-14221 > Project: Hive > Issue Type: Sub-task > Components: Security >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Fix For: 2.1.0 > > Attachments: HIVE-14221.01.patch, HIVE-14221.02.patch, > HIVE-14221.03.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14221) set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER
[ https://issues.apache.org/jira/browse/HIVE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14221: --- Status: Open (was: Patch Available) > set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER > > > Key: HIVE-14221 > URL: https://issues.apache.org/jira/browse/HIVE-14221 > Project: Hive > Issue Type: Sub-task > Components: Security >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Fix For: 2.1.0 > > Attachments: HIVE-14221.01.patch, HIVE-14221.02.patch, > HIVE-14221.03.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14221) set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER
[ https://issues.apache.org/jira/browse/HIVE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14221: --- Status: Patch Available (was: Open) > set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER > > > Key: HIVE-14221 > URL: https://issues.apache.org/jira/browse/HIVE-14221 > Project: Hive > Issue Type: Sub-task > Components: Security >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Fix For: 2.1.0 > > Attachments: HIVE-14221.01.patch, HIVE-14221.02.patch, > HIVE-14221.03.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14256) CBO: Rewrite aggregate + distinct as 3-stage DAG
[ https://issues.apache.org/jira/browse/HIVE-14256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380402#comment-15380402 ] Gopal V commented on HIVE-14256: Both the bugs I filed today manage to miss the optimizations already built. {code} hive> set hive.optimize.distinct.rewrite; hive.optimize.distinct.rewrite=true hive> explain select sum(ss_net_profit), count(distinct ss_customer_sk) from store_sales; OK Plan optimized by CBO. Vertex dependency in root stage Reducer 2 <- Map 1 (SIMPLE_EDGE) Stage-0 Fetch Operator limit:-1 Stage-1 Reducer 2 llap File Output Operator [FS_6] Group By Operator [GBY_4] (rows=1 width=24) Output:["_col0","_col1"],aggregations:["sum(VALUE._col0)","count(DISTINCT KEY._col0:0._col0)"] <-Map 1 [SIMPLE_EDGE] vectorized, llap SHUFFLE [RS_9] Group By Operator [GBY_8] (rows=547946325 width=168) Output:["_col0","_col1","_col2"],aggregations:["sum(ss_net_profit)","count(DISTINCT ss_customer_sk)"],keys:ss_customer_sk Select Operator [SEL_7] (rows=547946325 width=168) Output:["ss_customer_sk","ss_net_profit"] TableScan [TS_0] (rows=547946325 width=168) tpcds_bin_partitioned_orc_200@store_sales,store_sales,Tbl:PARTIAL,Col:NONE,Output:["ss_customer_sk","ss_net_profit"] Time taken: 3.32 seconds, Fetched: 22 row(s) hive> {code} Single shuffle, no map-side reduction because of the distinct. > CBO: Rewrite aggregate + distinct as 3-stage DAG > > > Key: HIVE-14256 > URL: https://issues.apache.org/jira/browse/HIVE-14256 > Project: Hive > Issue Type: Improvement > Components: CBO >Affects Versions: 2.2.0 >Reporter: Gopal V > > {code} > select sum(ss_net_profit), count(distinct ss_customer_sk) from store_sales; > {code} > is very slow, while manually sub-aggregating this makes it much faster. > {code} > select sum(v), count(c) from > ( select sum(ss_net_profit) as v, ss_customer_sk as k from store_sales group > by ss_customer_sk); > {code} > Query28 in TPC-DS would be an example of whether this would be valuable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14256) CBO: Rewrite aggregate + distinct as 3-stage DAG
[ https://issues.apache.org/jira/browse/HIVE-14256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380401#comment-15380401 ] Ashutosh Chauhan commented on HIVE-14256: - set hive.optimize.distinct.rewrite=true ? > CBO: Rewrite aggregate + distinct as 3-stage DAG > > > Key: HIVE-14256 > URL: https://issues.apache.org/jira/browse/HIVE-14256 > Project: Hive > Issue Type: Improvement > Components: CBO >Affects Versions: 2.2.0 >Reporter: Gopal V > > {code} > select sum(ss_net_profit), count(distinct ss_customer_sk) from store_sales; > {code} > is very slow, while manually sub-aggregating this makes it much faster. > {code} > select sum(v), count(c) from > ( select sum(ss_net_profit) as v, ss_customer_sk as k from store_sales group > by ss_customer_sk); > {code} > Query28 in TPC-DS would be an example of whether this would be valuable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14224) LLAP rename query specific log files once a query is complete
[ https://issues.apache.org/jira/browse/HIVE-14224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380393#comment-15380393 ] Siddharth Seth commented on HIVE-14224: --- Have to check whether any of the other log4j2 config files need updating after moving to log4j 2.6.2 > LLAP rename query specific log files once a query is complete > - > > Key: HIVE-14224 > URL: https://issues.apache.org/jira/browse/HIVE-14224 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14224.02.patch, HIVE-14224.wip.01.patch > > > Once a query is complete, rename the query specific log file so that YARN can > aggregate the logs (once it's configured to do so). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14224) LLAP rename query specific log files once a query is complete
[ https://issues.apache.org/jira/browse/HIVE-14224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380390#comment-15380390 ] Prasanth Jayachandran commented on HIVE-14224: -- Left some comments in RB > LLAP rename query specific log files once a query is complete > - > > Key: HIVE-14224 > URL: https://issues.apache.org/jira/browse/HIVE-14224 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14224.02.patch, HIVE-14224.wip.01.patch > > > Once a query is complete, rename the query specific log file so that YARN can > aggregate the logs (once it's configured to do so). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14245) NoClassDefFoundError when starting LLAP daemon
[ https://issues.apache.org/jira/browse/HIVE-14245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380384#comment-15380384 ] Hive QA commented on HIVE-14245: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12818083/HIVE-14245.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10326 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_join_filters org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testCheckPermissions org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testGetToken org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testConnections {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/530/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/530/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-530/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12818083 - PreCommit-HIVE-MASTER-Build > NoClassDefFoundError when starting LLAP daemon > -- > > Key: HIVE-14245 > URL: https://issues.apache.org/jira/browse/HIVE-14245 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-14245.1.patch > > > Env: hive master branch > {noformat} > 2016-07-14T20:40:00,646 WARN [main[]] conf.Configuration: hive-site.xml:an > attempt to override final parameter: > hive.server2.tez.sessions.per.default.queue; Ignoring. > 2016-07-14T20:40:00,652 WARN [main[]] impl.LlapDaemon: Failed to start LLAP > Daemon with exception > java.lang.NoClassDefFoundError: > org/apache/hadoop/registry/client/binding/RegistryUtils$ServiceRecordMarshal > at > org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.(LlapZookeeperRegistryImpl.java:134) > ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.serviceInit(LlapRegistryService.java:84) > ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ~[hadoop-common-2.7.1.jar:?] > at > org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.serviceStart(LlapDaemon.java:369) > ~[hive-llap-server-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > ~[hadoop-common-2.7.1.jar:?] > at > org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.main(LlapDaemon.java:460) > [hive-llap-server-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.registry.client.binding.RegistryUtils$ServiceRecordMarshal > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > ~[?:1.8.0_65] > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_65] > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > ~[?:1.8.0_65] > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_65] > ... 6 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14257) CBO: Push Join through Groupby to trigger shuffle reductions
[ https://issues.apache.org/jira/browse/HIVE-14257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380365#comment-15380365 ] Gopal V edited comment on HIVE-14257 at 7/16/16 12:32 AM: -- The {{SyntheticJoinPredicate}} is actually in place to indicate these relationships. However those are not leveraged for anything other than the DPP runtime. RedundantDynamicPruningConditionsRemoval removes them for all non-partitioned columns. was (Author: gopalv): The {{SyntheticJoinPredicate}} is actually in place to indicate these relationships. However those are not leveraged for anything other than the DPP runtime. > CBO: Push Join through Groupby to trigger shuffle reductions > > > Key: HIVE-14257 > URL: https://issues.apache.org/jira/browse/HIVE-14257 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Gopal V > > Similar to the optimizations in hive, already which push aggregates through a > join (hive.transpose.aggr.join=true). > {code} > select count(v) from (select d_year, count(ss_item_sk) as v from store_sales, > date_dim where ss_sold_date_sk=d_Date_sk group by d_year) w, date_dim d where > d.d_year = w.d_year and d_date_sk = 1; > {code} > currently produces an entire aggregate of all years before discarding all of > that (because obviously, there's no data for d_date_sk=1; > This particular example is a simplified version of TPC-DS Query59's join > condition, which can have a reduction in scans by applying the d_month_seq > between 1185 and 1185 + 11 into the wss alias. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14254) Correct the hive version by changing "svn" to "git"
[ https://issues.apache.org/jira/browse/HIVE-14254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Li updated HIVE-14254: -- Affects Version/s: 2.1.0 Status: Patch Available (was: Open) > Correct the hive version by changing "svn" to "git" > --- > > Key: HIVE-14254 > URL: https://issues.apache.org/jira/browse/HIVE-14254 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 2.1.0 >Reporter: Tao Li >Assignee: Tao Li >Priority: Minor > Attachments: HIVE-14254.1.patch > > Original Estimate: 2h > Remaining Estimate: 2h > > When running "hive --version", "subversion" is displayed below, which should > be "git". > $ hive --version > Hive 2.1.0-SNAPSHOT > Subversion git:// -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14257) CBO: Push Join through Groupby to trigger shuffle reductions
[ https://issues.apache.org/jira/browse/HIVE-14257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380365#comment-15380365 ] Gopal V commented on HIVE-14257: The {{SyntheticJoinPredicate}} is actually in place to indicate these relationships. However those are not leveraged for anything other than the DPP runtime. > CBO: Push Join through Groupby to trigger shuffle reductions > > > Key: HIVE-14257 > URL: https://issues.apache.org/jira/browse/HIVE-14257 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Gopal V > > Similar to the optimizations in hive, already which push aggregates through a > join (hive.transpose.aggr.join=true). > {code} > select count(v) from (select d_year, count(ss_item_sk) as v from store_sales, > date_dim where ss_sold_date_sk=d_Date_sk group by d_year) w, date_dim d where > d.d_year = w.d_year and d_date_sk = 1; > {code} > currently produces an entire aggregate of all years before discarding all of > that (because obviously, there's no data for d_date_sk=1; > This particular example is a simplified version of TPC-DS Query59's join > condition, which can have a reduction in scans by applying the d_month_seq > between 1185 and 1185 + 11 into the wss alias. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13756) Map failure attempts to delete reducer _temporary directory on multi-query pig query
[ https://issues.apache.org/jira/browse/HIVE-13756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380352#comment-15380352 ] Mithun Radhakrishnan commented on HIVE-13756: - IMHO, the qtest failures here are irrelevant. This is a fix in HCat. > Map failure attempts to delete reducer _temporary directory on multi-query > pig query > > > Key: HIVE-13756 > URL: https://issues.apache.org/jira/browse/HIVE-13756 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.2.1, 2.0.0 >Reporter: Chris Drome >Assignee: Chris Drome > Attachments: HIVE-13756-branch-1.patch, HIVE-13756.1-branch-1.patch, > HIVE-13756.1.patch, HIVE-13756.patch > > > A pig script, executed with multi-query enabled, that reads the source data > and writes it as-is into TABLE_A as well as performing a group-by operation > on the data which is written into TABLE_B can produce erroneous results if > any map fails. This results in a single MR job that writes the map output to > a scratch directory relative to TABLE_A and the reducer output to a scratch > directory relative to TABLE_B. > If one or more maps fail it will delete the attempt data relative to TABLE_A, > but it also deletes the _temporary directory relative to TABLE_B. This has > the unintended side-effect of preventing subsequent maps from committing > their data. This means that any maps which successfully completed before the > first map failure will have its data committed as expected, other maps not, > resulting in an incomplete result set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14256) CBO: Rewrite aggregate + distinct as 3-stage DAG
[ https://issues.apache.org/jira/browse/HIVE-14256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-14256: --- Description: {code} select sum(ss_net_profit), count(distinct ss_customer_sk) from store_sales; {code} is very slow, while manually sub-aggregating this makes it much faster. {code} select sum(v), count(c) from ( select sum(ss_net_profit) as v, ss_customer_sk as k from store_sales group by ss_customer_sk); {code} Query28 in TPC-DS would be an example of whether this would be valuable. was: {code} select sum(ss_net_profit), count(distinct ss_customer_sk) from store_sales; {code} is very slow, while manually sub-aggregating this makes it much faster. {code} select sum(v), count(c) from ( select sum(ss_net_profit) as v, ss_customer_sk as k from store_sales group by ss_customer_sk); {code} > CBO: Rewrite aggregate + distinct as 3-stage DAG > > > Key: HIVE-14256 > URL: https://issues.apache.org/jira/browse/HIVE-14256 > Project: Hive > Issue Type: Improvement > Components: CBO >Affects Versions: 2.2.0 >Reporter: Gopal V > > {code} > select sum(ss_net_profit), count(distinct ss_customer_sk) from store_sales; > {code} > is very slow, while manually sub-aggregating this makes it much faster. > {code} > select sum(v), count(c) from > ( select sum(ss_net_profit) as v, ss_customer_sk as k from store_sales group > by ss_customer_sk); > {code} > Query28 in TPC-DS would be an example of whether this would be valuable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14254) Correct the hive version by changing "svn" to "git"
[ https://issues.apache.org/jira/browse/HIVE-14254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Li updated HIVE-14254: -- Attachment: HIVE-14254.1.patch First iteration of bug fix > Correct the hive version by changing "svn" to "git" > --- > > Key: HIVE-14254 > URL: https://issues.apache.org/jira/browse/HIVE-14254 > Project: Hive > Issue Type: Bug > Components: CLI >Reporter: Tao Li >Assignee: Tao Li >Priority: Minor > Attachments: HIVE-14254.1.patch > > Original Estimate: 2h > Remaining Estimate: 2h > > When running "hive --version", "subversion" is displayed below, which should > be "git". > $ hive --version > Hive 2.1.0-SNAPSHOT > Subversion git:// -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14198) Refactor aux jar related code to make them more consistent
[ https://issues.apache.org/jira/browse/HIVE-14198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380336#comment-15380336 ] Mohit Sabharwal commented on HIVE-14198: +1 > Refactor aux jar related code to make them more consistent > -- > > Key: HIVE-14198 > URL: https://issues.apache.org/jira/browse/HIVE-14198 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Affects Versions: 2.2.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-14198.1.patch, HIVE-14198.2.patch > > > There are some redundancy and inconsistency between hive.aux.jar.paths and > hive.reloadable.aux.jar.paths and also between MR and spark. > Refactor the code to reuse the same code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14211) AcidUtils.getAcidState()/Cleaner - make it consistent wrt multiple base files etc
[ https://issues.apache.org/jira/browse/HIVE-14211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380331#comment-15380331 ] Eugene Koifman commented on HIVE-14211: --- Suggestion from [~owen.omalley]: add a configurable limit on how long a write transaction can be open for and then make Cleaner not clean anything that may be needed by any open txn. This will slow down GC but won't penalize the readers. > AcidUtils.getAcidState()/Cleaner - make it consistent wrt multiple base files > etc > - > > Key: HIVE-14211 > URL: https://issues.apache.org/jira/browse/HIVE-14211 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > > The JavaDoc on getAcidState() reads, in part: > "Note that because major compactions don't >preserve the history, we can't use a base directory that includes a >transaction id that we must exclude." > which is correct but there is nothing in the code that does this. > And if we detect a situation where txn X must be excluded but and there are > deltas that contain X, we'll have to abort the txn. This can't (reasonably) > happen with auto commit mode, but with multi statement txns it's possible. > Suppose some long running txn starts and lock in snapshot at 17 (HWM). An > hour later it decides to access some partition for which all txns < 20 (for > example) have already been compacted (i.e. GC'd). > == > Here is a more concrete example. Let's say the file for table A are as > follows and created in the order listed. > delta_4_4 > delta_5_5 > delta_4_5 > base_5 > delta_16_16 > delta_17_17 > base_17 (for example user ran major compaction) > let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 > and ExceptionList=<16> > Assume that all txns <= 20 commit. > Reader can't use base_17 because it has result of txn16. So it should chose > base_5 "TxnBase bestBase" in _getChildState()_. > Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and > delta_17_17 in _Directory_ object. This would represent acceptable snapshot > for such reader. > The issue is if at the same time the Cleaner process is running. It will see > everything with txnid<17 as obsolete. Then it will check lock manger state > and decide to delete (as there may not be any locks in LM for table A). The > order in which the files are deleted is undefined right now. It may delete > delta_16_16 and delta_17_17 first and right at this moment the read request > with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by > some multi-stmt txn that started some time ago. It acquires locks after the > Cleaner checks LM state and calls getAcidState(). This request will choose > base_5 but it won't see delta_16_16 and delta_17_17 and thus return the > snapshot w/o modifications made by those txns. > [This is not possible currently since we only support autoCommit=true. The > reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) > locks in the snapshot. The cleaner won't delete anything for a given > compaction (partition) if there are locks on it. Thus for duration of the > transaction, nothing will be deleted so it's safe to use base_5] > This is a subtle race condition but possible. > 1. So the safest thing to do to ensure correctness is to use the latest > base_x as the "best" and check against exceptions in ValidTxnList and throw > an exception if there is an exception <=x. > 2. A better option is to keep 2 exception lists: aborted and open and only > throw if there is an open txn <=x. Compaction throws away data from aborted > txns and thus there is no harm using base with aborted txns in its range. > 3. You could make each txn record the lowest open txn id at its start and > prevent the cleaner from cleaning anything delta with id range that includes > this open txn id for any txn that is still running. This has a drawback of > potentially delaying GC of old files for arbitrarily long periods. So this > should be a user config choice. The implementation is not trivial. > I would go with 1 now and do 2/3 together with multi-statement txn work. > Side note: if 2 deltas have overlapping ID range, then 1 must be a subset of > the other -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14224) LLAP rename query specific log files once a query is complete
[ https://issues.apache.org/jira/browse/HIVE-14224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14224: -- Status: Patch Available (was: Open) > LLAP rename query specific log files once a query is complete > - > > Key: HIVE-14224 > URL: https://issues.apache.org/jira/browse/HIVE-14224 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14224.02.patch, HIVE-14224.wip.01.patch > > > Once a query is complete, rename the query specific log file so that YARN can > aggregate the logs (once it's configured to do so). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14224) LLAP rename query specific log files once a query is complete
[ https://issues.apache.org/jira/browse/HIVE-14224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14224: -- Attachment: HIVE-14224.02.patch Updated patch. This has the changes the query-router to log using he queryId and dagId. This is 1) to separate files for multi-stage queries, and 2) to make it easy to identify the dagId associated with a queryId (Eventually Hive will hopefully make this available via HS2). Also updated the HistoryLogger to include a time setup - otherwise log4j2 2.6.x complains about no date pattern despit using a Time+Size policy. The bit about not overwriting files for ext queries needs to be fixed. I'll take care of that in HIVE-14225 or a jira after that which updates the log links on the UI. [~prasanth_j] - could you please review. > LLAP rename query specific log files once a query is complete > - > > Key: HIVE-14224 > URL: https://issues.apache.org/jira/browse/HIVE-14224 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14224.02.patch, HIVE-14224.wip.01.patch > > > Once a query is complete, rename the query specific log file so that YARN can > aggregate the logs (once it's configured to do so). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file
[ https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-13369: -- Attachment: HIVE-13369.4.patch patch 4 addressing [~owen.omalley]'s comments > AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing > the "best" base file > -- > > Key: HIVE-13369 > URL: https://issues.apache.org/jira/browse/HIVE-13369 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-13369.1.patch, HIVE-13369.2.patch, > HIVE-13369.3.patch, HIVE-13369.4.patch > > > The JavaDoc on getAcidState() reads, in part: > "Note that because major compactions don't >preserve the history, we can't use a base directory that includes a >transaction id that we must exclude." > which is correct but there is nothing in the code that does this. > And if we detect a situation where txn X must be excluded but and there are > deltas that contain X, we'll have to abort the txn. This can't (reasonably) > happen with auto commit mode, but with multi statement txns it's possible. > Suppose some long running txn starts and lock in snapshot at 17 (HWM). An > hour later it decides to access some partition for which all txns < 20 (for > example) have already been compacted (i.e. GC'd). > == > Here is a more concrete example. Let's say the file for table A are as > follows and created in the order listed. > delta_4_4 > delta_5_5 > delta_4_5 > base_5 > delta_16_16 > delta_17_17 > base_17 (for example user ran major compaction) > let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 > and ExceptionList=<16> > Assume that all txns <= 20 commit. > Reader can't use base_17 because it has result of txn16. So it should chose > base_5 "TxnBase bestBase" in _getChildState()_. > Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and > delta_17_17 in _Directory_ object. This would represent acceptable snapshot > for such reader. > The issue is if at the same time the Cleaner process is running. It will see > everything with txnid<17 as obsolete. Then it will check lock manger state > and decide to delete (as there may not be any locks in LM for table A). The > order in which the files are deleted is undefined right now. It may delete > delta_16_16 and delta_17_17 first and right at this moment the read request > with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by > some multi-stmt txn that started some time ago. It acquires locks after the > Cleaner checks LM state and calls getAcidState(). This request will choose > base_5 but it won't see delta_16_16 and delta_17_17 and thus return the > snapshot w/o modifications made by those txns. > [This is not possible currently since we only support autoCommit=true. The > reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) > locks in the snapshot. The cleaner won't delete anything for a given > compaction (partition) if there are locks on it. Thus for duration of the > transaction, nothing will be deleted so it's safe to use base_5] > This is a subtle race condition but possible. > 1. So the safest thing to do to ensure correctness is to use the latest > base_x as the "best" and check against exceptions in ValidTxnList and throw > an exception if there is an exception <=x. > 2. A better option is to keep 2 exception lists: aborted and open and only > throw if there is an open txn <=x. Compaction throws away data from aborted > txns and thus there is no harm using base with aborted txns in its range. > 3. You could make each txn record the lowest open txn id at its start and > prevent the cleaner from cleaning anything delta with id range that includes > this open txn id for any txn that is still running. This has a drawback of > potentially delaying GC of old files for arbitrarily long periods. So this > should be a user config choice. The implementation is not trivial. > I would go with 1 now and do 2/3 together with multi-statement txn work. > Side note: if 2 deltas have overlapping ID range, then 1 must be a subset of > the other -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time
[ https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380321#comment-15380321 ] Hari Sankar Sivarama Subramaniyan commented on HIVE-13995: -- I tried to modify this query by adding PARTNAME to the group by columns and do a join on PART_COL_STATS and PARTITIONS column but it turns out that the query will be semantically incorrect. One of the ways to do this in a single query will be along these lines which I am trying as of now: {code} WITH sq1 as SELECT 'a', sq1.* from sq1 UNION ALL select b, from PART_COLS where PartId in (SELECT partid from sq1) group by COLNAME, COLTYPE {code} This way we can use the tags 'a' and 'b' to distinguish between the rows coming of the UNION ALL query. > Hive generates inefficient metastore queries for TPCDS tables with 1800+ > partitions leading to higher compile time > -- > > Key: HIVE-13995 > URL: https://issues.apache.org/jira/browse/HIVE-13995 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Nita Dembla >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch > > > TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when > the query does not a filter on the partition column, metastore queries > generated have a large IN clause listing all the partition names. Most RDBMS > systems have issues optimizing large IN clause and even when a good index > plan is chosen , comparing to 1800+ string values will not lead to best > execution time. > When all partitions are chosen, not specifying the partition list and having > filters only on table and column name will generate the same result set as > long as there are no concurrent modifications to partition list of the hive > table (adding/dropping partitions). > For eg: For TPCDS query18, the metastore query gathering partition column > statistics runs in 0.5 secs in Mysql. Following is output from mysql log > {noformat} > -- Query_time: 0.482063 Lock_time: 0.003037 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' > and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" in > ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654') > group by "PARTITION_NAME"; > {noformat} > Functionally equivalent query runs in 0.1 seconds > {noformat} > --Query_time: 0.121296 Lock_time: 0.000156 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > group by "PARTITION_NAME"; > {noformat} > If removing the partition list seems drastic, its also possible to simply > list the range since hive gets a ordered list of partition names. This > performs equally well as earlier query > {noformat} > # Query_time: 0.143874 Lock_time: 0.000154 Rows_sent: 1836 Rows_examined: > 18360 > SET timestamp=1464014881; > select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = > 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales' and > "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','
[jira] [Commented] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file
[ https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380280#comment-15380280 ] Owen O'Malley commented on HIVE-13369: -- Comments: * Your lines are way too long. Please limit them to 100 chars (80 is better!). * You have some trailing spaces that should be removed * You removed the line breaks from: {code} if (txnList.isTxnRangeValid(delta.minTransaction, delta.maxTransaction) != ValidTxnList.RangeResponse.NONE) { {code} * I think that if you are in the case were none of the bases are sufficient and there were compacted bases that include too much, it should always be an exception. Basically, you are almost never going to get complete coverage of all deltas. Other than that, it looks good. > AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing > the "best" base file > -- > > Key: HIVE-13369 > URL: https://issues.apache.org/jira/browse/HIVE-13369 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-13369.1.patch, HIVE-13369.2.patch, > HIVE-13369.3.patch > > > The JavaDoc on getAcidState() reads, in part: > "Note that because major compactions don't >preserve the history, we can't use a base directory that includes a >transaction id that we must exclude." > which is correct but there is nothing in the code that does this. > And if we detect a situation where txn X must be excluded but and there are > deltas that contain X, we'll have to abort the txn. This can't (reasonably) > happen with auto commit mode, but with multi statement txns it's possible. > Suppose some long running txn starts and lock in snapshot at 17 (HWM). An > hour later it decides to access some partition for which all txns < 20 (for > example) have already been compacted (i.e. GC'd). > == > Here is a more concrete example. Let's say the file for table A are as > follows and created in the order listed. > delta_4_4 > delta_5_5 > delta_4_5 > base_5 > delta_16_16 > delta_17_17 > base_17 (for example user ran major compaction) > let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 > and ExceptionList=<16> > Assume that all txns <= 20 commit. > Reader can't use base_17 because it has result of txn16. So it should chose > base_5 "TxnBase bestBase" in _getChildState()_. > Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and > delta_17_17 in _Directory_ object. This would represent acceptable snapshot > for such reader. > The issue is if at the same time the Cleaner process is running. It will see > everything with txnid<17 as obsolete. Then it will check lock manger state > and decide to delete (as there may not be any locks in LM for table A). The > order in which the files are deleted is undefined right now. It may delete > delta_16_16 and delta_17_17 first and right at this moment the read request > with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by > some multi-stmt txn that started some time ago. It acquires locks after the > Cleaner checks LM state and calls getAcidState(). This request will choose > base_5 but it won't see delta_16_16 and delta_17_17 and thus return the > snapshot w/o modifications made by those txns. > [This is not possible currently since we only support autoCommit=true. The > reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) > locks in the snapshot. The cleaner won't delete anything for a given > compaction (partition) if there are locks on it. Thus for duration of the > transaction, nothing will be deleted so it's safe to use base_5] > This is a subtle race condition but possible. > 1. So the safest thing to do to ensure correctness is to use the latest > base_x as the "best" and check against exceptions in ValidTxnList and throw > an exception if there is an exception <=x. > 2. A better option is to keep 2 exception lists: aborted and open and only > throw if there is an open txn <=x. Compaction throws away data from aborted > txns and thus there is no harm using base with aborted txns in its range. > 3. You could make each txn record the lowest open txn id at its start and > prevent the cleaner from cleaning anything delta with id range that includes > this open txn id for any txn that is still running. This has a drawback of > potentially delaying GC of old files for arbitrarily long periods. So this > should be a user config choice. The implementation is not trivial. > I would go with 1 now a
[jira] [Updated] (HIVE-9756) LLAP: use log4j 2 for llap (log to separate files, etc.)
[ https://issues.apache.org/jira/browse/HIVE-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-9756: Resolution: Fixed Fix Version/s: 2.2.0 Status: Resolved (was: Patch Available) Committed to master. Thanks [~sseth] for the reviews! > LLAP: use log4j 2 for llap (log to separate files, etc.) > > > Key: HIVE-9756 > URL: https://issues.apache.org/jira/browse/HIVE-9756 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Gunther Hagleitner >Assignee: Prasanth Jayachandran > Fix For: 2.2.0 > > Attachments: HIVE-9756.1.patch, HIVE-9756.10.patch, > HIVE-9756.2.patch, HIVE-9756.3.patch, HIVE-9756.4.patch, HIVE-9756.4.patch, > HIVE-9756.5.patch, HIVE-9756.6.patch, HIVE-9756.7.patch, HIVE-9756.8.patch, > HIVE-9756.9.patch > > > For the INFO logging, we'll need to use the log4j-jcl 2.x upgrade-path to get > throughput friendly logging. > http://logging.apache.org/log4j/2.0/manual/async.html#Performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-14255) Backport part of HIVE-12950 to fix "Can not create Path from an emtpy string" exception
[ https://issues.apache.org/jira/browse/HIVE-14255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran resolved HIVE-14255. -- Resolution: Fixed Fix Version/s: 1.3.0 Committed to branch-1 > Backport part of HIVE-12950 to fix "Can not create Path from an emtpy string" > exception > --- > > Key: HIVE-14255 > URL: https://issues.apache.org/jira/browse/HIVE-14255 > Project: Hive > Issue Type: Bug >Affects Versions: 1.3.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Fix For: 1.3.0 > > Attachments: HIVE-14255-branch-1.patch > > > HIVE-12950 already fixes the issue but that patch many unrelated changes that > are not in branch-1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
[ https://issues.apache.org/jira/browse/HIVE-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380250#comment-15380250 ] Hive QA commented on HIVE-13974: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12818074/HIVE-13974.094.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/529/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/529/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-529/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.8.0_25 ]] + export JAVA_HOME=/usr/java/jdk1.8.0_25 + JAVA_HOME=/usr/java/jdk1.8.0_25 + export PATH=/usr/java/jdk1.8.0_25/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/java/jdk1.8.0_25/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-MASTER-Build-529/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + cd apache-github-source-source + git fetch origin >From https://github.com/apache/hive a1d6b2d..8c11d37 master -> origin/master 2bda91a..f2b5564 branch-1 -> origin/branch-1 a177473..6ba22f4 branch-2.1 -> origin/branch-2.1 + git reset --hard HEAD HEAD is now at a1d6b2d HIVE-14241 Acid clashes with ConfVars.HIVEFETCHTASKCONVERSION <> none (Eugene Koifman, reviewed by Ashutosh Chauhan) + git clean -f -d + git checkout master Already on 'master' Your branch is behind 'origin/master' by 2 commits, and can be fast-forwarded. (use "git pull" to update your local branch) + git reset --hard origin/master HEAD is now at 8c11d37 HIVE-14253: Fix MinimrCliDriver test failure (Prasanth Jayachandran reviewed by Ashutosh Chauhan) + git merge --ff-only origin/master Already up-to-date. + git gc + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12818074 - PreCommit-HIVE-MASTER-Build > ORC Schema Evolution doesn't support add columns to non-last STRUCT columns > --- > > Key: HIVE-13974 > URL: https://issues.apache.org/jira/browse/HIVE-13974 > Project: Hive > Issue Type: Bug > Components: Hive, ORC, Transactions >Affects Versions: 1.3.0, 2.1.0, 2.2.0 >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Blocker > Attachments: HIVE-13974.01.patch, HIVE-13974.02.patch, > HIVE-13974.03.patch, HIVE-13974.04.patch, HIVE-13974.05.WIP.patch, > HIVE-13974.06.patch, HIVE-13974.07.patch, HIVE-13974.08.patch, > HIVE-13974.09.patch, HIVE-13974.091.patch, HIVE-13974.092.patch, > HIVE-13974.093.patch, HIVE-13974.094.patch > > > Currently, the included columns are based on the fileSchema and not the > readerSchema which doesn't work for adding columns to non-last STRUCT data > type columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9756) LLAP: use log4j 2 for llap (log to separate files, etc.)
[ https://issues.apache.org/jira/browse/HIVE-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380247#comment-15380247 ] Siddharth Seth commented on HIVE-9756: -- +1. Looks good. Don't think we need to wait for .10 to go through precommit. > LLAP: use log4j 2 for llap (log to separate files, etc.) > > > Key: HIVE-9756 > URL: https://issues.apache.org/jira/browse/HIVE-9756 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Gunther Hagleitner >Assignee: Prasanth Jayachandran > Attachments: HIVE-9756.1.patch, HIVE-9756.10.patch, > HIVE-9756.2.patch, HIVE-9756.3.patch, HIVE-9756.4.patch, HIVE-9756.4.patch, > HIVE-9756.5.patch, HIVE-9756.6.patch, HIVE-9756.7.patch, HIVE-9756.8.patch, > HIVE-9756.9.patch > > > For the INFO logging, we'll need to use the log4j-jcl 2.x upgrade-path to get > throughput friendly logging. > http://logging.apache.org/log4j/2.0/manual/async.html#Performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14255) Backport part of HIVE-12950 to fix "Can not create Path from an emtpy string" exception
[ https://issues.apache.org/jira/browse/HIVE-14255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380244#comment-15380244 ] Gunther Hagleitner commented on HIVE-14255: --- +1 > Backport part of HIVE-12950 to fix "Can not create Path from an emtpy string" > exception > --- > > Key: HIVE-14255 > URL: https://issues.apache.org/jira/browse/HIVE-14255 > Project: Hive > Issue Type: Bug >Affects Versions: 1.3.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14255-branch-1.patch > > > HIVE-12950 already fixes the issue but that patch many unrelated changes that > are not in branch-1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9756) LLAP: use log4j 2 for llap (log to separate files, etc.)
[ https://issues.apache.org/jira/browse/HIVE-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380242#comment-15380242 ] Hive QA commented on HIVE-9756: --- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12818129/HIVE-9756.9.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10322 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestMinimrCliDriver.org.apache.hadoop.hive.cli.TestMinimrCliDriver org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testCheckPermissions org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testGetToken org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testConnections {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/528/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/528/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-528/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12818129 - PreCommit-HIVE-MASTER-Build > LLAP: use log4j 2 for llap (log to separate files, etc.) > > > Key: HIVE-9756 > URL: https://issues.apache.org/jira/browse/HIVE-9756 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Gunther Hagleitner >Assignee: Prasanth Jayachandran > Attachments: HIVE-9756.1.patch, HIVE-9756.10.patch, > HIVE-9756.2.patch, HIVE-9756.3.patch, HIVE-9756.4.patch, HIVE-9756.4.patch, > HIVE-9756.5.patch, HIVE-9756.6.patch, HIVE-9756.7.patch, HIVE-9756.8.patch, > HIVE-9756.9.patch > > > For the INFO logging, we'll need to use the log4j-jcl 2.x upgrade-path to get > throughput friendly logging. > http://logging.apache.org/log4j/2.0/manual/async.html#Performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file
[ https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380239#comment-15380239 ] Eugene Koifman commented on HIVE-13369: --- uhm, that's not true. ValidCompactorTxnList ensures there are no open txns in the range being compacted. HIVE-8966 > AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing > the "best" base file > -- > > Key: HIVE-13369 > URL: https://issues.apache.org/jira/browse/HIVE-13369 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-13369.1.patch, HIVE-13369.2.patch, > HIVE-13369.3.patch > > > The JavaDoc on getAcidState() reads, in part: > "Note that because major compactions don't >preserve the history, we can't use a base directory that includes a >transaction id that we must exclude." > which is correct but there is nothing in the code that does this. > And if we detect a situation where txn X must be excluded but and there are > deltas that contain X, we'll have to abort the txn. This can't (reasonably) > happen with auto commit mode, but with multi statement txns it's possible. > Suppose some long running txn starts and lock in snapshot at 17 (HWM). An > hour later it decides to access some partition for which all txns < 20 (for > example) have already been compacted (i.e. GC'd). > == > Here is a more concrete example. Let's say the file for table A are as > follows and created in the order listed. > delta_4_4 > delta_5_5 > delta_4_5 > base_5 > delta_16_16 > delta_17_17 > base_17 (for example user ran major compaction) > let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 > and ExceptionList=<16> > Assume that all txns <= 20 commit. > Reader can't use base_17 because it has result of txn16. So it should chose > base_5 "TxnBase bestBase" in _getChildState()_. > Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and > delta_17_17 in _Directory_ object. This would represent acceptable snapshot > for such reader. > The issue is if at the same time the Cleaner process is running. It will see > everything with txnid<17 as obsolete. Then it will check lock manger state > and decide to delete (as there may not be any locks in LM for table A). The > order in which the files are deleted is undefined right now. It may delete > delta_16_16 and delta_17_17 first and right at this moment the read request > with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by > some multi-stmt txn that started some time ago. It acquires locks after the > Cleaner checks LM state and calls getAcidState(). This request will choose > base_5 but it won't see delta_16_16 and delta_17_17 and thus return the > snapshot w/o modifications made by those txns. > [This is not possible currently since we only support autoCommit=true. The > reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) > locks in the snapshot. The cleaner won't delete anything for a given > compaction (partition) if there are locks on it. Thus for duration of the > transaction, nothing will be deleted so it's safe to use base_5] > This is a subtle race condition but possible. > 1. So the safest thing to do to ensure correctness is to use the latest > base_x as the "best" and check against exceptions in ValidTxnList and throw > an exception if there is an exception <=x. > 2. A better option is to keep 2 exception lists: aborted and open and only > throw if there is an open txn <=x. Compaction throws away data from aborted > txns and thus there is no harm using base with aborted txns in its range. > 3. You could make each txn record the lowest open txn id at its start and > prevent the cleaner from cleaning anything delta with id range that includes > this open txn id for any txn that is still running. This has a drawback of > potentially delaying GC of old files for arbitrarily long periods. So this > should be a user config choice. The implementation is not trivial. > I would go with 1 now and do 2/3 together with multi-statement txn work. > Side note: if 2 deltas have overlapping ID range, then 1 must be a subset of > the other -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14253) Fix MinimrCliDriver test failure
[ https://issues.apache.org/jira/browse/HIVE-14253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-14253: - Resolution: Fixed Fix Version/s: 2.1.1 2.2.0 Status: Resolved (was: Patch Available) Committed to branch-2.1 and master > Fix MinimrCliDriver test failure > > > Key: HIVE-14253 > URL: https://issues.apache.org/jira/browse/HIVE-14253 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0, 2.1.1 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Fix For: 2.2.0, 2.1.1 > > Attachments: HIVE-14253.1.patch > > > MinimrCliDriver test is failing with the following exception for > bucket_num_reducers2.q test case > {code} > junit.framework.AssertionFailedError: Number of MapReduce jobs is incorrect > expected:<1> but was:<0> > at junit.framework.Assert.fail(Assert.java:57) > at junit.framework.Assert.failNotEquals(Assert.java:329) > at junit.framework.Assert.assertEquals(Assert.java:78) > at junit.framework.Assert.assertEquals(Assert.java:234) > at > org.apache.hadoop.hive.ql.hooks.VerifyNumReducersHook.run(VerifyNumReducersHook.java:46) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1664) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1313) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1082) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1070) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335) > at org.apache.hadoop.hive.ql.QTestUtil.cleanUp(QTestUtil.java:849) > at org.apache.hadoop.hive.ql.QTestUtil.cleanUp(QTestUtil.java:826) > at org.apache.hadoop.hive.ql.QTestUtil.shutdown(QTestUtil.java:488) > at > org.apache.hadoop.hive.cli.TestMinimrCliDriver.shutdown(TestMinimrCliDriver.java:89) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13934) Configure Tez to make nocondiional task size memory available for the Processor
[ https://issues.apache.org/jira/browse/HIVE-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380235#comment-15380235 ] Wei Zheng commented on HIVE-13934: -- Created TEZ-3353 for followup. > Configure Tez to make nocondiional task size memory available for the > Processor > --- > > Key: HIVE-13934 > URL: https://issues.apache.org/jira/browse/HIVE-13934 > Project: Hive > Issue Type: Bug >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-13934.1.patch, HIVE-13934.2.patch, > HIVE-13934.3.patch, HIVE-13934.4.patch, HIVE-13934.6.patch, > HIVE-13934.7.patch, HIVE-13934.8.patch, HIVE-13934.9.patch > > > Currently, noconditionaltasksize is not validated against the container size, > the reservations made in the container by Tez for Inputs / Outputs etc. > Check this at compile time to see if enough memory is available, or set up > the vertex to reserve additional memory for the Processor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file
[ https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380230#comment-15380230 ] Alan Gates commented on HIVE-13369: --- bq. When the major compaction runs, does it ensure that there are no open transactions in the range it is compacting? Not quite. It assures there are no locks for a partition/table to be cleaned before cleaning up after a compaction. This won't work going forward when we move to multi-statement transaction, but it should be ok now in the auto-commit world. > AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing > the "best" base file > -- > > Key: HIVE-13369 > URL: https://issues.apache.org/jira/browse/HIVE-13369 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-13369.1.patch, HIVE-13369.2.patch, > HIVE-13369.3.patch > > > The JavaDoc on getAcidState() reads, in part: > "Note that because major compactions don't >preserve the history, we can't use a base directory that includes a >transaction id that we must exclude." > which is correct but there is nothing in the code that does this. > And if we detect a situation where txn X must be excluded but and there are > deltas that contain X, we'll have to abort the txn. This can't (reasonably) > happen with auto commit mode, but with multi statement txns it's possible. > Suppose some long running txn starts and lock in snapshot at 17 (HWM). An > hour later it decides to access some partition for which all txns < 20 (for > example) have already been compacted (i.e. GC'd). > == > Here is a more concrete example. Let's say the file for table A are as > follows and created in the order listed. > delta_4_4 > delta_5_5 > delta_4_5 > base_5 > delta_16_16 > delta_17_17 > base_17 (for example user ran major compaction) > let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 > and ExceptionList=<16> > Assume that all txns <= 20 commit. > Reader can't use base_17 because it has result of txn16. So it should chose > base_5 "TxnBase bestBase" in _getChildState()_. > Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and > delta_17_17 in _Directory_ object. This would represent acceptable snapshot > for such reader. > The issue is if at the same time the Cleaner process is running. It will see > everything with txnid<17 as obsolete. Then it will check lock manger state > and decide to delete (as there may not be any locks in LM for table A). The > order in which the files are deleted is undefined right now. It may delete > delta_16_16 and delta_17_17 first and right at this moment the read request > with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by > some multi-stmt txn that started some time ago. It acquires locks after the > Cleaner checks LM state and calls getAcidState(). This request will choose > base_5 but it won't see delta_16_16 and delta_17_17 and thus return the > snapshot w/o modifications made by those txns. > [This is not possible currently since we only support autoCommit=true. The > reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) > locks in the snapshot. The cleaner won't delete anything for a given > compaction (partition) if there are locks on it. Thus for duration of the > transaction, nothing will be deleted so it's safe to use base_5] > This is a subtle race condition but possible. > 1. So the safest thing to do to ensure correctness is to use the latest > base_x as the "best" and check against exceptions in ValidTxnList and throw > an exception if there is an exception <=x. > 2. A better option is to keep 2 exception lists: aborted and open and only > throw if there is an open txn <=x. Compaction throws away data from aborted > txns and thus there is no harm using base with aborted txns in its range. > 3. You could make each txn record the lowest open txn id at its start and > prevent the cleaner from cleaning anything delta with id range that includes > this open txn id for any txn that is still running. This has a drawback of > potentially delaying GC of old files for arbitrarily long periods. So this > should be a user config choice. The implementation is not trivial. > I would go with 1 now and do 2/3 together with multi-statement txn work. > Side note: if 2 deltas have overlapping ID range, then 1 must be a subset of > the other -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9756) LLAP: use log4j 2 for llap (log to separate files, etc.)
[ https://issues.apache.org/jira/browse/HIVE-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-9756: Attachment: HIVE-9756.10.patch RFA rolls by size and time now. Fixed ref in properties file. > LLAP: use log4j 2 for llap (log to separate files, etc.) > > > Key: HIVE-9756 > URL: https://issues.apache.org/jira/browse/HIVE-9756 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Gunther Hagleitner >Assignee: Prasanth Jayachandran > Attachments: HIVE-9756.1.patch, HIVE-9756.10.patch, > HIVE-9756.2.patch, HIVE-9756.3.patch, HIVE-9756.4.patch, HIVE-9756.4.patch, > HIVE-9756.5.patch, HIVE-9756.6.patch, HIVE-9756.7.patch, HIVE-9756.8.patch, > HIVE-9756.9.patch > > > For the INFO logging, we'll need to use the log4j-jcl 2.x upgrade-path to get > throughput friendly logging. > http://logging.apache.org/log4j/2.0/manual/async.html#Performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13934) Configure Tez to make nocondiional task size memory available for the Processor
[ https://issues.apache.org/jira/browse/HIVE-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-13934: - Status: Open (was: Patch Available) > Configure Tez to make nocondiional task size memory available for the > Processor > --- > > Key: HIVE-13934 > URL: https://issues.apache.org/jira/browse/HIVE-13934 > Project: Hive > Issue Type: Bug >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-13934.1.patch, HIVE-13934.2.patch, > HIVE-13934.3.patch, HIVE-13934.4.patch, HIVE-13934.6.patch, > HIVE-13934.7.patch, HIVE-13934.8.patch, HIVE-13934.9.patch > > > Currently, noconditionaltasksize is not validated against the container size, > the reservations made in the container by Tez for Inputs / Outputs etc. > Check this at compile time to see if enough memory is available, or set up > the vertex to reserve additional memory for the Processor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13934) Configure Tez to make nocondiional task size memory available for the Processor
[ https://issues.apache.org/jira/browse/HIVE-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-13934: - Attachment: HIVE-13934.9.patch patch 9 uses DagUtils methods to get container size and hive Xmx. I removed the part that retrieves the default Tez Xmx since that's meaningless if we cannot get the actual settings. > Configure Tez to make nocondiional task size memory available for the > Processor > --- > > Key: HIVE-13934 > URL: https://issues.apache.org/jira/browse/HIVE-13934 > Project: Hive > Issue Type: Bug >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-13934.1.patch, HIVE-13934.2.patch, > HIVE-13934.3.patch, HIVE-13934.4.patch, HIVE-13934.6.patch, > HIVE-13934.7.patch, HIVE-13934.8.patch, HIVE-13934.9.patch > > > Currently, noconditionaltasksize is not validated against the container size, > the reservations made in the container by Tez for Inputs / Outputs etc. > Check this at compile time to see if enough memory is available, or set up > the vertex to reserve additional memory for the Processor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13934) Configure Tez to make nocondiional task size memory available for the Processor
[ https://issues.apache.org/jira/browse/HIVE-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-13934: - Status: Patch Available (was: Open) > Configure Tez to make nocondiional task size memory available for the > Processor > --- > > Key: HIVE-13934 > URL: https://issues.apache.org/jira/browse/HIVE-13934 > Project: Hive > Issue Type: Bug >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-13934.1.patch, HIVE-13934.2.patch, > HIVE-13934.3.patch, HIVE-13934.4.patch, HIVE-13934.6.patch, > HIVE-13934.7.patch, HIVE-13934.8.patch, HIVE-13934.9.patch > > > Currently, noconditionaltasksize is not validated against the container size, > the reservations made in the container by Tez for Inputs / Outputs etc. > Check this at compile time to see if enough memory is available, or set up > the vertex to reserve additional memory for the Processor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file
[ https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380220#comment-15380220 ] Eugene Koifman commented on HIVE-13369: --- yes > AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing > the "best" base file > -- > > Key: HIVE-13369 > URL: https://issues.apache.org/jira/browse/HIVE-13369 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-13369.1.patch, HIVE-13369.2.patch, > HIVE-13369.3.patch > > > The JavaDoc on getAcidState() reads, in part: > "Note that because major compactions don't >preserve the history, we can't use a base directory that includes a >transaction id that we must exclude." > which is correct but there is nothing in the code that does this. > And if we detect a situation where txn X must be excluded but and there are > deltas that contain X, we'll have to abort the txn. This can't (reasonably) > happen with auto commit mode, but with multi statement txns it's possible. > Suppose some long running txn starts and lock in snapshot at 17 (HWM). An > hour later it decides to access some partition for which all txns < 20 (for > example) have already been compacted (i.e. GC'd). > == > Here is a more concrete example. Let's say the file for table A are as > follows and created in the order listed. > delta_4_4 > delta_5_5 > delta_4_5 > base_5 > delta_16_16 > delta_17_17 > base_17 (for example user ran major compaction) > let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 > and ExceptionList=<16> > Assume that all txns <= 20 commit. > Reader can't use base_17 because it has result of txn16. So it should chose > base_5 "TxnBase bestBase" in _getChildState()_. > Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and > delta_17_17 in _Directory_ object. This would represent acceptable snapshot > for such reader. > The issue is if at the same time the Cleaner process is running. It will see > everything with txnid<17 as obsolete. Then it will check lock manger state > and decide to delete (as there may not be any locks in LM for table A). The > order in which the files are deleted is undefined right now. It may delete > delta_16_16 and delta_17_17 first and right at this moment the read request > with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by > some multi-stmt txn that started some time ago. It acquires locks after the > Cleaner checks LM state and calls getAcidState(). This request will choose > base_5 but it won't see delta_16_16 and delta_17_17 and thus return the > snapshot w/o modifications made by those txns. > [This is not possible currently since we only support autoCommit=true. The > reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) > locks in the snapshot. The cleaner won't delete anything for a given > compaction (partition) if there are locks on it. Thus for duration of the > transaction, nothing will be deleted so it's safe to use base_5] > This is a subtle race condition but possible. > 1. So the safest thing to do to ensure correctness is to use the latest > base_x as the "best" and check against exceptions in ValidTxnList and throw > an exception if there is an exception <=x. > 2. A better option is to keep 2 exception lists: aborted and open and only > throw if there is an open txn <=x. Compaction throws away data from aborted > txns and thus there is no harm using base with aborted txns in its range. > 3. You could make each txn record the lowest open txn id at its start and > prevent the cleaner from cleaning anything delta with id range that includes > this open txn id for any txn that is still running. This has a drawback of > potentially delaying GC of old files for arbitrarily long periods. So this > should be a user config choice. The implementation is not trivial. > I would go with 1 now and do 2/3 together with multi-statement txn work. > Side note: if 2 deltas have overlapping ID range, then 1 must be a subset of > the other -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file
[ https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380216#comment-15380216 ] Owen O'Malley commented on HIVE-13369: -- Sorry, I was looking at the code with your patch instead of prior to your patch. When the major compaction runs, does it ensure that there are no open transactions in the range it is compacting? > AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing > the "best" base file > -- > > Key: HIVE-13369 > URL: https://issues.apache.org/jira/browse/HIVE-13369 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-13369.1.patch, HIVE-13369.2.patch, > HIVE-13369.3.patch > > > The JavaDoc on getAcidState() reads, in part: > "Note that because major compactions don't >preserve the history, we can't use a base directory that includes a >transaction id that we must exclude." > which is correct but there is nothing in the code that does this. > And if we detect a situation where txn X must be excluded but and there are > deltas that contain X, we'll have to abort the txn. This can't (reasonably) > happen with auto commit mode, but with multi statement txns it's possible. > Suppose some long running txn starts and lock in snapshot at 17 (HWM). An > hour later it decides to access some partition for which all txns < 20 (for > example) have already been compacted (i.e. GC'd). > == > Here is a more concrete example. Let's say the file for table A are as > follows and created in the order listed. > delta_4_4 > delta_5_5 > delta_4_5 > base_5 > delta_16_16 > delta_17_17 > base_17 (for example user ran major compaction) > let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 > and ExceptionList=<16> > Assume that all txns <= 20 commit. > Reader can't use base_17 because it has result of txn16. So it should chose > base_5 "TxnBase bestBase" in _getChildState()_. > Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and > delta_17_17 in _Directory_ object. This would represent acceptable snapshot > for such reader. > The issue is if at the same time the Cleaner process is running. It will see > everything with txnid<17 as obsolete. Then it will check lock manger state > and decide to delete (as there may not be any locks in LM for table A). The > order in which the files are deleted is undefined right now. It may delete > delta_16_16 and delta_17_17 first and right at this moment the read request > with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by > some multi-stmt txn that started some time ago. It acquires locks after the > Cleaner checks LM state and calls getAcidState(). This request will choose > base_5 but it won't see delta_16_16 and delta_17_17 and thus return the > snapshot w/o modifications made by those txns. > [This is not possible currently since we only support autoCommit=true. The > reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) > locks in the snapshot. The cleaner won't delete anything for a given > compaction (partition) if there are locks on it. Thus for duration of the > transaction, nothing will be deleted so it's safe to use base_5] > This is a subtle race condition but possible. > 1. So the safest thing to do to ensure correctness is to use the latest > base_x as the "best" and check against exceptions in ValidTxnList and throw > an exception if there is an exception <=x. > 2. A better option is to keep 2 exception lists: aborted and open and only > throw if there is an open txn <=x. Compaction throws away data from aborted > txns and thus there is no harm using base with aborted txns in its range. > 3. You could make each txn record the lowest open txn id at its start and > prevent the cleaner from cleaning anything delta with id range that includes > this open txn id for any txn that is still running. This has a drawback of > potentially delaying GC of old files for arbitrarily long periods. So this > should be a user config choice. The implementation is not trivial. > I would go with 1 now and do 2/3 together with multi-statement txn work. > Side note: if 2 deltas have overlapping ID range, then 1 must be a subset of > the other -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13990) Client should not check dfs.namenode.acls.enabled to determine if extended ACLs are supported
[ https://issues.apache.org/jira/browse/HIVE-13990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380215#comment-15380215 ] Chris Drome commented on HIVE-13990: Fair comment [~thejas]. Let me review the patch to see where this can be cached. In Hadoop-2.6 (when we first encountered this) I was informed that there was no way of asking the NN for this information. > Client should not check dfs.namenode.acls.enabled to determine if extended > ACLs are supported > - > > Key: HIVE-13990 > URL: https://issues.apache.org/jira/browse/HIVE-13990 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.2.1 >Reporter: Chris Drome >Assignee: Chris Drome > Attachments: HIVE-13990-branch-1.patch, HIVE-13990.1-branch-1.patch, > HIVE-13990.1.patch > > > dfs.namenode.acls.enabled is a server side configuration and the client > should not presume to know how the server is configured. Barring a method for > querying the NN whether ACLs are supported the client should try and catch > the appropriate exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14255) Backport part of HIVE-12950 to fix "Can not create Path from an emtpy string" exception
[ https://issues.apache.org/jira/browse/HIVE-14255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-14255: - Attachment: HIVE-14255-branch-1.patch [~hagleitn]/[~ashutoshc] Can someone please take a look? > Backport part of HIVE-12950 to fix "Can not create Path from an emtpy string" > exception > --- > > Key: HIVE-14255 > URL: https://issues.apache.org/jira/browse/HIVE-14255 > Project: Hive > Issue Type: Bug >Affects Versions: 1.3.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14255-branch-1.patch > > > HIVE-12950 already fixes the issue but that patch many unrelated changes that > are not in branch-1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14251) Union All of different types resolves to incorrect data
[ https://issues.apache.org/jira/browse/HIVE-14251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-14251: Affects Version/s: 2.0.0 Status: Patch Available (was: Open) Attached patch-1: separated a new version of ImplicitConvertible() for union from comparison since for comparison, e.g., string should convert to double, while for union, double should convert to string. > Union All of different types resolves to incorrect data > --- > > Key: HIVE-14251 > URL: https://issues.apache.org/jira/browse/HIVE-14251 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-14251.1.patch > > > create table src(c1 date, c2 int, c3 double); > insert into src values ('2016-01-01',5,1.25); > select * from > (select c1 from src union all > select c2 from src union all > select c3 from src) t; > It will return NULL for the c1 values. Seems the common data type is resolved > to the last c3 which is double. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14244) bucketmap right outer join query throws ArrayIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/HIVE-14244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380201#comment-15380201 ] Gunther Hagleitner commented on HIVE-14244: --- I see. [~aplusplus] you're saying that in this case there were no splits for certain buckets and thus routingtable entries < numBuckets? This looks good to me. Let's see what the tests say. +1. > bucketmap right outer join query throws ArrayIndexOutOfBoundsException > -- > > Key: HIVE-14244 > URL: https://issues.apache.org/jira/browse/HIVE-14244 > Project: Hive > Issue Type: Bug >Affects Versions: 1.3.0, 2.1.0 >Reporter: Jagruti Varia >Assignee: Zhiyuan Yang > Attachments: HIVE-14244.1.patch > > > bucketmap right outer join on partitioned bucketed table throws this error: > {noformat} > Vertex failed, vertexName=Map 1, vertexId=vertex_1466710232033_0539_6_00, > diagnostics=[Task failed, taskId=task_1466710232033_0539_6_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1466710232033_0539_6_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:95) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:70) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:393) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185) > ... 14 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:850) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:86) > ... 17 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.ArrayIndexOutOfBoundsException: -1 > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:416) > at > org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.process(VectorReduceSinkOperator.java:104) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130) > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:762) > ... 18 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: -1 > at > org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.write(UnorderedPartitionedKVWriter.java:314) > at > org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.write(UnorderedPartitionedKVWriter.java:257) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor$TezKVOutputCollector.collect(TezProcessor.java:253) > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:552) > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:398) > ... 22 more > ], TaskAtte
[jira] [Commented] (HIVE-14135) beeline output not formatted correctly for large column widths
[ https://issues.apache.org/jira/browse/HIVE-14135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380197#comment-15380197 ] Sergio Peña commented on HIVE-14135: The new patch looks good. +1 > beeline output not formatted correctly for large column widths > -- > > Key: HIVE-14135 > URL: https://issues.apache.org/jira/browse/HIVE-14135 > Project: Hive > Issue Type: Bug > Components: Beeline >Affects Versions: 2.2.0 >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar > Attachments: HIVE-14135.1.patch, HIVE-14135.2.patch, > HIVE-14135.3.patch, csv.txt, csv2.txt, dsv.txt, longKeyValues.txt, > output_after.txt, output_before.txt, table.txt, tsv.txt, tsv2.txt, > vertical.txt > > > If the column width is too large then beeline uses the maximum column width > when normalizing all the column widths. In order to reproduce the issue, run > set -v; > Once the configuration variables is classpath which can be extremely large > width (41k characters in my environment). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13873) Column pruning for nested fields
[ https://issues.apache.org/jira/browse/HIVE-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380194#comment-15380194 ] Xuefu Zhang commented on HIVE-13873: Thanks, [~Ferd]. Regarding #3, for array, there is probably nothing to do about it. Map is probably encoded as an array of struct of key and value, so there might be nothing to do there either (Hive has no way to get all keys or values in a map). Thus, we are probably good on that. While you're doing this work, it would be great to check if this has any performance gain. The similar work done for Presto sees a few times faster in highly selective projections. > Column pruning for nested fields > > > Key: HIVE-13873 > URL: https://issues.apache.org/jira/browse/HIVE-13873 > Project: Hive > Issue Type: New Feature > Components: Logical Optimizer >Reporter: Xuefu Zhang >Assignee: Ferdinand Xu > Attachments: HIVE-13873.wip.patch > > > Some columnar file formats such as Parquet store fields in struct type also > column by column using encoding described in Google Dramel pager. It's very > common in big data where data are stored in structs while queries only needs > a subset of the the fields in the structs. However, presently Hive still > needs to read the whole struct regardless whether all fields are selected. > Therefore, pruning unwanted sub-fields in struct or nested fields at file > reading time would be a big performance boost for such scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14251) Union All of different types resolves to incorrect data
[ https://issues.apache.org/jira/browse/HIVE-14251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-14251: Attachment: HIVE-14251.1.patch > Union All of different types resolves to incorrect data > --- > > Key: HIVE-14251 > URL: https://issues.apache.org/jira/browse/HIVE-14251 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-14251.1.patch > > > create table src(c1 date, c2 int, c3 double); > insert into src values ('2016-01-01',5,1.25); > select * from > (select c1 from src union all > select c2 from src union all > select c3 from src) t; > It will return NULL for the c1 values. Seems the common data type is resolved > to the last c3 which is double. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14187) JDOPersistenceManager objects remain cached if MetaStoreClient#close is not called
[ https://issues.apache.org/jira/browse/HIVE-14187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-14187: --- Resolution: Fixed Fix Version/s: 2.2.0 Status: Resolved (was: Patch Available) > JDOPersistenceManager objects remain cached if MetaStoreClient#close is not > called > -- > > Key: HIVE-14187 > URL: https://issues.apache.org/jira/browse/HIVE-14187 > Project: Hive > Issue Type: Bug >Reporter: Mohit Sabharwal >Assignee: Mohit Sabharwal > Fix For: 2.2.0 > > Attachments: HIVE-14187.1.patch, HIVE-14187.2.patch, > HIVE-14187.patch, HIVE-14187.patch > > > JDOPersistenceManager objects are cached in JDOPersistenceManagerFactory by > DataNuclues. > A new JDOPersistenceManager object gets created for every HMS thread since > ObjectStore is a thread local. > In non-embedded metastore mode, JDOPersistenceManager associated with a > thread only gets cleaned up if IMetaStoreClient#close is called by the client > (which calls ObjectStore#shutdown which calls JDOPersistenceManager#close > which in turn removes the object from cache in > JDOPersistenceManagerFactory#releasePersistenceManager > https://github.com/datanucleus/datanucleus-api-jdo/blob/master/src/main/java/org/datanucleus/api/jdo/JDOPersistenceManagerFactory.java#L1271), > i.e. the object will remain cached if client does not call close. > For example: If one interrupts out of hive CLI shell (instead of using > 'exit;' command), SessionState#close does not get called, and hence > IMetaStoreClient#close does not get called. > Instead of relying the client to call close, it's cleaner to automatically > perform RawStore related cleanup at the server end via deleteContext() which > gets called when the server detects a lost/closed connection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14187) JDOPersistenceManager objects remain cached if MetaStoreClient#close is not called
[ https://issues.apache.org/jira/browse/HIVE-14187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380179#comment-15380179 ] Sergio Peña commented on HIVE-14187: thanks [~mohitsabharwal] I committed this to 2.2 > JDOPersistenceManager objects remain cached if MetaStoreClient#close is not > called > -- > > Key: HIVE-14187 > URL: https://issues.apache.org/jira/browse/HIVE-14187 > Project: Hive > Issue Type: Bug >Reporter: Mohit Sabharwal >Assignee: Mohit Sabharwal > Fix For: 2.2.0 > > Attachments: HIVE-14187.1.patch, HIVE-14187.2.patch, > HIVE-14187.patch, HIVE-14187.patch > > > JDOPersistenceManager objects are cached in JDOPersistenceManagerFactory by > DataNuclues. > A new JDOPersistenceManager object gets created for every HMS thread since > ObjectStore is a thread local. > In non-embedded metastore mode, JDOPersistenceManager associated with a > thread only gets cleaned up if IMetaStoreClient#close is called by the client > (which calls ObjectStore#shutdown which calls JDOPersistenceManager#close > which in turn removes the object from cache in > JDOPersistenceManagerFactory#releasePersistenceManager > https://github.com/datanucleus/datanucleus-api-jdo/blob/master/src/main/java/org/datanucleus/api/jdo/JDOPersistenceManagerFactory.java#L1271), > i.e. the object will remain cached if client does not call close. > For example: If one interrupts out of hive CLI shell (instead of using > 'exit;' command), SessionState#close does not get called, and hence > IMetaStoreClient#close does not get called. > Instead of relying the client to call close, it's cleaner to automatically > perform RawStore related cleanup at the server end via deleteContext() which > gets called when the server detects a lost/closed connection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14246) Tez: disable auto-reducer parallelism when CUSTOM_EDGE is in place
[ https://issues.apache.org/jira/browse/HIVE-14246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380178#comment-15380178 ] Gunther Hagleitner commented on HIVE-14246: --- +1 > Tez: disable auto-reducer parallelism when CUSTOM_EDGE is in place > -- > > Key: HIVE-14246 > URL: https://issues.apache.org/jira/browse/HIVE-14246 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 2.2.0 >Reporter: Gopal V >Assignee: Gopal V >Priority: Minor > Fix For: 2.2.0 > > Attachments: HIVE-14246.1.patch > > > The CUSTOM_SIMPLE_EDGE impl has differences between the size constraints of > either edge which cannot be represented by the ShuffleVertexManager presently. > Reducing the width based on the hashtable build side vs the streaming probe > side have different consequences since there is no order of runtime between > them. > Until the two parent vertices of the shuffle hash-join are related, this > feature causes massive inconsistency of performance across runs. > For inner & semi joins, the hashtable side should have a higher priority than > the streaming side and for left outer joins, the streaming side can over-take > the hashtable side, being the more dominant factor in the final row-counts. > Until such priorities can be bubbled up into ShuffleVertexManager, this > feature can be disabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13990) Client should not check dfs.namenode.acls.enabled to determine if extended ACLs are supported
[ https://issues.apache.org/jira/browse/HIVE-13990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380167#comment-15380167 ] Thejas M Nair commented on HIVE-13990: -- At the very least, I think we should cache the information that ACL is not supported after the UnsupportedException is thrown. Otherwise, it could end up being very expensive to deal with that exception for every file check. > Client should not check dfs.namenode.acls.enabled to determine if extended > ACLs are supported > - > > Key: HIVE-13990 > URL: https://issues.apache.org/jira/browse/HIVE-13990 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.2.1 >Reporter: Chris Drome >Assignee: Chris Drome > Attachments: HIVE-13990-branch-1.patch, HIVE-13990.1-branch-1.patch, > HIVE-13990.1.patch > > > dfs.namenode.acls.enabled is a server side configuration and the client > should not presume to know how the server is configured. Barring a method for > querying the NN whether ACLs are supported the client should try and catch > the appropriate exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14187) JDOPersistenceManager objects remain cached if MetaStoreClient#close is not called
[ https://issues.apache.org/jira/browse/HIVE-14187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380165#comment-15380165 ] Sergio Peña commented on HIVE-14187: Thanks, +1 > JDOPersistenceManager objects remain cached if MetaStoreClient#close is not > called > -- > > Key: HIVE-14187 > URL: https://issues.apache.org/jira/browse/HIVE-14187 > Project: Hive > Issue Type: Bug >Reporter: Mohit Sabharwal >Assignee: Mohit Sabharwal > Attachments: HIVE-14187.1.patch, HIVE-14187.2.patch, > HIVE-14187.patch, HIVE-14187.patch > > > JDOPersistenceManager objects are cached in JDOPersistenceManagerFactory by > DataNuclues. > A new JDOPersistenceManager object gets created for every HMS thread since > ObjectStore is a thread local. > In non-embedded metastore mode, JDOPersistenceManager associated with a > thread only gets cleaned up if IMetaStoreClient#close is called by the client > (which calls ObjectStore#shutdown which calls JDOPersistenceManager#close > which in turn removes the object from cache in > JDOPersistenceManagerFactory#releasePersistenceManager > https://github.com/datanucleus/datanucleus-api-jdo/blob/master/src/main/java/org/datanucleus/api/jdo/JDOPersistenceManagerFactory.java#L1271), > i.e. the object will remain cached if client does not call close. > For example: If one interrupts out of hive CLI shell (instead of using > 'exit;' command), SessionState#close does not get called, and hence > IMetaStoreClient#close does not get called. > Instead of relying the client to call close, it's cleaner to automatically > perform RawStore related cleanup at the server end via deleteContext() which > gets called when the server detects a lost/closed connection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13990) Client should not check dfs.namenode.acls.enabled to determine if extended ACLs are supported
[ https://issues.apache.org/jira/browse/HIVE-13990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380158#comment-15380158 ] Thejas M Nair commented on HIVE-13990: -- [~cnauroth] Are there any new apis that might help in checking if ACL is enabled in the FS ? > Client should not check dfs.namenode.acls.enabled to determine if extended > ACLs are supported > - > > Key: HIVE-13990 > URL: https://issues.apache.org/jira/browse/HIVE-13990 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.2.1 >Reporter: Chris Drome >Assignee: Chris Drome > Attachments: HIVE-13990-branch-1.patch, HIVE-13990.1-branch-1.patch, > HIVE-13990.1.patch > > > dfs.namenode.acls.enabled is a server side configuration and the client > should not presume to know how the server is configured. Barring a method for > querying the NN whether ACLs are supported the client should try and catch > the appropriate exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14213) Add timeouts for various components in llap status check
[ https://issues.apache.org/jira/browse/HIVE-14213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380148#comment-15380148 ] Siddharth Seth commented on HIVE-14213: --- bq. So no documentation is needed, right? No. I don't think we should document these settings. > Add timeouts for various components in llap status check > > > Key: HIVE-14213 > URL: https://issues.apache.org/jira/browse/HIVE-14213 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Fix For: 2.1.1 > > Attachments: HIVE-14213.01.patch, HIVE-14213.02.patch > > > The llapstatus check connects to various compoennts - YARN, HDFS via Slider, > ZooKeeper. If either of these components are down - the command can take a > long time to exit. > NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14236) CTAS with UNION ALL puts the wrong stats + count(*) = 0 in Tez
[ https://issues.apache.org/jira/browse/HIVE-14236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380090#comment-15380090 ] Ashutosh Chauhan commented on HIVE-14236: - +1 > CTAS with UNION ALL puts the wrong stats + count(*) = 0 in Tez > -- > > Key: HIVE-14236 > URL: https://issues.apache.org/jira/browse/HIVE-14236 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14236.01.patch > > > to repo. in Tez, create table t as select * from src union all select * from > src; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14253) Fix MinimrCliDriver test failure
[ https://issues.apache.org/jira/browse/HIVE-14253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-14253: - Attachment: HIVE-14253.1.patch PostHook gets executed for drop table queries as well during shutdown which was causing the assertion error. > Fix MinimrCliDriver test failure > > > Key: HIVE-14253 > URL: https://issues.apache.org/jira/browse/HIVE-14253 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0, 2.1.1 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14253.1.patch > > > MinimrCliDriver test is failing with the following exception for > bucket_num_reducers2.q test case > {code} > junit.framework.AssertionFailedError: Number of MapReduce jobs is incorrect > expected:<1> but was:<0> > at junit.framework.Assert.fail(Assert.java:57) > at junit.framework.Assert.failNotEquals(Assert.java:329) > at junit.framework.Assert.assertEquals(Assert.java:78) > at junit.framework.Assert.assertEquals(Assert.java:234) > at > org.apache.hadoop.hive.ql.hooks.VerifyNumReducersHook.run(VerifyNumReducersHook.java:46) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1664) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1313) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1082) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1070) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335) > at org.apache.hadoop.hive.ql.QTestUtil.cleanUp(QTestUtil.java:849) > at org.apache.hadoop.hive.ql.QTestUtil.cleanUp(QTestUtil.java:826) > at org.apache.hadoop.hive.ql.QTestUtil.shutdown(QTestUtil.java:488) > at > org.apache.hadoop.hive.cli.TestMinimrCliDriver.shutdown(TestMinimrCliDriver.java:89) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14253) Fix MinimrCliDriver test failure
[ https://issues.apache.org/jira/browse/HIVE-14253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380075#comment-15380075 ] Ashutosh Chauhan commented on HIVE-14253: - +1 > Fix MinimrCliDriver test failure > > > Key: HIVE-14253 > URL: https://issues.apache.org/jira/browse/HIVE-14253 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0, 2.1.1 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14253.1.patch > > > MinimrCliDriver test is failing with the following exception for > bucket_num_reducers2.q test case > {code} > junit.framework.AssertionFailedError: Number of MapReduce jobs is incorrect > expected:<1> but was:<0> > at junit.framework.Assert.fail(Assert.java:57) > at junit.framework.Assert.failNotEquals(Assert.java:329) > at junit.framework.Assert.assertEquals(Assert.java:78) > at junit.framework.Assert.assertEquals(Assert.java:234) > at > org.apache.hadoop.hive.ql.hooks.VerifyNumReducersHook.run(VerifyNumReducersHook.java:46) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1664) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1313) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1082) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1070) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335) > at org.apache.hadoop.hive.ql.QTestUtil.cleanUp(QTestUtil.java:849) > at org.apache.hadoop.hive.ql.QTestUtil.cleanUp(QTestUtil.java:826) > at org.apache.hadoop.hive.ql.QTestUtil.shutdown(QTestUtil.java:488) > at > org.apache.hadoop.hive.cli.TestMinimrCliDriver.shutdown(TestMinimrCliDriver.java:89) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14135) beeline output not formatted correctly for large column widths
[ https://issues.apache.org/jira/browse/HIVE-14135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380070#comment-15380070 ] Hive QA commented on HIVE-14135: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12818055/HIVE-14135.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10323 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestMinimrCliDriver.org.apache.hadoop.hive.cli.TestMinimrCliDriver org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testCheckPermissions org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testGetToken org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testConnections {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/527/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/527/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-527/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12818055 - PreCommit-HIVE-MASTER-Build > beeline output not formatted correctly for large column widths > -- > > Key: HIVE-14135 > URL: https://issues.apache.org/jira/browse/HIVE-14135 > Project: Hive > Issue Type: Bug > Components: Beeline >Affects Versions: 2.2.0 >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar > Attachments: HIVE-14135.1.patch, HIVE-14135.2.patch, > HIVE-14135.3.patch, csv.txt, csv2.txt, dsv.txt, longKeyValues.txt, > output_after.txt, output_before.txt, table.txt, tsv.txt, tsv2.txt, > vertical.txt > > > If the column width is too large then beeline uses the maximum column width > when normalizing all the column widths. In order to reproduce the issue, run > set -v; > Once the configuration variables is classpath which can be extremely large > width (41k characters in my environment). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14253) Fix MinimrCliDriver test failure
[ https://issues.apache.org/jira/browse/HIVE-14253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-14253: - Status: Patch Available (was: Open) > Fix MinimrCliDriver test failure > > > Key: HIVE-14253 > URL: https://issues.apache.org/jira/browse/HIVE-14253 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0, 2.1.1 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14253.1.patch > > > MinimrCliDriver test is failing with the following exception for > bucket_num_reducers2.q test case > {code} > junit.framework.AssertionFailedError: Number of MapReduce jobs is incorrect > expected:<1> but was:<0> > at junit.framework.Assert.fail(Assert.java:57) > at junit.framework.Assert.failNotEquals(Assert.java:329) > at junit.framework.Assert.assertEquals(Assert.java:78) > at junit.framework.Assert.assertEquals(Assert.java:234) > at > org.apache.hadoop.hive.ql.hooks.VerifyNumReducersHook.run(VerifyNumReducersHook.java:46) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1664) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1313) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1082) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1070) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335) > at org.apache.hadoop.hive.ql.QTestUtil.cleanUp(QTestUtil.java:849) > at org.apache.hadoop.hive.ql.QTestUtil.cleanUp(QTestUtil.java:826) > at org.apache.hadoop.hive.ql.QTestUtil.shutdown(QTestUtil.java:488) > at > org.apache.hadoop.hive.cli.TestMinimrCliDriver.shutdown(TestMinimrCliDriver.java:89) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14252) Sum on Decimal rounding provides incorrect result
[ https://issues.apache.org/jira/browse/HIVE-14252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Gullipalli updated HIVE-14252: Description: hive> select sum(round(amt, 2)), sum(round(txn_amt, 2)) from table1 where DT ='20160517' union all select sum(round(amt,2)), sum(round(txn_amt,2)) from table2 where DT ='20160517' 99773.577 77408.35 3336.16996 2582.35013 was: hive> select sum(round(amt, 2)), sum(round(txn_amt, 2)) from table1 where DT ='20160517' union all select sum(round(amt,2)), sum(round(txn_amt,2)) from table 2 where DT ='20160517' 99773.577 77408.35 3336.16996 2582.35013 > Sum on Decimal rounding provides incorrect result > - > > Key: HIVE-14252 > URL: https://issues.apache.org/jira/browse/HIVE-14252 > Project: Hive > Issue Type: Bug >Reporter: Varun Gullipalli > > hive> select sum(round(amt, 2)), sum(round(txn_amt, 2)) from table1 where DT > ='20160517' > union all > select sum(round(amt,2)), sum(round(txn_amt,2)) from table2 > where DT ='20160517' > 99773.577 77408.35 > 3336.16996 2582.35013 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14241) Acid clashes with ConfVars.HIVEFETCHTASKCONVERSION <> "none"
[ https://issues.apache.org/jira/browse/HIVE-14241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-14241: -- Resolution: Fixed Status: Resolved (was: Patch Available) > Acid clashes with ConfVars.HIVEFETCHTASKCONVERSION <> "none" > > > Key: HIVE-14241 > URL: https://issues.apache.org/jira/browse/HIVE-14241 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0, 2.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Fix For: 1.3.0, 2.2.0, 2.1.1 > > Attachments: HIVE-14241.patch > > > Some queries are optimized so as not to create an MR job. This somehow causes > the Configuration object in FetchOperator to be passed to the operator before > Driver.recordValidTxns() is called. So then to this op it looks like there > are no valid txns and it returns nothing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13966) DbNotificationListener: can loose DDL operation notifications
[ https://issues.apache.org/jira/browse/HIVE-13966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15379947#comment-15379947 ] Rahul Sharma commented on HIVE-13966: - Hi [~sushanth], Can you take a look at the patch. -Thanks > DbNotificationListener: can loose DDL operation notifications > - > > Key: HIVE-13966 > URL: https://issues.apache.org/jira/browse/HIVE-13966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Reporter: Nachiket Vaidya >Assignee: Rahul Sharma >Priority: Critical > Attachments: HIVE-13966.1.patch, HIVE-13966.2.patch > > > The code for each API in HiveMetaStore.java is like this: > 1. openTransaction() > 2. -- operation-- > 3. commit() or rollback() based on result of the operation. > 4. add entry to notification log (unconditionally) > If the operation is failed (in step 2), we still add entry to notification > log. Found this issue in testing. > It is still ok as this is the case of false positive. > If the operation is successful and adding to notification log failed, the > user will get an MetaException. It will not rollback the operation, as it is > already committed. We need to handle this case so that we will not have false > negatives. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14198) Refactor aux jar related code to make them more consistent
[ https://issues.apache.org/jira/browse/HIVE-14198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-14198: Attachment: HIVE-14198.2.patch Attached patch-2: to address comments. > Refactor aux jar related code to make them more consistent > -- > > Key: HIVE-14198 > URL: https://issues.apache.org/jira/browse/HIVE-14198 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Affects Versions: 2.2.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-14198.1.patch, HIVE-14198.2.patch > > > There are some redundancy and inconsistency between hive.aux.jar.paths and > hive.reloadable.aux.jar.paths and also between MR and spark. > Refactor the code to reuse the same code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14236) CTAS with UNION ALL puts the wrong stats + count(*) = 0 in Tez
[ https://issues.apache.org/jira/browse/HIVE-14236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15379895#comment-15379895 ] Ashutosh Chauhan commented on HIVE-14236: - Couple of comments on RB. Also, failures need to be looked at. > CTAS with UNION ALL puts the wrong stats + count(*) = 0 in Tez > -- > > Key: HIVE-14236 > URL: https://issues.apache.org/jira/browse/HIVE-14236 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14236.01.patch > > > to repo. in Tez, create table t as select * from src union all select * from > src; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file
[ https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15379865#comment-15379865 ] Eugene Koifman commented on HIVE-13369: --- What do you mean "making it through"? It only sets "bestBase.status" if isValidBase() is true... > AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing > the "best" base file > -- > > Key: HIVE-13369 > URL: https://issues.apache.org/jira/browse/HIVE-13369 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-13369.1.patch, HIVE-13369.2.patch, > HIVE-13369.3.patch > > > The JavaDoc on getAcidState() reads, in part: > "Note that because major compactions don't >preserve the history, we can't use a base directory that includes a >transaction id that we must exclude." > which is correct but there is nothing in the code that does this. > And if we detect a situation where txn X must be excluded but and there are > deltas that contain X, we'll have to abort the txn. This can't (reasonably) > happen with auto commit mode, but with multi statement txns it's possible. > Suppose some long running txn starts and lock in snapshot at 17 (HWM). An > hour later it decides to access some partition for which all txns < 20 (for > example) have already been compacted (i.e. GC'd). > == > Here is a more concrete example. Let's say the file for table A are as > follows and created in the order listed. > delta_4_4 > delta_5_5 > delta_4_5 > base_5 > delta_16_16 > delta_17_17 > base_17 (for example user ran major compaction) > let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 > and ExceptionList=<16> > Assume that all txns <= 20 commit. > Reader can't use base_17 because it has result of txn16. So it should chose > base_5 "TxnBase bestBase" in _getChildState()_. > Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and > delta_17_17 in _Directory_ object. This would represent acceptable snapshot > for such reader. > The issue is if at the same time the Cleaner process is running. It will see > everything with txnid<17 as obsolete. Then it will check lock manger state > and decide to delete (as there may not be any locks in LM for table A). The > order in which the files are deleted is undefined right now. It may delete > delta_16_16 and delta_17_17 first and right at this moment the read request > with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by > some multi-stmt txn that started some time ago. It acquires locks after the > Cleaner checks LM state and calls getAcidState(). This request will choose > base_5 but it won't see delta_16_16 and delta_17_17 and thus return the > snapshot w/o modifications made by those txns. > [This is not possible currently since we only support autoCommit=true. The > reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) > locks in the snapshot. The cleaner won't delete anything for a given > compaction (partition) if there are locks on it. Thus for duration of the > transaction, nothing will be deleted so it's safe to use base_5] > This is a subtle race condition but possible. > 1. So the safest thing to do to ensure correctness is to use the latest > base_x as the "best" and check against exceptions in ValidTxnList and throw > an exception if there is an exception <=x. > 2. A better option is to keep 2 exception lists: aborted and open and only > throw if there is an open txn <=x. Compaction throws away data from aborted > txns and thus there is no harm using base with aborted txns in its range. > 3. You could make each txn record the lowest open txn id at its start and > prevent the cleaner from cleaning anything delta with id range that includes > this open txn id for any txn that is still running. This has a drawback of > potentially delaying GC of old files for arbitrarily long periods. So this > should be a user config choice. The implementation is not trivial. > I would go with 1 now and do 2/3 together with multi-statement txn work. > Side note: if 2 deltas have overlapping ID range, then 1 must be a subset of > the other -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file
[ https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15379865#comment-15379865 ] Eugene Koifman edited comment on HIVE-13369 at 7/15/16 6:34 PM: What do you mean "making it through"? getChildState() only sets "bestBase.status" if isValidBase() is true... was (Author: ekoifman): What do you mean "making it through"? It only sets "bestBase.status" if isValidBase() is true... > AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing > the "best" base file > -- > > Key: HIVE-13369 > URL: https://issues.apache.org/jira/browse/HIVE-13369 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-13369.1.patch, HIVE-13369.2.patch, > HIVE-13369.3.patch > > > The JavaDoc on getAcidState() reads, in part: > "Note that because major compactions don't >preserve the history, we can't use a base directory that includes a >transaction id that we must exclude." > which is correct but there is nothing in the code that does this. > And if we detect a situation where txn X must be excluded but and there are > deltas that contain X, we'll have to abort the txn. This can't (reasonably) > happen with auto commit mode, but with multi statement txns it's possible. > Suppose some long running txn starts and lock in snapshot at 17 (HWM). An > hour later it decides to access some partition for which all txns < 20 (for > example) have already been compacted (i.e. GC'd). > == > Here is a more concrete example. Let's say the file for table A are as > follows and created in the order listed. > delta_4_4 > delta_5_5 > delta_4_5 > base_5 > delta_16_16 > delta_17_17 > base_17 (for example user ran major compaction) > let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 > and ExceptionList=<16> > Assume that all txns <= 20 commit. > Reader can't use base_17 because it has result of txn16. So it should chose > base_5 "TxnBase bestBase" in _getChildState()_. > Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and > delta_17_17 in _Directory_ object. This would represent acceptable snapshot > for such reader. > The issue is if at the same time the Cleaner process is running. It will see > everything with txnid<17 as obsolete. Then it will check lock manger state > and decide to delete (as there may not be any locks in LM for table A). The > order in which the files are deleted is undefined right now. It may delete > delta_16_16 and delta_17_17 first and right at this moment the read request > with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by > some multi-stmt txn that started some time ago. It acquires locks after the > Cleaner checks LM state and calls getAcidState(). This request will choose > base_5 but it won't see delta_16_16 and delta_17_17 and thus return the > snapshot w/o modifications made by those txns. > [This is not possible currently since we only support autoCommit=true. The > reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) > locks in the snapshot. The cleaner won't delete anything for a given > compaction (partition) if there are locks on it. Thus for duration of the > transaction, nothing will be deleted so it's safe to use base_5] > This is a subtle race condition but possible. > 1. So the safest thing to do to ensure correctness is to use the latest > base_x as the "best" and check against exceptions in ValidTxnList and throw > an exception if there is an exception <=x. > 2. A better option is to keep 2 exception lists: aborted and open and only > throw if there is an open txn <=x. Compaction throws away data from aborted > txns and thus there is no harm using base with aborted txns in its range. > 3. You could make each txn record the lowest open txn id at its start and > prevent the cleaner from cleaning anything delta with id range that includes > this open txn id for any txn that is still running. This has a drawback of > potentially delaying GC of old files for arbitrarily long periods. So this > should be a user config choice. The implementation is not trivial. > I would go with 1 now and do 2/3 together with multi-statement txn work. > Side note: if 2 deltas have overlapping ID range, then 1 must be a subset of > the other -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14236) CTAS with UNION ALL puts the wrong stats + count(*) = 0 in Tez
[ https://issues.apache.org/jira/browse/HIVE-14236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15379843#comment-15379843 ] Hive QA commented on HIVE-14236: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12818057/HIVE-14236.01.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 18 failed/errored test(s), 10323 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_stats org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_cte_2 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning_2 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_llap_nullscan org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_union_dynamic_partition org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union_dynamic_partition org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union4 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union6 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union_fast_stats org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union_stats org.apache.hadoop.hive.cli.TestMinimrCliDriver.org.apache.hadoop.hive.cli.TestMinimrCliDriver org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testCheckPermissions org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testGetToken org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testConnections {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/526/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/526/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-526/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 18 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12818057 - PreCommit-HIVE-MASTER-Build > CTAS with UNION ALL puts the wrong stats + count(*) = 0 in Tez > -- > > Key: HIVE-14236 > URL: https://issues.apache.org/jira/browse/HIVE-14236 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14236.01.patch > > > to repo. in Tez, create table t as select * from src union all select * from > src; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14167) Use work directories provided by Tez instead of directly using YARN local dirs
[ https://issues.apache.org/jira/browse/HIVE-14167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-14167: - Status: Patch Available (was: Open) > Use work directories provided by Tez instead of directly using YARN local dirs > -- > > Key: HIVE-14167 > URL: https://issues.apache.org/jira/browse/HIVE-14167 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.1.0 >Reporter: Siddharth Seth >Assignee: Wei Zheng > Attachments: HIVE-14167.1.patch, HIVE-14167.2.patch, > HIVE-14167.3.patch > > > HIVE-13303 fixed things to use multiple directories instead of a single tmp > directory. However it's using yarn-local-dirs directly. > I'm not sure how well using the yarn-local-dir will work on a secure cluster. > Would be better to use Tez*Context.getWorkDirs. This provides an app specific > directory - writable by the user. > cc [~sershe] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14167) Use work directories provided by Tez instead of directly using YARN local dirs
[ https://issues.apache.org/jira/browse/HIVE-14167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15379837#comment-15379837 ] Wei Zheng commented on HIVE-14167: -- Tried a couple failed tests locally and they all passed. Will trigger another run to make sure. > Use work directories provided by Tez instead of directly using YARN local dirs > -- > > Key: HIVE-14167 > URL: https://issues.apache.org/jira/browse/HIVE-14167 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.1.0 >Reporter: Siddharth Seth >Assignee: Wei Zheng > Attachments: HIVE-14167.1.patch, HIVE-14167.2.patch, > HIVE-14167.3.patch > > > HIVE-13303 fixed things to use multiple directories instead of a single tmp > directory. However it's using yarn-local-dirs directly. > I'm not sure how well using the yarn-local-dir will work on a secure cluster. > Would be better to use Tez*Context.getWorkDirs. This provides an app specific > directory - writable by the user. > cc [~sershe] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14167) Use work directories provided by Tez instead of directly using YARN local dirs
[ https://issues.apache.org/jira/browse/HIVE-14167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-14167: - Attachment: HIVE-14167.3.patch patch 3 is the same as patch 2, just to trigger QA run > Use work directories provided by Tez instead of directly using YARN local dirs > -- > > Key: HIVE-14167 > URL: https://issues.apache.org/jira/browse/HIVE-14167 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.1.0 >Reporter: Siddharth Seth >Assignee: Wei Zheng > Attachments: HIVE-14167.1.patch, HIVE-14167.2.patch, > HIVE-14167.3.patch > > > HIVE-13303 fixed things to use multiple directories instead of a single tmp > directory. However it's using yarn-local-dirs directly. > I'm not sure how well using the yarn-local-dir will work on a secure cluster. > Would be better to use Tez*Context.getWorkDirs. This provides an app specific > directory - writable by the user. > cc [~sershe] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file
[ https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15379834#comment-15379834 ] Owen O'Malley commented on HIVE-13369: -- Ok, I'm missing something fundamental. It looks like AcidUtils.getChildState is checking AcidUtils.isValidBase, which should reject any bases that have an open transaction included. Why is the problematic base making it through the isValidBase check? > AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing > the "best" base file > -- > > Key: HIVE-13369 > URL: https://issues.apache.org/jira/browse/HIVE-13369 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-13369.1.patch, HIVE-13369.2.patch, > HIVE-13369.3.patch > > > The JavaDoc on getAcidState() reads, in part: > "Note that because major compactions don't >preserve the history, we can't use a base directory that includes a >transaction id that we must exclude." > which is correct but there is nothing in the code that does this. > And if we detect a situation where txn X must be excluded but and there are > deltas that contain X, we'll have to abort the txn. This can't (reasonably) > happen with auto commit mode, but with multi statement txns it's possible. > Suppose some long running txn starts and lock in snapshot at 17 (HWM). An > hour later it decides to access some partition for which all txns < 20 (for > example) have already been compacted (i.e. GC'd). > == > Here is a more concrete example. Let's say the file for table A are as > follows and created in the order listed. > delta_4_4 > delta_5_5 > delta_4_5 > base_5 > delta_16_16 > delta_17_17 > base_17 (for example user ran major compaction) > let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 > and ExceptionList=<16> > Assume that all txns <= 20 commit. > Reader can't use base_17 because it has result of txn16. So it should chose > base_5 "TxnBase bestBase" in _getChildState()_. > Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and > delta_17_17 in _Directory_ object. This would represent acceptable snapshot > for such reader. > The issue is if at the same time the Cleaner process is running. It will see > everything with txnid<17 as obsolete. Then it will check lock manger state > and decide to delete (as there may not be any locks in LM for table A). The > order in which the files are deleted is undefined right now. It may delete > delta_16_16 and delta_17_17 first and right at this moment the read request > with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by > some multi-stmt txn that started some time ago. It acquires locks after the > Cleaner checks LM state and calls getAcidState(). This request will choose > base_5 but it won't see delta_16_16 and delta_17_17 and thus return the > snapshot w/o modifications made by those txns. > [This is not possible currently since we only support autoCommit=true. The > reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) > locks in the snapshot. The cleaner won't delete anything for a given > compaction (partition) if there are locks on it. Thus for duration of the > transaction, nothing will be deleted so it's safe to use base_5] > This is a subtle race condition but possible. > 1. So the safest thing to do to ensure correctness is to use the latest > base_x as the "best" and check against exceptions in ValidTxnList and throw > an exception if there is an exception <=x. > 2. A better option is to keep 2 exception lists: aborted and open and only > throw if there is an open txn <=x. Compaction throws away data from aborted > txns and thus there is no harm using base with aborted txns in its range. > 3. You could make each txn record the lowest open txn id at its start and > prevent the cleaner from cleaning anything delta with id range that includes > this open txn id for any txn that is still running. This has a drawback of > potentially delaying GC of old files for arbitrarily long periods. So this > should be a user config choice. The implementation is not trivial. > I would go with 1 now and do 2/3 together with multi-statement txn work. > Side note: if 2 deltas have overlapping ID range, then 1 must be a subset of > the other -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14167) Use work directories provided by Tez instead of directly using YARN local dirs
[ https://issues.apache.org/jira/browse/HIVE-14167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-14167: - Status: Open (was: Patch Available) > Use work directories provided by Tez instead of directly using YARN local dirs > -- > > Key: HIVE-14167 > URL: https://issues.apache.org/jira/browse/HIVE-14167 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.1.0 >Reporter: Siddharth Seth >Assignee: Wei Zheng > Attachments: HIVE-14167.1.patch, HIVE-14167.2.patch > > > HIVE-13303 fixed things to use multiple directories instead of a single tmp > directory. However it's using yarn-local-dirs directly. > I'm not sure how well using the yarn-local-dir will work on a secure cluster. > Would be better to use Tez*Context.getWorkDirs. This provides an app specific > directory - writable by the user. > cc [~sershe] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14241) Acid clashes with ConfVars.HIVEFETCHTASKCONVERSION <> "none"
[ https://issues.apache.org/jira/browse/HIVE-14241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15379808#comment-15379808 ] Eugene Koifman edited comment on HIVE-14241 at 7/15/16 6:22 PM: Filed HIVE-14250 for followup. Reran org.apache.hive.minikdc.TestJdbcWithDBTokenStore - all passed list_bucket_dml_13 fails on and off (e.g. https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/513/#showFailuresLink) TestLlapTaskSchedulerService.testDelayedLocalityNodeCommErrorImmediateAllocation is a timeout failure and passes on rerun Thanks Ashutosh for the review was (Author: ekoifman): Filed HIVE-14250 for followup. > Acid clashes with ConfVars.HIVEFETCHTASKCONVERSION <> "none" > > > Key: HIVE-14241 > URL: https://issues.apache.org/jira/browse/HIVE-14241 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0, 2.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Fix For: 1.3.0, 2.2.0, 2.1.1 > > Attachments: HIVE-14241.patch > > > Some queries are optimized so as not to create an MR job. This somehow causes > the Configuration object in FetchOperator to be passed to the operator before > Driver.recordValidTxns() is called. So then to this op it looks like there > are no valid txns and it returns nothing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14241) Acid clashes with ConfVars.HIVEFETCHTASKCONVERSION <> "none"
[ https://issues.apache.org/jira/browse/HIVE-14241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15379808#comment-15379808 ] Eugene Koifman commented on HIVE-14241: --- Filed HIVE-14250 for followup. > Acid clashes with ConfVars.HIVEFETCHTASKCONVERSION <> "none" > > > Key: HIVE-14241 > URL: https://issues.apache.org/jira/browse/HIVE-14241 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0, 2.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Fix For: 1.3.0, 2.2.0, 2.1.1 > > Attachments: HIVE-14241.patch > > > Some queries are optimized so as not to create an MR job. This somehow causes > the Configuration object in FetchOperator to be passed to the operator before > Driver.recordValidTxns() is called. So then to this op it looks like there > are no valid txns and it returns nothing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14241) Acid clashes with ConfVars.HIVEFETCHTASKCONVERSION <> "none"
[ https://issues.apache.org/jira/browse/HIVE-14241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15379774#comment-15379774 ] Ashutosh Chauhan commented on HIVE-14241: - I see. +1 for the patch. Lets do the follow-up of moving FetchTask::initialize() in compileInternal() before execute() It doesn't seem its any useful to do that in compile(). > Acid clashes with ConfVars.HIVEFETCHTASKCONVERSION <> "none" > > > Key: HIVE-14241 > URL: https://issues.apache.org/jira/browse/HIVE-14241 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0, 2.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Fix For: 1.3.0, 2.2.0, 2.1.1 > > Attachments: HIVE-14241.patch > > > Some queries are optimized so as not to create an MR job. This somehow causes > the Configuration object in FetchOperator to be passed to the operator before > Driver.recordValidTxns() is called. So then to this op it looks like there > are no valid txns and it returns nothing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14249) Add simple materialized views with manual rebuilds
[ https://issues.apache.org/jira/browse/HIVE-14249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-14249: --- Assignee: Alan Gates (was: Jesus Camacho Rodriguez) > Add simple materialized views with manual rebuilds > -- > > Key: HIVE-14249 > URL: https://issues.apache.org/jira/browse/HIVE-14249 > Project: Hive > Issue Type: Sub-task > Components: Parser, Views >Reporter: Alan Gates >Assignee: Alan Gates > Attachments: HIVE-10459.2.patch > > > This patch is a start at implementing simple views. It doesn't have enough > testing yet (e.g. there's no negative testing). And I know it fails in the > partitioned case. I suspect things like security and locking don't work > properly yet either. But I'm posting it as a starting point. > In this initial patch I'm just handling simple materialized views with manual > rebuilds. In later JIRAs we can add features such as allowing the optimizer > to rewrite queries to use materialized views rather than tables named in the > queries, giving the optimizer the ability to determine when a materialized > view is stale, etc. > Also, I didn't rebase this patch against trunk after the migration from > svn->git so it may not apply cleanly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14249) Add simple materialized views with manual rebuilds
[ https://issues.apache.org/jira/browse/HIVE-14249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-14249: --- Attachment: HIVE-10459.2.patch > Add simple materialized views with manual rebuilds > -- > > Key: HIVE-14249 > URL: https://issues.apache.org/jira/browse/HIVE-14249 > Project: Hive > Issue Type: Sub-task > Components: Parser, Views >Reporter: Alan Gates >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-10459.2.patch > > > This patch is a start at implementing simple views. It doesn't have enough > testing yet (e.g. there's no negative testing). And I know it fails in the > partitioned case. I suspect things like security and locking don't work > properly yet either. But I'm posting it as a starting point. > In this initial patch I'm just handling simple materialized views with manual > rebuilds. In later JIRAs we can add features such as allowing the optimizer > to rewrite queries to use materialized views rather than tables named in the > queries, giving the optimizer the ability to determine when a materialized > view is stale, etc. > Also, I didn't rebase this patch against trunk after the migration from > svn->git so it may not apply cleanly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-14249) Add simple materialized views with manual rebuilds
[ https://issues.apache.org/jira/browse/HIVE-14249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez reassigned HIVE-14249: -- Assignee: Jesus Camacho Rodriguez (was: Alan Gates) > Add simple materialized views with manual rebuilds > -- > > Key: HIVE-14249 > URL: https://issues.apache.org/jira/browse/HIVE-14249 > Project: Hive > Issue Type: Sub-task > Components: Parser, Views >Reporter: Alan Gates >Assignee: Jesus Camacho Rodriguez > > This patch is a start at implementing simple views. It doesn't have enough > testing yet (e.g. there's no negative testing). And I know it fails in the > partitioned case. I suspect things like security and locking don't work > properly yet either. But I'm posting it as a starting point. > In this initial patch I'm just handling simple materialized views with manual > rebuilds. In later JIRAs we can add features such as allowing the optimizer > to rewrite queries to use materialized views rather than tables named in the > queries, giving the optimizer the ability to determine when a materialized > view is stale, etc. > Also, I didn't rebase this patch against trunk after the migration from > svn->git so it may not apply cleanly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10459) Add materialized views to Hive
[ https://issues.apache.org/jira/browse/HIVE-10459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-10459: --- Attachment: (was: HIVE-10459.patch) > Add materialized views to Hive > -- > > Key: HIVE-10459 > URL: https://issues.apache.org/jira/browse/HIVE-10459 > Project: Hive > Issue Type: New Feature > Components: Views >Reporter: Alan Gates >Assignee: Julian Hyde > > Materialized views are useful as ways to store either alternate versions of > data (e.g. same data, different sort order) or derivatives of data sets (e.g. > commonly used aggregates). It is useful to store these as materialized views > rather than as tables because it can give the optimizer the ability to > understand how data sets are related. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10459) Add materialized views to Hive
[ https://issues.apache.org/jira/browse/HIVE-10459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-10459: --- Attachment: (was: HIVE-10459.2.patch) > Add materialized views to Hive > -- > > Key: HIVE-10459 > URL: https://issues.apache.org/jira/browse/HIVE-10459 > Project: Hive > Issue Type: New Feature > Components: Views >Reporter: Alan Gates >Assignee: Julian Hyde > Attachments: HIVE-10459.patch > > > Materialized views are useful as ways to store either alternate versions of > data (e.g. same data, different sort order) or derivatives of data sets (e.g. > commonly used aggregates). It is useful to store these as materialized views > rather than as tables because it can give the optimizer the ability to > understand how data sets are related. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10459) Add materialized views to Hive
[ https://issues.apache.org/jira/browse/HIVE-10459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-10459: --- Issue Type: New Feature (was: Improvement) > Add materialized views to Hive > -- > > Key: HIVE-10459 > URL: https://issues.apache.org/jira/browse/HIVE-10459 > Project: Hive > Issue Type: New Feature > Components: Views >Reporter: Alan Gates >Assignee: Julian Hyde > Attachments: HIVE-10459.patch > > > Materialized views are useful as ways to store either alternate versions of > data (e.g. same data, different sort order) or derivatives of data sets (e.g. > commonly used aggregates). It is useful to store these as materialized views > rather than as tables because it can give the optimizer the ability to > understand how data sets are related. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10459) Add materialized views to Hive
[ https://issues.apache.org/jira/browse/HIVE-10459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15379734#comment-15379734 ] Jesus Camacho Rodriguez commented on HIVE-10459: [~alangates], I will move the initial work to HIVE-14249; we can follow-up there with anything related to the patch. As there are a few people watching this issue, I will use this as the umbrella JIRA for all the work that we will do around materialized views implementation. > Add materialized views to Hive > -- > > Key: HIVE-10459 > URL: https://issues.apache.org/jira/browse/HIVE-10459 > Project: Hive > Issue Type: Improvement > Components: Views >Reporter: Alan Gates >Assignee: Julian Hyde > Attachments: HIVE-10459.2.patch, HIVE-10459.patch > > > Materialized views are useful as ways to store either alternate versions of > data (e.g. same data, different sort order) or derivatives of data sets (e.g. > commonly used aggregates). It is useful to store these as materialized views > rather than as tables because it can give the optimizer the ability to > understand how data sets are related. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file
[ https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375820#comment-15375820 ] Eugene Koifman edited comment on HIVE-13369 at 7/15/16 5:18 PM: failed tests have age > 2 except: list_bucket_dml_12 fails on and off (e.g. https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/499/testReport/) reran auto_sortmerge_join_2 - passes was (Author: ekoifman): most failed tests have age > 2 list_bucket_dml_12 fails on and off (e.g. https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/499/testReport/) todo: check auto_sortmerge_join_2 > AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing > the "best" base file > -- > > Key: HIVE-13369 > URL: https://issues.apache.org/jira/browse/HIVE-13369 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-13369.1.patch, HIVE-13369.2.patch, > HIVE-13369.3.patch > > > The JavaDoc on getAcidState() reads, in part: > "Note that because major compactions don't >preserve the history, we can't use a base directory that includes a >transaction id that we must exclude." > which is correct but there is nothing in the code that does this. > And if we detect a situation where txn X must be excluded but and there are > deltas that contain X, we'll have to abort the txn. This can't (reasonably) > happen with auto commit mode, but with multi statement txns it's possible. > Suppose some long running txn starts and lock in snapshot at 17 (HWM). An > hour later it decides to access some partition for which all txns < 20 (for > example) have already been compacted (i.e. GC'd). > == > Here is a more concrete example. Let's say the file for table A are as > follows and created in the order listed. > delta_4_4 > delta_5_5 > delta_4_5 > base_5 > delta_16_16 > delta_17_17 > base_17 (for example user ran major compaction) > let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 > and ExceptionList=<16> > Assume that all txns <= 20 commit. > Reader can't use base_17 because it has result of txn16. So it should chose > base_5 "TxnBase bestBase" in _getChildState()_. > Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and > delta_17_17 in _Directory_ object. This would represent acceptable snapshot > for such reader. > The issue is if at the same time the Cleaner process is running. It will see > everything with txnid<17 as obsolete. Then it will check lock manger state > and decide to delete (as there may not be any locks in LM for table A). The > order in which the files are deleted is undefined right now. It may delete > delta_16_16 and delta_17_17 first and right at this moment the read request > with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by > some multi-stmt txn that started some time ago. It acquires locks after the > Cleaner checks LM state and calls getAcidState(). This request will choose > base_5 but it won't see delta_16_16 and delta_17_17 and thus return the > snapshot w/o modifications made by those txns. > [This is not possible currently since we only support autoCommit=true. The > reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) > locks in the snapshot. The cleaner won't delete anything for a given > compaction (partition) if there are locks on it. Thus for duration of the > transaction, nothing will be deleted so it's safe to use base_5] > This is a subtle race condition but possible. > 1. So the safest thing to do to ensure correctness is to use the latest > base_x as the "best" and check against exceptions in ValidTxnList and throw > an exception if there is an exception <=x. > 2. A better option is to keep 2 exception lists: aborted and open and only > throw if there is an open txn <=x. Compaction throws away data from aborted > txns and thus there is no harm using base with aborted txns in its range. > 3. You could make each txn record the lowest open txn id at its start and > prevent the cleaner from cleaning anything delta with id range that includes > this open txn id for any txn that is still running. This has a drawback of > potentially delaying GC of old files for arbitrarily long periods. So this > should be a user config choice. The implementation is not trivial. > I would go with 1 now and do 2/3 together with multi-statement txn work. > Side note: if 2 deltas have overlapping ID range, then 1 must be a subset of > the other -- This
[jira] [Commented] (HIVE-14241) Acid clashes with ConfVars.HIVEFETCHTASKCONVERSION <> "none"
[ https://issues.apache.org/jira/browse/HIVE-14241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15379659#comment-15379659 ] Eugene Koifman commented on HIVE-14241: --- The issue is not HiveConf - that is share. The issue is that HiveConf is passed to JobConf which copies all values from HiveConf. So modifying HiveConf after that has no effect on JobConf. > Acid clashes with ConfVars.HIVEFETCHTASKCONVERSION <> "none" > > > Key: HIVE-14241 > URL: https://issues.apache.org/jira/browse/HIVE-14241 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0, 2.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Fix For: 1.3.0, 2.2.0, 2.1.1 > > Attachments: HIVE-14241.patch > > > Some queries are optimized so as not to create an MR job. This somehow causes > the Configuration object in FetchOperator to be passed to the operator before > Driver.recordValidTxns() is called. So then to this op it looks like there > are no valid txns and it returns nothing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14241) Acid clashes with ConfVars.HIVEFETCHTASKCONVERSION <> "none"
[ https://issues.apache.org/jira/browse/HIVE-14241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15379659#comment-15379659 ] Eugene Koifman edited comment on HIVE-14241 at 7/15/16 4:49 PM: The issue is not HiveConf - that is shared. The issue is that HiveConf is passed to JobConf which copies all values from HiveConf. So modifying HiveConf after that has no effect on JobConf. was (Author: ekoifman): The issue is not HiveConf - that is share. The issue is that HiveConf is passed to JobConf which copies all values from HiveConf. So modifying HiveConf after that has no effect on JobConf. > Acid clashes with ConfVars.HIVEFETCHTASKCONVERSION <> "none" > > > Key: HIVE-14241 > URL: https://issues.apache.org/jira/browse/HIVE-14241 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0, 2.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Fix For: 1.3.0, 2.2.0, 2.1.1 > > Attachments: HIVE-14241.patch > > > Some queries are optimized so as not to create an MR job. This somehow causes > the Configuration object in FetchOperator to be passed to the operator before > Driver.recordValidTxns() is called. So then to this op it looks like there > are no valid txns and it returns nothing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-14004) Minor compaction produces ArrayIndexOutOfBoundsException: 7 in SchemaEvolution.getFileType
[ https://issues.apache.org/jira/browse/HIVE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman resolved HIVE-14004. --- Resolution: Fixed Fix Version/s: 1.3.0 ported to branch-1 as well > Minor compaction produces ArrayIndexOutOfBoundsException: 7 in > SchemaEvolution.getFileType > -- > > Key: HIVE-14004 > URL: https://issues.apache.org/jira/browse/HIVE-14004 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0, 2.1.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Fix For: 1.3.0, 2.2.0, 2.1.1 > > Attachments: HIVE-14004.01.patch, HIVE-14004.02.patch, > HIVE-14004.03.patch, HIVE-14004.patch > > > Easiest way to repro is to add TestTxnCommands2 > {noformat} > @Test > public void testCompactWithDelete() throws Exception { > int[][] tableData = {{1,2},{3,4}}; > runStatementOnDriver("insert into " + Table.ACIDTBL + "(a,b) " + > makeValuesClause(tableData)); > runStatementOnDriver("alter table "+ Table.ACIDTBL + " compact 'MAJOR'"); > Worker t = new Worker(); > t.setThreadId((int) t.getId()); > t.setHiveConf(hiveConf); > AtomicBoolean stop = new AtomicBoolean(); > AtomicBoolean looped = new AtomicBoolean(); > stop.set(true); > t.init(stop, looped); > t.run(); > runStatementOnDriver("delete from " + Table.ACIDTBL + " where b = 4"); > runStatementOnDriver("update " + Table.ACIDTBL + " set b = -2 where b = > 2"); > runStatementOnDriver("alter table "+ Table.ACIDTBL + " compact 'MINOR'"); > t.run(); > } > {noformat} > to TestTxnCommands2 and run it. > Test won't fail but if you look > in target/tmp/log/hive.log for the following exception (from Minor > compaction). > {noformat} > 2016-06-09T18:36:39,071 WARN [Thread-190[]]: mapred.LocalJobRunner > (LocalJobRunner.java:run(560)) - job_local1233973168_0005 > java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) > ~[hadoop-mapreduce-client-common-2.6.1.jar:?] > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) > [hadoop-mapreduce-client-common-2.6.1.jar:?] > Caused by: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.orc.impl.SchemaEvolution.getFileType(SchemaEvolution.java:67) > ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2031) > ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.orc.impl.TreeReaderFactory$StructTreeReader.(TreeReaderFactory.java:1716) > ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2077) > ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.orc.impl.TreeReaderFactory$StructTreeReader.(TreeReaderFactory.java:1716) > ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2077) > ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.orc.impl.RecordReaderImpl.(RecordReaderImpl.java:208) > ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.(RecordReaderImpl.java:63) > ~[classes/:?] > at > org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:365) > ~[classes/:?] > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.(OrcRawRecordMerger.java:207) > ~[classes/:?] > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:508) > ~[classes/:?] > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRawReader(OrcInputFormat.java:1977) > ~[classes/:?] > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:630) > ~[classes/:?] > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:609) > ~[classes/:?] > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) > ~[hadoop-mapreduce-client-core-2.6.1.jar:?] > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) > ~[hadoop-mapreduce-client-core-2.6.1.jar:?] > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) > ~[hadoop-mapreduce-client-core-2.6.1.jar:?] > at > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) > ~[hadoop-mapreduce-client-common-2.6.1.jar:?] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > ~[?:1.7.0_71] >
[jira] [Commented] (HIVE-14241) Acid clashes with ConfVars.HIVEFETCHTASKCONVERSION <> "none"
[ https://issues.apache.org/jira/browse/HIVE-14241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15379653#comment-15379653 ] Ashutosh Chauhan commented on HIVE-14241: - Patch looks good to me for immediate problem. Though I wonder if there is a better solution which makes sure same HiveConf instance is used in fetch task or MR job or Tez job. > Acid clashes with ConfVars.HIVEFETCHTASKCONVERSION <> "none" > > > Key: HIVE-14241 > URL: https://issues.apache.org/jira/browse/HIVE-14241 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0, 2.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Fix For: 1.3.0, 2.2.0, 2.1.1 > > Attachments: HIVE-14241.patch > > > Some queries are optimized so as not to create an MR job. This somehow causes > the Configuration object in FetchOperator to be passed to the operator before > Driver.recordValidTxns() is called. So then to this op it looks like there > are no valid txns and it returns nothing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14004) Minor compaction produces ArrayIndexOutOfBoundsException: 7 in SchemaEvolution.getFileType
[ https://issues.apache.org/jira/browse/HIVE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-14004: -- Affects Version/s: (was: 2.2.0) 1.3.0 2.0.0 > Minor compaction produces ArrayIndexOutOfBoundsException: 7 in > SchemaEvolution.getFileType > -- > > Key: HIVE-14004 > URL: https://issues.apache.org/jira/browse/HIVE-14004 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0, 2.1.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Fix For: 2.2.0, 2.1.1 > > Attachments: HIVE-14004.01.patch, HIVE-14004.02.patch, > HIVE-14004.03.patch, HIVE-14004.patch > > > Easiest way to repro is to add TestTxnCommands2 > {noformat} > @Test > public void testCompactWithDelete() throws Exception { > int[][] tableData = {{1,2},{3,4}}; > runStatementOnDriver("insert into " + Table.ACIDTBL + "(a,b) " + > makeValuesClause(tableData)); > runStatementOnDriver("alter table "+ Table.ACIDTBL + " compact 'MAJOR'"); > Worker t = new Worker(); > t.setThreadId((int) t.getId()); > t.setHiveConf(hiveConf); > AtomicBoolean stop = new AtomicBoolean(); > AtomicBoolean looped = new AtomicBoolean(); > stop.set(true); > t.init(stop, looped); > t.run(); > runStatementOnDriver("delete from " + Table.ACIDTBL + " where b = 4"); > runStatementOnDriver("update " + Table.ACIDTBL + " set b = -2 where b = > 2"); > runStatementOnDriver("alter table "+ Table.ACIDTBL + " compact 'MINOR'"); > t.run(); > } > {noformat} > to TestTxnCommands2 and run it. > Test won't fail but if you look > in target/tmp/log/hive.log for the following exception (from Minor > compaction). > {noformat} > 2016-06-09T18:36:39,071 WARN [Thread-190[]]: mapred.LocalJobRunner > (LocalJobRunner.java:run(560)) - job_local1233973168_0005 > java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) > ~[hadoop-mapreduce-client-common-2.6.1.jar:?] > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) > [hadoop-mapreduce-client-common-2.6.1.jar:?] > Caused by: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.orc.impl.SchemaEvolution.getFileType(SchemaEvolution.java:67) > ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2031) > ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.orc.impl.TreeReaderFactory$StructTreeReader.(TreeReaderFactory.java:1716) > ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2077) > ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.orc.impl.TreeReaderFactory$StructTreeReader.(TreeReaderFactory.java:1716) > ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2077) > ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.orc.impl.RecordReaderImpl.(RecordReaderImpl.java:208) > ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.(RecordReaderImpl.java:63) > ~[classes/:?] > at > org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:365) > ~[classes/:?] > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.(OrcRawRecordMerger.java:207) > ~[classes/:?] > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:508) > ~[classes/:?] > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRawReader(OrcInputFormat.java:1977) > ~[classes/:?] > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:630) > ~[classes/:?] > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:609) > ~[classes/:?] > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) > ~[hadoop-mapreduce-client-core-2.6.1.jar:?] > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) > ~[hadoop-mapreduce-client-core-2.6.1.jar:?] > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) > ~[hadoop-mapreduce-client-core-2.6.1.jar:?] > at > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) > ~[hadoop-mapreduce-client-common-2.6.1.jar:?] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > ~[?:1.7.0