[jira] [Updated] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-12 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14448:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14448.01.patch, HIVE-14448.02.patch, 
> HIVE-14448.03.patch, HIVE-14448.04.patch, HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
> org.apache.hadoop.hive.ql.Test

[jira] [Updated] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-12 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14448:

Status: Patch Available  (was: In Progress)

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14448.01.patch, HIVE-14448.02.patch, 
> HIVE-14448.03.patch, HIVE-14448.04.patch, HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID(TestTxnCommands2.jav

[jira] [Updated] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-12 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14448:

Fix Version/s: 2.1.1
   2.2.0

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14448.01.patch, HIVE-14448.02.patch, 
> HIVE-14448.03.patch, HIVE-14448.04.patch, HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.testETLS

[jira] [Updated] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-12 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14448:

Status: In Progress  (was: Patch Available)

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14448.01.patch, HIVE-14448.02.patch, 
> HIVE-14448.03.patch, HIVE-14448.04.patch, HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID(TestTxnCommands2.jav

[jira] [Updated] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-12 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14448:

Attachment: HIVE-14448.04.patch

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14448.01.patch, HIVE-14448.02.patch, 
> HIVE-14448.03.patch, HIVE-14448.04.patch, HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID(TestTxnCommands2.java:280)
>

[jira] [Updated] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-12 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14448:
--
Status: Open  (was: Patch Available)

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14448.01.patch, HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID(TestTxnCommands2.java:280)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native

[jira] [Updated] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-11 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14448:

Status: In Progress  (was: Patch Available)

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14448.01.patch, HIVE-14448.02.patch, 
> HIVE-14448.03.patch, HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID(TestTxnCommands2.java:280)
>   at sun

[jira] [Updated] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-11 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14448:

Attachment: HIVE-14448.03.patch

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14448.01.patch, HIVE-14448.02.patch, 
> HIVE-14448.03.patch, HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID(TestTxnCommands2.java:280)
>   at sun.reflect.Nat

[jira] [Updated] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-11 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14448:

Status: Patch Available  (was: In Progress)

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14448.01.patch, HIVE-14448.02.patch, 
> HIVE-14448.03.patch, HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID(TestTxnCommands2.java:280)
>   at sun

[jira] [Updated] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-11 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14448:

Status: Patch Available  (was: Open)

Let's wait until a successful run before more code review.

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14448.01.patch, HIVE-14448.02.patch, 
> HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID(TestTxnCo

[jira] [Updated] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-11 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14448:

Attachment: HIVE-14448.02.patch

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14448.01.patch, HIVE-14448.02.patch, 
> HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID(TestTxnCommands2.java:280)
>   at sun.reflect.NativeMethodAccessorImpl

[jira] [Updated] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-10 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14448:

Status: Patch Available  (was: In Progress)

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14448.01.patch, HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID(TestTxnCommands2.java:280)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Nat

[jira] [Updated] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-10 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14448:

Attachment: HIVE-14448.01.patch

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14448.01.patch, HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID(TestTxnCommands2.java:280)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

[jira] [Updated] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-10 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14448:

Status: In Progress  (was: Patch Available)

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14448.01.patch, HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID(TestTxnCommands2.java:280)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Nat

[jira] [Updated] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-10 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14448:
--
Priority: Critical  (was: Major)

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID(TestTxnCommands2.java:280)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> su

[jira] [Updated] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-09 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14448:

Status: Patch Available  (was: Open)

Forgot to submit patch... grr

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID(TestTxnCommands2.java:280)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>

[jira] [Updated] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-08 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14448:

Attachment: HIVE-14448.patch

Let's see if this works... as per comment in getSargColumnNames that I added 
some time ago after understanding how it works, that code is brittle... oh well.

[~prasanth_j] can you take a look?

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
> org.apache.hadoop

[jira] [Updated] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-05 Thread Saket Saurabh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14448:
-
Description: 
When ETL split strategy is applied to ACID tables with predicate pushdown (SARG 
enabled), split generation fails for ACID. This bug will be usually exposed 
when working with data at scale, because in most otherwise cases only BI split 
strategy is chosen. My guess is that this is happening because the correct 
readerSchema is not being picked up when we try to extract SARG column names.

Quickest way to reproduce is to add the following unit test to 
ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java

{code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
 @Test
  public void testETLSplitStrategyForACID() throws Exception {
hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
runWorker(hiveConf);
List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  + 
" where a = 1");
int[][] resultData = new int[][] {{1,2}};
Assert.assertEquals(stringifyValues(resultData), rs);
  }
{code}

Back-trace for this failed test is as follows:
{code}
exec.Task: Job Submission failed with exception 'java.lang.RuntimeException(ORC 
split generation failed with exception: java.lang.NegativeArraySizeException)'
java.lang.RuntimeException: ORC split generation failed with exception: 
java.lang.NegativeArraySizeException
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
at 
org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
at 
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
at 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
at 
org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
at 
org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID(TestTxnCommands2.java:280)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.r

[jira] [Updated] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-05 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14448:
--
Target Version/s: 2.2.0

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)