[ https://issues.apache.org/jira/browse/HIVE-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092275#comment-16092275 ]
Eugene Koifman commented on HIVE-16077: --------------------------------------- bucket_num_reducers.q bucket_num_reducers2.q test non-acid code path Acid code path doesn't work - repro below {noformat} @Test public void testMoreBucketsThanReducers2() throws Exception { //see bucket_num_reducers.q bucket_num_reducers2.q d.destroy(); HiveConf hc = new HiveConf(hiveConf); hc.setIntVar(HiveConf.ConfVars.MAXREDUCERS, 1); //this is used in multiple places, SemanticAnalyzer.getBucketingSortingDest() among others hc.setIntVar(HiveConf.ConfVars.HADOOPNUMREDUCERS, 1); hc.setBoolVar(HiveConf.ConfVars.HIVE_EXPLAIN_USER, false); d = new Driver(hc); d.setMaxRows(10000); runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,1)");//txn X write to bucket1 runStatementOnDriver("insert into " + Table.ACIDTBL + " values(0,0),(3,3)");// txn X + 1 write to bucket0 + bucket1 /*so now FileSinkOperator for this update should have totalFiles=2, numFiles=2 and multiFileSpray=2 FileSinkOperator.process() has "if (fpaths.acidLastBucket != bucketNum) {" - this assumes that rows seen by process are grouped by bucketNum when numBuckets > numReducers. There is nothing that guarantees this. This demonstrates it - ReduceSinkOperator sorts by ROW_ID, thus the 1 FileSinkOperator here in process() should get (1,1),(0,0),(3,3) i.e. row from b1,b0,b1 and get ArrayIndexOutOfBoundsException 2017-07-18T14:48:58,771 ERROR [pool-23-thread-1] ExecReducer: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error whi\ le processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":12,"bucketid":536936448,"rowid":0}},"value":{"_col0":3}} at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:243) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:346) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ArrayIndexOutOfBoundsException: 2 at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:779) at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:952) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:900) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:891) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:234) ... 8 more */ CommandProcessorResponse cpr = runStatementOnDriverNegative("update " + Table.ACIDTBL + " set b = -1"); Assert.assertEquals("", 2, cpr.getResponseCode()); /*this error is not the only possible error: we could just corrupt the data: * say we have a single FS that should write 4 buckets and we see rows in this order: b1,b0,b3,b1 * The 2nd row for b1 will cause "++fpaths.acidFileOffset" and a 2nd writer for b1 will be created * in fpaths.updaters[3] (but same file name as updaters[0] - I don't know what will happen when * file names collide - maybe we get bucket0 and bucket0_copy1 - maybe it will be clobbered*/ } {noformat} > A tests where number of buckets is > number of reducers for Acid > ---------------------------------------------------------------- > > Key: HIVE-16077 > URL: https://issues.apache.org/jira/browse/HIVE-16077 > Project: Hive > Issue Type: Test > Components: Transactions > Reporter: Eugene Koifman > Assignee: Eugene Koifman > > don't think we have such tests for Acid path > check if they exist for non-acid path -- This message was sent by Atlassian JIRA (v6.4.14#64029)