[ 
https://issues.apache.org/jira/browse/HIVE-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092275#comment-16092275
 ] 

Eugene Koifman commented on HIVE-16077:
---------------------------------------

bucket_num_reducers.q bucket_num_reducers2.q test non-acid code path
Acid code path doesn't work - repro below
{noformat}
  @Test
  public void testMoreBucketsThanReducers2() throws Exception {
    //see bucket_num_reducers.q bucket_num_reducers2.q
    d.destroy();
    HiveConf hc = new HiveConf(hiveConf);
    hc.setIntVar(HiveConf.ConfVars.MAXREDUCERS, 1);
    //this is used in multiple places, 
SemanticAnalyzer.getBucketingSortingDest() among others
    hc.setIntVar(HiveConf.ConfVars.HADOOPNUMREDUCERS, 1);
    hc.setBoolVar(HiveConf.ConfVars.HIVE_EXPLAIN_USER, false);
    d = new Driver(hc);
    d.setMaxRows(10000);
    runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,1)");//txn 
X write to bucket1
    runStatementOnDriver("insert into " + Table.ACIDTBL + " 
values(0,0),(3,3)");// txn X + 1 write to bucket0 + bucket1
    
    /*so now FileSinkOperator for this update should have totalFiles=2, 
numFiles=2 and multiFileSpray=2
     FileSinkOperator.process() has "if (fpaths.acidLastBucket != bucketNum) {" 
- this assumes that
     rows seen by process are grouped by bucketNum when numBuckets > 
numReducers.  There is nothing
     that guarantees this.  This demonstrates it - ReduceSinkOperator sorts by 
ROW_ID, thus the
     1 FileSinkOperator here in process()
     should get (1,1),(0,0),(3,3) i.e. row from b1,b0,b1 and get 
ArrayIndexOutOfBoundsException
     2017-07-18T14:48:58,771 ERROR [pool-23-thread-1] ExecReducer: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error whi\
le processing row (tag=0) 
{"key":{"reducesinkkey0":{"transactionid":12,"bucketid":536936448,"rowid":0}},"value":{"_col0":3}}
        at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:243)
        at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:346)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
        at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:779)
        at 
org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:952)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:900)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:891)
        at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
        at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:234)
        ... 8 more
     */
    CommandProcessorResponse cpr = runStatementOnDriverNegative("update " + 
Table.ACIDTBL + " set b = -1");
    Assert.assertEquals("", 2, cpr.getResponseCode());
    /*this error is not the only possible error: we could just corrupt the data:
    * say we have a single FS that should write 4 buckets and we see rows in 
this order: b1,b0,b3,b1
    * The 2nd row for b1 will cause "++fpaths.acidFileOffset" and a 2nd writer 
for b1 will be created
    * in fpaths.updaters[3] (but same file name as updaters[0] - I don't know 
what will happen when
    * file names collide - maybe we get bucket0 and bucket0_copy1 - maybe it 
will be clobbered*/
  }

{noformat}

> A tests where number of buckets is > number of reducers for Acid
> ----------------------------------------------------------------
>
>                 Key: HIVE-16077
>                 URL: https://issues.apache.org/jira/browse/HIVE-16077
>             Project: Hive
>          Issue Type: Test
>          Components: Transactions
>            Reporter: Eugene Koifman
>            Assignee: Eugene Koifman
>
> don't think we have such tests for Acid path
> check if they exist for non-acid path



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to