[ 
https://issues.apache.org/jira/browse/HIVE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15373699#comment-15373699
 ] 

Owen O'Malley commented on HIVE-14004:
--------------------------------------

I should give more details. The problem was that OrcInputFormat was modifying 
the passed in Options object and that ACID was reusing the Options object 
between the deltas. Thus, when some of the delta files had fewer columns, the 
include array wasn't long enough.

> Minor compaction produces ArrayIndexOutOfBoundsException: 7 in 
> SchemaEvolution.getFileType
> ------------------------------------------------------------------------------------------
>
>                 Key: HIVE-14004
>                 URL: https://issues.apache.org/jira/browse/HIVE-14004
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 2.2.0
>            Reporter: Eugene Koifman
>            Assignee: Owen O'Malley
>         Attachments: HIVE-14004.01.patch, HIVE-14004.02.patch, 
> HIVE-14004.03.patch, HIVE-14004.patch
>
>
> Easiest way to repro is to add TestTxnCommands2
> {noformat}
>   @Test
>   public void testCompactWithDelete() throws Exception {
>     int[][] tableData = {{1,2},{3,4}};
>     runStatementOnDriver("insert into " + Table.ACIDTBL + "(a,b) " + 
> makeValuesClause(tableData));
>     runStatementOnDriver("alter table "+ Table.ACIDTBL + " compact 'MAJOR'");
>     Worker t = new Worker();
>     t.setThreadId((int) t.getId());
>     t.setHiveConf(hiveConf);
>     AtomicBoolean stop = new AtomicBoolean();
>     AtomicBoolean looped = new AtomicBoolean();
>     stop.set(true);
>     t.init(stop, looped);
>     t.run();
>     runStatementOnDriver("delete from " + Table.ACIDTBL + " where b = 4");
>     runStatementOnDriver("update " + Table.ACIDTBL + " set b = -2 where b = 
> 2");
>     runStatementOnDriver("alter table "+ Table.ACIDTBL + " compact 'MINOR'");
>     t.run();
>   }
> {noformat}
> to TestTxnCommands2 and run it.
> Test won't fail but if you look 
> in target/tmp/log/hive.log for the following exception (from Minor 
> compaction).
> {noformat}
> 2016-06-09T18:36:39,071 WARN  [Thread-190[]]: mapred.LocalJobRunner 
> (LocalJobRunner.java:run(560)) - job_local1233973168_0005
> java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 7
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) 
> ~[hadoop-mapreduce-client-common-2.6.1.jar:?]
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) 
> [hadoop-mapreduce-client-common-2.6.1.jar:?]
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
>         at 
> org.apache.orc.impl.SchemaEvolution.getFileType(SchemaEvolution.java:67) 
> ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at 
> org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2031)
>  ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.<init>(TreeReaderFactory.java:1716)
>  ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at 
> org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2077)
>  ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.<init>(TreeReaderFactory.java:1716)
>  ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at 
> org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2077)
>  ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at 
> org.apache.orc.impl.RecordReaderImpl.<init>(RecordReaderImpl.java:208) 
> ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.<init>(RecordReaderImpl.java:63)
>  ~[classes/:?]
>         at 
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:365) 
> ~[classes/:?]
>         at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.<init>(OrcRawRecordMerger.java:207)
>  ~[classes/:?]
>         at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:508)
>  ~[classes/:?]
>         at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRawReader(OrcInputFormat.java:1977)
>  ~[classes/:?]
>         at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:630)
>  ~[classes/:?]
>         at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:609)
>  ~[classes/:?]
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) 
> ~[hadoop-mapreduce-client-core-2.6.1.jar:?]
>         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) 
> ~[hadoop-mapreduce-client-core-2.6.1.jar:?]
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
> ~[hadoop-mapreduce-client-core-2.6.1.jar:?]
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
>  ~[hadoop-mapreduce-client-common-2.6.1.jar:?]
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
> ~[?:1.7.0_71]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
> ~[?:1.7.0_71]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  ~[?:1.7.0_71]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  ~[?:1.7.0_71]
>         at java.lang.Thread.run(Thread.java:745) ~[?:1.7.0_71]
> {noformat}
> I observed the same on a real cluster.
> Based on my observations, running Major compaction instead of minor, works 
> fine.
> Replacing the DELETE operation with update, makes both Major/Minor run fine.
> The issue itself should be addressed by HIVE-13974 but need to make sure to 
> add the test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to