[ 
https://issues.apache.org/jira/browse/ORC-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17249369#comment-17249369
 ] 

Owen O'Malley commented on ORC-697:
-----------------------------------


The output looks like:
{noformat}
 Processing data file bad.orc [length: 451155826]
 Unable to read batch at row 20735000 in stripe 17 (rows 20460000-21380000), 
recovery at row 20740000 in stripe 17 (rows 20460000-21380000)
 java.lang.ArrayIndexOutOfBoundsException: 0
 at 
org.apache.orc.impl.RunLengthIntegerReaderV2.readPatchedBaseValues(RunLengthIntegerReaderV2.java:200)
 at 
org.apache.orc.impl.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:70)
 at 
org.apache.orc.impl.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:323)
 at 
org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:373)
 at 
org.apache.orc.impl.TreeReaderFactory$LongTreeReader.nextVector(TreeReaderFactory.java:696)
 at 
org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextVector(TreeReaderFactory.java:2463)
 at 
org.apache.orc.impl.reader.tree.StructBatchReader.readBatchColumn(StructBatchReader.java:42)
 at 
org.apache.orc.impl.reader.tree.StructBatchReader.nextBatch(StructBatchReader.java:72)
 at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1236)
 at org.apache.orc.tools.ScanData.main(ScanData.java:175)
 at org.apache.orc.tools.Driver.main(Driver.java:126)
 Column 5 failed at row 20735968
 Unable to read batch at row 20756000 in stripe 17 (rows 20460000-21380000), 
recovery at row 20760000 in stripe 17 (rows 20460000-21380000)
 java.lang.ArrayIndexOutOfBoundsException: 0
 at 
org.apache.orc.impl.RunLengthIntegerReaderV2.readPatchedBaseValues(RunLengthIntegerReaderV2.java:200)
 at 
org.apache.orc.impl.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:70)
 at 
org.apache.orc.impl.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:323)
 at 
org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:373)
 at 
org.apache.orc.impl.TreeReaderFactory$LongTreeReader.nextVector(TreeReaderFactory.java:696)
 at 
org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextVector(TreeReaderFactory.java:2463)
 at 
org.apache.orc.impl.reader.tree.StructBatchReader.readBatchColumn(StructBatchReader.java:42)
 at 
org.apache.orc.impl.reader.tree.StructBatchReader.nextBatch(StructBatchReader.java:72)
 at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1236)
 at org.apache.orc.tools.ScanData.main(ScanData.java:175)
 at org.apache.orc.tools.Driver.main(Driver.java:126)
 Column 5 failed at row 20756960
 Unable to read batch at row 20785000 in stripe 17 (rows 20460000-21380000), 
recovery at row 20790000 in stripe 17 (rows 20460000-21380000)
 java.lang.ArrayIndexOutOfBoundsException: 0
 at 
org.apache.orc.impl.RunLengthIntegerReaderV2.readPatchedBaseValues(RunLengthIntegerReaderV2.java:200)
 at 
org.apache.orc.impl.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:70)
 at 
org.apache.orc.impl.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:323)
 at 
org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:373)
 at 
org.apache.orc.impl.TreeReaderFactory$LongTreeReader.nextVector(TreeReaderFactory.java:696)
 at 
org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextVector(TreeReaderFactory.java:2463)
 at 
org.apache.orc.impl.reader.tree.StructBatchReader.readBatchColumn(StructBatchReader.java:42)
 at 
org.apache.orc.impl.reader.tree.StructBatchReader.nextBatch(StructBatchReader.java:72)
 at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1236)
 at org.apache.orc.tools.ScanData.main(ScanData.java:175)
 at org.apache.orc.tools.Driver.main(Driver.java:126)
 Column 5 failed at row 20785632
 File: bad.orc, bad batches: 3, rows: 33323696/33337696
{noformat}

> Improve Scan tool to report where files are corrupted.
> ------------------------------------------------------
>
>                 Key: ORC-697
>                 URL: https://issues.apache.org/jira/browse/ORC-697
>             Project: ORC
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>            Priority: Major
>
> We recently had a case where a bad machine was causing corruption in ORC 
> files. In the process of debugging that, I extended the scan tool to report 
> where the corruption was.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to