[
https://issues.apache.org/jira/browse/ORC-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17249369#comment-17249369
]
Owen O'Malley commented on ORC-697:
-----------------------------------
The output looks like:
{noformat}
Processing data file bad.orc [length: 451155826]
Unable to read batch at row 20735000 in stripe 17 (rows 20460000-21380000),
recovery at row 20740000 in stripe 17 (rows 20460000-21380000)
java.lang.ArrayIndexOutOfBoundsException: 0
at
org.apache.orc.impl.RunLengthIntegerReaderV2.readPatchedBaseValues(RunLengthIntegerReaderV2.java:200)
at
org.apache.orc.impl.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:70)
at
org.apache.orc.impl.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:323)
at
org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:373)
at
org.apache.orc.impl.TreeReaderFactory$LongTreeReader.nextVector(TreeReaderFactory.java:696)
at
org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextVector(TreeReaderFactory.java:2463)
at
org.apache.orc.impl.reader.tree.StructBatchReader.readBatchColumn(StructBatchReader.java:42)
at
org.apache.orc.impl.reader.tree.StructBatchReader.nextBatch(StructBatchReader.java:72)
at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1236)
at org.apache.orc.tools.ScanData.main(ScanData.java:175)
at org.apache.orc.tools.Driver.main(Driver.java:126)
Column 5 failed at row 20735968
Unable to read batch at row 20756000 in stripe 17 (rows 20460000-21380000),
recovery at row 20760000 in stripe 17 (rows 20460000-21380000)
java.lang.ArrayIndexOutOfBoundsException: 0
at
org.apache.orc.impl.RunLengthIntegerReaderV2.readPatchedBaseValues(RunLengthIntegerReaderV2.java:200)
at
org.apache.orc.impl.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:70)
at
org.apache.orc.impl.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:323)
at
org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:373)
at
org.apache.orc.impl.TreeReaderFactory$LongTreeReader.nextVector(TreeReaderFactory.java:696)
at
org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextVector(TreeReaderFactory.java:2463)
at
org.apache.orc.impl.reader.tree.StructBatchReader.readBatchColumn(StructBatchReader.java:42)
at
org.apache.orc.impl.reader.tree.StructBatchReader.nextBatch(StructBatchReader.java:72)
at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1236)
at org.apache.orc.tools.ScanData.main(ScanData.java:175)
at org.apache.orc.tools.Driver.main(Driver.java:126)
Column 5 failed at row 20756960
Unable to read batch at row 20785000 in stripe 17 (rows 20460000-21380000),
recovery at row 20790000 in stripe 17 (rows 20460000-21380000)
java.lang.ArrayIndexOutOfBoundsException: 0
at
org.apache.orc.impl.RunLengthIntegerReaderV2.readPatchedBaseValues(RunLengthIntegerReaderV2.java:200)
at
org.apache.orc.impl.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:70)
at
org.apache.orc.impl.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:323)
at
org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:373)
at
org.apache.orc.impl.TreeReaderFactory$LongTreeReader.nextVector(TreeReaderFactory.java:696)
at
org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextVector(TreeReaderFactory.java:2463)
at
org.apache.orc.impl.reader.tree.StructBatchReader.readBatchColumn(StructBatchReader.java:42)
at
org.apache.orc.impl.reader.tree.StructBatchReader.nextBatch(StructBatchReader.java:72)
at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1236)
at org.apache.orc.tools.ScanData.main(ScanData.java:175)
at org.apache.orc.tools.Driver.main(Driver.java:126)
Column 5 failed at row 20785632
File: bad.orc, bad batches: 3, rows: 33323696/33337696
{noformat}
> Improve Scan tool to report where files are corrupted.
> ------------------------------------------------------
>
> Key: ORC-697
> URL: https://issues.apache.org/jira/browse/ORC-697
> Project: ORC
> Issue Type: Improvement
> Reporter: Owen O'Malley
> Assignee: Owen O'Malley
> Priority: Major
>
> We recently had a case where a bad machine was causing corruption in ORC
> files. In the process of debugging that, I extended the scan tool to report
> where the corruption was.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)