[ https://issues.apache.org/jira/browse/ORC-435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17078442#comment-17078442 ]
Ivan Dyptan commented on ORC-435: --------------------------------- [~prasanth_j] [~omalley] Please consider re-testing the case, as we are hitting similar error on ORC 1.5.5: {code:java} java -Xmx8g -jar /tmp/orc-tools-1.5.10-SNAPSHOT-uber.jar data /tmp/largestripe.orc log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).log4j:WARN Please initialize the log4j system properly.log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.Processing data file /tmp/largestripe.orc [length: 2415919549]Unable to dump data for file: /tmp/largestripe.orcjava.lang.NegativeArraySizeException at org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1553) at org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1575) at org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1673) at org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1517) at org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:2245) at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:2059) at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1315) at org.apache.orc.tools.PrintData.printJsonData(PrintData.java:203) at org.apache.orc.tools.PrintData.main(PrintData.java:241) at org.apache.orc.tools.Driver.main(Driver.java:110){code} > Ability to read stripes that are greater than 2GB > ------------------------------------------------- > > Key: ORC-435 > URL: https://issues.apache.org/jira/browse/ORC-435 > Project: ORC > Issue Type: Bug > Components: Reader > Affects Versions: 1.3.4, 1.4.4, 1.5.3, 1.6.0 > Reporter: Prasanth Jayachandran > Assignee: Prasanth Jayachandran > Priority: Major > Fix For: 1.5.4, 1.6.0 > > > ORC reader fails with NegativeArraySizeException if the stripe size is >2GB. > Even though default stripe size is 64MB there are cases where stripe size > will reach >2GB even before memory manager can kick in to check memory size. > Say if we are inserting 500KB strings (mostly unique) by the time we reach > 5000 rows stripe size is already over 2GB. Reader will have to chunk the disk > range reads for such cases instead of reading the stripe as whole blob. > Exception thrown when reading such files > {code:java} > 2018-10-12 21:43:58,833 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : java.lang.NegativeArraySizeException > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderUtils.readDiskRanges(RecordReaderUtils.java:272) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readPartialDataStreams(RecordReaderImpl.java:1007) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:835) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:1029) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1062) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1085){code} -- This message was sent by Atlassian Jira (v8.3.4#803005)