dzcxzl created ORC-1167: --------------------------- Summary: Support orc.row.batch.size configuration Key: ORC-1167 URL: https://issues.apache.org/jira/browse/ORC-1167 Project: ORC Issue Type: Improvement Reporter: dzcxzl
Now create OrcMapreduceRecordReader, the default value of batch size is 1024, we can support the configuration in Reader.Options. If we read 1024 relatively large strings, we might get NegativeArraySizeException, but no configuration to reduce batch size. {code:java} java.lang.NegativeArraySizeException at org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1544) at org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1566) at org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1662) at org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1508) at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:2047) at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1219) at org.apache.orc.mapreduce.OrcMapreduceRecordReader.ensureBatch(OrcMapreduceRecordReader.java:84) at org.apache.orc.mapreduce.OrcMapreduceRecordReader.nextKeyValue(OrcMapreduceRecordReader.java:102) at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39) {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)