dzcxzl created ORC-1167:
---------------------------
Summary: Support orc.row.batch.size configuration
Key: ORC-1167
URL: https://issues.apache.org/jira/browse/ORC-1167
Project: ORC
Issue Type: Improvement
Reporter: dzcxzl
Now create OrcMapreduceRecordReader, the default value of batch size is 1024,
we can support the configuration in Reader.Options.
If we read 1024 relatively large strings, we might get
NegativeArraySizeException, but no configuration to reduce batch size.
{code:java}
java.lang.NegativeArraySizeException
at
org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1544)
at
org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1566)
at
org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1662)
at
org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1508)
at
org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:2047)
at
org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1219)
at
org.apache.orc.mapreduce.OrcMapreduceRecordReader.ensureBatch(OrcMapreduceRecordReader.java:84)
at
org.apache.orc.mapreduce.OrcMapreduceRecordReader.nextKeyValue(OrcMapreduceRecordReader.java:102)
at
org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
{code}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)