[jira] [Created] (ORC-1167) Support orc.row.batch.size configuration

dzcxzl (Jira) Sat, 07 May 2022 05:04:07 -0700

dzcxzl created ORC-1167:
---------------------------

             Summary: Support orc.row.batch.size configuration
                 Key: ORC-1167
                 URL: https://issues.apache.org/jira/browse/ORC-1167
             Project: ORC
          Issue Type: Improvement
            Reporter: dzcxzl



Now create OrcMapreduceRecordReader, the default value of batch size is 1024, 
we can support the configuration in Reader.Options.

 

If we read 1024 relatively large strings, we might get 
NegativeArraySizeException, but no configuration to reduce batch size.

 
{code:java}
java.lang.NegativeArraySizeException
        at 
org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1544)
        at 
org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1566)
        at 
org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1662)
        at 
org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1508)
        at 
org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:2047)
        at 
org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1219)
        at 
org.apache.orc.mapreduce.OrcMapreduceRecordReader.ensureBatch(OrcMapreduceRecordReader.java:84)
        at 
org.apache.orc.mapreduce.OrcMapreduceRecordReader.nextKeyValue(OrcMapreduceRecordReader.java:102)
        at 
org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
 {code}
 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (ORC-1167) Support orc.row.batch.size configuration

Reply via email to