[ https://issues.apache.org/jira/browse/ARROW-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rong Ma updated ARROW-8803: --------------------------- Summary: [Java] Row count should be set before loading buffers in VectorLoader (was: [Java] Row count should be set before loading buffers In VectorLoader) > [Java] Row count should be set before loading buffers in VectorLoader > --------------------------------------------------------------------- > > Key: ARROW-8803 > URL: https://issues.apache.org/jira/browse/ARROW-8803 > Project: Apache Arrow > Issue Type: Bug > Components: Java > Reporter: Rong Ma > Priority: Major > Fix For: 1.0.0 > > > Hi guys! I'm new to the community, and I've been using Arrow for some time. > In my use case, I need to read RecordBatch with *compressed* underlying > buffers using Java's IPC API, and I'm finally blocked by the VectorLoader's > "load" method. In this method, > {quote}{{root.setRowCount(recordBatch.getLength());}} > {quote} > It not only set the rowCount for the root, but also set the valueCount for > the vectors the root holds, *which have already been set once when load > buffers.* > It's not a bug... I know. But if I try to load some compressed buffers, I > will get the following exceptions: > {quote}java.lang.IndexOutOfBoundsException: index: 0, length: 512 (expected: > range(0, 504)) > at io.netty.buffer.ArrowBuf.checkIndex(ArrowBuf.java:718) > at io.netty.buffer.ArrowBuf.setBytes(ArrowBuf.java:965) > at > org.apache.arrow.vector.BaseFixedWidthVector.reAlloc(BaseFixedWidthVector.java:439) > at > org.apache.arrow.vector.BaseFixedWidthVector.setValueCount(BaseFixedWidthVector.java:708) > at > org.apache.arrow.vector.VectorSchemaRoot.setRowCount(VectorSchemaRoot.java:226) > at org.apache.arrow.vector.VectorLoader.load(VectorLoader.java:61) > at > org.apache.arrow.vector.ipc.ArrowReader.loadRecordBatch(ArrowReader.java:205) > at > org.apache.arrow.vector.ipc.ArrowStreamReader.loadNextBatch(ArrowStreamReader.java:122) > {quote} > And I start to think that if it would be more make sense to call > root.setRowCount before loadbuffers? > In root.setRowCount it also calls each vector's setValueCount, which I think > is unnecessary here since the vectors after calling loadbuffers are already > formed. > Another existing piece of code upstream is similar to this change. > [link|https://github.com/apache/arrow/blob/ed1f771dccdde623ce85e212eccb2b573185c461/java/vector/src/main/java/org/apache/arrow/vector/ipc/JsonFileReader.java#L170-L178] -- This message was sent by Atlassian Jira (v8.3.4#803005)