[ 
https://issues.apache.org/jira/browse/ARROW-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956607#comment-16956607
 ] 

Ji Liu commented on ARROW-6896:
-------------------------------

I was a little confused about VectorSchemaRoot before (why not just call it 
RecordBatch), and recently I found it was a little different with other 
implementations when I writing documentation, for example, for IPC, the java 
reader will always hold the same vector schema root and updates for every call 
for loadNextBatch, but in python side, it uses different batches in 
writer/reader. From this perspective, I think VectorSchemaRoot != a record 
batch is reasonable. Just wonder why the implementation is different in Java?

The addColumn/removeColumn API was introduced by my recent PR(after 0.15) which 
I regard is as a ‘record batch'. If we finally reach consistent and want to 
make some fix on it, it's better include it into 0.15.1 I think, so the mistake 
wouldn't exposed to users.

> [Java] Vector schema root should not share vectors
> --------------------------------------------------
>
>                 Key: ARROW-6896
>                 URL: https://issues.apache.org/jira/browse/ARROW-6896
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Java
>            Reporter: Liya Fan
>            Assignee: Liya Fan
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Vector schema root should not share vectors. Otherwise, unexpectd behavior 
> would happen. 
> Please note that VectorSchemaRoot is not just a container for vectors, it is 
> also a resource (it implements the AutoClosable interface), and it manages 
> the life cycle of its inner vectors.
> When two VectorSchemaRoots share vectors, something unexpected may happen. 
> Consider the following scenario, which is frequently encountered in a SQL 
> engine.
> 1. We create a batch:
> VectorSchemaRoot oldBatch = ...
> 2. We add a vector to it, which results in a new batch
> VectorSchemaRoot newBatch = oldBatch.addVector(vector);
> 3. We are done with the old batch, and release the resource
> oldBatch.close();
> 4. We continue to use the new batch, but gets an exception, because some 
> inner vectors have been released by the old batch. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to