[ 
https://issues.apache.org/jira/browse/ARROW-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952475#comment-16952475
 ] 

Jacques Nadeau commented on ARROW-6896:
---------------------------------------

I disagree with the issue here. We should probably add a better description of 
reference count semantics but having the container close it's children makes 
sense. We depend on this functionality quite a bit.

Generally speaking, Vectors are things that shouldn't be handed around, they 
should be transferred, which has clear reference management semantics. The 
design was based on the [AttributeSource design 
pattern|[https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/util/AttributeSource.html]]
 in Lucene where you create an object once and then pass many separate pieces 
of data through it to minimize heap churn and pointer/reference management. I 
think if you're hitting the problem you describe, you're misunderstanding the 
goals of the codebase.

> [Java] Vector schema root should not share vectors
> --------------------------------------------------
>
>                 Key: ARROW-6896
>                 URL: https://issues.apache.org/jira/browse/ARROW-6896
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Java
>            Reporter: Liya Fan
>            Assignee: Liya Fan
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Vector schema root should not share vectors. Otherwise, unexpectd behavior 
> would happen. 
> Please note that VectorSchemaRoot is not just a container for vectors, it is 
> also a resource (it implements the AutoClosable interface), and it manages 
> the life cycle of its inner vectors.
> When two VectorSchemaRoots share vectors, something unexpected may happen. 
> Consider the following scenario, which is frequently encountered in a SQL 
> engine.
> 1. We create a batch:
> VectorSchemaRoot oldBatch = ...
> 2. We add a vector to it, which results in a new batch
> VectorSchemaRoot newBatch = oldBatch.addVector(vector);
> 3. We are done with the old batch, and release the resource
> oldBatch.close();
> 4. We continue to use the new batch, but gets an exception, because some 
> inner vectors have been released by the old batch. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to