[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy

2017-10-26 Thread Ethan Levine (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220392#comment-16220392
 ] 

Ethan Levine commented on ARROW-1710:
-

[~jnadeau]: I'd be interested to learn more about that consolidation. It sounds 
like the validity bits will be stored inline with the data? It seems like 
eliding that could be difficult.

Our use case involves data that's mostly not nullable, and we take care to 
ensure that it's declared that way. If the costs of writing only non-null 
values to a nullable array (in terms of memory and computation) become 
insignificant, then it makes sense to only include nullable arrays. But if 
that's not the case then I think it makes sense to keep both.

> [Java] Decide what to do with non-nullable vectors in new vector class 
> hierarchy 
> -
>
> Key: ARROW-1710
> URL: https://issues.apache.org/jira/browse/ARROW-1710
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java - Vectors
>Reporter: Li Jin
> Fix For: 0.8.0
>
>
> So far the consensus seems to be remove all non-nullable vectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy

2017-10-24 Thread Ethan Levine (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217560#comment-16217560
 ] 

Ethan Levine commented on ARROW-1710:
-

The BitVector is an extra object that has to be allocated (both in terms of the 
backing data and in terms of the Java objects involved). You'd also need to 
perform bit masking of the underlying data with every write, which could 
involve a cache miss if the data for the BitVector isn't neatly colocated with 
the actual data for the nullable vector.

Perhaps a tracking flag could be added to the nullable vectors, though. It 
would start out "false", and get set to "true" if you ever write a null value. 
That way you could avoid the extra allocation and computation involved with 
tracking the validity of each value in the case where there are no null values. 
This seems like it would be more complicated than just keeping non-nullable 
vectors around, however.

> [Java] Decide what to do with non-nullable vectors in new vector class 
> hierarchy 
> -
>
> Key: ARROW-1710
> URL: https://issues.apache.org/jira/browse/ARROW-1710
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java - Vectors
>Reporter: Li Jin
> Fix For: 0.8.0
>
>
> So far the consensus seems to be remove all non-nullable vectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)