[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220392#comment-16220392 ] Ethan Levine commented on ARROW-1710: - [~jnadeau]: I'd be interested to learn more about that consolidation. It sounds like the validity bits will be stored inline with the data? It seems like eliding that could be difficult. Our use case involves data that's mostly not nullable, and we take care to ensure that it's declared that way. If the costs of writing only non-null values to a nullable array (in terms of memory and computation) become insignificant, then it makes sense to only include nullable arrays. But if that's not the case then I think it makes sense to keep both. > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217560#comment-16217560 ] Ethan Levine commented on ARROW-1710: - The BitVector is an extra object that has to be allocated (both in terms of the backing data and in terms of the Java objects involved). You'd also need to perform bit masking of the underlying data with every write, which could involve a cache miss if the data for the BitVector isn't neatly colocated with the actual data for the nullable vector. Perhaps a tracking flag could be added to the nullable vectors, though. It would start out "false", and get set to "true" if you ever write a null value. That way you could avoid the extra allocation and computation involved with tracking the validity of each value in the case where there are no null values. This seems like it would be more complicated than just keeping non-nullable vectors around, however. > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)