[ 
https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16219836#comment-16219836
 ] 

Jacques Nadeau commented on ARROW-1710:
---------------------------------------

I'm one of the voices strongly arguing for dropping the additional class 
objects. (I also was the one who originally introduced the two separate sets 
when the code was first developed.) My experience has been the following:

* Extra complexity of managing two different runtime classes is very expensive 
(maintenance, coercing between, managing runtime code generation, etc)
* Most source data is actually declared as nullable but rarely has nulls

As such, having an adaptive interaction where you look at cells 64 values at a 
time and adapt your behavior based on actual nullability (as opposed to 
declared nullability) provides a much better performance lift in real world use 
cases than having specialized code for declared non-nullable situations.

FYI: [~e.levine], the updated approach with vectors is moving to a situation 
where we don't have a bit vector and ultimately also consolidates the buffer 
for the bits and the fixed bytes in the same buffer. In that case, there is no 
heap memory overhead and the direct memory overhead is 1 bit per value, far 
less than necessary.

Also note that in reality, most people focused on super high performance Java 
implementations interact directly with the memory. You can see an example of 
how we do this here: 
https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/com/dremio/sabot/op/common/ht2/Pivots.java#L89

If, in the future, if people need the vector classes to have an additional set 
of methods such as: 
allocateNewNoNull()
setSafeIgnoreNull(int index, int value) 

let's just add those when someone's usecase requires it. No need to have an 
extra set of vectors for that purpose.


> [Java] Decide what to do with non-nullable vectors in new vector class 
> hierarchy 
> ---------------------------------------------------------------------------------
>
>                 Key: ARROW-1710
>                 URL: https://issues.apache.org/jira/browse/ARROW-1710
>             Project: Apache Arrow
>          Issue Type: Sub-task
>          Components: Java - Vectors
>            Reporter: Li Jin
>             Fix For: 0.8.0
>
>
> So far the consensus seems to be remove all non-nullable vectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to