[ https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214895#comment-16214895 ]
ASF GitHub Bot commented on ARROW-1474: --------------------------------------- siddharthteotia commented on issue #1203: ARROW-1474:[WIP] Java Vector Refactor (Implementation Phase 2) URL: https://github.com/apache/arrow/pull/1203#issuecomment-338598989 **W.r.t introduction of some static interfaces for JsonFileWriter/Reader** The introduction of couple of static interfaces is not absolutely necessary. They are written for better readability in JsonFileReader's gigantic switch block when it parses Json and writes to the vector (and its inner vectors). Since now we no longer have inner vectors, we obviously couldn't leverage the same code. The JsonFileReader had to be changed to specifically write to different buffers (TYPE, VALIDITY, OFFSET, DATA) for a particular vector. Also it has to allocate the buffer and appropriately set writer index before calling loadFieldBuffers. This is something that was needed for every case in switch block here. Once I did this, the code looked pretty messy and ugly. So I moved all the logic private to vectors and made them as part of static interfaces. On the other hand, in JsonFileWriter we were reading from vector (and its inner vectors) and writing out Json data. Again, since there are no inner vectors, all operation had to be transformed to work at the buffer level -- for writing the contents of each inner buffer. Also, the old code of JsonFileWriter stated a TODO that it was not handling each type. The new code handles all types. If the general preference is to not introduce static interfaces in vector APIs, I am fine with removing them and moving all logic into JSon code itself. The javadocs already indicate that external use of these APIs is not recommended. **W.r.t introduction of some new APIs in ValueVector** Note that top level interface is still ValueVector even though hierarchy underneath has changed. So there are still non-nullable vectors extending ValueVector, implementing mutator/accessor interfaces etc. So I introduced APIs like getNullCount(), getValueCount(), setValueCount(), getObject() for the new nullable vectors. Once we remove non-nullable vectors and remove mutator/accessor from ValueVector, we can get rid of these APIs too. User is free to call such methods on vectors since internally they delegate the call to corresponding mutator/accessor operation for non-nullable vectors and for nullable vectors we already have the new implementation. For legacy vectors, it doesn't really matter since each operation is just a pass-through to new code. Each vector (nullable or non-nullable) has a concrete implementation of all such interfaces prescribed by ValueVector. Correctness is not affected anywhere. We should be able to do the simple cleanup once we remove non-nullable vectors. If we are not planning to remove non-nullable vectors then we should just remove mutator/accessor interfaces from them and expose all the get/set APIs directly just like we have done for other nullable and complex vectors. That will also allow us to do simple cleanup. Whatever we decide to do with non-nullable scalar vectors, we should do soon to make the entire java Vector code under ValueVector hierarchy consistent. Right now the nullable scalars and complex types are consistent -- none of them have inner vectors and none of them support mutator/accessor based access. Either we should do the same thing with non nullable vectors or remove them all together. The latter is preferable. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JAVA] ValueVector hierarchy (Implementation Phase 2) > ----------------------------------------------------- > > Key: ARROW-1474 > URL: https://issues.apache.org/jira/browse/ARROW-1474 > Project: Apache Arrow > Issue Type: Sub-task > Reporter: Jacques Nadeau > Assignee: Siddharth Teotia > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v6.4.14#64029)