Hi All, I wanted to update everyone on state of this mini-project:
- Requirements document and initial design proposal were sent out to the community for review and we have received some good feedback. All required docs are attached with corresponding JIRAs. - The initial prototype is in a reasonable state (code-complete). You can see the PR here - https://github.com/apache/arrow/pull/1164 - The prototype has code changes for the new hierarchy, abstract interfaces for fixed width and variable width vectors and concrete implementation of NullableIntVector and NullableVarCharVector. Plan for testing and integrating into existing infrastructure: - My initial thoughts are that this particular patch will require a lot of testing, reviews etc since the foundation of rest of the implementation more or less depends on how the APIs are flushed out here. - So the goal is to get this properly tested and merged into master first. - The idea is to slowly deprecate and remove the existing vectors in stages. In this patch itself, we change the existing NullableValueVectors.java template to generate LegacyNullableIntVector and LegacyNullableVarCharVector. Each operation on these vectors will delegate to the corresponding NullableIntVector and NullableVarCharVector that are newly implemented. - This achieves two goals w.r.t testing: - Firstly, our existing JAVA unit tests will automatically exercise the newly written code and its APIs (API names have not changed) for NullableInt and NullableVarChar vectors. - Secondly, let's say we rebase Dremio on top of Arrow master and replace all references to NullableIntVector and NullableVarCharVector with their Legacy counterparts, things should still work. - After this patch gets merged, we can do the following work in multiple patches: - Write concrete implementations for rest of the nullable types -- FLOAT4, FLOAT8, BIGINT, VARBINARY etc - Write additional tests (definitely needed but the first goal is to make sure existing tests are not broken). - Ensure NullableValueVectors template generates Legacy vectors and each operation is merely a delegation to the API in new implementation. - In the next Arrow release, remove all Legacy vectors and NullableValueVectors template since we will have the implementation for each type that passes existing tests. - I am currently inspecting the newly written code and making changes to the template to generate Legacy vector types for Nullable Int and Nullable VarChar and delegating the operations. The changes should be available in the PR in a couple of hours. I am wondering if there are any other ideas around testing, merging etc. Please feel free to reply here or comment on the PR. I would appreciate if people can take time to review the code in PR -- especially the abstract classes BaseNullableFixedWidth and BaseNullableVariableWidth. Writing concrete implementations for other types will be much less hassle if these abstract classes have proper code. Thanks, Siddharth