Hi All,

I wanted to update everyone on state of this mini-project:


   - Requirements document and initial design proposal were sent out to the
   community for review and we have received some good feedback. All required
   docs are attached with corresponding JIRAs.


   - The initial prototype is in a reasonable state (code-complete). You
   can see the PR here - https://github.com/apache/arrow/pull/1164


   - The prototype has code changes for the new hierarchy, abstract
   interfaces for fixed width and variable width vectors and concrete
   implementation of NullableIntVector and NullableVarCharVector.


Plan for testing and integrating into existing infrastructure:


   - My initial thoughts are that this particular patch will require a lot
   of testing, reviews etc since the foundation of rest of the implementation
   more or less depends on how the APIs are flushed out here.


   - So the goal is to get this properly tested and merged into master
   first.


   - The idea is to slowly deprecate and remove the existing vectors in
   stages. In this patch itself, we change the existing
   NullableValueVectors.java template to generate LegacyNullableIntVector and
   LegacyNullableVarCharVector. Each operation on these vectors will delegate
   to the corresponding NullableIntVector and NullableVarCharVector that are
   newly implemented.


   - This achieves two goals w.r.t testing:


   - Firstly, our existing JAVA unit tests will automatically exercise the
      newly written code and its APIs (API names have not changed) for
      NullableInt and NullableVarChar vectors.


   - Secondly, let's say we rebase Dremio on top of Arrow master and
      replace all references to NullableIntVector and
NullableVarCharVector with
      their Legacy counterparts, things should still work.


   - After this patch gets merged, we can do the following work in multiple
   patches:
      - Write concrete implementations for rest of the nullable types --
      FLOAT4, FLOAT8, BIGINT, VARBINARY etc


   - Write additional tests (definitely needed but the first goal is to
      make sure existing tests are not broken).


   - Ensure NullableValueVectors template generates Legacy vectors and each
      operation is merely a delegation to the API in new implementation.


   - In the next Arrow release, remove all Legacy vectors and
      NullableValueVectors template since we will have the implementation for
      each type that passes existing tests.


   - I am currently inspecting the newly written code and making changes to
      the template to generate Legacy vector types for Nullable Int
and Nullable
      VarChar and delegating the operations. The changes should be available in
      the PR in a couple of hours.


I am wondering if there are any other ideas around testing, merging etc.
Please feel free to reply here or comment on the PR.

I would appreciate if people can take time to review the code in PR --
especially the abstract classes BaseNullableFixedWidth and
BaseNullableVariableWidth. Writing concrete implementations for other types
will be much less hassle if these abstract classes have proper code.

Thanks,
Siddharth

Reply via email to