Hi Siddharth,

Thanks for the update. This looks good.

A few thoughts:

*Compatibility:*
It sounds like we will introduce some back-compatibility with the new
Vector class. At this point I think our main Java users should be Spark and
Dremio, is this right?


   - For Spark:

It seems fine since Spark uses just the basic functionality of Vector
classes and the existing code should work with the new Vector classes,
maybe even without any code change on the Spark side.


   - For Dremio:

Sounds like you are already taking care of this by introducing the
LegacyVector classes.


*Testing:*

   - Spark Integration Tests:

Bryan and I can help with integration test with Spark. I think the target
timeline for Spark 2.3 release is some time in mid Nov (Bryan please
correct me if I am wrong).

I will take a look at the PR today.






On Tue, Oct 10, 2017 at 4:29 PM, Siddharth Teotia <siddha...@dremio.com>
wrote:

> Hi All,
>
> I wanted to update everyone on state of this mini-project:
>
>
>    - Requirements document and initial design proposal were sent out to the
>    community for review and we have received some good feedback. All
> required
>    docs are attached with corresponding JIRAs.
>
>
>    - The initial prototype is in a reasonable state (code-complete). You
>    can see the PR here - https://github.com/apache/arrow/pull/1164
>
>
>    - The prototype has code changes for the new hierarchy, abstract
>    interfaces for fixed width and variable width vectors and concrete
>    implementation of NullableIntVector and NullableVarCharVector.
>
>
> Plan for testing and integrating into existing infrastructure:
>
>
>    - My initial thoughts are that this particular patch will require a lot
>    of testing, reviews etc since the foundation of rest of the
> implementation
>    more or less depends on how the APIs are flushed out here.
>
>
>    - So the goal is to get this properly tested and merged into master
>    first.
>
>
>    - The idea is to slowly deprecate and remove the existing vectors in
>    stages. In this patch itself, we change the existing
>    NullableValueVectors.java template to generate LegacyNullableIntVector
> and
>    LegacyNullableVarCharVector. Each operation on these vectors will
> delegate
>    to the corresponding NullableIntVector and NullableVarCharVector that
> are
>    newly implemented.
>
>
>    - This achieves two goals w.r.t testing:
>
>
>    - Firstly, our existing JAVA unit tests will automatically exercise the
>       newly written code and its APIs (API names have not changed) for
>       NullableInt and NullableVarChar vectors.
>
>
>    - Secondly, let's say we rebase Dremio on top of Arrow master and
>       replace all references to NullableIntVector and
> NullableVarCharVector with
>       their Legacy counterparts, things should still work.
>
>
>    - After this patch gets merged, we can do the following work in multiple
>    patches:
>       - Write concrete implementations for rest of the nullable types --
>       FLOAT4, FLOAT8, BIGINT, VARBINARY etc
>
>
>    - Write additional tests (definitely needed but the first goal is to
>       make sure existing tests are not broken).
>
>
>    - Ensure NullableValueVectors template generates Legacy vectors and each
>       operation is merely a delegation to the API in new implementation.
>
>
>    - In the next Arrow release, remove all Legacy vectors and
>       NullableValueVectors template since we will have the implementation
> for
>       each type that passes existing tests.
>
>
>    - I am currently inspecting the newly written code and making changes to
>       the template to generate Legacy vector types for Nullable Int
> and Nullable
>       VarChar and delegating the operations. The changes should be
> available in
>       the PR in a couple of hours.
>
>
> I am wondering if there are any other ideas around testing, merging etc.
> Please feel free to reply here or comment on the PR.
>
> I would appreciate if people can take time to review the code in PR --
> especially the abstract classes BaseNullableFixedWidth and
> BaseNullableVariableWidth. Writing concrete implementations for other types
> will be much less hassle if these abstract classes have proper code.
>
> Thanks,
> Siddharth
>

Reply via email to