To make clear, I think it's fine to have Legacy Vectors in 0.8 as a
deprecated API.

On Thu, Oct 12, 2017 at 2:19 PM, Li Jin <ice.xell...@gmail.com> wrote:

> Siddharth,
>
> For working off a branch, Wes has created https://github.com/apache/
> arrow/tree/java-vector-refactor that we can submit PR to.
>
> For Legacy vectors, I think it's fine because it's really just a migration
> path to help Dremio to migrate to the new vectors. I don't think other
> users, i.e., Spark will use the Legacy vector class. Bryan and I will just
> migrate Spark to new vectors directly because Spark's use of Arrow is very
> simple.
>
>
>
> On Thu, Oct 12, 2017 at 2:08 PM, Siddharth Teotia <siddha...@dremio.com>
> wrote:
>
>> Thanks Bryan and Li.
>>
>> Yes, the goal is to get this (and the subsequent patches) merged to the
>> new
>> branch. Once it is stabilized from different aspects, we can move to
>> master. I am not sure of the exact mechanics when we work off a different
>> project branch and not master.
>>
>> Does that sound good?
>>
>> Regarding compatibility, are we suggesting that let's not create Legacy
>> Nullable vectors at all? The initial thoughts were to generate Legacy
>> vectors from NullableValueVectors template and these vectors are
>> mutator/accessor based (in today's world). Internally each operation will
>> be delegated to new vectors (non code generated).
>>
>> On Thu, Oct 12, 2017 at 10:58 AM, Bryan Cutler <cutl...@gmail.com> wrote:
>>
>> > Thanks for the update Siddharth.  From the Spark side of this, I
>> definitely
>> > want to try to upgrade to the latest Arrow before the Spark 2.3 release
>> but
>> > if it the refactor is too disruptive then others might get squeamish
>> about
>> > upgrading.  On the other hand, I don't think we should hold back on
>> > refactoring for compatibility sake and the way it's looking now trying
>> to
>> > be backwards-compatible will be too much of a pain.  I will try to
>> figure
>> > out the timeline for Spark 2.3 and what the feeling is for upgrading
>> > Arrow.  Can we hold off on merging this to master for now and just work
>> out
>> > of the separate branch until we can get a better feeling for the impact?
>> >
>> > Bryan
>> >
>> > On Wed, Oct 11, 2017 at 8:19 AM, Li Jin <ice.xell...@gmail.com> wrote:
>> >
>> > > Hi Siddharth,
>> > >
>> > > Thanks for the update. This looks good.
>> > >
>> > > A few thoughts:
>> > >
>> > > *Compatibility:*
>> > > It sounds like we will introduce some back-compatibility with the new
>> > > Vector class. At this point I think our main Java users should be
>> Spark
>> > and
>> > > Dremio, is this right?
>> > >
>> > >
>> > >    - For Spark:
>> > >
>> > > It seems fine since Spark uses just the basic functionality of Vector
>> > > classes and the existing code should work with the new Vector classes,
>> > > maybe even without any code change on the Spark side.
>> > >
>> > >
>> > >    - For Dremio:
>> > >
>> > > Sounds like you are already taking care of this by introducing the
>> > > LegacyVector classes.
>> > >
>> > >
>> > > *Testing:*
>> > >
>> > >    - Spark Integration Tests:
>> > >
>> > > Bryan and I can help with integration test with Spark. I think the
>> target
>> > > timeline for Spark 2.3 release is some time in mid Nov (Bryan please
>> > > correct me if I am wrong).
>> > >
>> > > I will take a look at the PR today.
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Tue, Oct 10, 2017 at 4:29 PM, Siddharth Teotia <
>> siddha...@dremio.com>
>> > > wrote:
>> > >
>> > > > Hi All,
>> > > >
>> > > > I wanted to update everyone on state of this mini-project:
>> > > >
>> > > >
>> > > >    - Requirements document and initial design proposal were sent
>> out to
>> > > the
>> > > >    community for review and we have received some good feedback. All
>> > > > required
>> > > >    docs are attached with corresponding JIRAs.
>> > > >
>> > > >
>> > > >    - The initial prototype is in a reasonable state (code-complete).
>> > You
>> > > >    can see the PR here - https://github.com/apache/arrow/pull/1164
>> > > >
>> > > >
>> > > >    - The prototype has code changes for the new hierarchy, abstract
>> > > >    interfaces for fixed width and variable width vectors and
>> concrete
>> > > >    implementation of NullableIntVector and NullableVarCharVector.
>> > > >
>> > > >
>> > > > Plan for testing and integrating into existing infrastructure:
>> > > >
>> > > >
>> > > >    - My initial thoughts are that this particular patch will
>> require a
>> > > lot
>> > > >    of testing, reviews etc since the foundation of rest of the
>> > > > implementation
>> > > >    more or less depends on how the APIs are flushed out here.
>> > > >
>> > > >
>> > > >    - So the goal is to get this properly tested and merged into
>> master
>> > > >    first.
>> > > >
>> > > >
>> > > >    - The idea is to slowly deprecate and remove the existing
>> vectors in
>> > > >    stages. In this patch itself, we change the existing
>> > > >    NullableValueVectors.java template to generate
>> > LegacyNullableIntVector
>> > > > and
>> > > >    LegacyNullableVarCharVector. Each operation on these vectors will
>> > > > delegate
>> > > >    to the corresponding NullableIntVector and NullableVarCharVector
>> > that
>> > > > are
>> > > >    newly implemented.
>> > > >
>> > > >
>> > > >    - This achieves two goals w.r.t testing:
>> > > >
>> > > >
>> > > >    - Firstly, our existing JAVA unit tests will automatically
>> exercise
>> > > the
>> > > >       newly written code and its APIs (API names have not changed)
>> for
>> > > >       NullableInt and NullableVarChar vectors.
>> > > >
>> > > >
>> > > >    - Secondly, let's say we rebase Dremio on top of Arrow master and
>> > > >       replace all references to NullableIntVector and
>> > > > NullableVarCharVector with
>> > > >       their Legacy counterparts, things should still work.
>> > > >
>> > > >
>> > > >    - After this patch gets merged, we can do the following work in
>> > > multiple
>> > > >    patches:
>> > > >       - Write concrete implementations for rest of the nullable
>> types
>> > --
>> > > >       FLOAT4, FLOAT8, BIGINT, VARBINARY etc
>> > > >
>> > > >
>> > > >    - Write additional tests (definitely needed but the first goal
>> is to
>> > > >       make sure existing tests are not broken).
>> > > >
>> > > >
>> > > >    - Ensure NullableValueVectors template generates Legacy vectors
>> and
>> > > each
>> > > >       operation is merely a delegation to the API in new
>> > implementation.
>> > > >
>> > > >
>> > > >    - In the next Arrow release, remove all Legacy vectors and
>> > > >       NullableValueVectors template since we will have the
>> > implementation
>> > > > for
>> > > >       each type that passes existing tests.
>> > > >
>> > > >
>> > > >    - I am currently inspecting the newly written code and making
>> > changes
>> > > to
>> > > >       the template to generate Legacy vector types for Nullable Int
>> > > > and Nullable
>> > > >       VarChar and delegating the operations. The changes should be
>> > > > available in
>> > > >       the PR in a couple of hours.
>> > > >
>> > > >
>> > > > I am wondering if there are any other ideas around testing, merging
>> > etc.
>> > > > Please feel free to reply here or comment on the PR.
>> > > >
>> > > > I would appreciate if people can take time to review the code in PR
>> --
>> > > > especially the abstract classes BaseNullableFixedWidth and
>> > > > BaseNullableVariableWidth. Writing concrete implementations for
>> other
>> > > types
>> > > > will be much less hassle if these abstract classes have proper code.
>> > > >
>> > > > Thanks,
>> > > > Siddharth
>> > > >
>> > >
>> >
>>
>
>

Reply via email to