Yes, that is the intention. Good that we all are on the same page. I will
move the PR https://github.com/apache/arrow/pull/1164 to new branch.

On Thu, Oct 12, 2017 at 11:20 AM, Li Jin <[email protected]> wrote:

> To make clear, I think it's fine to have Legacy Vectors in 0.8 as a
> deprecated API.
>
> On Thu, Oct 12, 2017 at 2:19 PM, Li Jin <[email protected]> wrote:
>
> > Siddharth,
> >
> > For working off a branch, Wes has created https://github.com/apache/
> > arrow/tree/java-vector-refactor that we can submit PR to.
> >
> > For Legacy vectors, I think it's fine because it's really just a
> migration
> > path to help Dremio to migrate to the new vectors. I don't think other
> > users, i.e., Spark will use the Legacy vector class. Bryan and I will
> just
> > migrate Spark to new vectors directly because Spark's use of Arrow is
> very
> > simple.
> >
> >
> >
> > On Thu, Oct 12, 2017 at 2:08 PM, Siddharth Teotia <[email protected]>
> > wrote:
> >
> >> Thanks Bryan and Li.
> >>
> >> Yes, the goal is to get this (and the subsequent patches) merged to the
> >> new
> >> branch. Once it is stabilized from different aspects, we can move to
> >> master. I am not sure of the exact mechanics when we work off a
> different
> >> project branch and not master.
> >>
> >> Does that sound good?
> >>
> >> Regarding compatibility, are we suggesting that let's not create Legacy
> >> Nullable vectors at all? The initial thoughts were to generate Legacy
> >> vectors from NullableValueVectors template and these vectors are
> >> mutator/accessor based (in today's world). Internally each operation
> will
> >> be delegated to new vectors (non code generated).
> >>
> >> On Thu, Oct 12, 2017 at 10:58 AM, Bryan Cutler <[email protected]>
> wrote:
> >>
> >> > Thanks for the update Siddharth.  From the Spark side of this, I
> >> definitely
> >> > want to try to upgrade to the latest Arrow before the Spark 2.3
> release
> >> but
> >> > if it the refactor is too disruptive then others might get squeamish
> >> about
> >> > upgrading.  On the other hand, I don't think we should hold back on
> >> > refactoring for compatibility sake and the way it's looking now trying
> >> to
> >> > be backwards-compatible will be too much of a pain.  I will try to
> >> figure
> >> > out the timeline for Spark 2.3 and what the feeling is for upgrading
> >> > Arrow.  Can we hold off on merging this to master for now and just
> work
> >> out
> >> > of the separate branch until we can get a better feeling for the
> impact?
> >> >
> >> > Bryan
> >> >
> >> > On Wed, Oct 11, 2017 at 8:19 AM, Li Jin <[email protected]>
> wrote:
> >> >
> >> > > Hi Siddharth,
> >> > >
> >> > > Thanks for the update. This looks good.
> >> > >
> >> > > A few thoughts:
> >> > >
> >> > > *Compatibility:*
> >> > > It sounds like we will introduce some back-compatibility with the
> new
> >> > > Vector class. At this point I think our main Java users should be
> >> Spark
> >> > and
> >> > > Dremio, is this right?
> >> > >
> >> > >
> >> > >    - For Spark:
> >> > >
> >> > > It seems fine since Spark uses just the basic functionality of
> Vector
> >> > > classes and the existing code should work with the new Vector
> classes,
> >> > > maybe even without any code change on the Spark side.
> >> > >
> >> > >
> >> > >    - For Dremio:
> >> > >
> >> > > Sounds like you are already taking care of this by introducing the
> >> > > LegacyVector classes.
> >> > >
> >> > >
> >> > > *Testing:*
> >> > >
> >> > >    - Spark Integration Tests:
> >> > >
> >> > > Bryan and I can help with integration test with Spark. I think the
> >> target
> >> > > timeline for Spark 2.3 release is some time in mid Nov (Bryan please
> >> > > correct me if I am wrong).
> >> > >
> >> > > I will take a look at the PR today.
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > On Tue, Oct 10, 2017 at 4:29 PM, Siddharth Teotia <
> >> [email protected]>
> >> > > wrote:
> >> > >
> >> > > > Hi All,
> >> > > >
> >> > > > I wanted to update everyone on state of this mini-project:
> >> > > >
> >> > > >
> >> > > >    - Requirements document and initial design proposal were sent
> >> out to
> >> > > the
> >> > > >    community for review and we have received some good feedback.
> All
> >> > > > required
> >> > > >    docs are attached with corresponding JIRAs.
> >> > > >
> >> > > >
> >> > > >    - The initial prototype is in a reasonable state
> (code-complete).
> >> > You
> >> > > >    can see the PR here - https://github.com/apache/
> arrow/pull/1164
> >> > > >
> >> > > >
> >> > > >    - The prototype has code changes for the new hierarchy,
> abstract
> >> > > >    interfaces for fixed width and variable width vectors and
> >> concrete
> >> > > >    implementation of NullableIntVector and NullableVarCharVector.
> >> > > >
> >> > > >
> >> > > > Plan for testing and integrating into existing infrastructure:
> >> > > >
> >> > > >
> >> > > >    - My initial thoughts are that this particular patch will
> >> require a
> >> > > lot
> >> > > >    of testing, reviews etc since the foundation of rest of the
> >> > > > implementation
> >> > > >    more or less depends on how the APIs are flushed out here.
> >> > > >
> >> > > >
> >> > > >    - So the goal is to get this properly tested and merged into
> >> master
> >> > > >    first.
> >> > > >
> >> > > >
> >> > > >    - The idea is to slowly deprecate and remove the existing
> >> vectors in
> >> > > >    stages. In this patch itself, we change the existing
> >> > > >    NullableValueVectors.java template to generate
> >> > LegacyNullableIntVector
> >> > > > and
> >> > > >    LegacyNullableVarCharVector. Each operation on these vectors
> will
> >> > > > delegate
> >> > > >    to the corresponding NullableIntVector and
> NullableVarCharVector
> >> > that
> >> > > > are
> >> > > >    newly implemented.
> >> > > >
> >> > > >
> >> > > >    - This achieves two goals w.r.t testing:
> >> > > >
> >> > > >
> >> > > >    - Firstly, our existing JAVA unit tests will automatically
> >> exercise
> >> > > the
> >> > > >       newly written code and its APIs (API names have not changed)
> >> for
> >> > > >       NullableInt and NullableVarChar vectors.
> >> > > >
> >> > > >
> >> > > >    - Secondly, let's say we rebase Dremio on top of Arrow master
> and
> >> > > >       replace all references to NullableIntVector and
> >> > > > NullableVarCharVector with
> >> > > >       their Legacy counterparts, things should still work.
> >> > > >
> >> > > >
> >> > > >    - After this patch gets merged, we can do the following work in
> >> > > multiple
> >> > > >    patches:
> >> > > >       - Write concrete implementations for rest of the nullable
> >> types
> >> > --
> >> > > >       FLOAT4, FLOAT8, BIGINT, VARBINARY etc
> >> > > >
> >> > > >
> >> > > >    - Write additional tests (definitely needed but the first goal
> >> is to
> >> > > >       make sure existing tests are not broken).
> >> > > >
> >> > > >
> >> > > >    - Ensure NullableValueVectors template generates Legacy vectors
> >> and
> >> > > each
> >> > > >       operation is merely a delegation to the API in new
> >> > implementation.
> >> > > >
> >> > > >
> >> > > >    - In the next Arrow release, remove all Legacy vectors and
> >> > > >       NullableValueVectors template since we will have the
> >> > implementation
> >> > > > for
> >> > > >       each type that passes existing tests.
> >> > > >
> >> > > >
> >> > > >    - I am currently inspecting the newly written code and making
> >> > changes
> >> > > to
> >> > > >       the template to generate Legacy vector types for Nullable
> Int
> >> > > > and Nullable
> >> > > >       VarChar and delegating the operations. The changes should be
> >> > > > available in
> >> > > >       the PR in a couple of hours.
> >> > > >
> >> > > >
> >> > > > I am wondering if there are any other ideas around testing,
> merging
> >> > etc.
> >> > > > Please feel free to reply here or comment on the PR.
> >> > > >
> >> > > > I would appreciate if people can take time to review the code in
> PR
> >> --
> >> > > > especially the abstract classes BaseNullableFixedWidth and
> >> > > > BaseNullableVariableWidth. Writing concrete implementations for
> >> other
> >> > > types
> >> > > > will be much less hassle if these abstract classes have proper
> code.
> >> > > >
> >> > > > Thanks,
> >> > > > Siddharth
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Reply via email to