Yes, that is the intention. Good that we all are on the same page. I will move the PR https://github.com/apache/arrow/pull/1164 to new branch.
On Thu, Oct 12, 2017 at 11:20 AM, Li Jin <[email protected]> wrote: > To make clear, I think it's fine to have Legacy Vectors in 0.8 as a > deprecated API. > > On Thu, Oct 12, 2017 at 2:19 PM, Li Jin <[email protected]> wrote: > > > Siddharth, > > > > For working off a branch, Wes has created https://github.com/apache/ > > arrow/tree/java-vector-refactor that we can submit PR to. > > > > For Legacy vectors, I think it's fine because it's really just a > migration > > path to help Dremio to migrate to the new vectors. I don't think other > > users, i.e., Spark will use the Legacy vector class. Bryan and I will > just > > migrate Spark to new vectors directly because Spark's use of Arrow is > very > > simple. > > > > > > > > On Thu, Oct 12, 2017 at 2:08 PM, Siddharth Teotia <[email protected]> > > wrote: > > > >> Thanks Bryan and Li. > >> > >> Yes, the goal is to get this (and the subsequent patches) merged to the > >> new > >> branch. Once it is stabilized from different aspects, we can move to > >> master. I am not sure of the exact mechanics when we work off a > different > >> project branch and not master. > >> > >> Does that sound good? > >> > >> Regarding compatibility, are we suggesting that let's not create Legacy > >> Nullable vectors at all? The initial thoughts were to generate Legacy > >> vectors from NullableValueVectors template and these vectors are > >> mutator/accessor based (in today's world). Internally each operation > will > >> be delegated to new vectors (non code generated). > >> > >> On Thu, Oct 12, 2017 at 10:58 AM, Bryan Cutler <[email protected]> > wrote: > >> > >> > Thanks for the update Siddharth. From the Spark side of this, I > >> definitely > >> > want to try to upgrade to the latest Arrow before the Spark 2.3 > release > >> but > >> > if it the refactor is too disruptive then others might get squeamish > >> about > >> > upgrading. On the other hand, I don't think we should hold back on > >> > refactoring for compatibility sake and the way it's looking now trying > >> to > >> > be backwards-compatible will be too much of a pain. I will try to > >> figure > >> > out the timeline for Spark 2.3 and what the feeling is for upgrading > >> > Arrow. Can we hold off on merging this to master for now and just > work > >> out > >> > of the separate branch until we can get a better feeling for the > impact? > >> > > >> > Bryan > >> > > >> > On Wed, Oct 11, 2017 at 8:19 AM, Li Jin <[email protected]> > wrote: > >> > > >> > > Hi Siddharth, > >> > > > >> > > Thanks for the update. This looks good. > >> > > > >> > > A few thoughts: > >> > > > >> > > *Compatibility:* > >> > > It sounds like we will introduce some back-compatibility with the > new > >> > > Vector class. At this point I think our main Java users should be > >> Spark > >> > and > >> > > Dremio, is this right? > >> > > > >> > > > >> > > - For Spark: > >> > > > >> > > It seems fine since Spark uses just the basic functionality of > Vector > >> > > classes and the existing code should work with the new Vector > classes, > >> > > maybe even without any code change on the Spark side. > >> > > > >> > > > >> > > - For Dremio: > >> > > > >> > > Sounds like you are already taking care of this by introducing the > >> > > LegacyVector classes. > >> > > > >> > > > >> > > *Testing:* > >> > > > >> > > - Spark Integration Tests: > >> > > > >> > > Bryan and I can help with integration test with Spark. I think the > >> target > >> > > timeline for Spark 2.3 release is some time in mid Nov (Bryan please > >> > > correct me if I am wrong). > >> > > > >> > > I will take a look at the PR today. > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > On Tue, Oct 10, 2017 at 4:29 PM, Siddharth Teotia < > >> [email protected]> > >> > > wrote: > >> > > > >> > > > Hi All, > >> > > > > >> > > > I wanted to update everyone on state of this mini-project: > >> > > > > >> > > > > >> > > > - Requirements document and initial design proposal were sent > >> out to > >> > > the > >> > > > community for review and we have received some good feedback. > All > >> > > > required > >> > > > docs are attached with corresponding JIRAs. > >> > > > > >> > > > > >> > > > - The initial prototype is in a reasonable state > (code-complete). > >> > You > >> > > > can see the PR here - https://github.com/apache/ > arrow/pull/1164 > >> > > > > >> > > > > >> > > > - The prototype has code changes for the new hierarchy, > abstract > >> > > > interfaces for fixed width and variable width vectors and > >> concrete > >> > > > implementation of NullableIntVector and NullableVarCharVector. > >> > > > > >> > > > > >> > > > Plan for testing and integrating into existing infrastructure: > >> > > > > >> > > > > >> > > > - My initial thoughts are that this particular patch will > >> require a > >> > > lot > >> > > > of testing, reviews etc since the foundation of rest of the > >> > > > implementation > >> > > > more or less depends on how the APIs are flushed out here. > >> > > > > >> > > > > >> > > > - So the goal is to get this properly tested and merged into > >> master > >> > > > first. > >> > > > > >> > > > > >> > > > - The idea is to slowly deprecate and remove the existing > >> vectors in > >> > > > stages. In this patch itself, we change the existing > >> > > > NullableValueVectors.java template to generate > >> > LegacyNullableIntVector > >> > > > and > >> > > > LegacyNullableVarCharVector. Each operation on these vectors > will > >> > > > delegate > >> > > > to the corresponding NullableIntVector and > NullableVarCharVector > >> > that > >> > > > are > >> > > > newly implemented. > >> > > > > >> > > > > >> > > > - This achieves two goals w.r.t testing: > >> > > > > >> > > > > >> > > > - Firstly, our existing JAVA unit tests will automatically > >> exercise > >> > > the > >> > > > newly written code and its APIs (API names have not changed) > >> for > >> > > > NullableInt and NullableVarChar vectors. > >> > > > > >> > > > > >> > > > - Secondly, let's say we rebase Dremio on top of Arrow master > and > >> > > > replace all references to NullableIntVector and > >> > > > NullableVarCharVector with > >> > > > their Legacy counterparts, things should still work. > >> > > > > >> > > > > >> > > > - After this patch gets merged, we can do the following work in > >> > > multiple > >> > > > patches: > >> > > > - Write concrete implementations for rest of the nullable > >> types > >> > -- > >> > > > FLOAT4, FLOAT8, BIGINT, VARBINARY etc > >> > > > > >> > > > > >> > > > - Write additional tests (definitely needed but the first goal > >> is to > >> > > > make sure existing tests are not broken). > >> > > > > >> > > > > >> > > > - Ensure NullableValueVectors template generates Legacy vectors > >> and > >> > > each > >> > > > operation is merely a delegation to the API in new > >> > implementation. > >> > > > > >> > > > > >> > > > - In the next Arrow release, remove all Legacy vectors and > >> > > > NullableValueVectors template since we will have the > >> > implementation > >> > > > for > >> > > > each type that passes existing tests. > >> > > > > >> > > > > >> > > > - I am currently inspecting the newly written code and making > >> > changes > >> > > to > >> > > > the template to generate Legacy vector types for Nullable > Int > >> > > > and Nullable > >> > > > VarChar and delegating the operations. The changes should be > >> > > > available in > >> > > > the PR in a couple of hours. > >> > > > > >> > > > > >> > > > I am wondering if there are any other ideas around testing, > merging > >> > etc. > >> > > > Please feel free to reply here or comment on the PR. > >> > > > > >> > > > I would appreciate if people can take time to review the code in > PR > >> -- > >> > > > especially the abstract classes BaseNullableFixedWidth and > >> > > > BaseNullableVariableWidth. Writing concrete implementations for > >> other > >> > > types > >> > > > will be much less hassle if these abstract classes have proper > code. > >> > > > > >> > > > Thanks, > >> > > > Siddharth > >> > > > > >> > > > >> > > >> > > > > >
