To make clear, I think it's fine to have Legacy Vectors in 0.8 as a deprecated API.
On Thu, Oct 12, 2017 at 2:19 PM, Li Jin <ice.xell...@gmail.com> wrote: > Siddharth, > > For working off a branch, Wes has created https://github.com/apache/ > arrow/tree/java-vector-refactor that we can submit PR to. > > For Legacy vectors, I think it's fine because it's really just a migration > path to help Dremio to migrate to the new vectors. I don't think other > users, i.e., Spark will use the Legacy vector class. Bryan and I will just > migrate Spark to new vectors directly because Spark's use of Arrow is very > simple. > > > > On Thu, Oct 12, 2017 at 2:08 PM, Siddharth Teotia <siddha...@dremio.com> > wrote: > >> Thanks Bryan and Li. >> >> Yes, the goal is to get this (and the subsequent patches) merged to the >> new >> branch. Once it is stabilized from different aspects, we can move to >> master. I am not sure of the exact mechanics when we work off a different >> project branch and not master. >> >> Does that sound good? >> >> Regarding compatibility, are we suggesting that let's not create Legacy >> Nullable vectors at all? The initial thoughts were to generate Legacy >> vectors from NullableValueVectors template and these vectors are >> mutator/accessor based (in today's world). Internally each operation will >> be delegated to new vectors (non code generated). >> >> On Thu, Oct 12, 2017 at 10:58 AM, Bryan Cutler <cutl...@gmail.com> wrote: >> >> > Thanks for the update Siddharth. From the Spark side of this, I >> definitely >> > want to try to upgrade to the latest Arrow before the Spark 2.3 release >> but >> > if it the refactor is too disruptive then others might get squeamish >> about >> > upgrading. On the other hand, I don't think we should hold back on >> > refactoring for compatibility sake and the way it's looking now trying >> to >> > be backwards-compatible will be too much of a pain. I will try to >> figure >> > out the timeline for Spark 2.3 and what the feeling is for upgrading >> > Arrow. Can we hold off on merging this to master for now and just work >> out >> > of the separate branch until we can get a better feeling for the impact? >> > >> > Bryan >> > >> > On Wed, Oct 11, 2017 at 8:19 AM, Li Jin <ice.xell...@gmail.com> wrote: >> > >> > > Hi Siddharth, >> > > >> > > Thanks for the update. This looks good. >> > > >> > > A few thoughts: >> > > >> > > *Compatibility:* >> > > It sounds like we will introduce some back-compatibility with the new >> > > Vector class. At this point I think our main Java users should be >> Spark >> > and >> > > Dremio, is this right? >> > > >> > > >> > > - For Spark: >> > > >> > > It seems fine since Spark uses just the basic functionality of Vector >> > > classes and the existing code should work with the new Vector classes, >> > > maybe even without any code change on the Spark side. >> > > >> > > >> > > - For Dremio: >> > > >> > > Sounds like you are already taking care of this by introducing the >> > > LegacyVector classes. >> > > >> > > >> > > *Testing:* >> > > >> > > - Spark Integration Tests: >> > > >> > > Bryan and I can help with integration test with Spark. I think the >> target >> > > timeline for Spark 2.3 release is some time in mid Nov (Bryan please >> > > correct me if I am wrong). >> > > >> > > I will take a look at the PR today. >> > > >> > > >> > > >> > > >> > > >> > > >> > > On Tue, Oct 10, 2017 at 4:29 PM, Siddharth Teotia < >> siddha...@dremio.com> >> > > wrote: >> > > >> > > > Hi All, >> > > > >> > > > I wanted to update everyone on state of this mini-project: >> > > > >> > > > >> > > > - Requirements document and initial design proposal were sent >> out to >> > > the >> > > > community for review and we have received some good feedback. All >> > > > required >> > > > docs are attached with corresponding JIRAs. >> > > > >> > > > >> > > > - The initial prototype is in a reasonable state (code-complete). >> > You >> > > > can see the PR here - https://github.com/apache/arrow/pull/1164 >> > > > >> > > > >> > > > - The prototype has code changes for the new hierarchy, abstract >> > > > interfaces for fixed width and variable width vectors and >> concrete >> > > > implementation of NullableIntVector and NullableVarCharVector. >> > > > >> > > > >> > > > Plan for testing and integrating into existing infrastructure: >> > > > >> > > > >> > > > - My initial thoughts are that this particular patch will >> require a >> > > lot >> > > > of testing, reviews etc since the foundation of rest of the >> > > > implementation >> > > > more or less depends on how the APIs are flushed out here. >> > > > >> > > > >> > > > - So the goal is to get this properly tested and merged into >> master >> > > > first. >> > > > >> > > > >> > > > - The idea is to slowly deprecate and remove the existing >> vectors in >> > > > stages. In this patch itself, we change the existing >> > > > NullableValueVectors.java template to generate >> > LegacyNullableIntVector >> > > > and >> > > > LegacyNullableVarCharVector. Each operation on these vectors will >> > > > delegate >> > > > to the corresponding NullableIntVector and NullableVarCharVector >> > that >> > > > are >> > > > newly implemented. >> > > > >> > > > >> > > > - This achieves two goals w.r.t testing: >> > > > >> > > > >> > > > - Firstly, our existing JAVA unit tests will automatically >> exercise >> > > the >> > > > newly written code and its APIs (API names have not changed) >> for >> > > > NullableInt and NullableVarChar vectors. >> > > > >> > > > >> > > > - Secondly, let's say we rebase Dremio on top of Arrow master and >> > > > replace all references to NullableIntVector and >> > > > NullableVarCharVector with >> > > > their Legacy counterparts, things should still work. >> > > > >> > > > >> > > > - After this patch gets merged, we can do the following work in >> > > multiple >> > > > patches: >> > > > - Write concrete implementations for rest of the nullable >> types >> > -- >> > > > FLOAT4, FLOAT8, BIGINT, VARBINARY etc >> > > > >> > > > >> > > > - Write additional tests (definitely needed but the first goal >> is to >> > > > make sure existing tests are not broken). >> > > > >> > > > >> > > > - Ensure NullableValueVectors template generates Legacy vectors >> and >> > > each >> > > > operation is merely a delegation to the API in new >> > implementation. >> > > > >> > > > >> > > > - In the next Arrow release, remove all Legacy vectors and >> > > > NullableValueVectors template since we will have the >> > implementation >> > > > for >> > > > each type that passes existing tests. >> > > > >> > > > >> > > > - I am currently inspecting the newly written code and making >> > changes >> > > to >> > > > the template to generate Legacy vector types for Nullable Int >> > > > and Nullable >> > > > VarChar and delegating the operations. The changes should be >> > > > available in >> > > > the PR in a couple of hours. >> > > > >> > > > >> > > > I am wondering if there are any other ideas around testing, merging >> > etc. >> > > > Please feel free to reply here or comment on the PR. >> > > > >> > > > I would appreciate if people can take time to review the code in PR >> -- >> > > > especially the abstract classes BaseNullableFixedWidth and >> > > > BaseNullableVariableWidth. Writing concrete implementations for >> other >> > > types >> > > > will be much less hassle if these abstract classes have proper code. >> > > > >> > > > Thanks, >> > > > Siddharth >> > > > >> > > >> > >> > >