houqp commented on issue #1532:
URL: 
https://github.com/apache/arrow-datafusion/issues/1532#issuecomment-1012789934


   > I would personally prefer an approach that sees the great work on arrow2 
cherry-picked into arrow-rs, with arrow2 serving as an incubator for new ideas. 
I am happy to help out with this if there are things people would particularly 
like to see ported across? 
   
   For me personally, on top of the highly optimized parquet, avro and json io 
modules, I really like it's transmute free design and the muttable array 
abstraction. The latter is the main reason why delta-rs is also in the process 
of migrating to arrow2.
   
   From previous discussions in the arrow dev list, I believe Jorge tried 
applying his arrow2 learnings back to arrow-rs last year, but decided that it's 
not worth the effort because it would require basically rewriting the majority 
of the code base. My main concern with cherry-picking arrow2 designs into 
arrow-rs is that we are spending all these efforts into making arrow-rs as good 
as arrow2 while on the other hand we could have spent the same amount of 
efforts into making arrow2 even better, which will not only benefit datafusion, 
but a much larger community including other projects that are currently using 
arrow2.
   
   IMHO, there is value in forking an open-source repo when fundamental design 
tradeoffs diverges. But from what I have seen so far, both arrow2 and arrow-rs 
contributors are pretty aligned on the direction of where an ideal arrow rust 
implementation should go?
   
   > The current ecosystem fragmentation is just unfortunate for both users and 
contributors imo...
   
   I agree 100%. That's why I think it would be good if we can come up with a 
way to avoid cherry-picking commits from arrow2 into arrow-rs. Perhaps we can 
have arrow-rs build on top of arrow2 so they still share the majority of the 
code base? For example, arrow-rs could focus on providing a higher level and 
stable API for consumers while using arrow2 as the core. That way from 
contributors' point of view, it will be clear where they should send their 
patches to depending on which layer they work on.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to