houqp commented on issue #1532: URL: https://github.com/apache/arrow-datafusion/issues/1532#issuecomment-1012789934
> I would personally prefer an approach that sees the great work on arrow2 cherry-picked into arrow-rs, with arrow2 serving as an incubator for new ideas. I am happy to help out with this if there are things people would particularly like to see ported across? For me personally, on top of the highly optimized parquet, avro and json io modules, I really like it's transmute free design and the muttable array abstraction. The latter is the main reason why delta-rs is also in the process of migrating to arrow2. From previous discussions in the arrow dev list, I believe Jorge tried applying his arrow2 learnings back to arrow-rs last year, but decided that it's not worth the effort because it would require basically rewriting the majority of the code base. My main concern with cherry-picking arrow2 designs into arrow-rs is that we are spending all these efforts into making arrow-rs as good as arrow2 while on the other hand we could have spent the same amount of efforts into making arrow2 even better, which will not only benefit datafusion, but a much larger community including other projects that are currently using arrow2. IMHO, there is value in forking an open-source repo when fundamental design tradeoffs diverges. But from what I have seen so far, both arrow2 and arrow-rs contributors are pretty aligned on the direction of where an ideal arrow rust implementation should go? > The current ecosystem fragmentation is just unfortunate for both users and contributors imo... I agree 100%. That's why I think it would be good if we can come up with a way to avoid cherry-picking commits from arrow2 into arrow-rs. Perhaps we can have arrow-rs build on top of arrow2 so they still share the majority of the code base? For example, arrow-rs could focus on providing a higher level and stable API for consumers while using arrow2 as the core. That way from contributors' point of view, it will be clear where they should send their patches to depending on which layer they work on. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org