houqp commented on issue #1176:
URL: https://github.com/apache/arrow-rs/issues/1176#issuecomment-1013781098


   Thank you @alamb for starting and driving this discussion. Great summary on 
the current community consensus.
   
   > It is not clear to me if there is a consensus on:
   > How important the Apache Governance model is (please lend your opinions 
here!)
   
   Personally, I think the Apache model works better for relatively slow moving 
monolith projects, while arrow2/parquet2 are fast evolving projects with a 
vision to be broken up into even smaller modular crates. @alamb has done an 
exceptional work on driving the arrow-rs releases. But seeing how much effort 
and time it takes, I would consider it a unnecessary overhead for arrow2 at its 
current stage. @jorgecarleitao was able to react to user feedbacks fast and 
release 3-4 new versions in a week for arrow2, this is simply not possible with 
the Apache Governance model. That said, I think the Apache voting process is 
very useful when you need high confidence on the quality of every single 
release and has a large diverse set of PMC members who can participate in the 
voting in a timely manner. But arrow2 seems still pretty far away from this.
   
   @andygrove brought up a good point that it might become an issue for large 
corporations with restrictive open source contribution guide lines. This is the 
first time I am aware of this issue, previously I was under the impression that 
software license is all what matters. On the other hand, I am guessing ASF is 
not the only governance that's allowed? Perhaps we could help @jorgecarleitao 
come up with a different compatible governance model for arrow2 until it's 
ready for the ASF contribution?  If Andy wants to contribute to arrow2 now but 
is blocked by lack of governance, then I would consider this a serious issue 
that we should address. Otherwise I would optimize for iteration velocity over 
governance until it becomes a real problem. 
   
   In short, from what I have seen so far, the upside from adopting the Apache 
governance model is to unblock potential contributions from big corporations. 
The downside is it will slow down our iteration process and potentially even 
disincentivize @jorgecarleitao from actively working on the project. Reading 
from his past emails, I get the feeling that he did try very hard to pass the 
IP clearance and donate arrow2 to ASF last year, but got frustrated by the 
bureaucracy. I am personally much more concerned about latter than the former.
   
   > How important the stability of APIs / the specific versioning scheme (0.x 
vs 1.x or later)
   
   IMHO, this is not important as long as it is well communicated to the users. 
i.e. be explicit that we are special and please treat our 8.x as 0.x until we 
say otherwise. But Jorge has a strong opinion on this and want to strictly 
follow what the rest of the Rust ecosystem does. I also understand where he is 
coming from and respect his stance on this.
   
   > Switch datafusion to arrow2, making no changes to arrow-rs. It could be 
maintained by anyone who wished to contribute,
   
   I agree with @andygrove on this. As long as there is community interests in 
this, we should probably still open arrow-rs up for contributions. This is not 
the result I want to see, but I have a feeling that this is likely what is 
going to happen :(
   
   > Start more actively porting the more ergonomic parts of arrow2 into 
arrow-rs 
   
   I think this is certainly doable, but then I stand by my previous comment 
that it won't be a good use of our time unless there is fundamental design 
tradeoffs in arrow-rs that are not compatible with arrow2's design. Simply 
replicating the design another project has is not a good reason to start a fork 
IMHO. I know @tustvold has a fairly strong opinion on this option and is more 
familiar with the parquet code base than I do, so perhaps he could help shed 
some light on this.
   
   > Option 2 leaves open the question of “how does arrow2 development move 
forward” – where would patches be sent, for example?
   
   Just throwing out random idea here, one potential variant of option 2 is we 
use arrow-rs as the place to maintain stable arrow2 branches and let arrow2 
iterate as fast as it could without the fear of introducing breaking changes. 
While the stable branch will cherry-pick compatible commits for a specific 0.x 
release that we want to maintain for X months. This way, we can still direct 
all contributions back to arrow2. The downside is I don't know how much 
interests the community has for a stable API considering we just decided to 
stop maintaining stable releases for arrow-rs.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to