alamb opened a new issue, #16622:
URL: https://github.com/apache/datafusion/issues/16622

   ### Is your feature request related to a problem or challenge?
   
   One of the dreams of the composable data ecosystem is to quickly assemble a 
system from various components (DataFusion, data formats 
   
   DataFusion still releases once a month, which allows code to quickly flow 
but also causes at least 2 challenges:
   1. Takes non trivial work required to upgrade downstream projects, as 
mentioned in https://github.com/apache/datafusion/issues/5269
   2. Make upgrading and using downstream third-party extensions hard
   
   Third party extensions like delta-rs and iceberg provide `TableProviders` 
for DataFusion, which is really nice. However, to use those packages  the 
versions of DataFusion must match exactly. 
   
   This means for an application that relies on multiple downstream packages 
must wait until **ALL** of them have upgraded to the new version in order to 
upgrade DataFusion. If there is any delay in the downstream libraries updating, 
it delays. 
   
   For example, an application that wants to use delta-rs, iceberg, and the 
`table-providers` crate, there is a race after each upgrade of DataFusion
   
   Let's take a release timeline for
   1. +0 days: DataFusion version `X` released
   2. +7 days: New delta-rs releases upgraded to DataFusion `X`
   3. +11 days: new iceberg crate released upgraded to DataFusion `X`
   4. +12 days: new table-providers version is released
   5. +13-30 days: End user app can upgrade DataFusion and delta, and icerberg
   6. +31 days: New DataFusion is released again
   
   
   
   
   ### Describe the solution you'd like
   
   I would like downstream libraries to have more time and schedule flexibility 
when upgrading DataFusion and other dependent crates, so that it is easier to 
construct a system from different components
   
   ### Describe alternatives you've considered
   
   ## Option 1: Switch to major/minor release cadence
   We could follow the model of arrow-rs which does releases monthly, but 
breaking releases only quarterly. Here is how it works in arrow-rs: 
https://github.com/apache/arrow-rs?tab=readme-ov-file#release-versioning-and-schedule
   
   The major cost here is that maintainers and contributors would have to be 
diligent about not merging breaking API changes until a major release
   
   This is possible to automate somewhat:
   - https://github.com/apache/datafusion/pull/16078 from @logan-keede 
   - https://github.com/apache/datafusion/pull/16541 from @lic
   
   
   ## Option 2: LTS and feature branch
   -Keep (at least) two branches going:  LTS and main, as proposed by 
@andygrove in https://github.com/apache/datafusion/issues/5269
   
   In this model we would likely backport changes to the LTS branch and make 
releases from there. The downside of this approach is that there is extra work 
to backport changes to LTS.
   
   
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to