To clarify: I don't think we should try and get rid of all forking and I didn't read any of our prior or this discussion as that absolute. I think some reasons for forking are healthy, and other reasons for forking are redundant and wasteful. We should celebrate the former and try and root out the latter.
For example: if someone has written a feature to a GA branch and upstreamed it, and then we don't cut a release for multiple years and don't cut alphas from trunk, I'd contend our processes are encouraging low-value, redundant, wasteful fork maintenance. Or forcing people to qualify bespoke releases from arbitrary SHAs on trunk and running the database based on that which I expect most people wouldn't be too comfortable doing. Hence my desire to try and map out why people are forking. For instance: I think most people could agree that we shouldn't target completely getting rid of forks for reason #1: "You need bespoke code to integrate with internal infrastructure". We could take that as a sign to improve users' lives there by making integration points easier to write to, formalize some APIs, document them, etc. On Tue, Oct 21, 2025, at 2:07 PM, C. Scott Andreas wrote: > There’s a common motivation at the root of any fork: having at least one > patch that matters to you – perhaps one you’ve written yourself – and having > complete and total control over your ability to run it. Isaac's example of > C-20749 is a good example of this. > > This is the classic story of open source, and for me it’s a positive one. > > Releases published by the Apache Cassandra project are solid and stable. It’s > *also* true that many users pretty aggressively editorialize distributions of > the database they deploy. These can include urgent fixes for bugs they’ve > identified and patched; removal/disabling of features they haven’t qualified; > or incubation of patches they intend to contribute upstream. These are all > positive motivations, and something that open source enables. > > I don’t agree that the existence of “forks” is a negative or that eliminating > them is a desirable or achievable goal. Folks will always want or need to be > able to apply a patch that matters to them to solve a problem or scratch an > itch - and they should. > > If and as we learn that many users of a particular release have a common > challenge, our process of DISCUSS + backport provides a path to meet that > need, and I think we should exercise it more often. I see Isaac's example of > C-20749 as a good candidate for that as well. > > – Scott > >> On Oct 21, 2025, at 9:12 AM, Isaac Reath <[email protected]> wrote: >> >> >> I'd say the biggest reasons to me are (1) and (2). For example, we recently >> worked on _CASSANDRA-20749_ >> <https://issues.apache.org/jira/browse/CASSANDRA-20749> due to an internal >> need for this functionality, and fortunately we’ve been able to bring it >> upstream. But, since we run 4.1 and 5.0, we brought this patch back to these >> versions so that we are able to use this feature today. >> >> >> To your point in (3), I'd say it's easier to qualify a release built off of >> a stable GA than trunk, especially once the GA release has gotten to where >> 4.1 is. I’d love to get to the point where we’re qualifying and running >> trunk builds in production, but even when we get there I still see a world >> where we still need to run the latest GA alongside trunk which would still >> motivate us to bring patches back to the latest GA where we need them. >> >> >> On Mon, Oct 20, 2025 at 3:24 PM Josh McKenzie <[email protected]> wrote: >>> __ >>> We had a long conversation about the potential of piloting a supported >>> backport release branch here: >>> https://lists.apache.org/thread/xbxt21rttsqvhmh8ds9vs2cr7fx27w3k >>> >>> When I tried to summarize the thread and identify next steps, one >>> observation stood out: I think we did a good job establishing the shape of >>> the challenge we'd like to address (people want to work on OSS, not >>> maintain private forks), but I don't think we got to the root of why this >>> challenge exists. If we take action now we run the risk of having the wrong >>> solution to the right problem. >>> >>> So: why are people running forks? Some reasons I've seen brought up: >>> 1. You need bespoke code to integrate with internal infrastructure >>> 2. You've written a new feature targeting your internal version, >>> upstreamed the code, and have to wait for the feature to be in a GA release >>> 3. Someone else has contributed a feature to trunk that's attractive and >>> it's less work or more palatable to back-port it to your private fork and >>> maintain the diff than to qualify a custom release off trunk >>> 4. You have stability concerns with GA releases or running a release based >>> off trunk >>> The backport branch we discussed in the previous thread would primarily >>> address #4 (stability concerns) and secondarily #2 and #3 (feature >>> availability). All four motivations could be addressed in other >>> ways—ideally by reducing the pressure to fork in the first place, rather >>> than accommodating forks as inevitable. >>> >>> If you're running a fork and open to sharing your experience, do these >>> reasons match yours, or is something else at play? The more detail we can >>> gather, the better we can target improvements where they'll actually help. >>> >>> I know these conversations take time and can be hard; I appreciate everyone >>> taking the time and energy to help us collectively improve. >>> >>> ~Josh >>> >
