Hi Druid devs! I've been thinking about our release process and would love to get your thoughts on how we manage new features.
When a new feature is added is it first marked as experimental? How do users know which features are experimental? How do we ensure that features do not break with each new release? Should the release manager manually check each feature works as part of the release process? This doesn't seem like it can scale. Should integration tests always be required if the feature is being added to core? To address these issues, I'd like to propose we introduce a feature lifecycle for all features so that we can set expectations for users appropriately - either in the docs, product or both. I'd like to propose something like this: * Alpha - Known major bugs / performance issues. Incomplete functionality. Disabled by default. * Beta - Feature is not yet battle tested in production. API and compatibility may change in the future. May not be forward/ backward compatible. * GA - Feature has appropriate user facing documentation and testing so that it won't regres with a version upgrade. Will be forward / backward compatible for x releases (maybe 4? ~ 1 year) I think a model like this will allow us to continue to ship features quickly while keeping the release quality bar high so that our users can continue to rely on Druid without worrying about upgrade issues. I understand that adding integration tests may not always make sense for early / experimental features when we're uncertain of the API or the broader use case we're trying to solve. This model would make it clear to our users which features are still work in progress, and which ones they can expect to remain stable for a longer time. Below is an example of how I think this model can be applied to a new feature: This PR adds support for a new feature - https://github.com/apache/druid/pull/9449 While it has been tested locally, there may be changes that enter Druid before the 0.19 release that break this feature, or more likely - a refactoring after 0.19 that breaks something in this feature. In this example, I think the feature should be marked as alpha, since there are future changes expected to the functionality. At this stage integration tests are not expected. Once the feature is complete, there should be happy path integration tests for the feature and it can graduate to Beta. After it has been running in production for a while, the feature can graduate to GA once we've added enough integration tests that we feel confident that the feature will continue to work if the integration tests run successfully. I know this is a very long email, but I look forward to hearing your thoughts on this. Suneet