alamb opened a new issue, #16886: URL: https://github.com/apache/datafusion/issues/16886
### Is your feature request related to a problem or challenge? @viirya says in https://github.com/apache/datafusion/issues/16800#issuecomment-3084789737: > Sometimes, I feel that some important proposals in DataFusion lack sufficient context, or that the relevant context is scattered across various issues and PR comments. This makes it difficult to fully understand the proposals or to trace their motivations and evaluate their soundness. As a result, we sometimes see large PRs — hundreds or even thousands of lines — that are based on these proposals, making the review process even more challenging. Only the author or those who were involved in the initial discussions seem to be in a position to effectively review them. > > For example, Spark has the SPIP (Spark Project Improvement Proposal) mechanism, where contributors submit formal documents for review when proposing significant changes. These documents typically consolidate the technical details, motivation, and background of the proposal into a single place. This approach helps the community better understand and participate in discussions around major changes. > > I wonder if it would be beneficial for DataFusion to adopt a similar lightweight proposal process for major design changes — something that allows ideas and context to be collected and reviewed before implementation begins. It could help improve transparency, facilitate broader community involvement, and make the review process more accessible. > > If the full SPIP process — including voting and formal approval — feels too heavy or unnecessary for our context, perhaps we could at least establish a lightweight template for major change proposals. This template could include sections for motivation, background, technical details, and other relevant context. Having a consistent format would make it easier for the community to follow and engage with significant design discussions. My opinions: 1. Finding the outstanding proposals and discussions is difficult. They are all public but there is lots of them going on 2. The context for proposals is often scattered across issues and PRs 3. It is hard to know when "enough" communication has been done for a proposal to move forward and when it needs more work 5. Improving the communication around major changes is becoming more important as the project grows and we have more users and contributors For example, there are several recent discussions that could benefit from this mor formalproposal process, including but not limited to the discussion itself above - https://github.com/apache/datafusion/issues/16800 itself (along with actually this one( - https://github.com/apache/datafusion/pull/16625 from @findepi - https://github.com/apache/datafusion/issues/13704 in general , and https://github.com/apache/datafusion/issues/13704#issuecomment-3109180176 recently with @berkasynnada - https://github.com/apache/datafusion/issues/16841 from @gabotechs, - https://github.com/apache/datafusion/issues/16677 with @findepi ### Describe the solution you'd like Some sort of "process" that 1. Makes it easy to find outstanding community improvement proposals 2. Makes it easy to know the steps to create a new improvement proposal 3. Is documented ### Describe alternatives you've considered Here is a strawman (for discussion) proposal: 1. Add a new tag in the DataFusion repo ("DIP - DataFusion Improvement Proposal") 2. Add a new [ISSUE_TEMPLATE](https://github.com/apache/datafusion/tree/main/.github/ISSUE_TEMPLATE) for proposals issues based on the SPIP one and current DataFusion issue template 3. Add a section to the site documentation describing the process I personally worry that DataFusion is not at a point I where formal voting / formal approval would add a lot of value, but I do think formalizing the proposal format and making them easier to find would be beneficial. I propose starting with more formalization around the communication of proposals and we can add more explicitly approval / consensus standards if and when they become necessary. ### Additional context Here is the documentation for the spark process: The https://spark.apache.org/improvement-proposals.html I looked through the [list of SPIPs](https://issues.apache.org/jira/browse/SPARK-51162?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20Reopened%2C%20%22In%20Progress%22)%20AND%20(labels%20%3D%20SPIP%20OR%20summary%20~%20%22SPIP%22)%20ORDER%20BY%20createdDate%20DESC) in Spark and the few I looked at didn't have huge amounts of discussion. They often linked to a google doc with more details. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org