Omega359 opened a new issue, #14777: URL: https://github.com/apache/datafusion/issues/14777
With the extraction of builtin functions to UDF's last year it has become much easier to add new functions to DataFusion and if @findepi's [simple functions](https://github.com/apache/datafusion/issues/12635) is merged into main in the future it's only going to become easier to add new functions. There have been a number of comments in the past concerning functions in DataFusion and what should and should not be in core and what should likely be in an external repository. The only guidance right now in the [contributor guide](https://datafusion.apache.org/contributor-guide/index.html#what-contributions-are-good-fits) is: ``` Contributions that will likely involve more discussion (see Discussing New Features above) prior to acceptance include: * Major new functionality (even if it is part of the “standard SQL”) * New functions, especially if they aren’t part of “standard SQL” * New data sources (e.g. support for Apache ORC) ``` That's great, except that there is no definition of what is 'standard SQL' (typically I would think we would point to postgresql as the 'standard') and there are many many useful functions in other systems such as duckdb, singlestore, spark, etc that could be candidates for inclusion. There already is a PR to [include spark functions](https://github.com/apache/datafusion/pull/14392) in DataFusion as a separate crate spearheaded by the comet team. The concern with having so many functions in core is the maintenance burden it incurs. The community has been able to handle it so far but if we keep adding more functions in the future that may no longer be the case. I would like to come to a consensus as to what we would accept in a more specific worded manner, but perhaps more importantly, can we provide a home for non-core functions where the community could maintain them outside of DataFusion core? My proposal is: * New functions will only be accepted in DataFusion if they fill in a gap compared to Postgresql or fill a gap identified by the community compared to an alternative systems such as DuckDB. In the later case the functions should be contributed as a group within an epic that fills out the specified gap (for example, the union* functions in DuckDB), not single functions coming in piecemeal. * A new apache repository is setup (datafusion-additional-functions ?) where we provide the framework for adding, testing and packaging new functions but with the explicitly stated understanding that the maintenance of any functions contained have lower maintenance priority in the DataFusion team and releases may or may not coincide with DataFusion releases. Thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org