Omega359 opened a new issue, #14777:
URL: https://github.com/apache/datafusion/issues/14777

   With the extraction of builtin functions to UDF's last year it has become 
much easier to add new functions to DataFusion and if @findepi's [simple 
functions](https://github.com/apache/datafusion/issues/12635) is merged into 
main in the future it's only going to become easier to add new functions.
   
   There have been a number of comments in the past concerning functions in 
DataFusion and what should and should not be in core and what should likely be 
in an external repository. The only guidance right now in the [contributor 
guide](https://datafusion.apache.org/contributor-guide/index.html#what-contributions-are-good-fits)
 is:
   ```
   Contributions that will likely involve more discussion (see Discussing New 
Features above) prior to acceptance include:
   
   * Major new functionality (even if it is part of the “standard SQL”)
   * New functions, especially if they aren’t part of “standard SQL”
   * New data sources (e.g. support for Apache ORC)
   ```
   
   That's great, except that there is no definition of what is 'standard SQL' 
(typically I would think we would point to postgresql as the 'standard') and 
there are many many useful functions in other systems such as duckdb, 
singlestore, spark, etc that could be candidates for inclusion.
   
   There already is a PR to [include spark 
functions](https://github.com/apache/datafusion/pull/14392) in DataFusion as a 
separate crate spearheaded by the comet team.
   
   The concern with having so many functions in core is the maintenance burden 
it incurs. The community has been able to handle it so far but if we keep 
adding more functions in the future that may no longer be the case.
   
   I would like to come to a consensus as to what we would accept in a more 
specific worded manner, but perhaps more importantly, can we provide a home for 
non-core functions where the community could maintain them outside of 
DataFusion core?
     
   My proposal is:
   
   * New functions will only be accepted in DataFusion if they fill in a gap 
compared to Postgresql or fill a gap identified by the community compared to an 
alternative systems such as DuckDB. In the later case the functions should be 
contributed as a group within an epic that fills out the specified gap (for 
example, the union* functions in DuckDB), not single functions coming in 
piecemeal.
   * A new apache repository is setup (datafusion-additional-functions ?) where 
we provide the framework for adding, testing and packaging new functions but 
with the explicitly stated understanding that the maintenance of any functions 
contained have lower maintenance priority in the DataFusion team and releases 
may or may not coincide with DataFusion releases. 
   
   
   Thoughts?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to