Hello MADlib community, I think it might make sense to add a module to MADlib for prediction metrics. Since there are quite a bit of options, I decided to start with the list of metrics from PDLTools [1]. You can see my proposed interface at attachment of the associated JIRA [2,3]. I'll paste a snippet just as an example. I would like the feedback of the community on a number of questions that came up.
1) Are there any other metrics that should take precedence over these ones? Please note that binary_classifier reports multiple metrics (tpr, fpr, acc, f1 etc.) 2) How should we handle grouping? As you can see in the example, the function returns a double value for regular execution but an output table is used if grouping parameter is passed. This dual interface doesn't seem clean and returning a table with a single value for the regular execution feels wrong. Thanks Orhan Kislal [1] http://pivotalsoftware.github.io/PDLTools/group__grp__prediction__metrics.html [2] https://issues.apache.org/jira/browse/MADLIB-907 [3] https://issues.apache.org/jira/secure/attachment/12797816/interface_v1.sql ----------------------------------------------------------------------- CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.area_under_roc( table_in TEXT, prediction_col TEXT, observed_col TEXT, table_out TEXT, grouping_col TEXT ) RETURNS VOID AS $$ PythonFunctionBodyOnly(`pred_metrics', `pred_metrics') return pred_metrics.area_under_roc(schema_madlib, table_in, prediction_col, observed_col, table_out, grouping_col) $$ LANGUAGE plpythonu m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL DATA', `'); CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.area_under_roc( table_in TEXT, prediction_col TEXT, observed_col TEXT ) RETURNS DOUBLE PRECISION AS $$ PythonFunctionBodyOnly(`pred_metrics', `pred_metrics') return pred_metrics.area_under_roc(schema_madlib, table_in, prediction_col, observed_col) $$ LANGUAGE plpythonu m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL DATA', `'); -----------------------------------------------------------------------