Hello MADlib community,

I think it might make sense to add a module to MADlib for prediction
metrics.
Since there are quite a bit of options, I decided to start with the list of
metrics from PDLTools [1]. You can see my proposed interface at attachment
of
the associated JIRA [2,3]. I'll paste a snippet just as an example. I would
like
the feedback of the community on a number of questions that came up.

1) Are there any other metrics that should take precedence over these ones?
Please note that binary_classifier reports multiple metrics (tpr, fpr, acc,
f1
etc.)

2) How should we handle grouping? As you can see in the example, the
function
returns a double value for regular execution but an output table is used if
grouping parameter is passed. This dual interface doesn't seem clean and
returning a table with a single value for the regular execution feels wrong.

Thanks

Orhan Kislal


[1]
http://pivotalsoftware.github.io/PDLTools/group__grp__prediction__metrics.html

[2] https://issues.apache.org/jira/browse/MADLIB-907

[3]
https://issues.apache.org/jira/secure/attachment/12797816/interface_v1.sql

-----------------------------------------------------------------------

CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.area_under_roc(
    table_in    TEXT,
    prediction_col TEXT,
  observed_col TEXT,
    table_out TEXT,
    grouping_col TEXT
) RETURNS VOID
AS $$
    PythonFunctionBodyOnly(`pred_metrics', `pred_metrics')
    return pred_metrics.area_under_roc(schema_madlib,
    table_in, prediction_col, observed_col, table_out, grouping_col)
$$ LANGUAGE plpythonu
m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL DATA', `');

CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.area_under_roc(
    table_in    TEXT,
    prediction_col TEXT,
  observed_col TEXT
) RETURNS DOUBLE PRECISION
AS $$
    PythonFunctionBodyOnly(`pred_metrics', `pred_metrics')
    return pred_metrics.area_under_roc(schema_madlib,
    table_in, prediction_col, observed_col)
$$ LANGUAGE plpythonu
m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL DATA', `');

-----------------------------------------------------------------------

Reply via email to