Orhan,

I think this is a good addition to MADlib.  Regarding your questions:

1) Seems like a good set of prediction metrics to start with.  If other
members of the community would like to add more, they are welcome to create
a JIRA for those and work on them.

2) Suggest we do include grouping as an optional param, since it could be
very useful.  It means an output table is the way to go.  Without grouping,
an output table with a single value is not ideal but OK, since consistency
of output format is useful.

Frank



On Fri, Apr 8, 2016 at 3:54 PM, Orhan Kislal <okis...@pivotal.io> wrote:

> Hello MADlib community,
>
> I think it might make sense to add a module to MADlib for prediction
> metrics.
> Since there are quite a bit of options, I decided to start with the list of
> metrics from PDLTools [1]. You can see my proposed interface at attachment
> of
> the associated JIRA [2,3]. I'll paste a snippet just as an example. I would
> like
> the feedback of the community on a number of questions that came up.
>
> 1) Are there any other metrics that should take precedence over these ones?
> Please note that binary_classifier reports multiple metrics (tpr, fpr, acc,
> f1
> etc.)
>
> 2) How should we handle grouping? As you can see in the example, the
> function
> returns a double value for regular execution but an output table is used if
> grouping parameter is passed. This dual interface doesn't seem clean and
> returning a table with a single value for the regular execution feels
> wrong.
>
> Thanks
>
> Orhan Kislal
>
>
> [1]
>
> http://pivotalsoftware.github.io/PDLTools/group__grp__prediction__metrics.html
>
> [2] https://issues.apache.org/jira/browse/MADLIB-907
>
> [3]
> https://issues.apache.org/jira/secure/attachment/12797816/interface_v1.sql
>
> -----------------------------------------------------------------------
>
> CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.area_under_roc(
>     table_in    TEXT,
>     prediction_col TEXT,
>   observed_col TEXT,
>     table_out TEXT,
>     grouping_col TEXT
> ) RETURNS VOID
> AS $$
>     PythonFunctionBodyOnly(`pred_metrics', `pred_metrics')
>     return pred_metrics.area_under_roc(schema_madlib,
>     table_in, prediction_col, observed_col, table_out, grouping_col)
> $$ LANGUAGE plpythonu
> m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL DATA', `');
>
> CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.area_under_roc(
>     table_in    TEXT,
>     prediction_col TEXT,
>   observed_col TEXT
> ) RETURNS DOUBLE PRECISION
> AS $$
>     PythonFunctionBodyOnly(`pred_metrics', `pred_metrics')
>     return pred_metrics.area_under_roc(schema_madlib,
>     table_in, prediction_col, observed_col)
> $$ LANGUAGE plpythonu
> m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL DATA', `');
>
> -----------------------------------------------------------------------
>

Reply via email to