[
https://issues.apache.org/jira/browse/MADLIB-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Frank McQuillan updated MADLIB-1171:
------------------------------------
Attachment: p100.png
Example for logistic regression attached
Advantages:
* Input parameters saved with results – easy to understand impact of input
changes on the output
* Model train function can be re-executed without changes
* Train information like start/end time and train_label (some user-defined
name) saved with results
* One common output table – easy to maintain
Notes:
* For some models we may need to normalize (e.g. keep output separately from
input when output consists of more than one row)
* Also add MADlib version to this table
> Support model versioning in output tables
> -----------------------------------------
>
> Key: MADLIB-1171
> URL: https://issues.apache.org/jira/browse/MADLIB-1171
> Project: Apache MADlib
> Issue Type: New Feature
> Components: All Modules
> Reporter: Frank McQuillan
> Priority: Major
> Fix For: v2.0
>
> Attachments: p100.png
>
>
> Context
> For many MADlib modules, <out_table> contains the separate models for each
> group and <out_table>_summary contains the common model data for all groups.
> Modeling versioning can be awkward since the model output table and model
> summary table need to be explicitly dropped between runs.
> Story
> As a data scientist, I want to perform multiple runs without having to drop
> tables, so that I can easily get a history of the model runs with clear
> versioning.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)