[ 
https://issues.apache.org/jira/browse/MADLIB-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16234798#comment-16234798
 ] 

Frank McQuillan edited comment on MADLIB-1171 at 11/1/17 9:52 PM:
------------------------------------------------------------------

Example for logistic regression attached

Advantages:
* Input parameters saved with results – easy to understand impact of input 
changes on the output
* Model train function can be re-executed without changes
* Train information like start/end time and train_label (some user-defined 
name) saved with results
* One common output table – easy to maintain
* Could also add MADlib version to this table



was (Author: fmcquillan):
Example for logistic regression attached

Advantages:
* Input parameters saved with results – easy to understand impact of input 
changes on the output
* Model train function can be re-executed without changes
* Train information like start/end time and train_label (some user-defined 
name) saved with results
* One common output table – easy to maintain
 
Notes:
* For some models we may need to normalize (e.g. keep output separately from 
input when output consists of more than one row)
* Also add MADlib version to this table


> Support model versioning in output tables
> -----------------------------------------
>
>                 Key: MADLIB-1171
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1171
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: All Modules
>            Reporter: Frank McQuillan
>            Priority: Major
>             Fix For: v2.0
>
>         Attachments: p100.png, p101.png
>
>
> Context
> For many MADlib modules,  <out_table> contains the separate models for each 
> group and <out_table>_summary contains the common model data for all groups.  
> Modeling versioning can be awkward since the model output table and model 
> summary table need to be explicitly dropped between runs.
> Story
> As a data scientist, I want to perform multiple runs without having to drop 
> tables, so that I can easily get a history of the model runs with clear 
> versioning.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to