[ 
https://issues.apache.org/jira/browse/MADLIB-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16606315#comment-16606315
 ] 

Frank McQuillan commented on MADLIB-1171:
-----------------------------------------

Attached is an updated approach to model versioning.
 [^model-versioning-work2.pdf] 

The changes from the previous version are:

1) model summary in JSON format
2) model in serialized format
3) can mix different model types in the same summary/repo tables

Why JSON?

* enables #3
* makes backward compatibility easier
* better for portability (e.g., to a low latency prediction server running out 
of db)
* enables easier integration with 3rd party model management tools

> Support model versioning in output tables
> -----------------------------------------
>
>                 Key: MADLIB-1171
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1171
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: All Modules
>            Reporter: Frank McQuillan
>            Priority: Major
>             Fix For: v2.0
>
>         Attachments: model-versioning-work1.pdf, model-versioning-work2.pdf, 
> p100.png, p101.png
>
>
> Context
> For many MADlib modules,  <out_table> contains the separate models for each 
> group and <out_table>_summary contains the common model data for all groups.  
> Modeling versioning can be awkward since the model output table and model 
> summary table need to be explicitly dropped between runs.
> Story
> As a data scientist, I want to perform multiple runs without having to drop 
> tables, so that I can easily get a history of the model runs with clear 
> versioning.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to