[
https://issues.apache.org/jira/browse/MADLIB-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16234806#comment-16234806
]
Frank McQuillan commented on MADLIB-1171:
-----------------------------------------
Other:
If model output table requires unique index storage attributes, this should be
specified in the code that creates the table.
Some users use append-optimized compressed tables by default which doesn't work
with most of MADlib functions.
gprdsu 09:39:02=> SELECT madlib.logregr_train(
gprdsu(> 'patients', -- source table
gprdsu(> 'patients_logregr', -- output table
gprdsu(> 'second_attack', -- labels
gprdsu(> 'ARRAY[1, treatment, trait_anxiety]', -- features
gprdsu(> NULL, -- grouping columns
gprdsu(> 20, -- max number of
iteration
gprdsu(> 'irls' -- optimizer
gprdsu(> );
ERROR: plpy.SPIError: append-only tables do not support unique indexes
(plpython.c:4656)
CONTEXT: Traceback (most recent call last):
PL/Python function "logregr_train", line 19, in <module>
return logistic.logregr_train(**globals())
PL/Python function "logregr_train", line 133, in logregr_train
PL/Python function "logregr_train", line 260, in __logregr_train_compute
PL/Python function "logregr_train", line 75, in __compute_logregr
PL/Python function "logregr_train", line 127, in __enter__
PL/Python function "logregr_train", line 197, in runSQL
PL/Python function "logregr_train"
We don't want to force users to reset storage parameters every time he/she
needs to call MADlib function
ALTER ROLE xxx SET gp_default_storage_options 'appendonly=false' ;
> Support model versioning in output tables
> -----------------------------------------
>
> Key: MADLIB-1171
> URL: https://issues.apache.org/jira/browse/MADLIB-1171
> Project: Apache MADlib
> Issue Type: New Feature
> Components: All Modules
> Reporter: Frank McQuillan
> Priority: Major
> Fix For: v2.0
>
> Attachments: p100.png, p101.png
>
>
> Context
> For many MADlib modules, <out_table> contains the separate models for each
> group and <out_table>_summary contains the common model data for all groups.
> Modeling versioning can be awkward since the model output table and model
> summary table need to be explicitly dropped between runs.
> Story
> As a data scientist, I want to perform multiple runs without having to drop
> tables, so that I can easily get a history of the model runs with clear
> versioning.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)