[ 
https://issues.apache.org/jira/browse/MADLIB-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532078#comment-16532078
 ] 

Rahul Iyer commented on MADLIB-1250:
------------------------------------

PR for the fix: https://github.com/apache/madlib/pull/287

> Can't generate cross validation table for SVM
> ---------------------------------------------
>
>                 Key: MADLIB-1250
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1250
>             Project: Apache MADlib
>          Issue Type: Bug
>          Components: Module: Support Vector Machines
>            Reporter: Frank McQuillan
>            Assignee: Rahul Iyer
>            Priority: Minor
>             Fix For: v1.15
>
>
> SVM does provide the CV:
> 1) The CV results table can be obtained by setting the validation_result 
> variable in params parameter. This can be any arbitrary name, including 
> <output_table>_cv.
> 2) The _summary table reports the best cross-validated parameter, which 
> corresponds to the model in the output table. This gives the user the exact 
> parameters to recreate the model. It's open for debate if that is the purpose 
> of the summary table.
> 3) The docs are definitely missing examples for CV.
> But there seems to be a bug:
> {code}
> DROP TABLE IF EXISTS houses;
> CREATE TABLE houses (id INT, tax INT, bedroom INT, bath FLOAT, price INT,
>             size INT, lot INT);
> INSERT INTO houses VALUES   
>   (1 ,  590 ,       2 ,    1 ,  50000 ,  770 , 22100),
>   (2 , 1050 ,       3 ,    2 ,  85000 , 1410 , 12000),
>   (3 ,   20 ,       3 ,    1 ,  22500 , 1060 ,  3500),
>   (4 ,  870 ,       2 ,    2 ,  90000 , 1300 , 17500),
>   (5 , 1320 ,       3 ,    2 , 133000 , 1500 , 30000),
>   (6 , 1350 ,       2 ,    1 ,  90500 ,  820 , 25700),
>   (7 , 2790 ,       3 ,  2.5 , 260000 , 2130 , 25000),
>   (8 ,  680 ,       2 ,    1 , 142500 , 1170 , 22000),
>   (9 , 1840 ,       3 ,    2 , 160000 , 1500 , 19000),
>  (10 , 3680 ,       4 ,    2 , 240000 , 2790 , 20000),
>  (11 , 1660 ,       3 ,    1 ,  87000 , 1030 , 17500),
>  (12 , 1620 ,       3 ,    2 , 118600 , 1250 , 20000),
>  (13 , 3100 ,       3 ,    2 , 140000 , 1760 , 38000),
>  (14 , 2070 ,       2 ,    3 , 148000 , 1550 , 14000),
>  (15 ,  650 ,       3 ,  1.5 ,  65000 , 1450 , 12000);
> {code}
> Run training with CV:
> {code}
> DROP TABLE IF EXISTS houses_svm_gaussian_regression, 
> houses_svm_gaussian_regression_summary, 
> houses_svm_gaussian_regression_random, houses_svm_gaussian_regression_cv;
> SELECT madlib.svm_regression( 'houses',
>                               'houses_svm_gaussian_regression',
>                               'price',
>                               'ARRAY[1, tax, bath, size]',
>                               'gaussian',
>                               'n_components=10',
>                               '',
>                               'init_stepsize=[0.01, 1], max_iter=200, 
> validation_result=houses_svm_gaussian_regression_cv, n_folds=3'
>                            );
> SELECT * FROM houses_svm_gaussian_regression_cv;
> {code}
> Results in error:
> {code}
> InternalError: (psycopg2.InternalError) KeyError: 'params_dict' 
> (plpython.c:4960)
> CONTEXT:  Traceback (most recent call last):
>   PL/Python function "svm_regression", line 23, in <module>
>     return svm.svm(**globals())
>   PL/Python function "svm_regression", line 970, in svm
>   PL/Python function "svm_regression", line 1033, in _cross_validate_svm
>   PL/Python function "svm_regression", line 146, in output_tbl
> PL/Python function "svm_regression"
>  [SQL: "SELECT madlib.svm_regression( 'houses',\n                             
>  'houses_svm_gaussian_regression',\n                              'price',\n  
>                             'ARRAY[1, tax, bath, size]',\n                    
>           'gaussian',\n                              'n_components=10',\n     
>                          '',\n                              
> 'init_stepsize=[0.01, 1], max_iter=200, 
> validation_result=houses_svm_gaussian_regression_cv, n_folds=3'\n             
>               );"]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to