[ 
https://issues.apache.org/jira/browse/MADLIB-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537666#comment-16537666
 ] 

Frank McQuillan commented on MADLIB-1250:
-----------------------------------------

Sorry, I was looking at the wrong commit.  It does work and product a cv table.

Follow on question:  What is relationship between the loss in the model file 
(53.304...) and the loss in the cv file (-3.967...) ?


{code}
SELECT * FROM abalone_svm_gaussian_regression;

-[ RECORD 1 
]------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
coef               | 
{4.49159888608739,2.15910382155845,-2.08509487146716,1.11174909585583,2.80827257045678,-4.2929982834413,4.17107692883502,-5.3961512636042,0.843266166563478,-3.63205371428082}
loss               | 53.3043777086007
norm_of_gradient   | 0.973725991236767
num_iterations     | 167
num_rows_processed | 20
num_rows_skipped   | 0
dep_var_mapping    | {NULL}
{code}



{code}
SELECT * FROM abalone_svm_gaussian_regression_cv;

<pre class="result">
 init_stepsize | lambda | mean_neg_loss  | std_neg_loss  
---------------+--------+----------------+---------------
           1.0 |   0.01 | -3.96735188976 |  1.3798803678
           1.0 |    0.1 |   -4.095856273 |  1.6177430524
           1.0 |    0.5 | -4.46655412608 | 1.84835810144
          0.01 |   0.01 | -10.6320659903 | 2.81168056113
          0.01 |    0.1 | -10.6334744319 | 2.81159136886
          0.01 |    0.5 | -10.6396840998 | 2.81119803995
(6 rows)
{code}



> Can't generate cross validation table for SVM
> ---------------------------------------------
>
>                 Key: MADLIB-1250
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1250
>             Project: Apache MADlib
>          Issue Type: Bug
>          Components: Module: Support Vector Machines
>            Reporter: Frank McQuillan
>            Assignee: Rahul Iyer
>            Priority: Minor
>             Fix For: v1.15
>
>         Attachments: svm_cv_output.sql
>
>
> SVM does provide the CV:
> 1) The CV results table can be obtained by setting the validation_result 
> variable in params parameter. This can be any arbitrary name, including 
> <output_table>_cv.
> 2) The _summary table reports the best cross-validated parameter, which 
> corresponds to the model in the output table. This gives the user the exact 
> parameters to recreate the model. It's open for debate if that is the purpose 
> of the summary table.
> 3) The docs are definitely missing examples for CV.
> But there seems to be a bug:
> {code}
> DROP TABLE IF EXISTS houses;
> CREATE TABLE houses (id INT, tax INT, bedroom INT, bath FLOAT, price INT,
>             size INT, lot INT);
> INSERT INTO houses VALUES   
>   (1 ,  590 ,       2 ,    1 ,  50000 ,  770 , 22100),
>   (2 , 1050 ,       3 ,    2 ,  85000 , 1410 , 12000),
>   (3 ,   20 ,       3 ,    1 ,  22500 , 1060 ,  3500),
>   (4 ,  870 ,       2 ,    2 ,  90000 , 1300 , 17500),
>   (5 , 1320 ,       3 ,    2 , 133000 , 1500 , 30000),
>   (6 , 1350 ,       2 ,    1 ,  90500 ,  820 , 25700),
>   (7 , 2790 ,       3 ,  2.5 , 260000 , 2130 , 25000),
>   (8 ,  680 ,       2 ,    1 , 142500 , 1170 , 22000),
>   (9 , 1840 ,       3 ,    2 , 160000 , 1500 , 19000),
>  (10 , 3680 ,       4 ,    2 , 240000 , 2790 , 20000),
>  (11 , 1660 ,       3 ,    1 ,  87000 , 1030 , 17500),
>  (12 , 1620 ,       3 ,    2 , 118600 , 1250 , 20000),
>  (13 , 3100 ,       3 ,    2 , 140000 , 1760 , 38000),
>  (14 , 2070 ,       2 ,    3 , 148000 , 1550 , 14000),
>  (15 ,  650 ,       3 ,  1.5 ,  65000 , 1450 , 12000);
> {code}
> Run training with CV:
> {code}
> DROP TABLE IF EXISTS houses_svm_gaussian_regression, 
> houses_svm_gaussian_regression_summary, 
> houses_svm_gaussian_regression_random, houses_svm_gaussian_regression_cv;
> SELECT madlib.svm_regression( 'houses',
>                               'houses_svm_gaussian_regression',
>                               'price',
>                               'ARRAY[1, tax, bath, size]',
>                               'gaussian',
>                               'n_components=10',
>                               '',
>                               'init_stepsize=[0.01, 1], max_iter=200, 
> validation_result=houses_svm_gaussian_regression_cv, n_folds=3'
>                            );
> SELECT * FROM houses_svm_gaussian_regression_cv;
> {code}
> Results in error:
> {code}
> InternalError: (psycopg2.InternalError) KeyError: 'params_dict' 
> (plpython.c:4960)
> CONTEXT:  Traceback (most recent call last):
>   PL/Python function "svm_regression", line 23, in <module>
>     return svm.svm(**globals())
>   PL/Python function "svm_regression", line 970, in svm
>   PL/Python function "svm_regression", line 1033, in _cross_validate_svm
>   PL/Python function "svm_regression", line 146, in output_tbl
> PL/Python function "svm_regression"
>  [SQL: "SELECT madlib.svm_regression( 'houses',\n                             
>  'houses_svm_gaussian_regression',\n                              'price',\n  
>                             'ARRAY[1, tax, bath, size]',\n                    
>           'gaussian',\n                              'n_components=10',\n     
>                          '',\n                              
> 'init_stepsize=[0.01, 1], max_iter=200, 
> validation_result=houses_svm_gaussian_regression_cv, n_folds=3'\n             
>               );"]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to