[
https://issues.apache.org/jira/browse/MADLIB-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Frank McQuillan updated MADLIB-1250:
------------------------------------
Priority: Minor (was: Major)
> Can't generate cross validation table for SVM
> ---------------------------------------------
>
> Key: MADLIB-1250
> URL: https://issues.apache.org/jira/browse/MADLIB-1250
> Project: Apache MADlib
> Issue Type: Bug
> Components: Module: Support Vector Machines
> Reporter: Frank McQuillan
> Priority: Minor
> Fix For: v1.15
>
>
> SVM does provide the CV:
> 1) The CV results table can be obtained by setting the validation_result
> variable in params parameter. This can be any arbitrary name, including
> <output_table>_cv.
> 2) The _summary table reports the best cross-validated parameter, which
> corresponds to the model in the output table. This gives the user the exact
> parameters to recreate the model. It's open for debate if that is the purpose
> of the summary table.
> 3) The docs are definitely missing examples for CV.
> But there seems to be a bug:
> {code}
> DROP TABLE IF EXISTS houses;
> CREATE TABLE houses (id INT, tax INT, bedroom INT, bath FLOAT, price INT,
> size INT, lot INT);
> INSERT INTO houses VALUES
> (1 , 590 , 2 , 1 , 50000 , 770 , 22100),
> (2 , 1050 , 3 , 2 , 85000 , 1410 , 12000),
> (3 , 20 , 3 , 1 , 22500 , 1060 , 3500),
> (4 , 870 , 2 , 2 , 90000 , 1300 , 17500),
> (5 , 1320 , 3 , 2 , 133000 , 1500 , 30000),
> (6 , 1350 , 2 , 1 , 90500 , 820 , 25700),
> (7 , 2790 , 3 , 2.5 , 260000 , 2130 , 25000),
> (8 , 680 , 2 , 1 , 142500 , 1170 , 22000),
> (9 , 1840 , 3 , 2 , 160000 , 1500 , 19000),
> (10 , 3680 , 4 , 2 , 240000 , 2790 , 20000),
> (11 , 1660 , 3 , 1 , 87000 , 1030 , 17500),
> (12 , 1620 , 3 , 2 , 118600 , 1250 , 20000),
> (13 , 3100 , 3 , 2 , 140000 , 1760 , 38000),
> (14 , 2070 , 2 , 3 , 148000 , 1550 , 14000),
> (15 , 650 , 3 , 1.5 , 65000 , 1450 , 12000);
> {code}
> Run training with CV:
> {code}
> DROP TABLE IF EXISTS houses_svm_gaussian_regression,
> houses_svm_gaussian_regression_summary,
> houses_svm_gaussian_regression_random, houses_svm_gaussian_regression_cv;
> SELECT madlib.svm_regression( 'houses',
> 'houses_svm_gaussian_regression',
> 'price',
> 'ARRAY[1, tax, bath, size]',
> 'gaussian',
> 'n_components=10',
> '',
> 'init_stepsize=[0.01, 1], max_iter=200,
> validation_result=houses_svm_gaussian_regression_cv, n_folds=3'
> );
> SELECT * FROM houses_svm_gaussian_regression_cv;
> {code}
> Results in error:
> {code}
> InternalError: (psycopg2.InternalError) KeyError: 'params_dict'
> (plpython.c:4960)
> CONTEXT: Traceback (most recent call last):
> PL/Python function "svm_regression", line 23, in <module>
> return svm.svm(**globals())
> PL/Python function "svm_regression", line 970, in svm
> PL/Python function "svm_regression", line 1033, in _cross_validate_svm
> PL/Python function "svm_regression", line 146, in output_tbl
> PL/Python function "svm_regression"
> [SQL: "SELECT madlib.svm_regression( 'houses',\n
> 'houses_svm_gaussian_regression',\n 'price',\n
> 'ARRAY[1, tax, bath, size]',\n
> 'gaussian',\n 'n_components=10',\n
> '',\n
> 'init_stepsize=[0.01, 1], max_iter=200,
> validation_result=houses_svm_gaussian_regression_cv, n_folds=3'\n
> );"]
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)