[ https://issues.apache.org/jira/browse/MADLIB-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532078#comment-16532078 ]
Rahul Iyer commented on MADLIB-1250: ------------------------------------ PR for the fix: https://github.com/apache/madlib/pull/287 > Can't generate cross validation table for SVM > --------------------------------------------- > > Key: MADLIB-1250 > URL: https://issues.apache.org/jira/browse/MADLIB-1250 > Project: Apache MADlib > Issue Type: Bug > Components: Module: Support Vector Machines > Reporter: Frank McQuillan > Assignee: Rahul Iyer > Priority: Minor > Fix For: v1.15 > > > SVM does provide the CV: > 1) The CV results table can be obtained by setting the validation_result > variable in params parameter. This can be any arbitrary name, including > <output_table>_cv. > 2) The _summary table reports the best cross-validated parameter, which > corresponds to the model in the output table. This gives the user the exact > parameters to recreate the model. It's open for debate if that is the purpose > of the summary table. > 3) The docs are definitely missing examples for CV. > But there seems to be a bug: > {code} > DROP TABLE IF EXISTS houses; > CREATE TABLE houses (id INT, tax INT, bedroom INT, bath FLOAT, price INT, > size INT, lot INT); > INSERT INTO houses VALUES > (1 , 590 , 2 , 1 , 50000 , 770 , 22100), > (2 , 1050 , 3 , 2 , 85000 , 1410 , 12000), > (3 , 20 , 3 , 1 , 22500 , 1060 , 3500), > (4 , 870 , 2 , 2 , 90000 , 1300 , 17500), > (5 , 1320 , 3 , 2 , 133000 , 1500 , 30000), > (6 , 1350 , 2 , 1 , 90500 , 820 , 25700), > (7 , 2790 , 3 , 2.5 , 260000 , 2130 , 25000), > (8 , 680 , 2 , 1 , 142500 , 1170 , 22000), > (9 , 1840 , 3 , 2 , 160000 , 1500 , 19000), > (10 , 3680 , 4 , 2 , 240000 , 2790 , 20000), > (11 , 1660 , 3 , 1 , 87000 , 1030 , 17500), > (12 , 1620 , 3 , 2 , 118600 , 1250 , 20000), > (13 , 3100 , 3 , 2 , 140000 , 1760 , 38000), > (14 , 2070 , 2 , 3 , 148000 , 1550 , 14000), > (15 , 650 , 3 , 1.5 , 65000 , 1450 , 12000); > {code} > Run training with CV: > {code} > DROP TABLE IF EXISTS houses_svm_gaussian_regression, > houses_svm_gaussian_regression_summary, > houses_svm_gaussian_regression_random, houses_svm_gaussian_regression_cv; > SELECT madlib.svm_regression( 'houses', > 'houses_svm_gaussian_regression', > 'price', > 'ARRAY[1, tax, bath, size]', > 'gaussian', > 'n_components=10', > '', > 'init_stepsize=[0.01, 1], max_iter=200, > validation_result=houses_svm_gaussian_regression_cv, n_folds=3' > ); > SELECT * FROM houses_svm_gaussian_regression_cv; > {code} > Results in error: > {code} > InternalError: (psycopg2.InternalError) KeyError: 'params_dict' > (plpython.c:4960) > CONTEXT: Traceback (most recent call last): > PL/Python function "svm_regression", line 23, in <module> > return svm.svm(**globals()) > PL/Python function "svm_regression", line 970, in svm > PL/Python function "svm_regression", line 1033, in _cross_validate_svm > PL/Python function "svm_regression", line 146, in output_tbl > PL/Python function "svm_regression" > [SQL: "SELECT madlib.svm_regression( 'houses',\n > 'houses_svm_gaussian_regression',\n 'price',\n > 'ARRAY[1, tax, bath, size]',\n > 'gaussian',\n 'n_components=10',\n > '',\n > 'init_stepsize=[0.01, 1], max_iter=200, > validation_result=houses_svm_gaussian_regression_cv, n_folds=3'\n > );"] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)