[
https://issues.apache.org/jira/browse/MADLIB-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584353#comment-16584353
]
Frank McQuillan commented on MADLIB-1262:
-----------------------------------------
Based on commit
https://github.com/apache/madlib/commit/ec328dba6853d31df5b1bd6bbdcd35933596fe78
this seems to work now.
For data set:
{code}
DROP TABLE IF EXISTS houses;
CREATE TABLE houses ( id INT,
tax INT,
bedroom INT,
bath FLOAT,
price INT,
size INT,
lot INT,
zipcode TEXT);
INSERT INTO houses (id, tax, bedroom, bath, price, size, lot, zipcode) VALUES
(1 , 590 , 2 , 1 , 50000 , 770 , 22100 , 'x94301'),
(2 , 1050 , 3 , 2 , 85000 , 1410 , 12000 , 'x94301'),
(3 , 20 , 3 , 1 , 22500 , 1060 , 3500 , 'x94301'),
(4 , 870 , 2 , 2 , 90000 , 1300 , 17500 , 'x94301'),
(5 , 1320 , 3 , 2 , 133000 , 1500 , 30000 , 'x94301'),
(6 , 1350 , 2 , 1 , 90500 , 820 , 25700 , 'x94301'),
(7 , 2790 , 3 , 2.5 , 260000 , 2130 , 25000 , 'x94301'),
(8 , 680 , 2 , 1 , 142500 , 1170 , 22000 , 'x94301'),
(9 , 1840 , 3 , 2 , 160000 , 1500 , 19000 , 'x94301'),
(10 , 3680 , 4 , 2 , 240000 , 2790 , 20000 , 'x94301'),
(11 , 1660 , 3 , 1 , 87000 , 1030 , 17500 , 'x94301'),
(12 , 1620 , 3 , 2 , 118600 , 1250 , 20000 , 'x94301'),
(13 , 3100 , 3 , 2 , 140000 , 1760 , 38000 , 'x94301'),
(14 , 2070 , 2 , 3 , 148000 , 1550 , 14000 , 'x94301'),
(15 , 650 , 3 , 1.5 , 65000 , 1450 , 12000 , 'x94301'),
(16 , 770 , 2 , 2 , 91000 , 1300 , 17500 , 'x76010'),
(17 , 1220 , 3 , 2 , 132300 , 1500 , 30000 , 'x76010'),
(18 , 1150 , 2 , 1 , 91100 , 820 , 25700 , 'x76010'),
(19 , 2690 , 3 , 2.5 , 260011 , 2130 , 25000 , 'x76010'),
(20 , 780 , 2 , 1 , 141800 , 1170 , 22000 , 'x76010'),
(21 , 1910 , 3 , 2 , 160900 , 1500 , 19000 , 'x76010'),
(22 , 3600 , 4 , 2 , 239000 , 2790 , 20000 , 'x76010'),
(23 , 1600 , 3 , 1 , 81010 , 1030 , 17500 , 'x76010'),
(24 , 1590 , 3 , 2 , 117910 , 1250 , 20000 , 'x76010'),
(25 , 3200 , 3 , 2 , 141100 , 1760 , 38000 , 'x76010'),
(26 , 2270 , 2 , 3 , 148011 , 1550 , 14000 , 'x76010'),
(27 , 750 , 3 , 1.5 , 66000 , 1450 , 12000 , 'x76010');
{code}
For query:
{code}
DROP TABLE IF EXISTS houses_en1, houses_en1_summary;
SELECT madlib.elastic_net_train( 'houses', -- Source table
'houses_en1', -- Result table
'price', -- Dependent
variable
'array[tax, bath, size]', -- Independent
variable
'gaussian', -- Regression family
0.5, -- Alpha value
0.1, -- Lambda value
TRUE, -- Standardize
'zipcode', -- Grouping
column(s)
'fista', -- Optimizer
'', -- Optimizer
parameters
NULL, -- Excluded columns
10000, -- Maximum
iterations
1e-6 -- Tolerance value
);
SELECT * FROM houses_en1;
{code}
In 1.15 the error is:
{code}
Done.
(psycopg2.ProgrammingError) spiexceptions.UndefinedColumn: column "x94301" does
not exist
LINE 3: (x94301, 'gaussian', '{tax,bath,size}'::text...
^
QUERY:
INSERT INTO houses_en1 VALUES
(x94301, 'gaussian', '{tax,bath,size}'::text[],
'{tax,bath,size}'::text[],
'{27.6945611116,11509.0099645,49.094555881}'::double
precision[], '{27.6945611116,11509.0099645,49.094555881}'::double precision[],
-11145.5062166, -520358787.006, True, 10000)
CONTEXT: Traceback (most recent call last):
PL/Python function "elastic_net_train", line 24, in <module>
excluded, max_iter, tolerance)
PL/Python function "elastic_net_train", line 469, in elastic_net_train
PL/Python function "elastic_net_train", line 510, in
_internal_elastic_net_train
PL/Python function "elastic_net_train", line 24, in
_elastic_net_gaussian_fista_train
PL/Python function "elastic_net_train", line 171, in _elastic_net_fista_train
PL/Python function "elastic_net_train", line 297, in
_elastic_net_fista_train_compute
PL/Python function "elastic_net_train", line 120, in
_elastic_net_generate_result
PL/Python function "elastic_net_train", line 191, in build_output_table
PL/Python function "elastic_net_train"
[SQL: "SELECT madlib.elastic_net_train( 'houses', -- Source
table\n 'houses_en1', -- Result
table\n 'price', -- Dependent
variable\n 'array[tax, bath, size]', --
Independent variable\n 'gaussian',
-- Regression family\n 0.5,
-- Alpha value\n 0.1,
-- Lambda value\n TRUE, --
Standardize\n 'zipcode', --
Grouping column(s)\n 'fista',
-- Optimizer\n '', --
Optimizer parameters\n NULL,
-- Excluded columns\n 10000,
-- Maximum iterations\n 1e-6
-- Tolerance value\n );"]
{code}
But for 1.15.1 with the commit applied, I get
{code}
-[ RECORD 1 ]-----+--------------------------------------------
zipcode | x94301
family | gaussian
features | {tax,bath,size}
features_selected | {tax,bath,size}
coef_nonzero | {27.6945595076,11509.0096164,49.0945593644}
coef_all | {27.6945595076,11509.0096164,49.0945593644}
intercept | -11145.5081132
log_likelihood | -520358783.294
standardize | t
iteration_run | 10000
-[ RECORD 2 ]-----+--------------------------------------------
zipcode | x76010
family | gaussian
features | {tax,bath,size}
features_selected | {tax,bath,size}
coef_nonzero | {14.9802170998,9133.17395416,62.8225354177}
coef_all | {14.9802170998,9133.17395416,62.8225354177}
intercept | 14.7356231281
log_likelihood | -525667072.871
standardize | t
iteration_run | 10000
{code}
> Error for elastic net when the group column contains no-numeric characters
> --------------------------------------------------------------------------
>
> Key: MADLIB-1262
> URL: https://issues.apache.org/jira/browse/MADLIB-1262
> Project: Apache MADlib
> Issue Type: Bug
> Components: Module: Regularized Regression
> Reporter: Frank McQuillan
> Priority: Minor
> Fix For: v1.15.1
>
>
> Minor error for elastic net when the group column contains no-numeric
> characters.
> It can be fixed by changing the below code in elastic_net_generate_result.py
> {code}
> grouping_info = "\'"+ '\',\''.join([str(res[grp_col.strip()])
> for grp_col in grouping_str1.split(',')
> if grp_col.strip() in res.keys()]) +
> "\',"
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)