[jira] [Commented] (MADLIB-1094) Elastic Net fails when used without normalization
[ https://issues.apache.org/jira/browse/MADLIB-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131324#comment-16131324 ] Frank McQuillan commented on MADLIB-1094: - this works now {code} DROP TABLE IF EXISTS house_en,house_en_summary; SELECT madlib.elastic_net_train( 'houses', -- input 'house_en', -- ouput 'price',-- dependent var 'ARRAY[tax, bath, size]', -- independent vars 'gaussian', -- regression family 0.5,-- alpha 0.5,-- lambda True, -- normalize? NULL, -- grouping col 'igd', -- optimizer '', -- optimizer params NULL, -- excluded cols 1, -- max iterations 1e-6-- tolerance {code} > Elastic Net fails when used without normalization > - > > Key: MADLIB-1094 > URL: https://issues.apache.org/jira/browse/MADLIB-1094 > Project: Apache MADlib > Issue Type: Bug > Components: Module: Regularized Regression >Reporter: Nandish Jayaram >Priority: Minor > Fix For: v1.12 > > > Using Elastic Net with the normalization/standardize flag turned off (for > Gaussian IGD) results in failure, with the following error: > {code:sql} > madlib-pg94=# SELECT madlib.elastic_net_train( > 'houses1', > 'houses_en', > 'array[tax, bath, size]', > 'gaussian', > 0.5, > 0.1, > FALSE, -- Standardize > NULL, > 'igd', > '', > NULL, > 1,1e-6); > ERROR: spiexceptions.NumericValueOutOfRange: value out of range: overflow > CONTEXT: Traceback (most recent call last): > PL/Python function "elastic_net_train", line 23, in > return elastic_net.elastic_net_train(**globals()) > PL/Python function "elastic_net_train", line 332, in elastic_net_train > PL/Python function "elastic_net_train", line 42, in > __elastic_net_gaussian_igd_train > PL/Python function "elastic_net_train", line 268, in __elastic_net_igd_train > PL/Python function "elastic_net_train", line 373, in > __elastic_net_igd_train_compute > PL/Python function "elastic_net_train", line 69, in > __elastic_net_generate_result > PL/Python function "elastic_net_train", line 154, in > __compute_log_likelihood > PL/Python function "elastic_net_train" > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MADLIB-1094) Elastic Net fails when used without normalization
[ https://issues.apache.org/jira/browse/MADLIB-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123805#comment-16123805 ] ASF GitHub Bot commented on MADLIB-1094: Github user asfgit closed the pull request at: https://github.com/apache/incubator-madlib/pull/164 > Elastic Net fails when used without normalization > - > > Key: MADLIB-1094 > URL: https://issues.apache.org/jira/browse/MADLIB-1094 > Project: Apache MADlib > Issue Type: Bug > Components: Module: Regularized Regression >Reporter: Nandish Jayaram >Priority: Minor > Fix For: v1.12 > > > Using Elastic Net with the normalization/standardize flag turned off (for > Gaussian IGD) results in failure, with the following error: > {code:sql} > madlib-pg94=# SELECT madlib.elastic_net_train( > 'houses1', > 'houses_en', > 'array[tax, bath, size]', > 'gaussian', > 0.5, > 0.1, > FALSE, -- Standardize > NULL, > 'igd', > '', > NULL, > 1,1e-6); > ERROR: spiexceptions.NumericValueOutOfRange: value out of range: overflow > CONTEXT: Traceback (most recent call last): > PL/Python function "elastic_net_train", line 23, in > return elastic_net.elastic_net_train(**globals()) > PL/Python function "elastic_net_train", line 332, in elastic_net_train > PL/Python function "elastic_net_train", line 42, in > __elastic_net_gaussian_igd_train > PL/Python function "elastic_net_train", line 268, in __elastic_net_igd_train > PL/Python function "elastic_net_train", line 373, in > __elastic_net_igd_train_compute > PL/Python function "elastic_net_train", line 69, in > __elastic_net_generate_result > PL/Python function "elastic_net_train", line 154, in > __compute_log_likelihood > PL/Python function "elastic_net_train" > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MADLIB-1094) Elastic Net fails when used without normalization
[ https://issues.apache.org/jira/browse/MADLIB-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16122503#comment-16122503 ] ASF GitHub Bot commented on MADLIB-1094: GitHub user cooper-sloan opened a pull request: https://github.com/apache/incubator-madlib/pull/164 Elastic Net: Fix normalization issue MADLIB-1094 and MADLIB-1146 avg in psql is numerically unstable Data scaling was not occuring when grouping is true. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cooper-sloan/incubator-madlib elastic_net_normalization Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-madlib/pull/164.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #164 commit 0b00513bf20e7f0b9032b267472321bd6cfc4355 Author: Cooper Sloan Date: 2017-08-10T19:04:04Z Elastic Net: Fix normalization issue MADLIB-1094 and MADLIB-1146 avg in psql is numerically unstable Data scaling was not occuring when grouping is true. > Elastic Net fails when used without normalization > - > > Key: MADLIB-1094 > URL: https://issues.apache.org/jira/browse/MADLIB-1094 > Project: Apache MADlib > Issue Type: Bug > Components: Module: Regularized Regression >Reporter: Nandish Jayaram >Priority: Minor > Fix For: v1.12 > > > Using Elastic Net with the normalization/standardize flag turned off (for > Gaussian IGD) results in failure, with the following error: > {code:sql} > madlib-pg94=# SELECT madlib.elastic_net_train( > 'houses1', > 'houses_en', > 'array[tax, bath, size]', > 'gaussian', > 0.5, > 0.1, > FALSE, -- Standardize > NULL, > 'igd', > '', > NULL, > 1,1e-6); > ERROR: spiexceptions.NumericValueOutOfRange: value out of range: overflow > CONTEXT: Traceback (most recent call last): > PL/Python function "elastic_net_train", line 23, in > return elastic_net.elastic_net_train(**globals()) > PL/Python function "elastic_net_train", line 332, in elastic_net_train > PL/Python function "elastic_net_train", line 42, in > __elastic_net_gaussian_igd_train > PL/Python function "elastic_net_train", line 268, in __elastic_net_igd_train > PL/Python function "elastic_net_train", line 373, in > __elastic_net_igd_train_compute > PL/Python function "elastic_net_train", line 69, in > __elastic_net_generate_result > PL/Python function "elastic_net_train", line 154, in > __compute_log_likelihood > PL/Python function "elastic_net_train" > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MADLIB-1094) Elastic Net fails when used without normalization
[ https://issues.apache.org/jira/browse/MADLIB-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16122289#comment-16122289 ] Cooper Sloan commented on MADLIB-1094: -- Updated repro ``` DROP TABLE IF EXISTS houses; CREATE TABLE houses ( id INT, tax INT, bedroom INT, bath FLOAT, price INT, size INT, lot INT, zipcode INT); INSERT INTO houses VALUES (1, 590, 2, 1, 5, 770, 22100, 94301), (2, 1050, 3, 2, 85000, 1410, 12000, 94301), (3, 20, 3, 1, 22500, 1060, 3500, 94301), (4, 870, 2, 2, 9, 1300, 17500, 94301), (5, 1320, 3, 2, 133000, 1500, 3, 94301), (6, 1350, 2, 1, 90500, 820, 25700, 94301), (7, 2790, 3, 2.5, 26, 2130, 25000, 94301), (8, 680, 2, 1, 142500, 1170, 22000, 94301), (9, 1840, 3, 2, 16, 1500, 19000, 94301), (10, 3680, 4, 2, 24, 2790, 2, 94301), (11, 1660, 3, 1, 87000, 1030, 17500, 94301), (12, 1620, 3, 2, 118600, 1250, 2, 94301), (13, 3100, 3, 2, 14, 1760, 38000, 94301), (14, 2070, 2, 3, 148000, 1550, 14000, 94301), (15, 650, 3, 1.5, 65000, 1450, 12000, 94301), (16, 770, 2, 2, 91000, 1300, 17500, 76010), (17, 1220, 3, 2, 132300, 1500, 3, 76010), (18, 1150, 2, 1, 91100, 820, 25700, 76010), (19, 2690, 3, 2.5, 260011, 2130, 25000, 76010), (20, 780, 2, 1, 141800, 1170, 22000, 76010), (21, 1910, 3, 2, 160900, 1500, 19000, 76010), (22, 3600, 4, 2, 239000, 2790, 2, 76010), (23, 1600, 3, 1, 81010, 1030, 17500, 76010), (24, 1590, 3, 2, 117910, 1250, 2, 76010), (25, 3200, 3, 2, 141100, 1760, 38000, 76010), (26, 2270, 2, 3, 148011, 1550, 14000, 76010), (27, 750, 3, 1.5, 66000, 1450, 12000, 76010); DROP TABLE IF EXISTS house_en,house_en_summary; SELECT madlib.elastic_net_train( 'houses', 'house_en', 'price', 'ARRAY[tax, bath, size]', 'gaussian', 0.5, 0.5, False, NULL, 'igd', '', NULL, 1, 1e-6 ); psql:/Users/csloan/elastic_net.sql:17: ERROR: spiexceptions.NumericValueOutOfRange: value out of range: overflow CONTEXT: Traceback (most recent call last): PL/Python function "elastic_net_train", line 27, in excluded, max_iter, tolerance) PL/Python function "elastic_net_train", line 467, in elastic_net_train PL/Python function "elastic_net_train", line 495, in _internal_elastic_net_train PL/Python function "elastic_net_train", line 46, in _elastic_net_gaussian_igd_train PL/Python function "elastic_net_train", line 169, in _elastic_net_igd_train PL/Python function "elastic_net_train", line 297, in _elastic_net_igd_train_compute PL/Python function "elastic_net_train", line 121, in _elastic_net_generate_result PL/Python function "elastic_net_train", line 174, in build_output_table PL/Python function "elastic_net_train", line 143, in _compute_log_likelihood PL/Python function "elastic_net_train" ``` > Elastic Net fails when used without normalization > - > > Key: MADLIB-1094 > URL: https://issues.apache.org/jira/browse/MADLIB-1094 > Project: Apache MADlib > Issue Type: Bug > Components: Module: Regularized Regression >Reporter: Nandish Jayaram >Priority: Minor > Fix For: v1.12 > > > Using Elastic Net with the normalization/standardize flag turned off (for > Gaussian IGD) results in failure, with the following error: > {code:sql} > madlib-pg94=# SELECT madlib.elastic_net_train( > 'houses1', > 'houses_en', > 'array[tax, bath, size]', > 'gaussian', > 0.5, > 0.1, > FALSE, -- Standardize > NULL, > 'igd', > '', > NULL, > 1,1e-6); > ERROR: spiexceptions.NumericValueOutOfRange: value out of range: overflow > CONTEXT: Traceback (most recent call last): > PL/Python function "elastic_net_train", line 23, in > return elastic_net.elastic_net_train(**globals()) > PL/Python function "elastic_net_train", line 332, in elastic_net_train > PL/Python function "elastic_net_train", line 42, in > __elastic_net_gaussian_igd_train > PL/Python function "elastic_net_train", line 268, in __elastic_net_igd_train > PL/Python function "elastic_net_train", line 373, in > __elastic_net_igd_train_compute > PL/Python function "elastic_net_train", line 69, in > __elastic_net_generate_result > PL/Python function "elastic_net_train", line 154, in > __compute_log_likelihood > PL/Python function "elastic_net_train" > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MADLIB-1094) Elastic Net fails when used without normalization
[ https://issues.apache.org/jira/browse/MADLIB-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15983663#comment-15983663 ] Frank McQuillan commented on MADLIB-1094: - Looks like this bug may have been there before. I'd suggest we have a look in 1.12 but not hold up the 1.11 release for this one. > Elastic Net fails when used without normalization > - > > Key: MADLIB-1094 > URL: https://issues.apache.org/jira/browse/MADLIB-1094 > Project: Apache MADlib > Issue Type: Bug > Components: Module: Regularized Regression >Reporter: Nandish Jayaram > Fix For: v1.12 > > > Using Elastic Net with the normalization/standardize flag turned off (for > Gaussian IGD) results in failure, with the following error: > {code:sql} > madlib-pg94=# SELECT madlib.elastic_net_train( > 'houses1', > 'houses_en', > 'array[tax, bath, size]', > 'gaussian', > 0.5, > 0.1, > FALSE, -- Standardize > NULL, > 'igd', > '', > NULL, > 1,1e-6); > ERROR: spiexceptions.NumericValueOutOfRange: value out of range: overflow > CONTEXT: Traceback (most recent call last): > PL/Python function "elastic_net_train", line 23, in > return elastic_net.elastic_net_train(**globals()) > PL/Python function "elastic_net_train", line 332, in elastic_net_train > PL/Python function "elastic_net_train", line 42, in > __elastic_net_gaussian_igd_train > PL/Python function "elastic_net_train", line 268, in __elastic_net_igd_train > PL/Python function "elastic_net_train", line 373, in > __elastic_net_igd_train_compute > PL/Python function "elastic_net_train", line 69, in > __elastic_net_generate_result > PL/Python function "elastic_net_train", line 154, in > __compute_log_likelihood > PL/Python function "elastic_net_train" > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MADLIB-1094) Elastic Net fails when used without normalization
[ https://issues.apache.org/jira/browse/MADLIB-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15983336#comment-15983336 ] Rashmi Raghu commented on MADLIB-1094: -- The scikit-learn result in the previous comment is with data that was not standardized and not normalized. > Elastic Net fails when used without normalization > - > > Key: MADLIB-1094 > URL: https://issues.apache.org/jira/browse/MADLIB-1094 > Project: Apache MADlib > Issue Type: Bug > Components: Module: Regularized Regression >Reporter: Nandish Jayaram > Fix For: v1.11 > > > Using Elastic Net with the normalization/standardize flag turned off (for > Gaussian IGD) results in failure, with the following error: > {code:sql} > madlib-pg94=# SELECT madlib.elastic_net_train( > 'houses1', > 'houses_en', > 'array[tax, bath, size]', > 'gaussian', > 0.5, > 0.1, > FALSE, -- Standardize > NULL, > 'igd', > '', > NULL, > 1,1e-6); > ERROR: spiexceptions.NumericValueOutOfRange: value out of range: overflow > CONTEXT: Traceback (most recent call last): > PL/Python function "elastic_net_train", line 23, in > return elastic_net.elastic_net_train(**globals()) > PL/Python function "elastic_net_train", line 332, in elastic_net_train > PL/Python function "elastic_net_train", line 42, in > __elastic_net_gaussian_igd_train > PL/Python function "elastic_net_train", line 268, in __elastic_net_igd_train > PL/Python function "elastic_net_train", line 373, in > __elastic_net_igd_train_compute > PL/Python function "elastic_net_train", line 69, in > __elastic_net_generate_result > PL/Python function "elastic_net_train", line 154, in > __compute_log_likelihood > PL/Python function "elastic_net_train" > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MADLIB-1094) Elastic Net fails when used without normalization
[ https://issues.apache.org/jira/browse/MADLIB-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15983323#comment-15983323 ] Nandish Jayaram commented on MADLIB-1094: - Thank you for trying it out on scikit learn Rashmi. To give you a bit more information from my end, I did get similar coefficients with MADlib when I used Gaussian IGD *with* normalization. The exact coefficients I got are: bq. {22.8037405001, 10717.6300401, 54.8314625851} > Elastic Net fails when used without normalization > - > > Key: MADLIB-1094 > URL: https://issues.apache.org/jira/browse/MADLIB-1094 > Project: Apache MADlib > Issue Type: Bug > Components: Module: Regularized Regression >Reporter: Nandish Jayaram > Fix For: v1.11 > > > Using Elastic Net with the normalization/standardize flag turned off (for > Gaussian IGD) results in failure, with the following error: > {code:sql} > madlib-pg94=# SELECT madlib.elastic_net_train( > 'houses1', > 'houses_en', > 'array[tax, bath, size]', > 'gaussian', > 0.5, > 0.1, > FALSE, -- Standardize > NULL, > 'igd', > '', > NULL, > 1,1e-6); > ERROR: spiexceptions.NumericValueOutOfRange: value out of range: overflow > CONTEXT: Traceback (most recent call last): > PL/Python function "elastic_net_train", line 23, in > return elastic_net.elastic_net_train(**globals()) > PL/Python function "elastic_net_train", line 332, in elastic_net_train > PL/Python function "elastic_net_train", line 42, in > __elastic_net_gaussian_igd_train > PL/Python function "elastic_net_train", line 268, in __elastic_net_igd_train > PL/Python function "elastic_net_train", line 373, in > __elastic_net_igd_train_compute > PL/Python function "elastic_net_train", line 69, in > __elastic_net_generate_result > PL/Python function "elastic_net_train", line 154, in > __compute_log_likelihood > PL/Python function "elastic_net_train" > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MADLIB-1094) Elastic Net fails when used without normalization
[ https://issues.apache.org/jira/browse/MADLIB-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15983299#comment-15983299 ] Rashmi Raghu commented on MADLIB-1094: -- Trying the same problem with scikit-learn works fine. Resulting coefficients for the features are: {[ 23.15135449, 7759.59888732,59.03219813]}}. It is not yet clear what optimizer scikit-learn is using though. > Elastic Net fails when used without normalization > - > > Key: MADLIB-1094 > URL: https://issues.apache.org/jira/browse/MADLIB-1094 > Project: Apache MADlib > Issue Type: Bug > Components: Module: Regularized Regression >Reporter: Nandish Jayaram > Fix For: v1.11 > > > Using Elastic Net with the normalization/standardize flag turned off (for > Gaussian IGD) results in failure, with the following error: > {code:sql} > madlib-pg94=# SELECT madlib.elastic_net_train( > 'houses1', > 'houses_en', > 'array[tax, bath, size]', > 'gaussian', > 0.5, > 0.1, > FALSE, -- Standardize > NULL, > 'igd', > '', > NULL, > 1,1e-6); > ERROR: spiexceptions.NumericValueOutOfRange: value out of range: overflow > CONTEXT: Traceback (most recent call last): > PL/Python function "elastic_net_train", line 23, in > return elastic_net.elastic_net_train(**globals()) > PL/Python function "elastic_net_train", line 332, in elastic_net_train > PL/Python function "elastic_net_train", line 42, in > __elastic_net_gaussian_igd_train > PL/Python function "elastic_net_train", line 268, in __elastic_net_igd_train > PL/Python function "elastic_net_train", line 373, in > __elastic_net_igd_train_compute > PL/Python function "elastic_net_train", line 69, in > __elastic_net_generate_result > PL/Python function "elastic_net_train", line 154, in > __compute_log_likelihood > PL/Python function "elastic_net_train" > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)