[jira] [Commented] (MADLIB-1094) Elastic Net fails when used without normalization

2017-08-17 Thread Frank McQuillan (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131324#comment-16131324
 ] 

Frank McQuillan commented on MADLIB-1094:
-

this works now

{code}
DROP TABLE IF EXISTS house_en,house_en_summary;
SELECT madlib.elastic_net_train(
'houses',   -- input
'house_en', -- ouput
'price',-- dependent var
'ARRAY[tax, bath, size]',  -- independent vars
'gaussian', -- regression family
0.5,-- alpha
0.5,-- lambda
True,  -- normalize?
NULL,  -- grouping col
'igd',  -- optimizer
'', -- optimizer params
NULL,   -- excluded cols
1,  -- max iterations
1e-6-- tolerance
{code}

> Elastic Net fails when used without normalization
> -
>
> Key: MADLIB-1094
> URL: https://issues.apache.org/jira/browse/MADLIB-1094
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Module: Regularized Regression
>Reporter: Nandish Jayaram
>Priority: Minor
> Fix For: v1.12
>
>
> Using Elastic Net with the normalization/standardize flag turned off (for 
> Gaussian IGD) results in failure, with the following error:
> {code:sql}
> madlib-pg94=# SELECT madlib.elastic_net_train(
> 'houses1',
> 'houses_en',
> 'array[tax, bath, size]',
> 'gaussian',
> 0.5,
> 0.1, 
> FALSE,  -- Standardize 
> NULL,
> 'igd',
> '',
> NULL,
> 1,1e-6);
> ERROR:  spiexceptions.NumericValueOutOfRange: value out of range: overflow
> CONTEXT:  Traceback (most recent call last):
>   PL/Python function "elastic_net_train", line 23, in 
> return elastic_net.elastic_net_train(**globals())
>   PL/Python function "elastic_net_train", line 332, in elastic_net_train
>   PL/Python function "elastic_net_train", line 42, in 
> __elastic_net_gaussian_igd_train
>   PL/Python function "elastic_net_train", line 268, in __elastic_net_igd_train
>   PL/Python function "elastic_net_train", line 373, in 
> __elastic_net_igd_train_compute
>   PL/Python function "elastic_net_train", line 69, in 
> __elastic_net_generate_result
>   PL/Python function "elastic_net_train", line 154, in 
> __compute_log_likelihood
> PL/Python function "elastic_net_train"
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MADLIB-1094) Elastic Net fails when used without normalization

2017-08-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123805#comment-16123805
 ] 

ASF GitHub Bot commented on MADLIB-1094:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-madlib/pull/164


> Elastic Net fails when used without normalization
> -
>
> Key: MADLIB-1094
> URL: https://issues.apache.org/jira/browse/MADLIB-1094
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Module: Regularized Regression
>Reporter: Nandish Jayaram
>Priority: Minor
> Fix For: v1.12
>
>
> Using Elastic Net with the normalization/standardize flag turned off (for 
> Gaussian IGD) results in failure, with the following error:
> {code:sql}
> madlib-pg94=# SELECT madlib.elastic_net_train(
> 'houses1',
> 'houses_en',
> 'array[tax, bath, size]',
> 'gaussian',
> 0.5,
> 0.1, 
> FALSE,  -- Standardize 
> NULL,
> 'igd',
> '',
> NULL,
> 1,1e-6);
> ERROR:  spiexceptions.NumericValueOutOfRange: value out of range: overflow
> CONTEXT:  Traceback (most recent call last):
>   PL/Python function "elastic_net_train", line 23, in 
> return elastic_net.elastic_net_train(**globals())
>   PL/Python function "elastic_net_train", line 332, in elastic_net_train
>   PL/Python function "elastic_net_train", line 42, in 
> __elastic_net_gaussian_igd_train
>   PL/Python function "elastic_net_train", line 268, in __elastic_net_igd_train
>   PL/Python function "elastic_net_train", line 373, in 
> __elastic_net_igd_train_compute
>   PL/Python function "elastic_net_train", line 69, in 
> __elastic_net_generate_result
>   PL/Python function "elastic_net_train", line 154, in 
> __compute_log_likelihood
> PL/Python function "elastic_net_train"
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MADLIB-1094) Elastic Net fails when used without normalization

2017-08-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16122503#comment-16122503
 ] 

ASF GitHub Bot commented on MADLIB-1094:


GitHub user cooper-sloan opened a pull request:

https://github.com/apache/incubator-madlib/pull/164

Elastic Net: Fix normalization issue

MADLIB-1094 and MADLIB-1146

avg in psql is numerically unstable
Data scaling was not occuring when
grouping is true.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cooper-sloan/incubator-madlib 
elastic_net_normalization

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-madlib/pull/164.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #164


commit 0b00513bf20e7f0b9032b267472321bd6cfc4355
Author: Cooper Sloan 
Date:   2017-08-10T19:04:04Z

Elastic Net: Fix normalization issue

MADLIB-1094 and MADLIB-1146

avg in psql is numerically unstable
Data scaling was not occuring when
grouping is true.




> Elastic Net fails when used without normalization
> -
>
> Key: MADLIB-1094
> URL: https://issues.apache.org/jira/browse/MADLIB-1094
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Module: Regularized Regression
>Reporter: Nandish Jayaram
>Priority: Minor
> Fix For: v1.12
>
>
> Using Elastic Net with the normalization/standardize flag turned off (for 
> Gaussian IGD) results in failure, with the following error:
> {code:sql}
> madlib-pg94=# SELECT madlib.elastic_net_train(
> 'houses1',
> 'houses_en',
> 'array[tax, bath, size]',
> 'gaussian',
> 0.5,
> 0.1, 
> FALSE,  -- Standardize 
> NULL,
> 'igd',
> '',
> NULL,
> 1,1e-6);
> ERROR:  spiexceptions.NumericValueOutOfRange: value out of range: overflow
> CONTEXT:  Traceback (most recent call last):
>   PL/Python function "elastic_net_train", line 23, in 
> return elastic_net.elastic_net_train(**globals())
>   PL/Python function "elastic_net_train", line 332, in elastic_net_train
>   PL/Python function "elastic_net_train", line 42, in 
> __elastic_net_gaussian_igd_train
>   PL/Python function "elastic_net_train", line 268, in __elastic_net_igd_train
>   PL/Python function "elastic_net_train", line 373, in 
> __elastic_net_igd_train_compute
>   PL/Python function "elastic_net_train", line 69, in 
> __elastic_net_generate_result
>   PL/Python function "elastic_net_train", line 154, in 
> __compute_log_likelihood
> PL/Python function "elastic_net_train"
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MADLIB-1094) Elastic Net fails when used without normalization

2017-08-10 Thread Cooper Sloan (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16122289#comment-16122289
 ] 

Cooper Sloan commented on MADLIB-1094:
--

Updated repro
```
DROP TABLE IF EXISTS houses;
CREATE TABLE houses ( id INT,
tax INT,
bedroom INT,
bath FLOAT,
price INT,
size INT,
lot INT,
zipcode INT);
INSERT INTO houses VALUES
(1,  590, 2,   1,  5,  770, 22100, 94301),
(2, 1050, 3,   2,  85000, 1410, 12000, 94301),
(3,   20, 3,   1,  22500, 1060,  3500, 94301),
(4,  870, 2,   2,  9, 1300, 17500, 94301),
(5, 1320, 3,   2, 133000, 1500, 3, 94301),
(6, 1350, 2,   1,  90500,  820, 25700, 94301),
(7, 2790, 3, 2.5, 26, 2130, 25000, 94301),
(8,  680, 2,   1, 142500, 1170, 22000, 94301),
(9, 1840, 3,   2, 16, 1500, 19000, 94301),
(10, 3680, 4,   2, 24, 2790, 2, 94301),
(11, 1660, 3,   1,  87000, 1030, 17500, 94301),
(12, 1620, 3,   2, 118600, 1250, 2, 94301),
(13, 3100, 3,   2, 14, 1760, 38000, 94301),
(14, 2070, 2,   3, 148000, 1550, 14000, 94301),
(15,  650, 3, 1.5,  65000, 1450, 12000, 94301),
(16,  770, 2,   2,  91000, 1300, 17500, 76010),
(17, 1220, 3,   2, 132300, 1500, 3, 76010),
(18, 1150, 2,   1,  91100,  820, 25700, 76010),
(19, 2690, 3, 2.5, 260011, 2130, 25000, 76010),
(20,  780, 2,   1, 141800, 1170, 22000, 76010),
(21, 1910, 3,   2, 160900, 1500, 19000, 76010),
(22, 3600, 4,   2, 239000, 2790, 2, 76010),
(23, 1600, 3,   1,  81010, 1030, 17500, 76010),
(24, 1590, 3,   2, 117910, 1250, 2, 76010),
(25, 3200, 3,   2, 141100, 1760, 38000, 76010),
(26, 2270, 2,   3, 148011, 1550, 14000, 76010),
(27,  750, 3, 1.5,  66000, 1450, 12000, 76010);

DROP TABLE IF EXISTS house_en,house_en_summary;
SELECT madlib.elastic_net_train(
'houses',
'house_en',
'price',
'ARRAY[tax, bath, size]',
'gaussian',
0.5,
0.5,
False,
NULL,
'igd',
'',
NULL,
1,
1e-6
);
psql:/Users/csloan/elastic_net.sql:17: ERROR:  
spiexceptions.NumericValueOutOfRange: value out of range: overflow
CONTEXT:  Traceback (most recent call last):
  PL/Python function "elastic_net_train", line 27, in 
excluded, max_iter, tolerance)
  PL/Python function "elastic_net_train", line 467, in elastic_net_train
  PL/Python function "elastic_net_train", line 495, in 
_internal_elastic_net_train
  PL/Python function "elastic_net_train", line 46, in 
_elastic_net_gaussian_igd_train
  PL/Python function "elastic_net_train", line 169, in _elastic_net_igd_train
  PL/Python function "elastic_net_train", line 297, in 
_elastic_net_igd_train_compute
  PL/Python function "elastic_net_train", line 121, in 
_elastic_net_generate_result
  PL/Python function "elastic_net_train", line 174, in build_output_table
  PL/Python function "elastic_net_train", line 143, in _compute_log_likelihood
PL/Python function "elastic_net_train"
```

> Elastic Net fails when used without normalization
> -
>
> Key: MADLIB-1094
> URL: https://issues.apache.org/jira/browse/MADLIB-1094
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Module: Regularized Regression
>Reporter: Nandish Jayaram
>Priority: Minor
> Fix For: v1.12
>
>
> Using Elastic Net with the normalization/standardize flag turned off (for 
> Gaussian IGD) results in failure, with the following error:
> {code:sql}
> madlib-pg94=# SELECT madlib.elastic_net_train(
> 'houses1',
> 'houses_en',
> 'array[tax, bath, size]',
> 'gaussian',
> 0.5,
> 0.1, 
> FALSE,  -- Standardize 
> NULL,
> 'igd',
> '',
> NULL,
> 1,1e-6);
> ERROR:  spiexceptions.NumericValueOutOfRange: value out of range: overflow
> CONTEXT:  Traceback (most recent call last):
>   PL/Python function "elastic_net_train", line 23, in 
> return elastic_net.elastic_net_train(**globals())
>   PL/Python function "elastic_net_train", line 332, in elastic_net_train
>   PL/Python function "elastic_net_train", line 42, in 
> __elastic_net_gaussian_igd_train
>   PL/Python function "elastic_net_train", line 268, in __elastic_net_igd_train
>   PL/Python function "elastic_net_train", line 373, in 
> __elastic_net_igd_train_compute
>   PL/Python function "elastic_net_train", line 69, in 
> __elastic_net_generate_result
>   PL/Python function "elastic_net_train", line 154, in 
> __compute_log_likelihood
> PL/Python function "elastic_net_train"
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MADLIB-1094) Elastic Net fails when used without normalization

2017-04-25 Thread Frank McQuillan (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15983663#comment-15983663
 ] 

Frank McQuillan commented on MADLIB-1094:
-

Looks like this bug may have been there before.  I'd suggest we have a look in 
1.12 but  not hold up the 1.11 release for this one.

> Elastic Net fails when used without normalization
> -
>
> Key: MADLIB-1094
> URL: https://issues.apache.org/jira/browse/MADLIB-1094
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Module: Regularized Regression
>Reporter: Nandish Jayaram
> Fix For: v1.12
>
>
> Using Elastic Net with the normalization/standardize flag turned off (for 
> Gaussian IGD) results in failure, with the following error:
> {code:sql}
> madlib-pg94=# SELECT madlib.elastic_net_train(
> 'houses1',
> 'houses_en',
> 'array[tax, bath, size]',
> 'gaussian',
> 0.5,
> 0.1, 
> FALSE,  -- Standardize 
> NULL,
> 'igd',
> '',
> NULL,
> 1,1e-6);
> ERROR:  spiexceptions.NumericValueOutOfRange: value out of range: overflow
> CONTEXT:  Traceback (most recent call last):
>   PL/Python function "elastic_net_train", line 23, in 
> return elastic_net.elastic_net_train(**globals())
>   PL/Python function "elastic_net_train", line 332, in elastic_net_train
>   PL/Python function "elastic_net_train", line 42, in 
> __elastic_net_gaussian_igd_train
>   PL/Python function "elastic_net_train", line 268, in __elastic_net_igd_train
>   PL/Python function "elastic_net_train", line 373, in 
> __elastic_net_igd_train_compute
>   PL/Python function "elastic_net_train", line 69, in 
> __elastic_net_generate_result
>   PL/Python function "elastic_net_train", line 154, in 
> __compute_log_likelihood
> PL/Python function "elastic_net_train"
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MADLIB-1094) Elastic Net fails when used without normalization

2017-04-25 Thread Rashmi Raghu (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15983336#comment-15983336
 ] 

Rashmi Raghu commented on MADLIB-1094:
--

The scikit-learn result in the previous comment is with data that was not 
standardized and not normalized.

> Elastic Net fails when used without normalization
> -
>
> Key: MADLIB-1094
> URL: https://issues.apache.org/jira/browse/MADLIB-1094
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Module: Regularized Regression
>Reporter: Nandish Jayaram
> Fix For: v1.11
>
>
> Using Elastic Net with the normalization/standardize flag turned off (for 
> Gaussian IGD) results in failure, with the following error:
> {code:sql}
> madlib-pg94=# SELECT madlib.elastic_net_train(
> 'houses1',
> 'houses_en',
> 'array[tax, bath, size]',
> 'gaussian',
> 0.5,
> 0.1, 
> FALSE,  -- Standardize 
> NULL,
> 'igd',
> '',
> NULL,
> 1,1e-6);
> ERROR:  spiexceptions.NumericValueOutOfRange: value out of range: overflow
> CONTEXT:  Traceback (most recent call last):
>   PL/Python function "elastic_net_train", line 23, in 
> return elastic_net.elastic_net_train(**globals())
>   PL/Python function "elastic_net_train", line 332, in elastic_net_train
>   PL/Python function "elastic_net_train", line 42, in 
> __elastic_net_gaussian_igd_train
>   PL/Python function "elastic_net_train", line 268, in __elastic_net_igd_train
>   PL/Python function "elastic_net_train", line 373, in 
> __elastic_net_igd_train_compute
>   PL/Python function "elastic_net_train", line 69, in 
> __elastic_net_generate_result
>   PL/Python function "elastic_net_train", line 154, in 
> __compute_log_likelihood
> PL/Python function "elastic_net_train"
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MADLIB-1094) Elastic Net fails when used without normalization

2017-04-25 Thread Nandish Jayaram (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15983323#comment-15983323
 ] 

Nandish Jayaram commented on MADLIB-1094:
-

Thank you for trying it out on scikit learn Rashmi. To give you a bit more 
information from my end, I did get similar coefficients with MADlib when I used 
Gaussian IGD *with* normalization. The exact coefficients I got are:
bq. {22.8037405001, 10717.6300401, 54.8314625851}


> Elastic Net fails when used without normalization
> -
>
> Key: MADLIB-1094
> URL: https://issues.apache.org/jira/browse/MADLIB-1094
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Module: Regularized Regression
>Reporter: Nandish Jayaram
> Fix For: v1.11
>
>
> Using Elastic Net with the normalization/standardize flag turned off (for 
> Gaussian IGD) results in failure, with the following error:
> {code:sql}
> madlib-pg94=# SELECT madlib.elastic_net_train(
> 'houses1',
> 'houses_en',
> 'array[tax, bath, size]',
> 'gaussian',
> 0.5,
> 0.1, 
> FALSE,  -- Standardize 
> NULL,
> 'igd',
> '',
> NULL,
> 1,1e-6);
> ERROR:  spiexceptions.NumericValueOutOfRange: value out of range: overflow
> CONTEXT:  Traceback (most recent call last):
>   PL/Python function "elastic_net_train", line 23, in 
> return elastic_net.elastic_net_train(**globals())
>   PL/Python function "elastic_net_train", line 332, in elastic_net_train
>   PL/Python function "elastic_net_train", line 42, in 
> __elastic_net_gaussian_igd_train
>   PL/Python function "elastic_net_train", line 268, in __elastic_net_igd_train
>   PL/Python function "elastic_net_train", line 373, in 
> __elastic_net_igd_train_compute
>   PL/Python function "elastic_net_train", line 69, in 
> __elastic_net_generate_result
>   PL/Python function "elastic_net_train", line 154, in 
> __compute_log_likelihood
> PL/Python function "elastic_net_train"
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MADLIB-1094) Elastic Net fails when used without normalization

2017-04-25 Thread Rashmi Raghu (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15983299#comment-15983299
 ] 

Rashmi Raghu commented on MADLIB-1094:
--

Trying the same problem with scikit-learn works fine. Resulting coefficients 
for the features are: {[   23.15135449,  7759.59888732,59.03219813]}}. It 
is not yet clear what optimizer scikit-learn is using though.

> Elastic Net fails when used without normalization
> -
>
> Key: MADLIB-1094
> URL: https://issues.apache.org/jira/browse/MADLIB-1094
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Module: Regularized Regression
>Reporter: Nandish Jayaram
> Fix For: v1.11
>
>
> Using Elastic Net with the normalization/standardize flag turned off (for 
> Gaussian IGD) results in failure, with the following error:
> {code:sql}
> madlib-pg94=# SELECT madlib.elastic_net_train(
> 'houses1',
> 'houses_en',
> 'array[tax, bath, size]',
> 'gaussian',
> 0.5,
> 0.1, 
> FALSE,  -- Standardize 
> NULL,
> 'igd',
> '',
> NULL,
> 1,1e-6);
> ERROR:  spiexceptions.NumericValueOutOfRange: value out of range: overflow
> CONTEXT:  Traceback (most recent call last):
>   PL/Python function "elastic_net_train", line 23, in 
> return elastic_net.elastic_net_train(**globals())
>   PL/Python function "elastic_net_train", line 332, in elastic_net_train
>   PL/Python function "elastic_net_train", line 42, in 
> __elastic_net_gaussian_igd_train
>   PL/Python function "elastic_net_train", line 268, in __elastic_net_igd_train
>   PL/Python function "elastic_net_train", line 373, in 
> __elastic_net_igd_train_compute
>   PL/Python function "elastic_net_train", line 69, in 
> __elastic_net_generate_result
>   PL/Python function "elastic_net_train", line 154, in 
> __compute_log_likelihood
> PL/Python function "elastic_net_train"
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)