[jira] [Commented] (MATH-1428) OLSMultipleLinearRegression estimates different residuals with different order of input
[ https://issues.apache.org/jira/browse/MATH-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151110#comment-17151110 ] Gilles Sadowski commented on MATH-1428: --- Thanks for the additional information. Hopefully someone will delve into the code and make it more robust. ;-) > OLSMultipleLinearRegression estimates different residuals with different > order of input > > > Key: MATH-1428 > URL: https://issues.apache.org/jira/browse/MATH-1428 > Project: Commons Math > Issue Type: Bug >Affects Versions: 3.4.1 > Environment: win7 64bit jdk1.8 intelljidea >Reporter: butchild >Priority: Major > Labels: ols, regression, residuals > > I have a regression job with 31 X ,which 30 of them are dummys . > And the length of data is 800+ . > I'm using OLSMultipleLinearRegression to do regression. > I found if I change the order of the 800+ data, the residuals I got from > ols.estimateResiduals() > are differents ,and the correlation of the two differet rersiduals is near > 100%,like 99.8%. > My data is below in Docs Text area. > The fields of each Column is : > sig,y,x1,x2,xn -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MATH-1428) OLSMultipleLinearRegression estimates different residuals with different order of input
[ https://issues.apache.org/jira/browse/MATH-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17150921#comment-17150921 ] David Hudson commented on MATH-1428: I encountered this issue recently, also with a dataset having multiple dummy variables. Turned out the the columns were not linearly independent. After removing one of the dummy variables, different column orders produced a stable output as expected. It's worth noting that I compared the results against some python libraries (sklearn/statsmodels) and these gave the correct results for things like the intercept and regular varibales even with the dependent columns. > OLSMultipleLinearRegression estimates different residuals with different > order of input > > > Key: MATH-1428 > URL: https://issues.apache.org/jira/browse/MATH-1428 > Project: Commons Math > Issue Type: Bug >Affects Versions: 3.4.1 > Environment: win7 64bit jdk1.8 intelljidea >Reporter: butchild >Priority: Major > Labels: ols, regression, residuals > > I have a regression job with 31 X ,which 30 of them are dummys . > And the length of data is 800+ . > I'm using OLSMultipleLinearRegression to do regression. > I found if I change the order of the 800+ data, the residuals I got from > ols.estimateResiduals() > are differents ,and the correlation of the two differet rersiduals is near > 100%,like 99.8%. > My data is below in Docs Text area. > The fields of each Column is : > sig,y,x1,x2,xn -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MATH-1428) OLSMultipleLinearRegression estimates different residuals with different order of input
[ https://issues.apache.org/jira/browse/MATH-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123215#comment-16123215 ] Gilles commented on MATH-1428: -- What result did you expect? What do other libraries produce? Also, please provide a _minimal_ working code (preferably a JUnit test) example. > OLSMultipleLinearRegression estimates different residuals with different > order of input > > > Key: MATH-1428 > URL: https://issues.apache.org/jira/browse/MATH-1428 > Project: Commons Math > Issue Type: Bug >Affects Versions: 3.4.1 > Environment: win7 64bit jdk1.8 intelljidea >Reporter: butchild > Labels: ols, regression, residuals > > I have a regression job with 31 X ,which 30 of them are dummys . > And the length of data is 800+ . > I'm using OLSMultipleLinearRegression to do regression. > I found if I change the order of the 800+ data, the residuals I got from > ols.estimateResiduals() > are differents ,and the correlation of the two differet rersiduals is near > 100%,like 99.8%. > My data is below in Docs Text area. > The fields of each Column is : > sig,y,x1,x2,xn -- This message was sent by Atlassian JIRA (v6.4.14#64029)