[statistics] Pull request for GLSMultipleLinearRegression

Елена Картышева Thu, 23 May 2019 06:44:31 -0700

Hello.

I would like to propose a pull request implementing an option to use variance 
vector instead of covariance matrix. It allows users to avoid unnecessary 
memory usage and excessive computation in case of uncorrelated but 
heteroscedastic errors thus making it possible to work with huge input 
matrices. Using variance vector in such cases allows to reduce time complexity 
from O(N^2) to just O(N) (where N is a number of observations) and dramatically 
reduce memory usage. For example, in my practice arose a need to train 
generalized linear model. Usage of Iteratively reweighted least squares 
algorithm requires weighted regression with more than a million observations. 
Current implementation would require approximately 12 terabytes of memory while 
patched version needs only 8 megabytes. Since IRLS is iterative algorithm a 
million-times complexity reduction is also pretty handy.


 
-- 
Sincerely yours, Elena Kartysheva.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

[statistics] Pull request for GLSMultipleLinearRegression

Reply via email to