I would like to propose a pull request implementing an option to use
variance vector instead of covariance matrix. It allows users to avoid
unnecessary memory usage and excessive computation in case of uncorrelated
but heteroscedastic errors thus making it possible to work with huge input
matrices. Using variance vector in such cases allows to reduce time
complexity from O(N^2) to just O(N) (where N is a number of observations)
and dramatically reduce memory usage. For example, in my practice arose a
need to train generalized linear model. Usage of Iteratively reweighted
least squares algorithm requires weighted regression with more than a
million observations. Current implementation would require approximately 12
terabytes of memory while patched version needs only 8 megabytes. Since
IRLS is iterative algorithm a million-times complexity reduction is also
pretty handy.

https://github.com/apache/commons-math/pull/106

Reply via email to