[ https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060800#comment-13060800 ]
greg sterijevski commented on MATH-607: --------------------------------------- On the results object: There are vars *( vars + 1 ) /2 elements in the cov matrix, vars int parameters, vars int standard errors and a some other assorted stuff. Not terribly large at first. However, consider doing panel regression via dummy variables, the covariance matrix can get fast very quickly. That being said, I don't think making RegressionResults a concrete class is a gamestopper. Should I send a follow up patch with results made concrete? On the regression object: Are you concerned that we will be removing methods from any interface we specify today? Or do you think the contract is too restrictive? The reason I am pushing for interface is that I have two candidates for concrete implementation of updating regression. The first implementation is based on Gentleman's lemma and is detailed in the following article: Algorithm AS 274: Least Squares Routines to Supplement those of Gentleman Author: Alan J Miller Source Journal of the Royal Statistical Society Vol 41 No 2 (1992) The second approach is one detailed by this article by Goodnight: A Tutorial on the SWEEP Operator James H. Goodnight The American Statistician, Vol. 33, No. 3. (Aug., 1979), pp. 149-158. The first approach never forms the cross products matrix, the second does. They are significantly different approaches to dealing with large data sets. How would I do this in the concrete class you propose? Thanks, -Greg > Current Multiple Regression Object does calculations with all data incore. > There are non incore techniques which would be useful with large datasets. > ----------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: MATH-607 > URL: https://issues.apache.org/jira/browse/MATH-607 > Project: Commons Math > Issue Type: New Feature > Affects Versions: 3.0 > Environment: Java > Reporter: greg sterijevski > Labels: Gentleman's, QR, Regression, Updating, decomposition, > lemma > Fix For: 3.0 > > Attachments: updating_reg_ifaces > > Original Estimate: 840h > Remaining Estimate: 840h > > The current multiple regression class does a QR decomposition on the complete > data set. This necessitates the loading incore of the complete dataset. For > large datasets, or large datasets and a requirement to do datamining or > stepwise regression this is not practical. There are techniques which form > the normal equations on the fly, as well as ones which form the QR > decomposition on an update basis. I am proposing, first, the specification of > an "UpdatingLinearRegression" interface which defines basic functionality all > such techniques must fulfill. > Related to this 'updating' regression, the results of running a regression on > some subset of the data should be encapsulated in an immutable object. This > is to ensure that subsequent additions of observations do not corrupt or > render inconsistent parameter estimates. I am calling this interface > "RegressionResults". > Once the community has reached a consensus on the interface, work on the > concrete implementation of these techniques will take place. > Thanks, > -Greg -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira