[ 
https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060800#comment-13060800
 ] 

greg sterijevski commented on MATH-607:
---------------------------------------

On the results object:

There are vars *( vars + 1 ) /2 elements in the cov matrix, vars int
parameters, vars int standard errors and a some other assorted stuff. Not
terribly large at first. However, consider doing panel regression via dummy
variables, the covariance matrix can get fast very quickly. That being said,
I don't think making RegressionResults a concrete class is a gamestopper.
Should I send a follow up patch with results made concrete?

On the regression object:

Are you concerned that we will be removing methods from any interface we
specify today? Or do you think the contract is too restrictive? The reason I
am pushing for interface is that I have two candidates for concrete
implementation of updating regression. The first implementation is based on
Gentleman's lemma and is detailed in the following article:

Algorithm AS 274: Least Squares Routines to Supplement those of Gentleman
Author: Alan J Miller
Source Journal of the Royal Statistical Society Vol 41 No 2 (1992)

The second approach is one detailed by this article by Goodnight:
A Tutorial on the SWEEP Operator
James H. Goodnight
The American Statistician, Vol. 33, No. 3. (Aug., 1979), pp. 149-158.

The first approach never forms the cross products matrix, the second does.
They are significantly different approaches to dealing with large data sets.


How would I do this in the concrete class you propose?

Thanks,

-Greg





> Current Multiple Regression Object does calculations with all data incore. 
> There are non incore techniques which would be useful with large datasets.
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MATH-607
>                 URL: https://issues.apache.org/jira/browse/MATH-607
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.0
>         Environment: Java
>            Reporter: greg sterijevski
>              Labels: Gentleman's, QR, Regression, Updating, decomposition, 
> lemma
>             Fix For: 3.0
>
>         Attachments: updating_reg_ifaces
>
>   Original Estimate: 840h
>  Remaining Estimate: 840h
>
> The current multiple regression class does a QR decomposition on the complete 
> data set. This necessitates the loading incore of the complete dataset. For 
> large datasets, or large datasets and a requirement to do datamining or 
> stepwise regression this is not practical. There are techniques which form 
> the normal equations on the fly, as well as ones which form the QR 
> decomposition on an update basis. I am proposing, first, the specification of 
> an "UpdatingLinearRegression" interface which defines basic functionality all 
> such techniques must fulfill. 
> Related to this 'updating' regression, the results of running a regression on 
> some subset of the data should be encapsulated in an immutable object. This 
> is to ensure that subsequent additions of observations do not corrupt or 
> render inconsistent parameter estimates. I am calling this interface 
> "RegressionResults".  
> Once the community has reached a consensus on the interface, work on the 
> concrete implementation of these techniques will take place.
> Thanks,
> -Greg

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to