[ 
https://issues.apache.org/jira/browse/MATH-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088996#comment-13088996
 ] 

Phil Steitz commented on MATH-449:
----------------------------------

Good point on the stored data version.  This is really our first foray into 
meaningful management of missing data and now is  a great time to start dealing 
with it.  In the correlation package, at this point, we can fairly easily 
support either or both casewise or pairwise "deletion" so it is probably best 
to make it configurable. Also, we need to agree on and advertise the fact that 
NaNs should be used to signal missing data.  Lets start by implementing things 
this way in the new storeless covariance classes and then open new tickets to 
add support for missing data in first the rest of the correlation package and 
then regression.

One thing that is bugging me a little is convincing myself that if we allow 
pairwise deletion, the covariance matrix will be legitimate (i.e. have all of 
the analytical properties associated with a cov matrix).  Also, are there 
negative implications that I have not thought about to using NaNs to signal 
missing data.   

> Storeless covariance
> --------------------
>
>                 Key: MATH-449
>                 URL: https://issues.apache.org/jira/browse/MATH-449
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Patrick Meyer
>            Assignee: Phil Steitz
>             Fix For: 3.1
>
>         Attachments: MATH-449.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently there is no storeless version for computing the covariance. 
> However, Pebay (2008) describes algorithms for on-line covariance 
> computations, [http://infoserve.sandia.gov/sand_doc/2008/086212.pdf]. I have 
> provided a simple class for implementing this algorithm. It would be nice to 
> have this integrated into org.apache.commons.math.stat.correlation.Covariance.
> {code}
> //This code is granted for inclusion in the Apache Commons under the terms of 
> the ASL.
> public class StorelessCovariance{
>     private double deltaX = 0.0;
>     private double deltaY = 0.0;
>     private double meanX = 0.0;
>     private double meanY = 0.0;
>     private double N=0;
>     private Double covarianceNumerator=0.0;
>     private boolean unbiased=true;
>     public Covariance(boolean unbiased){
>       this.unbiased = unbiased;
>     }
>     public void increment(Double x, Double y){
>         if(x!=null & y!=null){
>             N++;
>             deltaX = x - meanX;
>             deltaY = y - meanY;
>             meanX += deltaX/N;
>             meanY += deltaY/N;
>             covarianceNumerator += ((N-1.0)/N)*deltaX*deltaY;
>         }
>         
>     }
>     public Double getResult(){
>         if(unbiased){
>             return covarianceNumerator/(N-1.0);
>         }else{
>             return covarianceNumerator/N;
>         }
>     }   
> }
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to