[ https://issues.apache.org/jira/browse/MATH-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088719#comment-13088719 ]
Patrick Meyer commented on MATH-449: ------------------------------------ I like all of these ideas. When I wrote the patch, I didn't know if forcing a square matrix was preferred, so I wrote it more generally. A square matrix is fine with me. Incrementing the full vector of new values is definitely the safest way to do it. However, it forces the user into listwise deletion if a case has any missing data. The more granular version allows a user to implement pairwise deletion. Nether option is a great way to handle missing data, but do we want to force one approach on the user? Is there way to increment the full vector of values and account for missing data on one or more variables? Thanks, Patrick > Storeless covariance > -------------------- > > Key: MATH-449 > URL: https://issues.apache.org/jira/browse/MATH-449 > Project: Commons Math > Issue Type: Improvement > Reporter: Patrick Meyer > Assignee: Phil Steitz > Fix For: 3.1 > > Attachments: MATH-449.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > Currently there is no storeless version for computing the covariance. > However, Pebay (2008) describes algorithms for on-line covariance > computations, [http://infoserve.sandia.gov/sand_doc/2008/086212.pdf]. I have > provided a simple class for implementing this algorithm. It would be nice to > have this integrated into org.apache.commons.math.stat.correlation.Covariance. > {code} > //This code is granted for inclusion in the Apache Commons under the terms of > the ASL. > public class StorelessCovariance{ > private double deltaX = 0.0; > private double deltaY = 0.0; > private double meanX = 0.0; > private double meanY = 0.0; > private double N=0; > private Double covarianceNumerator=0.0; > private boolean unbiased=true; > public Covariance(boolean unbiased){ > this.unbiased = unbiased; > } > public void increment(Double x, Double y){ > if(x!=null & y!=null){ > N++; > deltaX = x - meanX; > deltaY = y - meanY; > meanX += deltaX/N; > meanY += deltaY/N; > covarianceNumerator += ((N-1.0)/N)*deltaX*deltaY; > } > > } > public Double getResult(){ > if(unbiased){ > return covarianceNumerator/(N-1.0); > }else{ > return covarianceNumerator/N; > } > } > } > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira