Maybe it would be possible to include something about the difference in method/results between the cov and corrcoef functions in the documentation? Right now I don't see it explained.
Thanks, Nir On Thu, Mar 8, 2012 at 6:06 PM, Alois Schlögl <alois.schlo...@ist.ac.at> wrote: > On 03/08/2012 05:41 PM, Nir Krakauer wrote: >> >> With the current package version (2.5.2) running in Octave 3.6.1, cov >> may return infinite covariances when there is only a single non-NaN >> overlap between two data series. For example, >> >> C = cov([NaN 1 2 3 4; 1 NaN NaN NaN 2]') >> >> returns >> >> C = >> >> 1.66667 Inf >> Inf 0.50000 >> >> >> I assume that is not the intended outcome? > > > > Hi Nir, > > this is a strange question, how should I answer? The brief answer is: the > outcome itself is not intended by me, but the behavior of the function cov() > is intended. Let me explain: > > Inf is caused by the default normalization with (N-1) in cov(), resulting in > a division by zero. You can avoid this, by using a normalization with N as > documented: > > C = cov([NaN 1 2 3 4; 1 NaN NaN NaN 2]', 1) > C = > > 1.25000 0.75000 > 0.75000 0.25000 > > Whatsoever, the function cov() is rather strange in its behavior with > respect to data containing missing values. Unlike for data w/o missing > values, there is no garantee that the outcome of cov() is "positive > definite", or that the (magnitude of the) elements are always smaller than > 1. > > det(C) > ans = -0.25000 > > In the NaN-tb, the nice properties we know from data w/o missing values > (positive definiteness, and magnitude of elements <=1) are maintained by > the function corrcoef(). > > C= corrcoef([NaN 1 2 3 4; 1 NaN NaN NaN 2]'*10) > C = > > 1 NaN > NaN 1 > > Here, NaN indicates a 0/0 like the std(x) of a single element x is also > (x-mean(x))/(N-1) = 0/0 = NaN, resulting in an undefined value. > (Note, it does not mean that the off-diagonals can be outside the interval > ]-1,1[, it means the value can take any value in the interval +-1. ) > > cov() is fast and might be suitable for large data sets, with corrcoef() we > get the nice properties even for very small data sets. Therefore, I did not > try to make cov() and corrcoef() similar, it's intended by me that the two > functions can behave differently, and that the user can choose which one > (s)he wants. Note also, that the functions yield the same outcome for data > w/o NaNs. So, the compatibility w.r.t. to data w/o NaN is maintained. > > I hope this answers your question. > > Alois ------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ Octave-dev mailing list Octave-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/octave-dev