Maybe it would be possible to include something about the difference
in method/results between the cov and corrcoef functions in the
documentation? Right now I don't see it explained.

Thanks,

Nir

On Thu, Mar 8, 2012 at 6:06 PM, Alois Schlögl <alois.schlo...@ist.ac.at> wrote:
> On 03/08/2012 05:41 PM, Nir Krakauer wrote:
>>
>> With the current package version (2.5.2) running in Octave 3.6.1, cov
>> may return infinite covariances when there is only a single non-NaN
>> overlap between two data series. For example,
>>
>> C = cov([NaN 1 2 3 4; 1 NaN NaN NaN 2]')
>>
>> returns
>>
>> C =
>>
>>    1.66667       Inf
>>        Inf   0.50000
>>
>>
>> I assume that is not the intended outcome?
>
>
>
> Hi Nir,
>
> this is a strange question, how should I answer? The brief answer is: the
> outcome itself is not intended by me, but the behavior of the function cov()
> is intended. Let me explain:
>
> Inf is caused by the default normalization with (N-1) in cov(), resulting in
> a division by zero. You can avoid this, by using a normalization with N as
> documented:
>
> C = cov([NaN 1 2 3 4; 1 NaN NaN NaN 2]', 1)
> C =
>
>   1.25000   0.75000
>   0.75000   0.25000
>
> Whatsoever, the function cov() is rather strange in its behavior with
> respect to data containing missing values. Unlike for data w/o missing
> values, there is no garantee that the outcome of cov() is "positive
> definite", or that the (magnitude of the) elements are always smaller than
> 1.
>
> det(C)
>  ans = -0.25000
>
> In the NaN-tb, the nice properties we know from data w/o missing values
>  (positive definiteness, and magnitude of elements <=1) are maintained by
> the function corrcoef().
>
> C= corrcoef([NaN 1 2 3 4; 1 NaN NaN NaN 2]'*10)
> C =
>
>     1   NaN
>   NaN     1
>
> Here, NaN indicates a 0/0 like the std(x) of a single element x is also
> (x-mean(x))/(N-1) = 0/0 = NaN, resulting in an undefined value.
> (Note, it does not mean that the off-diagonals can be outside the interval
> ]-1,1[, it means the value can take any value in the interval +-1. )
>
> cov() is fast and might be suitable for large data sets, with corrcoef() we
> get the nice properties even for very small data sets. Therefore, I did not
> try to make cov() and corrcoef() similar, it's intended by me that the two
> functions can behave differently, and that the user can choose which one
> (s)he wants. Note also, that the functions yield the same outcome for data
> w/o NaNs. So, the compatibility w.r.t. to data w/o NaN is maintained.
>
> I hope this answers your question.
>
>   Alois

------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Octave-dev mailing list
Octave-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/octave-dev

Reply via email to