In article <[EMAIL PROTECTED]>, Fernando De la Torre
<[EMAIL PROTECTED]> writes
>Hi,
>
>If I have n samples of d-dimensional vectors, in order for the covariance to
>do not
>be rank defficient I would need n>d (assuming the samples are independent).
>However for high d if I want a good
>estimation of the covariance matrix (not just full rank) usually I would
>need much more than d, i.e. n>>d. Does anybody any theoretical study of how
>many samples would I need to have a good estimation of the covatriance?
>Since the covariance would have d(d+1)/2 do I need at least this samples?
>

You have n d-dimensional samples, so n*d observed numbers with which
estimate about d*d/2 parameters. If you think 10 observed numbers per
parameter is enough for a 'good' estimate, n>=5d would be enough.

Unfortunately, you have probably asked the wrong question! I'm assuming
your real interest is in classification, since you asked about that a
couple of weeks ago. For classification, assuming the classes are
normally distributed, the answer is much more complicated and less well
understood, and depends on the unknown true covariances (and means). The
answer is also very different for a linear classifier (pooled
covariance) and a quadratic classifier (one covariance per class). Try
searching for "Hughes phenomenon" for more information.

-- 
Graham Jones
http://www.visiv.co.uk
Emails to [EMAIL PROTECTED] may be deleted as spam
Please add a j just before the @ to ensure delivery

.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to