In article <[EMAIL PROTECTED]>, Fernando De la Torre <[EMAIL PROTECTED]> writes >Hi, > >If I have n samples of d-dimensional vectors, in order for the covariance to >do not >be rank defficient I would need n>d (assuming the samples are independent). >However for high d if I want a good >estimation of the covariance matrix (not just full rank) usually I would >need much more than d, i.e. n>>d. Does anybody any theoretical study of how >many samples would I need to have a good estimation of the covatriance? >Since the covariance would have d(d+1)/2 do I need at least this samples? >
You have n d-dimensional samples, so n*d observed numbers with which estimate about d*d/2 parameters. If you think 10 observed numbers per parameter is enough for a 'good' estimate, n>=5d would be enough. Unfortunately, you have probably asked the wrong question! I'm assuming your real interest is in classification, since you asked about that a couple of weeks ago. For classification, assuming the classes are normally distributed, the answer is much more complicated and less well understood, and depends on the unknown true covariances (and means). The answer is also very different for a linear classifier (pooled covariance) and a quadratic classifier (one covariance per class). Try searching for "Hughes phenomenon" for more information. -- Graham Jones http://www.visiv.co.uk Emails to [EMAIL PROTECTED] may be deleted as spam Please add a j just before the @ to ensure delivery . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
