Sidney Thomas <[EMAIL PROTECTED]> wrote in message 
news:<[EMAIL PROTECTED]>...
> srinivas wrote:
> > 
> > Hi,
> > 
> >   I have a problem in identifying the right multivariate tools to
> > handle datset of dimension 1,00,000*500. The problem is still
> > complicated with lot of missing data. can anyone suggest a way out to
> > reduce the data set and  also to estimate the missing value. I need to
> > know which clustering tool is appropriate for grouping the
> > observations( based on 500 variables ).

One of the best ways in which to handle missing data is to impute the
mean for other cases with the selfsame value.  If I'm doing
psychological research and I am missing some values on my depression
scale for certain individuals, I can look at their, say, locus of
control reported and impute the mean value.  Let's say [common
finding] that I find a pattern - individuals with a high locus of
control report low levels of depression, and I have a scale ranging
from 1-100 listing locus of control.  If I have a missing value for
depression at level 75 for one case, I can take the mean depression
level for all individuals at level 75 of locus of control and impute
that for all missing cases in which 75 is the listed locus of control
value.  I'm not sure why you'd want to reduce the size of the data
set, since for the most part the larger the "N" the better.


Tracey Continelli


=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to