Re: multivariate techniques for large datasets

Art Kendall Mon, 18 Jun 2001 08:50:56 -0700
you might want to go to http://www.pitt.edu/~csna/
and then cross-post your question to CLASS-L

The Classification Society meeting this weekend had a lot of discussion of
these topics.

My first question is whether you intend to interpret the clusters?

If so, what is the nature of the 500 variables?
What is the nature of your cases?
What does the set of cases represent?
How much data is missing. What kinds of missing data do you have?
What do you want to do with the cluster reults?
Are you interested in a tree or a simple clustering?


Many users of clustering use data reduction techniques such as factor
analysis to summarize the variability of the 500 with a smaller number of
dimensions.



srinivas wrote:

> Hi,
>
>   I have a problem in identifying the right multivariate tools to
> handle datset of dimension 1,00,000*500. The problem is still
> complicated with lot of missing data. can anyone suggest a way out to
> reduce the data set and  also to estimate the missing value. I need to
> know which clustering tool is appropriate for grouping the
> observations( based on 500 variables ).



=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================
Re: multivariate techniques for large datasets

Reply via email to