I'm an MSc student and we've been given a data set that is a
contingency table cross tabulating countries and traits that natives
of those countries are said to have (its a survey of opinions of EU
citizens). 



Data is like this:
Country Stylish Arrogant        Sexy    Devious Easygoing       Greedy
Cowardly        Boring  Efficient       Lazy    Hardworking     Clever
Courageous
France  37      29      21      19      10      10      8       8
6       6       5       2       1
Spain   7       14      8       9       27      7       3       7
3       23      12      1       3
Italy   30      12      19      10      20      7       12      6
5       13      10      1       2
UK      9       14      4       6       27      12      2       13
26      16      29      6       25
Ireland 1       7       1       16      30      3       10      9
5       11      22      2       27
Holland 5       4       2       2       15      2       0       13
24      1       28      4       6
Germany 4       48      1       12      3       9       2       11
41      1       38      8       8

I've done hierarhcical cluster analysis. correspondence analysis and
multidimensional scaling and I'm generally happy with the results.

My understanding of k-means is that its meant for use with large scale
problems and needs continuous data. The SPSS help seems to infer that
the only clustering it can do with count data is hierarchical. I've
run it against the data and it seems to give sensible results it was
just the help file that got me thinking.

This is part of our coursework, so I hope you don't think I'm being
cheeky in asking for help. I'm just after an opinion or a pointer to a
web site/other resource which says k-means is valid/invalid for this
type of data.

So I've been thinking a bit more today and been wondering if it might
be more useful to use a matrix of chi-sqaure distances (which is what
correspondence analysis builds) and do k-means on that.

The aim of the exercise is to see how consistent the soliutions
between the various clusteriing, correspondence analysis and MDS are. 

On Sun, 14 Mar 2004 21:42:47 GMT, Art Kendall <[EMAIL PROTECTED]>
wrote:

>Please tell us more about what you are doing.  One form of cluster 
>analysis is the 2 variable crosstab each cell is a "cluster of cases".
>
>Art
>[EMAIL PROTECTED]
>Social Research Consultants
>University Park, MD  USA
>(301) 864-5570
>
>
>[EMAIL PROTECTED] wrote:
>
>> I'm trying to do a cluster analysis with a data set that is in the
>> form of a contingency table (i.e. cross tabulation of counts in
>> various categories). I wanted to use k-means but I'm not sure that
>> this is a valid thing to do. Has anyone got any opinions as to whether
>> I should use just hierarchical or k-means.
>> 
>> Thanks
>> 
>> 
>> Keith

.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to