I'm an MSc student and we've been given a data set that is a contingency table cross tabulating countries and traits that natives of those countries are said to have (its a survey of opinions of EU citizens).
Data is like this: Country Stylish Arrogant Sexy Devious Easygoing Greedy Cowardly Boring Efficient Lazy Hardworking Clever Courageous France 37 29 21 19 10 10 8 8 6 6 5 2 1 Spain 7 14 8 9 27 7 3 7 3 23 12 1 3 Italy 30 12 19 10 20 7 12 6 5 13 10 1 2 UK 9 14 4 6 27 12 2 13 26 16 29 6 25 Ireland 1 7 1 16 30 3 10 9 5 11 22 2 27 Holland 5 4 2 2 15 2 0 13 24 1 28 4 6 Germany 4 48 1 12 3 9 2 11 41 1 38 8 8 I've done hierarhcical cluster analysis. correspondence analysis and multidimensional scaling and I'm generally happy with the results. My understanding of k-means is that its meant for use with large scale problems and needs continuous data. The SPSS help seems to infer that the only clustering it can do with count data is hierarchical. I've run it against the data and it seems to give sensible results it was just the help file that got me thinking. This is part of our coursework, so I hope you don't think I'm being cheeky in asking for help. I'm just after an opinion or a pointer to a web site/other resource which says k-means is valid/invalid for this type of data. So I've been thinking a bit more today and been wondering if it might be more useful to use a matrix of chi-sqaure distances (which is what correspondence analysis builds) and do k-means on that. The aim of the exercise is to see how consistent the soliutions between the various clusteriing, correspondence analysis and MDS are. On Sun, 14 Mar 2004 21:42:47 GMT, Art Kendall <[EMAIL PROTECTED]> wrote: >Please tell us more about what you are doing. One form of cluster >analysis is the 2 variable crosstab each cell is a "cluster of cases". > >Art >[EMAIL PROTECTED] >Social Research Consultants >University Park, MD USA >(301) 864-5570 > > >[EMAIL PROTECTED] wrote: > >> I'm trying to do a cluster analysis with a data set that is in the >> form of a contingency table (i.e. cross tabulation of counts in >> various categories). I wanted to use k-means but I'm not sure that >> this is a valid thing to do. Has anyone got any opinions as to whether >> I should use just hierarchical or k-means. >> >> Thanks >> >> >> Keith . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
