Thank you Matta for the great suggestion,
I will try the additional tests. I have just been experimenting with the e1071 package and the adjustedRand. It works perfectly, The only outstadning question is interpretation - is there any rule of thumbs for the level of agreement that needs to be reached in order to say there is "High Agreement" or similar?

Thanks
Martin

On 11/17/2010 4:49 PM, Mattia Prosperi wrote:
Another useful measure to compare partitions is the adjusted Rand
index which is implemented in the library(e1071) within the
classAgreement function.
If you have your data partitions to be compared in a matricial form
(where each column is a different partition), the syntax is
ARI<-classAgreement(table(data[,i],data[,j]))$crand

Other useful measures of goodness-of-fit for clustering are the
silhouette index or the c-index or the Goodman-Kruskal index. although
they evaluate in general inter/intra-cluster distance distributions.
For instance, you can maximise/minimise these indices to find the best
partition among a set of candidate ones.

Mattia Prosperi.


2010/11/17 Marc Schwartz<marc_schwa...@me.com>:
On Nov 17, 2010, at 7:33 AM, Martin Tomko wrote:

Dear all,
I am having a hard time to figure out a suitable test for the match between two 
nominal classifications of the same set of data.
I have used hierarchical clustering with multiple methods (ward, k-means,...) 
to classify my dat into a set number of classesa, and I would like to compare 
the resulting automated classification with the actual - objective benchmark 
one.
So in principle I have a data frame with n columns of nominal classifications, 
and I want to do a mutual comparison and test for significance in difference in 
classification between pairs of columns.

I just need to identify a suitable test, but I fail. I am currently exploring 
the possibility of using Cohen's Kappa, but I am open to other suggestions. 
Especially the fact that kappa seems to be moslty used on failible, human 
annotators seems to bring in limitations taht do not apply to my automatic 
classification.
Any help will be appreciated, especially if also followed by a pointer to an R 
package that implements it.

Thanks
Martin

In addition to Matt's comments, you might want to consider marginal homogeneity 
tests. There are extensions of the pairwise McNemar test to greater than two 
categories. Some online information is here:

  http://www.john-uebersax.com/stat/mcnemar.htm

and there is the ?mh_test implemented in the 'coin' package on CRAN.

HTH,

Marc Schwartz

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to