Hi, as already mentioned, sorting could be a pain.
My solution to that is to write my own "order" routine for a given language. The idea is to transform the UTF-8 string into ASCII in such a way that the built-in order routine outputs the desired result. But this could be a very stony way. Example for Spanish (please correct me if I'm wrong): -accents are ignored -ll is one single entity and comes after l (ludar comes before llave) -ch is one single entity and comes after c The only thing I do not know if it could happen that a 'll' is not one entity but two (maybe the result of the combination of two nouns). If so then the entire story will be much more complicated. Now the big question is how to delete all these accents in åàÿñü etc. to get aaynu. (technically spoken canonical decomposition of a Unicode string NFKD) One possible way is to use a scripting language which can handle it. The only language I know which can do it as default is python. For ruby, perl one has to install an additional library. On a Mac system python is installed as default; on Windows not. If this ordering is also an issue for Windows users then one has to install it in beforehand. The code comes here: orderES <- function(x) { #decomposes all accented characters str <- NKFD(x) #all combining diacritics nonChars <- c(768:879) pattern <- paste("[", intToUtf8(as.integer(nonChars)), "]", sep="") #delete all combining diacritics str <- gsub(pattern, "", str) #transform ll an ch to l{ and c{ ({ comes after z) str <- gsub("ll", "l{", gsub("ch", "c{", str)) order(str) } NKFD <- function(x) { system(paste("echo -en '# coding=utf-8\nimport unicodedata\nfor i,v in enumerate([\"" , paste(x, collapse="\", \""), "\"]):print unicodedata.normalize(\"NFKD\",unicode(v, \"UTF-8\")).encode(\"UTF-8\")'|python -", sep=""), intern=T) } Notes to NFKD rountine: - only works if R's environment is set to UTF-8! - for instance a Danish ø won't be decompose to o / (these cases has to be solved manually) - this routine is not very fast Cheers, --Hans ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.