On Wed, Apr 04, 2012 at 01:32:10PM +0200, paladini wrote: > Hello, > I want to do a cluster analysis with my data. The problem is, that the > variables dont't consist of single value but the entries are pairs of > values. > That lokks like this: > > > Variable 1: Variable2: Variable3: . . . > (1,2) (1,5) (4,2) > (7,8) (3,88) (6,5) > (4,7) (12,4) (4,4) > . . . > . . . > . . . > Is it possible to perform a cluster-analysis with this kind of data in > R ? > I dont even know how to get this data in a matrix or a dada-frame or > anything like this.
Hi. The data as they are may be read into R as character data. The exact way depends on the format of the data in the file. The result may look like the following. Var1 <- c("(1,2)", "(7,8)", "(4,7)") Var2 <- c("(1,5)", "(3,88)", "(12,4)") Var3 <- c("(4,2)", "(6,5)", "(4,4)") DF <- data.frame(Var1, Var2, Var3, stringsAsFactors=FALSE) If you want to use a distance between pairs depending on the numbers (and not only equal/different pair), then the data should to be transformed to a numeric format. For example, as follows trans <- function(x) { y <- strsplit(gsub("[()]", "", x), ",") unname(t(vapply(y, FUN=as.numeric, FUN.VALUE=c(0, 0)))) } DF <- data.frame(Var1=trans(Var1), Var2=trans(Var2), Var2=trans(Var3)) DF Var1.1 Var1.2 Var2.1 Var2.2 Var2.1.1 Var2.2.1 1 1 2 1 5 4 2 2 7 8 3 88 6 5 3 4 7 12 4 4 4 Then, see library(help=cluster). Hope this helps. Petr Savicky. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.