Re: [R] cluster analysis with pairwise data

Petr Savicky Wed, 04 Apr 2012 09:14:27 -0700

On Wed, Apr 04, 2012 at 01:32:10PM +0200, paladini wrote:
> Hello,
> I want to do a cluster analysis with my data. The problem is, that the 
> variables dont't consist of single value but the entries are pairs of 
> values.
> That lokks like this:
> 
> 
> Variable 1:    Variable2:      Variable3:  .    .    .
> (1,2)          (1,5)           (4,2)
> (7,8)          (3,88)          (6,5)
> (4,7)          (12,4)          (4,4)
> .               .              .
> .               .              .
> .               .              .
> Is it possible to perform a cluster-analysis with this kind of data in 
> R ?
> I dont even know how to get this data in a matrix or a dada-frame or 
> anything like this.


Hi.

The data as they are may be read into R as character data. The
exact way depends on the format of the data in the file. The
result may look like the following.

  Var1 <- c("(1,2)", "(7,8)", "(4,7)")
  Var2 <- c("(1,5)", "(3,88)", "(12,4)")
  Var3 <- c("(4,2)", "(6,5)", "(4,4)")
  DF <- data.frame(Var1, Var2, Var3, stringsAsFactors=FALSE)

If you want to use a distance between pairs depending on the
numbers (and not only equal/different pair), then the data should
to be transformed to a numeric format. For example, as follows

  trans <- function(x)
  {
      y <- strsplit(gsub("[()]", "", x), ",")
      unname(t(vapply(y, FUN=as.numeric, FUN.VALUE=c(0, 0))))
  }

  DF <- data.frame(Var1=trans(Var1), Var2=trans(Var2), Var2=trans(Var3))
  DF

    Var1.1 Var1.2 Var2.1 Var2.2 Var2.1.1 Var2.2.1
  1      1      2      1      5        4        2
  2      7      8      3     88        6        5
  3      4      7     12      4        4        4

Then, see library(help=cluster).

Hope this helps.

Petr Savicky.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cluster analysis with pairwise data

Reply via email to