Re: [R] Extracting metadata information to corresponding dissimilarity matrix

David L Carlson Tue, 16 May 2017 09:45:06 -0700

Fixing a typo in the original, adding a simplification, and using dissimilarity 
instead of similarity:


set.seed(42)
dta <- data.frame(ID=1:7, gender=sample(c("M", "F"), 7, replace=TRUE),
     age=sample.int(75, 7))
dsim <- dist(dta$age) # distance, already lower triangular
dsim

dta1 <- dta
names(dta1) <- paste0(names(dta), "1") # generalizes to more than 3 columns
dta2 <- dta
names(dta2) <- paste0(names(dta), "2")

dta12 <- merge(dta2, dta1) # order is important
dta12 <- dta12[dta12$ID1 < dta12$ID2, ] # get rid of duplicates

dta12 <- data.frame(dta12, dsim=as.vector(dsim)) # Typo was here
dta12 <- dta12[, c("ID1", "ID2", "gender1", "gender2", "age1", "age2", "dsim")]
dta12

David C


-----Original Message-----
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of David L Carlson
Sent: Tuesday, May 16, 2017 11:21 AM
To: Rune Grønseth <nielsenr...@me.com>; r-help@r-project.org
Subject: Re: [R] Extracting metadata information to corresponding dissimilarity 
matrix

I think this is what you are trying to do. I've created a data set with 7 rows 
and a similarity matrix based on age:

set.seed(42)
dta <- data.frame(ID=1:7, gender=sample(c("M", "F"), 7, replace=TRUE),
     age=sample.int(75, 7))
sim <- max(dist(dta$age)) - dist(dta$age) # already lower triangular
sim

#    1  2  3  4  5  6
# 2 24               
# 3 21 59            
# 4 40 46 43         
# 5  0 38 41 22      
# 6  7 45 48 29 55   
# 7 55 31 28 47  7 14

# Now duplicate dta:
dta1 <- dta
names(dta1) <- c("ID1", "gender1", "age1")
dta2 <- dta
names(dta2) <- c("ID2", "gender2", "age2")

# Now merge and eliminate unneeded rows
dta12 <- merge(dta2, dta1) # order is important
dta12 <- dta12[dta12$ID1 < dta12$ID2, ]

# Finally combine the similarities with the combined data and rearrange
# the variable names
dta12 <- data.frame(dta12mod, sim=as.vector(sim))
dta12 <- dta12[, c("ID1", "ID2", "gender1", "gender2", "age1", "age2", "sim")]
dta12

#    ID1 ID2 gender1 gender2 age1 age2 sim
# 2    1   2       F       F   11   49  24
# 3    1   3       F       M   11   52  21
# 4    1   4       F       F   11   33  40
# 5    1   5       F       F   11   73   0
# 6    1   6       F       F   11   66   7
# 7    1   7       F       F   11   18  55
# 10   2   3       F       M   49   52  59
# 11   2   4       F       F   49   33  46
# 12   2   5       F       F   49   73  38
# 13   2   6       F       F   49   66  45
# 14   2   7       F       F   49   18  31
# 18   3   4       M       F   52   33  43
# 19   3   5       M       F   52   73  41
# 20   3   6       M       F   52   66  48
# 21   3   7       M       F   52   18  28
# 26   4   5       F       F   33   73  22
# 27   4   6       F       F   33   66  29
# 28   4   7       F       F   33   18  47
# 34   5   6       F       F   73   66  55
# 35   5   7       F       F   73   18   7
# 42   6   7       F       F   66   18  14

-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-----Original Message-----
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Rune Grønseth
Sent: Tuesday, May 16, 2017 4:31 AM
To: r-help@r-project.org
Subject: [R] Extracting metadata information to corresponding dissimilarity 
matrix

Hi,
I am R beginner. I've tried googling and reading, but this might be too simple 
to be found in the documentation. 

I have a dissimilarity index (symmetric matrix) from which I have extracted the 
unique values using the exodist package command "lower". There are 14 
observations, so there are 91 unique comparisons.

After this I'd like to extract corresponding metadata from a separate data 
frame (the 14 observations organized in rows identified by a 
samplenumber-vector, and other variables as gender, age, et cetera). The aim is 
to have a new data frame with 91 rows and metadata vectors giving me the value 
of the dissimilarity index,  gender each of the two observations that are 
compared by the dissimilarity metric. So if I'm looking for gender differences, 
I need 5 vectors in the data frame: samplenumber1, samplenumber2, gender1, 
gender2 and dissimilarity metric.

Does anyone have suggestions or experiences in reformatting data in this 
manner? This is just a test-dataset. My full data-set is for more than 100 
observations, so I need a more general code, if that is possible.

With great appreciation of any help.

Rune Grønseth 

---

Rune Grønseth, MD, PhD, postdoctoral fellow
Department of Thoracic Medicine
Haukeland University Hospital
N-5021 Bergen
Norway

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Extracting metadata information to corresponding dissimilarity matrix

Reply via email to