I may have misunderstood, but does this do what you want? > df.mat <- as.matrix(df) > same <- lapply(1:3, function(x) df.mat[grep(paste0("_", x), + rownames(df.mat)), grep(paste0("_", x), colnames(df.mat))]) > same [[1]] HQ673618_1 HQ674317_1 EU686630_1 HQ673618_1 NA 90.8 89.8 HQ674317_1 90.8 NA 98.6 EU686630_1 89.8 98.6 NA
[[2]] EU686593_2 JN166322_2 EU491340_2 EU686593_2 NA 98.1 96.8 JN166322_2 98.1 NA 97.5 EU491340_2 96.8 97.5 NA [[3]] AB694259_3 AB694258_3 AB694462_3 AB694259_3 NA 98.3 95.9 AB694258_3 98.3 NA 95.8 AB694462_3 95.9 95.8 NA > Diff <- as.matrix(expand.grid(1:3, 1:3)) > Diff <- Diff[Diff[,1]<Diff[,2],] > different <- lapply(seq_len(nrow(Diff)), function(x) + df.mat[grep(paste0("_", Diff[x,1]), rownames(df.mat)), + grep(paste0("_", Diff[x,2]), colnames(df.mat))]) > different [[1]] EU686593_2 JN166322_2 EU491340_2 HQ673618_1 89.6 89.8 88.9 HQ674317_1 97.7 98.4 97.4 EU686630_1 98.4 98.9 97.7 [[2]] AB694259_3 AB694258_3 AB694462_3 HQ673618_1 87.8 88.2 88.3 HQ674317_1 94.9 96.2 95.1 EU686630_1 95.4 96.4 95.8 [[3]] AB694259_3 AB694258_3 AB694462_3 EU686593_2 94.4 95.6 94.8 JN166322_2 95.3 96.5 95.9 EU491340_2 96.5 97.7 96.0 ------------------------------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Bert Gunter Sent: Monday, December 1, 2014 11:46 AM To: Tim Richter-Heitmann Cc: r-help@r-project.org Subject: Re: [R] Creating submatrices from a dataframe, depending on factors in sample names I do not have the patience to study your request carefully, but does the following help? > a <- 1:3 > x <- outer(a,a,paste,sep=".") > x [,1] [,2] [,3] [1,] "1.1" "1.2" "1.3" [2,] "2.1" "2.2" "2.3" [3,] "3.1" "3.2" "3.3" > x[upper.tri(x)] [1] "1.2" "1.3" "2.3" > x[upper.tri(x,diag=TRUE)] [1] "1.1" "1.2" "2.2" "1.3" "2.3" "3.3" This gives you a vector all possible pairs (including identical pairs or not) of values of a, which you could then loop over as an index to do what you want, I think. If this is not what you want, just ignore without replying. Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll On Mon, Dec 1, 2014 at 8:47 AM, Tim Richter-Heitmann <trich...@uni-bremen.de> wrote: > Hello there, > > this is a cross-post of a stack-overflow question, which wasnt answered, but > is very important for my work. Apologies for breaking any rules, but i do > hope for some help from the list instead: > > I have a huge matrix of pairwise similarity percentages between different > samples. The samples are belonging to groups. The groups are determined by > the suffix "_n" in the row.names/header names. > In the first step, i wanted to create submatrices consisting of all pairs > within single groups (i.e. for all samples from "_1"). > However, I realized that i need to know all pairwise submatrices, between > all combination of groups. So, i want to create (a list of) vectors that are > named "_n1 vs _n2" (or similar) for all combinations of n, as illustrated by > the colored rectangulars: > > http://i.stack.imgur.com/XMkxj.png > > Reproducible code, as provided by helpful Stack Overflow members, dealing > with identical "_n"s. > > > df <- structure(list(HQ673618_1 = c(NA, 90.8, 89.8, 89.6, 89.8, > 88.9, > 87.8, 88.2, 88.3), HQ674317_1 = c(90.8, NA, 98.6, 97.7, 98.4, > 97.4, 94.9, 96.2, 95.1), EU686630_1 = c(89.8, 98.6, NA, 98.4, > 98.9, 97.7, 95.4, 96.4, 95.8), EU686593_2 = c(89.6, 97.7, 98.4, > NA, 98.1, 96.8, 94.4, 95.6, 94.8), JN166322_2 = c(89.8, 98.4, > 98.9, 98.1, NA, 97.5, 95.3, 96.5, 95.9), EU491340_2 = c(88.9, > 97.4, 97.7, 96.8, 97.5, NA, 96.5, 97.7, 96), AB694259_3 = c(87.8, > 94.9, 95.4, 94.4, 95.3, 96.5, NA, 98.3, 95.9), AB694258_3 = c(88.2, > 96.2, 96.4, 95.6, 96.5, 97.7, 98.3, NA, 95.8), AB694462_3 = c(88.3, > 95.1, 95.8, 94.8, 95.9, 96, 95.9, 95.8, NA)), .Names = > c("HQ673618_1", > "HQ674317_1", "EU686630_1", "EU686593_2", "JN166322_2", > "EU491340_2", > "AB694259_3", "AB694258_3", "AB694462_3"), class = "data.frame", > row.names = c("HQ673618_1", > "HQ674317_1", "EU686630_1", "EU686593_2", "JN166322_2", > "EU491340_2", > "AB694259_3", "AB694258_3", "AB694462_3")) > > > indx <- gsub(".*_", "", names(df)) > sub.matrices <- lapply(unique(indx), function(x) { > temp <- which(indx %in% x) > df[temp, temp] > }) > unique_values <- lapply(sub.matrices, function(x) x[upper.tri(x)]) > names(unique_values) <- unique(indx) > > This code needs to be expanded to form sub.matrices for any combination of > unique indices in temp. > > > Thank you so much! > > > > > -- > Tim Richter-Heitmann (M.Sc.) > PhD Candidate > > > > International Max-Planck Research School for Marine Microbiology > University of Bremen > Microbial Ecophysiology Group (AG Friedrich) > FB02 - Biologie/Chemie > Leobener Straße (NW2 A2130) > D-28359 Bremen > Tel.: 0049(0)421 218-63062 > Fax: 0049(0)421 218-63069 > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.