[R] Help reading table rows into lists
Hi all, I have a large table mapping thousands of COGs(groups of genes) to pathways. # Ex COG0001 patha pathb pathc COG0002 pathd pathe COG0003 pathe pathf pathg pathh ## I would like to combine this information into a big list such as below COG2PATHWAY- list (COG0001 = c (patha ,pathb ,pathc ),COG0002=c(pathd,pathe),COG0003=c(pathf,pathg,pathh)) I am stuck and have tried various methods involving (probably mangled) versions of lappy and loops. Any suggestions on the most efficient way to do this would be great. Thanks, Alison Here is my latest attempt. # line_num-length(scan(file=/g/bork8/waller/ test_COGtoPath.txt,what=character,sep=\n)) COG2Path-vector(list,line_num) COG2Path-lapply(1:(line_num-1),function(x) scan(file=/g/bork8/waller/ test_COGtopath.txt,skip=x,nlines=1,quiet=T,what='character',sep=\t)) # I am getting an error # COG2Path-lapply(1:(line_num-1),function(x) scan(file=/g/bork8/ waller/ test_COGtopath.txt,skip=x,nlines=1,quiet=T,what='character',sep=\t)) Error in file(file, r) : cannot open the connection In addition: Warning message: In file(file, r) : But if I do scan alone I don't get an error # then I suppose it looks like the easiest wasy to name the list variables is using unix to cut the first column out and then read that in. names(COG2Path)-scan(file=/g/bork8/waller/ test_col_names.txt,sep=\t,what=character) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help reading table rows into lists
On Sun, Oct 10, 2010 at 11:40 AM, Alison Waller alison.wal...@embl.de wrote: Hi all, I have a large table mapping thousands of COGs(groups of genes) to pathways. # Ex COG0001 patha pathb pathc COG0002 pathd pathe COG0003 pathe pathf pathg pathh ## I would like to combine this information into a big list such as below COG2PATHWAY-list(COG0001=c(patha,pathb,pathc),COG0002=c(pathd,pathe),COG0003=c(pathf,pathg,pathh)) I am stuck and have tried various methods involving (probably mangled) versions of lappy and loops. Any suggestions on the most efficient way to do this would be great. Try this: Lines - COG0001 patha pathb pathc COG0002 pathd pathe COG0003 pathe pathf pathg pathh DF - read.table(textConnection(Lines), header = FALSE, fill = TRUE, as.is = TRUE, na.strings = ) library(reshape2) m - na.omit(melt(DF, 1)) result - unstack(m, value ~ V1) giving result $COG0001 [1] patha pathb pathc $COG0002 [1] pathd pathe $COG0003 [1] pathe pathf pathg pathh or acast(DF, value ~ V1) COG0001 COG0002 COG0003 patha patha NANA pathb pathb NANA pathc pathc NANA pathd NApathd NA pathe NApathe pathe pathf NANApathf pathg NANApathg pathh NANApathh Levels: patha pathb pathc pathd pathe pathf pathg pathh -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help reading table rows into lists
To get just the list you wanted, Gabor's solution is more elegant, but here's another using the apply family. First, your data: dat - scan(file=/g/bork8/waller/test_COGtoPath.txt,what=character,sep=\n) I expect dat to be a vector of strings where each string is a line of values separated by tabs, which I think, by looking at your other code, is what you get. sapply(dat, function(x){ tmp-unlist(strsplit(x, '\t', fixed=T)) out - list(tmp[seq_along(tmp)[-1]]) names(out) - tmp[1] out }, USE.NAMES=F) The one difference between the two is that if you have a COG with no pathways (might not be realistic or that big of a deal), this solution will have the COG name in the list with a value of character(0) where Gabor's will omit the COG completely. Again, probably not a big deal. Cheers, Jeff. On Sun, Oct 10, 2010 at 11:40 AM, Alison Waller alison.wal...@embl.de wrote: Hi all, I have a large table mapping thousands of COGs(groups of genes) to pathways. # Ex COG0001 patha pathb pathc COG0002 pathd pathe COG0003 pathe pathf pathg pathh ## I would like to combine this information into a big list such as below COG2PATHWAY-list(COG0001=c(patha,pathb,pathc),COG0002=c(pathd,pathe),COG0003=c(pathf,pathg,pathh)) I am stuck and have tried various methods involving (probably mangled) versions of lappy and loops. Any suggestions on the most efficient way to do this would be great. Thanks, Alison Here is my latest attempt. # line_num-length(scan(file=/g/bork8/waller/test_COGtoPath.txt,what=character,sep=\n)) COG2Path-vector(list,line_num) COG2Path-lapply(1:(line_num-1),function(x) scan(file=/g/bork8/waller/test_COGtopath.txt,skip=x,nlines=1,quiet=T,what='character',sep=\t)) # I am getting an error # COG2Path-lapply(1:(line_num-1),function(x) scan(file=/g/bork8/waller/test_COGtopath.txt,skip=x,nlines=1,quiet=T,what='character',sep=\t)) Error in file(file, r) : cannot open the connection In addition: Warning message: In file(file, r) : But if I do scan alone I don't get an error # then I suppose it looks like the easiest wasy to name the list variables is using unix to cut the first column out and then read that in. names(COG2Path)-scan(file=/g/bork8/waller/test_col_names.txt,sep=\t,what=character) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help reading table rows into lists
On Sun, Oct 10, 2010 at 2:59 PM, Jeffrey Spies jsp...@virginia.edu wrote: To get just the list you wanted, Gabor's solution is more elegant, but here's another using the apply family. First, your data: dat - scan(file=/g/bork8/waller/test_COGtoPath.txt,what=character,sep=\n) I expect dat to be a vector of strings where each string is a line of values separated by tabs, which I think, by looking at your other code, is what you get. sapply(dat, function(x){ tmp-unlist(strsplit(x, '\t', fixed=T)) out - list(tmp[seq_along(tmp)[-1]]) names(out) - tmp[1] out }, USE.NAMES=F) The one difference between the two is that if you have a COG with no pathways (might not be realistic or that big of a deal), this solution will have the COG name in the list with a value of character(0) where Gabor's will omit the COG completely. Again, probably not a big deal. If that is important then do it this way: Lines - COG0001 patha pathb pathc COG0002 pathd pathe COG0003 pathe pathf pathg pathh COG0004 DF - read.table(textConnection(Lines), header = FALSE, fill = TRUE, as.is = TRUE, na.strings = ) library(reshape2) m - melt(DF, 1) lapply(unstack(m, value ~ V1), complete.cases) acast(m, value ~ V1) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help reading table rows into lists
Thanks Gabor and Jeffrey, and thanks for explaining the differences. I think I'll go with Jeffery's as I think I want entries for COGs with no pathway. Alison On 10-Oct-10, at 8:59 PM, Jeffrey Spies wrote: sapply(dat, function(x){ tmp-unlist(strsplit(x, '\t', fixed=T)) out - list(tmp[seq_along(tmp)[-1]]) names(out) - tmp[1] out }, USE.NAMES=F) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help reading table rows into lists
On Sun, Oct 10, 2010 at 3:29 PM, Alison Waller alison.wal...@embl.de wrote: Thanks Gabor and Jeffrey, and thanks for explaining the differences. I think I'll go with Jeffery's as I think I want entries for COGs with no pathway. My second post does handle that case. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.