[R] Help reading table rows into lists

2010-10-10 Thread Alison Waller

Hi all,

I have a large table mapping thousands of COGs(groups of genes) to  
pathways.

# Ex
COG0001 patha   pathb   pathc
COG0002 pathd   pathe
COG0003 pathe   pathf   pathg   pathh
##

I would like to combine this information into a big list such as below
COG2PATHWAY- 
list 
(COG0001 
= 
c 
(patha 
,pathb 
,pathc 
),COG0002=c(pathd,pathe),COG0003=c(pathf,pathg,pathh))


I am stuck and have tried various methods involving (probably mangled)  
versions of lappy and loops.


Any suggestions on the most efficient way to do this would be great.

Thanks,

Alison

Here is my latest attempt.

#

line_num-length(scan(file=/g/bork8/waller/ 
test_COGtoPath.txt,what=character,sep=\n))

COG2Path-vector(list,line_num)
COG2Path-lapply(1:(line_num-1),function(x) scan(file=/g/bork8/waller/ 
test_COGtopath.txt,skip=x,nlines=1,quiet=T,what='character',sep=\t))


#

I am getting an error

#

COG2Path-lapply(1:(line_num-1),function(x) scan(file=/g/bork8/ 
waller/ 
test_COGtopath.txt,skip=x,nlines=1,quiet=T,what='character',sep=\t))

Error in file(file, r) : cannot open the connection
In addition: Warning message:
In file(file, r) :

But if I do scan alone I don't get an error

# then I suppose it looks like the easiest wasy to name the list  
variables is using unix to cut the first column out and then read that  
in.
names(COG2Path)-scan(file=/g/bork8/waller/ 
test_col_names.txt,sep=\t,what=character)


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help reading table rows into lists

2010-10-10 Thread Gabor Grothendieck
On Sun, Oct 10, 2010 at 11:40 AM, Alison Waller alison.wal...@embl.de wrote:
 Hi all,

 I have a large table mapping thousands of COGs(groups of genes) to pathways.
 # Ex
 COG0001 patha   pathb   pathc
 COG0002 pathd   pathe
 COG0003 pathe   pathf   pathg   pathh
 ##

 I would like to combine this information into a big list such as below
 COG2PATHWAY-list(COG0001=c(patha,pathb,pathc),COG0002=c(pathd,pathe),COG0003=c(pathf,pathg,pathh))

 I am stuck and have tried various methods involving (probably mangled)
 versions of lappy and loops.

 Any suggestions on the most efficient way to do this would be great.


Try this:


Lines - COG0001 patha   pathb   pathc
COG0002 pathd   pathe
COG0003 pathe   pathf   pathg   pathh
DF - read.table(textConnection(Lines), header = FALSE,
 fill = TRUE, as.is = TRUE, na.strings = )

library(reshape2)
m - na.omit(melt(DF, 1))
result - unstack(m, value ~ V1)

giving

 result
$COG0001
[1] patha pathb pathc

$COG0002
[1] pathd pathe

$COG0003
[1] pathe pathf pathg pathh


or

 acast(DF, value ~ V1)
  COG0001 COG0002 COG0003
patha patha   NANA
pathb pathb   NANA
pathc pathc   NANA
pathd NApathd   NA
pathe NApathe   pathe
pathf NANApathf
pathg NANApathg
pathh NANApathh
Levels: patha pathb pathc pathd pathe pathf pathg pathh

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help reading table rows into lists

2010-10-10 Thread Jeffrey Spies
To get just the list you wanted, Gabor's solution is more elegant, but
here's another using the apply family.  First, your data:

dat - scan(file=/g/bork8/waller/test_COGtoPath.txt,what=character,sep=\n)

I expect dat to be a vector of strings where each string is a line of
values separated by tabs, which I think, by looking at your other
code, is what you get.

sapply(dat, function(x){
tmp-unlist(strsplit(x, '\t', fixed=T))
out - list(tmp[seq_along(tmp)[-1]])
names(out) - tmp[1]
out
}, USE.NAMES=F)

The one difference between the two is that if you have a COG with no
pathways (might not be realistic or that big of a deal), this solution
will have the COG name in the list with a value of character(0) where
Gabor's will omit the COG completely. Again, probably not a big deal.

Cheers,

Jeff.

On Sun, Oct 10, 2010 at 11:40 AM, Alison Waller alison.wal...@embl.de wrote:
 Hi all,

 I have a large table mapping thousands of COGs(groups of genes) to pathways.
 # Ex
 COG0001 patha   pathb   pathc
 COG0002 pathd   pathe
 COG0003 pathe   pathf   pathg   pathh
 ##

 I would like to combine this information into a big list such as below
 COG2PATHWAY-list(COG0001=c(patha,pathb,pathc),COG0002=c(pathd,pathe),COG0003=c(pathf,pathg,pathh))

 I am stuck and have tried various methods involving (probably mangled)
 versions of lappy and loops.

 Any suggestions on the most efficient way to do this would be great.

 Thanks,

 Alison

 Here is my latest attempt.

 #

 line_num-length(scan(file=/g/bork8/waller/test_COGtoPath.txt,what=character,sep=\n))
 COG2Path-vector(list,line_num)
 COG2Path-lapply(1:(line_num-1),function(x)
 scan(file=/g/bork8/waller/test_COGtopath.txt,skip=x,nlines=1,quiet=T,what='character',sep=\t))

 #

 I am getting an error

 #

COG2Path-lapply(1:(line_num-1),function(x)
 scan(file=/g/bork8/waller/test_COGtopath.txt,skip=x,nlines=1,quiet=T,what='character',sep=\t))
 Error in file(file, r) : cannot open the connection
 In addition: Warning message:
 In file(file, r) :

 But if I do scan alone I don't get an error

 # then I suppose it looks like the easiest wasy to name the list variables
 is using unix to cut the first column out and then read that in.
 names(COG2Path)-scan(file=/g/bork8/waller/test_col_names.txt,sep=\t,what=character)

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help reading table rows into lists

2010-10-10 Thread Gabor Grothendieck
On Sun, Oct 10, 2010 at 2:59 PM, Jeffrey Spies jsp...@virginia.edu wrote:
 To get just the list you wanted, Gabor's solution is more elegant, but
 here's another using the apply family.  First, your data:

 dat - 
 scan(file=/g/bork8/waller/test_COGtoPath.txt,what=character,sep=\n)

 I expect dat to be a vector of strings where each string is a line of
 values separated by tabs, which I think, by looking at your other
 code, is what you get.

 sapply(dat, function(x){
    tmp-unlist(strsplit(x, '\t', fixed=T))
    out - list(tmp[seq_along(tmp)[-1]])
    names(out) - tmp[1]
    out
 }, USE.NAMES=F)

 The one difference between the two is that if you have a COG with no
 pathways (might not be realistic or that big of a deal), this solution
 will have the COG name in the list with a value of character(0) where
 Gabor's will omit the COG completely. Again, probably not a big deal.

If that is important then do it this way:

Lines - COG0001 patha   pathb   pathc
COG0002 pathd   pathe
COG0003 pathe   pathf   pathg   pathh
COG0004
DF - read.table(textConnection(Lines), header = FALSE,
fill = TRUE, as.is = TRUE, na.strings = )

library(reshape2)
m - melt(DF, 1)
lapply(unstack(m, value ~ V1), complete.cases)

acast(m, value ~ V1)


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help reading table rows into lists

2010-10-10 Thread Alison Waller

Thanks Gabor and Jeffrey,

and thanks for explaining the differences.  I think I'll go with  
Jeffery's as I think I want entries for COGs with no pathway.


Alison
On 10-Oct-10, at 8:59 PM, Jeffrey Spies wrote:


sapply(dat, function(x){
   tmp-unlist(strsplit(x, '\t', fixed=T))
   out - list(tmp[seq_along(tmp)[-1]])
   names(out) - tmp[1]
   out
}, USE.NAMES=F)


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help reading table rows into lists

2010-10-10 Thread Gabor Grothendieck
On Sun, Oct 10, 2010 at 3:29 PM, Alison Waller alison.wal...@embl.de wrote:
 Thanks Gabor and Jeffrey,

 and thanks for explaining the differences.  I think I'll go with Jeffery's
 as I think I want entries for COGs with no pathway.


My second post does handle that case.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.