Re: [R] strsplit help

2012-04-12 Thread alison waller
57412.251850588", 
"457412.251848006",
"657314.locus_tag:CK5_17510", "657313.locus_tag:RTO_05370", 
"457412.251849359",
"471875.197297105", "657313.locus_tag:RTO_09820", 
"657323.locus_tag:CK1_25830",
"471875.197297130", "657314.locus_tag:CK5_09290", "457412.251848019",
"471875.197297928", "657314.locus_tag:CK5_14710", "411460.145847612",
"457412.251849367", "657314.locus_tag:CK5_20860", "471875.197297907",
"657321.locus_tag:RBR_07980"), count_Conser = c(7L, 1L, 2L, 1L,
3L, 0L, 1L, 0L, 4L, 0L, 3L, 4L, 1L, 3L, 0L, 5L, 2L, 2L, 1L, 0L,
0L, 2L, 3L, 0L, 2L, 1L, 1L, 4L, 0L, 0L, 0L, 1L, 1L, 5L, 0L, 0L,
2L, 0L, 1L, 1L, 2L, 0L, 1L, 1L, 1L, 3L, 1L, 2L, 0L, 0L, 0L, 1L,
0L, 0L, 2L, 1L, 1L, 0L, 1L, 4L, 0L, 1L, 1L, 4L, 0L, 7L, 0L, 4L,
1L, 1L, 2L, 0L, 1L, 0L, 0L, 2L, 3L, 0L, 4L, 0L, 1L, 0L, 1L, 4L,
1L, 0L, 5L, 4L, 0L, 6L, 2L, 1L, 3L, 1L, 0L, 2L, 3L, 0L, 1L, 12L,
1L, 1L, 2L, 0L, 0L, 2L, 1L, 2L, 1L, 3L, 2L, 0L, 2L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 3L, 0L, 2L, 0L, 1L, 0L, 2L, 1L, 1L, 1L, 1L,
0L, 2L, 0L, 2L, 2L, 5L, 2L, 18L, 0L, 4L, 2L, 0L, 3L, 0L, 1L,
0L, 1L, 1L, 1L, 3L, 3L, 1L, 1L, 2L, 0L, 1L, 0L, 1L, 0L, 2L, 0L,
0L, 1L, 1L, 2L, 1L, 0L, 1L, 2L, 1L, 0L, 1L, 1L, 2L, 3L, 2L, 0L,
0L, 0L, 3L, 3L, 1L, 1L, 0L, 0L, 3L, 1L, 1L, 0L, 0L, 1L, 0L, 6L,
0L, 3L, 8L, 1L, 3L, 0L, 0L, 3L, 5L, 0L, 1L, 0L, 0L, 1L, 0L, 4L,
3L, 1L, 2L, 0L, 0L, 0L, 4L, 0L, 6L, 6L, 0L, 1L, 2L, 0L, 2L, 3L,
1L, 3L, 0L, 2L, 4L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 2L, 2L, 2L,
0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 4L, 0L,
0L, 3L, 3L, 1L, 0L, 1L, 1L, 2L, 0L, 0L, 1L, 3L, 0L, 2L, 5L, 0L,
0L, 1L, 0L, 8L, 1L, 8L, 2L, 0L, 1L), count_NonCons = c(5L, 4L,
4L, 0L, 0L, 2L, 0L, 2L, 0L, 2L, 4L, 0L, 0L, 2L, 1L, 1L, 2L, 0L,
0L, 0L, 3L, 1L, 1L, 2L, 1L, 0L, 0L, 4L, 1L, 0L, 4L, 2L, 2L, 15L,
2L, 0L, 2L, 0L, 1L, 0L, 1L, 0L, 3L, 0L, 0L, 8L, 0L, 0L, 0L, 0L,
1L, 2L, 4L, 0L, 0L, 0L, 1L, 3L, 5L, 2L, 0L, 0L, 6L, 0L, 2L, 1L,
1L, 4L, 1L, 4L, 1L, 8L, 5L, 1L, 6L, 1L, 5L, 0L, 11L, 0L, 0L,
0L, 2L, 1L, 0L, 0L, 6L, 1L, 0L, 10L, 2L, 1L, 0L, 1L, 1L, 3L,
2L, 1L, 3L, 4L, 1L, 0L, 12L, 0L, 0L, 1L, 3L, 15L, 9L, 4L, 12L,
2L, 4L, 2L, 0L, 0L, 0L, 2L, 2L, 3L, 1L, 1L, 1L, 0L, 0L, 1L, 0L,
5L, 0L, 0L, 1L, 0L, 3L, 4L, 1L, 1L, 2L, 0L, 0L, 0L, 1L, 3L, 9L,
1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 10L, 2L, 0L, 12L, 0L, 1L,
1L, 2L, 0L, 1L, 1L, 3L, 3L, 1L, 4L, 0L, 2L, 1L, 1L, 4L, 0L, 2L,
5L, 5L, 4L, 0L, 0L, 0L, 2L, 0L, 3L, 0L, 2L, 3L, 2L, 3L, 1L, 4L,
2L, 2L, 0L, 6L, 2L, 1L, 2L, 3L, 0L, 7L, 0L, 0L, 6L, 2L, 2L, 1L,
2L, 0L, 6L, 0L, 0L, 3L, 0L, 0L, 0L, 2L, 2L, 1L, 0L, 2L, 2L, 0L,
0L, 4L, 0L, 2L, 1L, 3L, 2L, 0L, 1L, 0L, 1L, 0L, 6L, 1L, 1L, 1L,
2L, 2L, 4L, 1L, 0L, 0L, 2L, 3L, 2L, 0L, 1L, 0L, 0L, 0L, 1L, 2L,
1L, 0L, 16L, 1L, 3L, 0L, 5L, 10L, 1L, 2L, 4L, 0L, 6L, 0L, 0L,
0L, 1L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 11L, 1L, 4L, 5L, 1L, 1L),
 count_ConsSubst = c(5, 3, 1, 1, 3, 1, 0, 1, 1, 0, 0, 2, 0,
 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 3, 0, 1, 0, 0,
 0, 6, 1, 1, 1, 0, 0, 0, 1, 2, 1, 0, 0, 4, 0, 0, 1, 0, 0,
 4, 1, 0, 0, 0, 0, 1, 0, 3, 0, 1, 0, 2, 1, 3, 0, 3, 0, 3,
 2, 0, 1, 1, 3, 4, 2, 0, 9, 0, 1, 1, 1, 0, 2, 0, 1, 1, 0,
 1, 1, 3, 0, 2, 0, 1, 0, 2, 2, 1, 3, 0, 6, 0, 0, 0, 2, 7,
 3, 1, 5, 1, 0, 2, 0, 0, 0, 0, 1, 0, 0, 0, 5, 0, 0, 1, 0,
 0, 0, 1, 0, 0, 3, 1, 0, 1, 1, 2, 0, 2, 0, 5, 2, 0, 0, 0,
 0, 2, 0, 2, 0, 0, 3, 0, 0, 2, 0, 2, 0, 2, 1, 1, 0, 2, 1,
     1, 1, 0, 0, 1, 1, 4, 0, 1, 0, 1, 5, 0, 0, 0, 5, 2, 1, 0,
 0, 1, 0, 0, 0, 4, 0, 2, 1, 1, 1, 2, 1, 1, 1, 4, 1, 2, 1,
 1, 2, 0, 0, 0, 1, 0, 1, 0, 0, 2, 0, 0, 1, 1, 0, 3, 1, 1,
 2, 2, 1, 1, 1, 1, 0, 2, 1, 1, 0, 0, 0, 1, 0, 0, 0, 3, 2,
 0, 1, 1, 0, 0, 0, 0, 2, 1, 1, 0, 0, 0, 0, 0, 3, 1, 0, 0,
 3, 4, 0, 5, 1, 0, 4, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 4,
 1, 4, 0, 0, 0), count_NCSubst = c(1, 0, 0, 0, 1, 1, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1,
 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0,
 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 2, 0, 0, 1, 0, 0, 1, 0, 0,
 0, 1, 1, 1, 0, 0, 1, 3, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0,
 1, 0, 1, 0, 5, 0, 0, 3, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 2, 0, 0,
 0, 1, 1, 1, 0, 2, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0,
 0, 1, 0, 0, 0, 0, 1, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2,
 1, 0, 1, 0, 0, 0, 1, 0, 2, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0,
 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1,
 0, 0, 0, 0, 0, 1, 1, 0, 0, 0)), .Names = c("geneid", "count_Conser",
"count_NonCons", "count_ConsSubst", "count_NCSubst"), class = 
"data.

[R] strsplit help

2012-04-11 Thread alison waller

Dear all,

I want to use string split to parse column names, however, I am having 
some errors that I don't understand.

I see a problem when I try to rbind the output from strsplit.

please let me know if I'm missing something obvious,

thanks,
alison

here are my commands:
>strsplit<-strsplit(as.character(Rumino_Reps_agreeWalign$geneid),"\\.")
> 
Rumino_Reps_agreeWalignTR<-transform(Rumino_Reps_agreeWalign,taxid=do.call(rbind, 
strsplit))

Warning message:
In function (..., deparse.level = 1)  :
  number of columns of result is not a multiple of vector length (arg 1)


here is my data:

> head(Rumino_Reps_agreeWalign)
  geneid count_Conser count_NonCons count_ConsSubst
1 657313.locus_tag:RTO_089407 5   5
2   457412.2518480181 4   3
3 657314.locus_tag:CK5_206302 4   1
4 657323.locus_tag:CK1_330601 0   1
5 657313.locus_tag:RTO_096903 0   3
6   471875.1972971060 2   1
  count_NCSubst
1 1
2 0
3 0
4 0
5 1
6 1

here are the results from strsplit:
> head(strsplit)
[[1]]
[1] "657313"  "locus_tag:RTO_08940"

[[2]]
[1] "457412""251848018"

[[3]]
[1] "657314"  "locus_tag:CK5_20630"

[[4]]
[1] "657323"  "locus_tag:CK1_33060"

[[5]]
[1] "657313"  "locus_tag:RTO_09690"

[[6]]
[1] "471875""197297106"

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] can't install plotrix

2011-08-15 Thread Alison Waller
Hi all,

I'm having problems installing plotrix.  I tried installing it through 
install.packages, and from the unix command line, but each time it seems to 
stall when it is installing the help indices.

has anyone had this same problem, is this package still maintained ?

any help?

thanks
> install.packages("plotrix")
> 
> I also tried using the source package
> R CMD INSTALL plotrix_3.2-3.tar.gz
> 
> both of them seemed to stall at installing help indices.
> I cancelled after about 45 minutes of waiting.
> 
> I tried to load the library incase the functions were loaded.  But
>> library(plotrix)
> Error in library(plotrix) : there is no package called 'plotrix'
> 
> Any knowledge about the status of this package or such errors would be
> great.
> 
> Alison

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Summarizing For Values with Multiple categories

2010-10-23 Thread Alison Waller

Yes, I guess I should update.

> R.version.string
[1] "R version 2.9.0 (2009-04-17)"
On 24-Oct-10, at 1:12 AM, Gabor Grothendieck wrote:


R.version.string


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Summarizing For Values with Multiple categories

2010-10-23 Thread Alison Waller

Thanks!

I tried reading the help for aggregate and can't figure out which form  
of the formula I am using, and therefore the syntax.


I'm getting the below error.

> aggregate(counts ~ ind, merge(stack(CAT2COG), df, by = 1), sum)
Error in as.data.frame.default(x) :
  cannot coerce class "formula" into a data.frame
> aggregate(counts ~ Cats, merge(stack(CAT2COG), df, by = 1), sum)
Error in as.data.frame.default(x) :
  cannot coerce class "formula" into a data.frame
> Cats
[1] A B C D E
Levels: A B C D E
> aggregate(counts ~ COGs, merge(stack(CAT2COG), df, by = 1), sum)
Error in as.data.frame.default(x) :
  cannot coerce class "formula" into a data.frame
On 24-Oct-10, at 12:50 AM, Gabor Grothendieck wrote:


aggregate(counts ~ ind, merge(stack(CAT2COG), df, by = 1), sum)


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Summarizing For Values with Multiple categories

2010-10-23 Thread Alison Waller

Hi all,

I have some data as follows.

Cat1 Cat2 Cat3  COG Counts
   ABC COG1 10
   BD  COG2 20
   C   COG3 30
   D   COG4 40

I would like to sum all the counts for each category:
A   B   C   D
10  30  40  60

>CAT2COG<- 
list(A="COG1",B=c("COG1","COG2"),C=c("COG1","COG3"),D=c("COG2","COG4"))
> COG2CAT<- 
list(COG1=c("A","B","C"),COG2=c("B","D"),COG3=c("C"),COG4="D")
> df<- 
data.frame(COGs=c("COG1","COG2","COG3","COG4"),counts=c(10,20,30,40))



I've been trying various version of apply and well as some crazy loops  
(Eg. below).


Any help would be appreciated

Thanks,

Alison
> CATS<-names(CAT2COG)
> Catcounts<-rep(0,length(CATS))
> counter<-1
> for (i in CATS){
+ Catcounts[counter]<-CatCounts+df$counts[df[1,]=CAT2COG[i],]
Error: syntax error
> counter<-counter+1
> }

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help reading table rows into lists

2010-10-10 Thread Alison Waller

Thanks Gabor and Jeffrey,

and thanks for explaining the differences.  I think I'll go with  
Jeffery's as I think I want entries for COGs with no pathway.


Alison
On 10-Oct-10, at 8:59 PM, Jeffrey Spies wrote:


sapply(dat, function(x){
   tmp<-unlist(strsplit(x, '\t', fixed=T))
   out <- list(tmp[seq_along(tmp)[-1]])
   names(out) <- tmp[1]
   out
}, USE.NAMES=F)


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help reading table rows into lists

2010-10-10 Thread Alison Waller

Hi all,

I have a large table mapping thousands of COGs(groups of genes) to  
pathways.

# Ex
COG0001 patha   pathb   pathc
COG0002 pathd   pathe
COG0003 pathe   pathf   pathg   pathh
##

I would like to combine this information into a big list such as below
COG2PATHWAY<- 
list 
(COG0001 
= 
c 
("patha 
","pathb 
","pathc 
"),COG0002=c("pathd","pathe"),COG0003=c("pathf","pathg","pathh"))


I am stuck and have tried various methods involving (probably mangled)  
versions of lappy and loops.


Any suggestions on the most efficient way to do this would be great.

Thanks,

Alison

Here is my latest attempt.

#

line_num<-length(scan(file="/g/bork8/waller/ 
test_COGtoPath.txt",what="character",sep="\n"))

COG2Path<-vector("list",line_num)
COG2Path<-lapply(1:(line_num-1),function(x) scan(file="/g/bork8/waller/ 
test_COGtopath.txt",skip=x,nlines=1,quiet=T,what='character',sep="\t"))


#

I am getting an error

#

>COG2Path<-lapply(1:(line_num-1),function(x) scan(file="/g/bork8/ 
waller/ 
test_COGtopath.txt",skip=x,nlines=1,quiet=T,what='character',sep="\t"))

Error in file(file, "r") : cannot open the connection
In addition: Warning message:
In file(file, "r") :

But if I do scan alone I don't get an error

# then I suppose it looks like the easiest wasy to name the list  
variables is using unix to cut the first column out and then read that  
in.
names(COG2Path)<-scan(file="/g/bork8/waller/ 
test_col_names.txt",sep="\t",what="character")


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] colour of label points on a boxplot

2010-08-05 Thread alison waller
Hi all,

I have 6 datasets(dataframes Assem_ContigsLen7 through all_ContigsLen12)
containing 3 columns (contig_id, contig_length, read_count).

Each dataset is composed of 3 types of contigs (assemblies of genomic
fragments), 1- all Bacterial fragments, 2 - all Viral fragments, 3 -
mixed fragments.

I identified the type of contig through a merge with another table with
just contig_id and contig_type as below:

AssemViral_ContigsLen<-merge(Assem_ContigsLen,allViral_contigs,by.x="contig_id",by.y="X.Contid.ID",all.x=FALSE)
 
Below is a boxplot for

boxplot(Assem_ContigsLen7$length,Assem_ContigsLen8$length,Assem_ContigsLen9$length,Assem_ContigsLen10$length,Assem_ContigsLen11$length,Assem_ContigsLen12$length,main="100species_rep2",ylab="Contig_length")


All of the longer contigs in the sixth data set are allViral.

How can I colour or label these?
I tried overlaying 2 boxplots of different colours (using add=TRUE), but
the individual points of the whiskers aren't coloured (and I can't
figure out how to do so)
I experimented with using points, but there isn't a general function
that I can apply to all 6 datasets to identify the allViral contigs.

specific questions;
1 -how can I color the data points that represent the whiskers in a boxplot?
2 - Can I identify and colour subsets of datapoints within a boxplot?
3- any other suggestions?

Thank you,

Alison



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using sprintf to pass a variable to a RMySQL query

2010-03-11 Thread alison waller
Hi all,

I re-installed R and tcltk.  I find some of the documentation misleading
as it indicates that tcltk is included with R.  And when you type
library() it shows tcltk, even though it hasn't been installed.

Anyways, I've decided to go with sprintf.
I am having errors with my query criteria.

I have slightly changed by criteria as I want to match 'MGi.' (so that I
match MG1. and MG10. if I did %MGi% won't I match MG1. and MG10.
I tried to escape the period with a backslash,quotes and double period. 
I think that R is fine with the syntax, but SQL doesn't like it.

Can anyone please help me with the syntax.

thank you,



## Error##
Error in mysqlExecStatement(conn, statement, ...) :
  RS-DBI driver: (could not run statement: You have an error in your SQL
syntax; check the manual that corresponds to your MySQL server version
for the right syntax to use near '1' at line 2
)
Calls: dbGetQuery ... .valueClassTest -> is -> is -> mysqlExecStatement
-> .Call
Execution halted
Script#


library(RMySQL)
mysql<-dbDriver("MySQL")
con<-dbConnect(mysql,username="u",host="g",password="s",port=,dbname="M")

i<-1
k<-0

while (k<=17) {
 while (i<=72) {

sqlcmd_ScaffLen<-sprintf('SELECT scaffold.length
FROM scaffold,scaffold2contig,contig2read
WHERE scaffold.scaffold_id=scaffold2contig.scaffold_id AND
scaffold2contig.contig_id=contig2read.contig_id AND
contig2read.read_id LIKE
\'%%MG%d..%%\'' ,i)

sqlcmd_contigs<-sprintf('SELECT length FROM contig WHERE external_id
 LIKE\'%%MG%d..%%\'',i)

sqlcmd_singletons<-paste('SELECT COUNT(*) FROM contig WHERE
read_count=1 AND external_id LIKE \'%%MG%d..%%\'',i)

MG_ScaffoldLen<-dbGetQuery(con,sqlcmd_ScaffLen)

MG_ContigsLen<-dbGetQuery(con,sqlcmd_contigs)

MG_SingletonsCount<-dbGetQuery(con,sqlcmd_singletons)

   
MG_ScaffoldLen_Summ<-as.data.frame(c(summary(MG_ScaffoldLen$length),MG_SingletonsCount))
MG_ContigsLen_Summ<-summary(MG_ContigsLen$length)

   
write.table(MG_ScaffoldLen_Summ,file="ScaffoldLen_SummStats.txt",append=TRUE,sep='\t')

   
write.table(MG_ContigsLen_Summ,file="ContigsLen_SummStats.txt",append=TRUE,sep='\t')

# Keep names for 4 of them so we can do summary plots for each treatment
# (ie combine all 4 reps)

MG_ScaffoldLen<-assign(paste('MG_ScaffoldLen',i,sep=''),MG_ScaffoldLen)

MG_ContigsLen<-assign(paste('MG_ContigsLen',i,sep=''),MGContigsLen)

i<-i+18
}
### Summary Plots For each Treatment ##

  jpeg(file=sprintf("Boxplots%dSanger_Virus.jpeg",k))
 
sprintf("boxplot(MG_ScaffoldLen(1+%d)$length,MG_ScaffoldLen(18+%d)$length,MG_ScaffoldLen(36+%d)$length,MG_ScaffoldLen(54+%d)$length)",k)
  dev.off()

  jpeg(file=sprintf("Scaffold_histograms%dSanger_Virus.jpeg",k))
  par(mfrow=c(1,3))
  sprintf("hist(MG_ScaffoldLen(1+%d)$length)",k)
  sprintf("hist(MG_ScaffoldLen(18+%d)$length)",k)
  sprintf("hist(MG_ScaffoldLen(36+%d)$length)",k)
  sprintf("hist(MG_ScaffoldLen(54+%d)$length)",k)
  dev.off()

  jpeg(file=sprintf("Contig_histograms%dSanger_Virus.jpeg",k))
  par(mfrow=c(1,3))
  sprintf("hist(MG_ContigsLen(1+%d)$length)",k)
  sprintf("hist(MG_ContigsLen(18+%d)$length)",k)
  sprintf("hist(MG_ContigsLen(36+%d)$length)",k)
  sprintf("hist(MG_ContigsLen(54+%d)$length)",k)
  dev.off()

  k<-k+1
  i<-1+k
  }


On 03/11/10 16:01, Uwe Ligges wrote:
> On 10.03.2010 12:45, alison waller wrote:
>> Thanks Gabor,
>>
>> As I said I would like to use gsubfn, but I am having problems
>> installing it, which I assume are due to some conflict with the current
>> tcltk package
>>
>> Below is the error I got after issuing install.packages("gsubfn")
>>
>> Any advice?
>
>
> Re-install R including the tcltk package?
>
> Uwe Ligges
>
>
>> ###
>> * Installing *source* package 'gsubfn' ...
>> ** R
>> ** demo
>> ** inst
>> ** preparing package for lazy loading
>> Warning: S3 methods '$.tclvar', '$<-.tclvar', 'as.character.tclObj',
>> 'as.character.tclVar', 'as.double.tclObj', 'as.integer.tclObj',
>> 'as.logical.tclObj', 'print.tclObj', '[[.tclArray', '[[<-.tclArray',
>> '$.tclArray', '$<-.tclArray', 'names.tclArray', 'names<-.tclArray',
>> 'length.tclArray', 'length<-.tcl

Re: [R] using sprintf to pass a variable to a RMySQL query

2010-03-10 Thread alison waller
Thanks Gabor,

As I said I would like to use gsubfn, but I am having problems
installing it, which I assume are due to some conflict with the current
tcltk package

Below is the error I got after issuing install.packages("gsubfn")

Any advice?

###
* Installing *source* package 'gsubfn' ...
** R
** demo
** inst
** preparing package for lazy loading
Warning: S3 methods '$.tclvar', '$<-.tclvar', 'as.character.tclObj',
'as.character.tclVar', 'as.double.tclObj', 'as.integer.tclObj',
'as.logical.tclObj', 'print.tclObj', '[[.tclArray', '[[<-.tclArray',
'$.tclArray', '$<-.tclArray', 'names.tclArray', 'names<-.tclArray',
'length.tclArray', 'length<-.tclArray', 'tclObj.tclVar',
'tclObj<-.tclVar', 'tclvalue.default', 'tclvalue.tclObj',
'tclvalue.tclVar', 'tclvalue<-.default', 'tclvalue<-.tclVar' were
declared in NAMESPACE but not found
Error in namespaceExport(ns, exports) :
  undefined exports: addTclPath, as.tclObj, is.tclObj, is.tkwin
Error : package 'tcltk' could not be loaded
ERROR: lazy loading failed for package 'gsubfn'
* Removing '/g/bork3/x86_64/lib64/R/library/gsubfn'

The downloaded packages are in
'/tmp/RtmpkfvT5f/downloaded_packages'
Updating HTML index of packages in '.Library'
Warning message:
In install.packages("gsubfn", lib = "/g/bork3/x86_64/lib64/R/library") :
  installation of package 'gsubfn' had non-zero exit status

## this is the error when I tried to install tcltk#
install.packages("tcltk")
Warning message:
In getDependencies(pkgs, dependencies, available, lib) :
  package 'tcltk' is not available



On 03/09/10 16:26, Gabor Grothendieck wrote:
> On Tue, Mar 9, 2010 at 7:10 AM, alison waller  wrote:
>   
>> Hi all,
>>
>> Thanks for help with the paste and sprintf syntax.
>>
>> So I've decided to use paste and or sprintf.  'gsubfn' looks like a
>> great package but unfortunately I've had problems installing it, as I
>> don't think it likes the version of tcltk that is installed.  I'm
>> working on a few unix clusters with many computers and there seems to be
>> problems with different versions of R and different versions of the
>> packages on different computers.
>> 
> The fn$ functionality that I mentioned does not use the tcltk package
> so the version of tcltk should not matter.
>
> The only part of the package that uses tcltk is strapply, which is not
> used here, and even in that case there is R code to it as well if you
> use strapply(..., engine = "R") or use ostrapply.
>
> Also the older 0.3-9 version of the gsubfn package did not use tcltk at all.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using sprintf to pass a variable to a RMySQL query

2010-03-09 Thread alison waller
Hi all,

Thanks for help with the paste and sprintf syntax.

So I've decided to use paste and or sprintf.  'gsubfn' looks like a
great package but unfortunately I've had problems installing it, as I
don't think it likes the version of tcltk that is installed.  I'm
working on a few unix clusters with many computers and there seems to be
problems with different versions of R and different versions of the
packages on different computers.

So, the other problem is that I want to rename the data.frames and names
of the output jpeg files resulting from the queries.  I've tried a few
different approaches but none seem to work, using sprintf and paste
turns the data frame into just a string of the name.

I have a complicated loop here as I'd like to do some summary output
after every 4 queries (ie. after MG1, MG 19, MG 37, MG 54) then I want
to start again and do for MG2, MG20 etc..

Here's my code below, there are probably error in the loop structure
that I can work out, but I need help with renaming the data frames based
on the parameters i and j

thanks


i<-1
j<-1

for (i<=72 and j<=4){{

sqlcmd_ScaffLen<- paste("SELECT scaffold.length
FROM scaffold, scaffold2contig, contig2read
WHERE scaffold.scaffold_id=scaffold2contig.scaffold_id AND
scaffold2contig.contig_id=contig2read.contig_id AND
contig2read.read_id LIKE '%MG", i ,"%'", sep='')

sqlcmd_contigs<-paste("SELECT length FROM contig WHERE external_id LIKE
'%MG",i,"%'",sep='' )

sqlcmd_singletons<-paste("SELECT COUNT(*) FROM contig WHERE
read_count=1 AND external_id LIKE '%MG",i,"%'",sep='')

MGi_ScaffoldLen<-dbGetQuery(con,sqlcmd_ScaffLen)
MGi_ContigsLen<-dbGetQuery(con,sqlcmd_contigs)
MGi_SingletonsCount<-dbGetQuery(con,sqlcmd_singletons)

MGi_ScaffoldLen_Summ<-as.data.frame(c(summary(MGi_ScaffoldLen$length),MGi_SingletonsCount))
MGi_ContigsLen_Summ<-summary(MGi_ContigsLen$length)

write.table(MGi_ScaffoldLen_Summ,file="ScaffoldLen_SummStats.txt",append=TRUE,sep='\t')

write.table(MGi_ContigsLen_Summ,file="ContigsLen_SummStats.txt",append=TRUE,sep='\t')

i<-i+18
j<-j+1

}

### Summary Plots For each Treatment ##

jpeg(file=sprintf("Boxplots_%d.jpeg",i)
boxplot(MGi_ScaffoldLen$length,MG(i+18*j)_ScaffoldLen$length,MG(i+_ScaffoldLen$length,MG59_ScaffoldLen$length,Main="400spec_10virus")
dev.off()

jpeg(file=sprintf("Scaffold_histograms_%d.jpeg",i)
hist(MGi_ScaffoldLen$length)
hist(MG(i+j*18)_ScaffoldLen$length)
hist(MG(i+j*18_ScaffoldLen$length)
hist(MG(i+j*18_ScaffoldLen$length)

dev.off()

jpeg(file=sprintf("Contig_histograms_%d.jpeg",i)
hist(MGi_ContigsLen$length)
hist(MG(i+j*18)_ContigsLen$length)
hist(MG(i+j*18_ContigsLen$length)
hist(MG(i+j*18_ContigsLen$length)

dev.off()

j<-1
i<-2
}


On 03/08/10 21:02, Don MacQueen wrote:
> I always use paste()
>
> i <- 1
> sqlcmd_ScaffLen <- paste("SELECT scaffold.length
> FROM scaffold, scaffold2contig, contig2read
> WHERE scaffold.scaffold_id=scaffold2contig.scaffold_id AND
> scaffold2contig.contig_id=contig2read.contig_id AND
> contig2read.read_id LIKE '%MG", i ,"%'", sep='')
>
> That should create bits like
>LIKE '%MG1%'
>LIKE '%MG2%'
> and so on.
>
> You just have to get the nesting of the single and double quotes
> correct - the SQL requires single quotes, so use double quotes for the
> fixed character strings insidte paste(). That, and use sep='' to get
> rid of unwanted space characters.
>
> Using paste is also effective for constructs like
>   IN (3,4,5)
> or
>   IN ('a','b','c')
> though it can be necessary to nest one paste within another
>
> -Don
>
> At 2:06 PM +0100 3/8/10, alison waller wrote:
>> Hello,
>>
>> I am using RmySQL and would like to iterate through a few queries.
>>
>> I would like to use sprintf but I think I'm having problems mixing and
>> matching the sprintf syntax and the SQL regex.
>>
>> I have checked my sqlcmd and it works when I wan to match %MG1% but how
>> do I iterate for i 1-72?  Escape characters,?
>>
>> thanks in advance
>>
>> i<-1
>> sqlcmd_ScaffLen<-sprintf('SELECT scaffold.length
>> FROM scaffold,scaffold2contig,contig2read
>> WHERE scaffold.scaffold_id=scaffold2contig.scaffold_id AND
>> scaffold2contig.contig_id=contig2read.contig_id AND
>> contig2read.read_id LIKE
>> '%MG%s%' ,i)
>>
>> = Here is my vague error message
>>
>> Error: unexpected input in:
>>
>> __
>> R-help@r-project.org mailing list
>> https://*stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://*www.*R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] using sprintf to pass a variable to a RMySQL query

2010-03-08 Thread alison waller
Hello,

I am using RmySQL and would like to iterate through a few queries.

I would like to use sprintf but I think I'm having problems mixing and
matching the sprintf syntax and the SQL regex.

I have checked my sqlcmd and it works when I wan to match %MG1% but how
do I iterate for i 1-72?  Escape characters,?

thanks in advance

i<-1
sqlcmd_ScaffLen<-sprintf('SELECT scaffold.length
FROM scaffold,scaffold2contig,contig2read
WHERE scaffold.scaffold_id=scaffold2contig.scaffold_id AND
scaffold2contig.contig_id=contig2read.contig_id AND contig2read.read_id LIKE
'%MG%s%' ,i)

= Here is my vague error message

Error: unexpected input in:

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] finding most highly transcribed genes - ranking, sorting and subsets?

2007-12-07 Thread alison waller
Thanks - great, should have thought of option b)


-Original Message-
From: Martin Morgan [mailto:[EMAIL PROTECTED] 
Sent: Friday, December 07, 2007 12:52 PM
To: alison waller
Cc: [EMAIL PROTECTED]
Subject: Re: [R] finding most highly transcribed genes - ranking, sorting
and subsets?

Hi Alison --

It's a funny twist of terminology, isn't it? high rank (we're #1!)
corresponds to low value. Maybe a wimpy stats joke? Anyway, (a) if m
is assigned rownames (e.g., from the appropriate column of the 'genes'
data frame in the limma object, rownames(m) <- maList$genes$GeneName)
they'll be caried through the analysis and (b) if you've extracted m
from a limma MAList, then subsetting the MAList with hrow
(maList[hrow,]) will give you a new MAList with all the info carrying
through. This would be the better way to go.

Martin

"alison waller" <[EMAIL PROTECTED]> writes:

> Thanks so much Martin,
>
> This method is definitely more straightforward.  And you are right I don't
> think I was doing anything wrong before. However, I thought that rank,
would
> rank the highest 1st, however after looking at the results using your
> methods, I realized it ranks the lowest number 1.  So I modified it for
> rank>18500.  And now I'm getting 300 rows for which the intensity is
> consistenly high.
>
> However, I am still laking some information.  For the results I can get a
> matrix of 300 rows and the corresponding intensities (from m) or rank
(from
> h), but what I really want is the name of the original row, which
> corresponds to a specific spot on the array).
>
> I did msubset<-m[hrows,] and as mentioned I just get the rows numbered
> 1-300, while I want to essentially pickout the 300 rows from the original
> 19,000 rows maintaing the original row designation as it corresponds to a
> specific gene.
>
> Thanks again for any suggestions,
>
> Alison
>
> -Original Message-
> From: Martin Morgan [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, December 06, 2007 4:06 PM
> To: alison waller
> Subject: Re: [R] finding most highly transcribed genes - ranking, sorting
> and subsets?
>
> Hi Alison --
>
> I'm not sure where your problem is coming from, but R can help you to
> more efficiently do your task. Skipping the bioc terminology and data
> structures, you have a matrix
>
>> m <- matrix(runif(10), ncol=10)
>
> you'd like to determine the rank of values in each column
>
>> r <- apply(m, 2, rank)
>
> identfiy those with high rank
>
>> h <- r < 500
>
> and find the rows for which the rank is always high
>
>> hrows <- apply(h, 1, all)
>
> you can then use hrows to subset your original matrix (m[hrows,]) or
> otherwise, e.g., how many rows with high rank
>
>> sum(hrows)
> [1] 0
>
> or perhaps the distribution of the number of columns in which high
> ranking genes occur.
>
>> table(apply(h, 1, sum))
>
>01234 
> 5996 3132  765  1007 
>
> Martin
>
> "alison waller" <[EMAIL PROTECTED]> writes:
>
>> Hello,
>>
>>  
>>
>> I am not only interested in finding out which genes are the most highly
> up-
>> or down-regulated (which I have done using the linear models and Bayesian
>> statistics in Limma), but I also want to know which genes are
consistently
>> highly transcribed (ie. they have a high intensity in the channel of
>> interest eg. Cy5 or Cy3 across the set of experiments).  I might have
> missed
>> a straight forward way to do this, or a valuable function, but I've been
>> using my own methods and going around in circles.
>>
>>  
>>
>> So far I've normalized within and between arrays, then returned the RG
>> values using RG<-RG.MA, then I ranked each R and G values for each array
> as
>> below.
>>
>> rankRG<-RG
>>
>> rankRG$R[,1]<-rank(rankRG$R[,1])
>>
>> rankRG$R[,2]<-rank(rankRG$R[,2]) .. and so on for 6 columns(ie. arrays,
as
>> well as the G's)
>>
>>  
>>
>> then I thought I could pull out a subset of rankRG using something like;
>>
>> topRG<-rankRG
>>
>> topRG$R<-subset(topRG$R,topRG$R[,1]<500&topRG$R[,2]<500&topRG$R[,5]<500)
>>
>>  
>>
>> However, this just returned me a matrix with one row of $R (the ranks
were
>> <500 for columns 1,2, and 5 and greater than 500 for 3,4,and 6).
However,
> I
>> can't believe that there is only one gene that is in the top 500 for R
>> intensitiy among those three arrays.
>>
>>  
>>
>> Am I doing something wrong?

Re: [R] finding most highly transcribed genes - ranking, sorting and subsets?

2007-12-07 Thread alison waller
Thanks so much Martin,

This method is definitely more straightforward.  And you are right I don't
think I was doing anything wrong before. However, I thought that rank, would
rank the highest 1st, however after looking at the results using your
methods, I realized it ranks the lowest number 1.  So I modified it for
rank>18500.  And now I'm getting 300 rows for which the intensity is
consistenly high.

However, I am still laking some information.  For the results I can get a
matrix of 300 rows and the corresponding intensities (from m) or rank (from
h), but what I really want is the name of the original row, which
corresponds to a specific spot on the array).

I did msubset<-m[hrows,] and as mentioned I just get the rows numbered
1-300, while I want to essentially pickout the 300 rows from the original
19,000 rows maintaing the original row designation as it corresponds to a
specific gene.

Thanks again for any suggestions,

Alison

-Original Message-
From: Martin Morgan [mailto:[EMAIL PROTECTED] 
Sent: Thursday, December 06, 2007 4:06 PM
To: alison waller
Subject: Re: [R] finding most highly transcribed genes - ranking, sorting
and subsets?

Hi Alison --

I'm not sure where your problem is coming from, but R can help you to
more efficiently do your task. Skipping the bioc terminology and data
structures, you have a matrix

> m <- matrix(runif(10), ncol=10)

you'd like to determine the rank of values in each column

> r <- apply(m, 2, rank)

identfiy those with high rank

> h <- r < 500

and find the rows for which the rank is always high

> hrows <- apply(h, 1, all)

you can then use hrows to subset your original matrix (m[hrows,]) or
otherwise, e.g., how many rows with high rank

> sum(hrows)
[1] 0

or perhaps the distribution of the number of columns in which high
ranking genes occur.

> table(apply(h, 1, sum))

   01    2    34 
5996 3132  765  1007 

Martin

"alison waller" <[EMAIL PROTECTED]> writes:

> Hello,
>
>  
>
> I am not only interested in finding out which genes are the most highly
up-
> or down-regulated (which I have done using the linear models and Bayesian
> statistics in Limma), but I also want to know which genes are consistently
> highly transcribed (ie. they have a high intensity in the channel of
> interest eg. Cy5 or Cy3 across the set of experiments).  I might have
missed
> a straight forward way to do this, or a valuable function, but I've been
> using my own methods and going around in circles.
>
>  
>
> So far I've normalized within and between arrays, then returned the RG
> values using RG<-RG.MA, then I ranked each R and G values for each array
as
> below.
>
> rankRG<-RG
>
> rankRG$R[,1]<-rank(rankRG$R[,1])
>
> rankRG$R[,2]<-rank(rankRG$R[,2]) .. and so on for 6 columns(ie. arrays, as
> well as the G's)
>
>  
>
> then I thought I could pull out a subset of rankRG using something like;
>
> topRG<-rankRG
>
> topRG$R<-subset(topRG$R,topRG$R[,1]<500&topRG$R[,2]<500&topRG$R[,5]<500)
>
>  
>
> However, this just returned me a matrix with one row of $R (the ranks were
> <500 for columns 1,2, and 5 and greater than 500 for 3,4,and 6).  However,
I
> can't believe that there is only one gene that is in the top 500 for R
> intensitiy among those three arrays.
>
>  
>
> Am I doing something wrong?  Can someone think of a better way of doing
> this?
>
>  
>
> Thanks
>
>  
>
> Alison
>
>  
>
>  
>
> **
> Alison S. Waller  M.A.Sc.
> Doctoral Candidate
> [EMAIL PROTECTED]
> 416-978-4222 (lab)
> Department of Chemical Engineering
> Wallberg Building
> 200 College st.
> Toronto, ON
> M5S 3E5
>
>   
>
>  
>
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Dr. Martin Morgan, PhD
Computational Biology Shared Resource Director
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] finding most highly transcribed genes - ranking, sorting and subsets?

2007-12-06 Thread alison waller
Hello,

 

I am not only interested in finding out which genes are the most highly up-
or down-regulated (which I have done using the linear models and Bayesian
statistics in Limma), but I also want to know which genes are consistently
highly transcribed (ie. they have a high intensity in the channel of
interest eg. Cy5 or Cy3 across the set of experiments).  I might have missed
a straight forward way to do this, or a valuable function, but I've been
using my own methods and going around in circles.

 

So far I've normalized within and between arrays, then returned the RG
values using RG<-RG.MA, then I ranked each R and G values for each array as
below.

rankRG<-RG

rankRG$R[,1]<-rank(rankRG$R[,1])

rankRG$R[,2]<-rank(rankRG$R[,2]) .. and so on for 6 columns(ie. arrays, as
well as the G's)

 

then I thought I could pull out a subset of rankRG using something like;

topRG<-rankRG

topRG$R<-subset(topRG$R,topRG$R[,1]<500&topRG$R[,2]<500&topRG$R[,5]<500)

 

However, this just returned me a matrix with one row of $R (the ranks were
<500 for columns 1,2, and 5 and greater than 500 for 3,4,and 6).  However, I
can't believe that there is only one gene that is in the top 500 for R
intensitiy among those three arrays.

 

Am I doing something wrong?  Can someone think of a better way of doing
this?

 

Thanks

 

Alison

 

 

**
Alison S. Waller  M.A.Sc.
Doctoral Candidate
[EMAIL PROTECTED]
416-978-4222 (lab)
Department of Chemical Engineering
Wallberg Building
200 College st.
Toronto, ON
M5S 3E5

  

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.