Re: [R] strsplit help
57412.251850588", "457412.251848006", "657314.locus_tag:CK5_17510", "657313.locus_tag:RTO_05370", "457412.251849359", "471875.197297105", "657313.locus_tag:RTO_09820", "657323.locus_tag:CK1_25830", "471875.197297130", "657314.locus_tag:CK5_09290", "457412.251848019", "471875.197297928", "657314.locus_tag:CK5_14710", "411460.145847612", "457412.251849367", "657314.locus_tag:CK5_20860", "471875.197297907", "657321.locus_tag:RBR_07980"), count_Conser = c(7L, 1L, 2L, 1L, 3L, 0L, 1L, 0L, 4L, 0L, 3L, 4L, 1L, 3L, 0L, 5L, 2L, 2L, 1L, 0L, 0L, 2L, 3L, 0L, 2L, 1L, 1L, 4L, 0L, 0L, 0L, 1L, 1L, 5L, 0L, 0L, 2L, 0L, 1L, 1L, 2L, 0L, 1L, 1L, 1L, 3L, 1L, 2L, 0L, 0L, 0L, 1L, 0L, 0L, 2L, 1L, 1L, 0L, 1L, 4L, 0L, 1L, 1L, 4L, 0L, 7L, 0L, 4L, 1L, 1L, 2L, 0L, 1L, 0L, 0L, 2L, 3L, 0L, 4L, 0L, 1L, 0L, 1L, 4L, 1L, 0L, 5L, 4L, 0L, 6L, 2L, 1L, 3L, 1L, 0L, 2L, 3L, 0L, 1L, 12L, 1L, 1L, 2L, 0L, 0L, 2L, 1L, 2L, 1L, 3L, 2L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 3L, 0L, 2L, 0L, 1L, 0L, 2L, 1L, 1L, 1L, 1L, 0L, 2L, 0L, 2L, 2L, 5L, 2L, 18L, 0L, 4L, 2L, 0L, 3L, 0L, 1L, 0L, 1L, 1L, 1L, 3L, 3L, 1L, 1L, 2L, 0L, 1L, 0L, 1L, 0L, 2L, 0L, 0L, 1L, 1L, 2L, 1L, 0L, 1L, 2L, 1L, 0L, 1L, 1L, 2L, 3L, 2L, 0L, 0L, 0L, 3L, 3L, 1L, 1L, 0L, 0L, 3L, 1L, 1L, 0L, 0L, 1L, 0L, 6L, 0L, 3L, 8L, 1L, 3L, 0L, 0L, 3L, 5L, 0L, 1L, 0L, 0L, 1L, 0L, 4L, 3L, 1L, 2L, 0L, 0L, 0L, 4L, 0L, 6L, 6L, 0L, 1L, 2L, 0L, 2L, 3L, 1L, 3L, 0L, 2L, 4L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 2L, 2L, 2L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 4L, 0L, 0L, 3L, 3L, 1L, 0L, 1L, 1L, 2L, 0L, 0L, 1L, 3L, 0L, 2L, 5L, 0L, 0L, 1L, 0L, 8L, 1L, 8L, 2L, 0L, 1L), count_NonCons = c(5L, 4L, 4L, 0L, 0L, 2L, 0L, 2L, 0L, 2L, 4L, 0L, 0L, 2L, 1L, 1L, 2L, 0L, 0L, 0L, 3L, 1L, 1L, 2L, 1L, 0L, 0L, 4L, 1L, 0L, 4L, 2L, 2L, 15L, 2L, 0L, 2L, 0L, 1L, 0L, 1L, 0L, 3L, 0L, 0L, 8L, 0L, 0L, 0L, 0L, 1L, 2L, 4L, 0L, 0L, 0L, 1L, 3L, 5L, 2L, 0L, 0L, 6L, 0L, 2L, 1L, 1L, 4L, 1L, 4L, 1L, 8L, 5L, 1L, 6L, 1L, 5L, 0L, 11L, 0L, 0L, 0L, 2L, 1L, 0L, 0L, 6L, 1L, 0L, 10L, 2L, 1L, 0L, 1L, 1L, 3L, 2L, 1L, 3L, 4L, 1L, 0L, 12L, 0L, 0L, 1L, 3L, 15L, 9L, 4L, 12L, 2L, 4L, 2L, 0L, 0L, 0L, 2L, 2L, 3L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 5L, 0L, 0L, 1L, 0L, 3L, 4L, 1L, 1L, 2L, 0L, 0L, 0L, 1L, 3L, 9L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 10L, 2L, 0L, 12L, 0L, 1L, 1L, 2L, 0L, 1L, 1L, 3L, 3L, 1L, 4L, 0L, 2L, 1L, 1L, 4L, 0L, 2L, 5L, 5L, 4L, 0L, 0L, 0L, 2L, 0L, 3L, 0L, 2L, 3L, 2L, 3L, 1L, 4L, 2L, 2L, 0L, 6L, 2L, 1L, 2L, 3L, 0L, 7L, 0L, 0L, 6L, 2L, 2L, 1L, 2L, 0L, 6L, 0L, 0L, 3L, 0L, 0L, 0L, 2L, 2L, 1L, 0L, 2L, 2L, 0L, 0L, 4L, 0L, 2L, 1L, 3L, 2L, 0L, 1L, 0L, 1L, 0L, 6L, 1L, 1L, 1L, 2L, 2L, 4L, 1L, 0L, 0L, 2L, 3L, 2L, 0L, 1L, 0L, 0L, 0L, 1L, 2L, 1L, 0L, 16L, 1L, 3L, 0L, 5L, 10L, 1L, 2L, 4L, 0L, 6L, 0L, 0L, 0L, 1L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 11L, 1L, 4L, 5L, 1L, 1L), count_ConsSubst = c(5, 3, 1, 1, 3, 1, 0, 1, 1, 0, 0, 2, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 3, 0, 1, 0, 0, 0, 6, 1, 1, 1, 0, 0, 0, 1, 2, 1, 0, 0, 4, 0, 0, 1, 0, 0, 4, 1, 0, 0, 0, 0, 1, 0, 3, 0, 1, 0, 2, 1, 3, 0, 3, 0, 3, 2, 0, 1, 1, 3, 4, 2, 0, 9, 0, 1, 1, 1, 0, 2, 0, 1, 1, 0, 1, 1, 3, 0, 2, 0, 1, 0, 2, 2, 1, 3, 0, 6, 0, 0, 0, 2, 7, 3, 1, 5, 1, 0, 2, 0, 0, 0, 0, 1, 0, 0, 0, 5, 0, 0, 1, 0, 0, 0, 1, 0, 0, 3, 1, 0, 1, 1, 2, 0, 2, 0, 5, 2, 0, 0, 0, 0, 2, 0, 2, 0, 0, 3, 0, 0, 2, 0, 2, 0, 2, 1, 1, 0, 2, 1, 1, 1, 0, 0, 1, 1, 4, 0, 1, 0, 1, 5, 0, 0, 0, 5, 2, 1, 0, 0, 1, 0, 0, 0, 4, 0, 2, 1, 1, 1, 2, 1, 1, 1, 4, 1, 2, 1, 1, 2, 0, 0, 0, 1, 0, 1, 0, 0, 2, 0, 0, 1, 1, 0, 3, 1, 1, 2, 2, 1, 1, 1, 1, 0, 2, 1, 1, 0, 0, 0, 1, 0, 0, 0, 3, 2, 0, 1, 1, 0, 0, 0, 0, 2, 1, 1, 0, 0, 0, 0, 0, 3, 1, 0, 0, 3, 4, 0, 5, 1, 0, 4, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 4, 1, 4, 0, 0, 0), count_NCSubst = c(1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 2, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 3, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 5, 0, 0, 3, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 2, 0, 0, 0, 1, 1, 1, 0, 2, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 1, 0, 1, 0, 0, 0, 1, 0, 2, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0)), .Names = c("geneid", "count_Conser", "count_NonCons", "count_ConsSubst", "count_NCSubst"), class = "data.
[R] strsplit help
Dear all, I want to use string split to parse column names, however, I am having some errors that I don't understand. I see a problem when I try to rbind the output from strsplit. please let me know if I'm missing something obvious, thanks, alison here are my commands: >strsplit<-strsplit(as.character(Rumino_Reps_agreeWalign$geneid),"\\.") > Rumino_Reps_agreeWalignTR<-transform(Rumino_Reps_agreeWalign,taxid=do.call(rbind, strsplit)) Warning message: In function (..., deparse.level = 1) : number of columns of result is not a multiple of vector length (arg 1) here is my data: > head(Rumino_Reps_agreeWalign) geneid count_Conser count_NonCons count_ConsSubst 1 657313.locus_tag:RTO_089407 5 5 2 457412.2518480181 4 3 3 657314.locus_tag:CK5_206302 4 1 4 657323.locus_tag:CK1_330601 0 1 5 657313.locus_tag:RTO_096903 0 3 6 471875.1972971060 2 1 count_NCSubst 1 1 2 0 3 0 4 0 5 1 6 1 here are the results from strsplit: > head(strsplit) [[1]] [1] "657313" "locus_tag:RTO_08940" [[2]] [1] "457412""251848018" [[3]] [1] "657314" "locus_tag:CK5_20630" [[4]] [1] "657323" "locus_tag:CK1_33060" [[5]] [1] "657313" "locus_tag:RTO_09690" [[6]] [1] "471875""197297106" __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] can't install plotrix
Hi all, I'm having problems installing plotrix. I tried installing it through install.packages, and from the unix command line, but each time it seems to stall when it is installing the help indices. has anyone had this same problem, is this package still maintained ? any help? thanks > install.packages("plotrix") > > I also tried using the source package > R CMD INSTALL plotrix_3.2-3.tar.gz > > both of them seemed to stall at installing help indices. > I cancelled after about 45 minutes of waiting. > > I tried to load the library incase the functions were loaded. But >> library(plotrix) > Error in library(plotrix) : there is no package called 'plotrix' > > Any knowledge about the status of this package or such errors would be > great. > > Alison __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Summarizing For Values with Multiple categories
Yes, I guess I should update. > R.version.string [1] "R version 2.9.0 (2009-04-17)" On 24-Oct-10, at 1:12 AM, Gabor Grothendieck wrote: R.version.string __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Summarizing For Values with Multiple categories
Thanks! I tried reading the help for aggregate and can't figure out which form of the formula I am using, and therefore the syntax. I'm getting the below error. > aggregate(counts ~ ind, merge(stack(CAT2COG), df, by = 1), sum) Error in as.data.frame.default(x) : cannot coerce class "formula" into a data.frame > aggregate(counts ~ Cats, merge(stack(CAT2COG), df, by = 1), sum) Error in as.data.frame.default(x) : cannot coerce class "formula" into a data.frame > Cats [1] A B C D E Levels: A B C D E > aggregate(counts ~ COGs, merge(stack(CAT2COG), df, by = 1), sum) Error in as.data.frame.default(x) : cannot coerce class "formula" into a data.frame On 24-Oct-10, at 12:50 AM, Gabor Grothendieck wrote: aggregate(counts ~ ind, merge(stack(CAT2COG), df, by = 1), sum) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Summarizing For Values with Multiple categories
Hi all, I have some data as follows. Cat1 Cat2 Cat3 COG Counts ABC COG1 10 BD COG2 20 C COG3 30 D COG4 40 I would like to sum all the counts for each category: A B C D 10 30 40 60 >CAT2COG<- list(A="COG1",B=c("COG1","COG2"),C=c("COG1","COG3"),D=c("COG2","COG4")) > COG2CAT<- list(COG1=c("A","B","C"),COG2=c("B","D"),COG3=c("C"),COG4="D") > df<- data.frame(COGs=c("COG1","COG2","COG3","COG4"),counts=c(10,20,30,40)) I've been trying various version of apply and well as some crazy loops (Eg. below). Any help would be appreciated Thanks, Alison > CATS<-names(CAT2COG) > Catcounts<-rep(0,length(CATS)) > counter<-1 > for (i in CATS){ + Catcounts[counter]<-CatCounts+df$counts[df[1,]=CAT2COG[i],] Error: syntax error > counter<-counter+1 > } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help reading table rows into lists
Thanks Gabor and Jeffrey, and thanks for explaining the differences. I think I'll go with Jeffery's as I think I want entries for COGs with no pathway. Alison On 10-Oct-10, at 8:59 PM, Jeffrey Spies wrote: sapply(dat, function(x){ tmp<-unlist(strsplit(x, '\t', fixed=T)) out <- list(tmp[seq_along(tmp)[-1]]) names(out) <- tmp[1] out }, USE.NAMES=F) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help reading table rows into lists
Hi all, I have a large table mapping thousands of COGs(groups of genes) to pathways. # Ex COG0001 patha pathb pathc COG0002 pathd pathe COG0003 pathe pathf pathg pathh ## I would like to combine this information into a big list such as below COG2PATHWAY<- list (COG0001 = c ("patha ","pathb ","pathc "),COG0002=c("pathd","pathe"),COG0003=c("pathf","pathg","pathh")) I am stuck and have tried various methods involving (probably mangled) versions of lappy and loops. Any suggestions on the most efficient way to do this would be great. Thanks, Alison Here is my latest attempt. # line_num<-length(scan(file="/g/bork8/waller/ test_COGtoPath.txt",what="character",sep="\n")) COG2Path<-vector("list",line_num) COG2Path<-lapply(1:(line_num-1),function(x) scan(file="/g/bork8/waller/ test_COGtopath.txt",skip=x,nlines=1,quiet=T,what='character',sep="\t")) # I am getting an error # >COG2Path<-lapply(1:(line_num-1),function(x) scan(file="/g/bork8/ waller/ test_COGtopath.txt",skip=x,nlines=1,quiet=T,what='character',sep="\t")) Error in file(file, "r") : cannot open the connection In addition: Warning message: In file(file, "r") : But if I do scan alone I don't get an error # then I suppose it looks like the easiest wasy to name the list variables is using unix to cut the first column out and then read that in. names(COG2Path)<-scan(file="/g/bork8/waller/ test_col_names.txt",sep="\t",what="character") __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] colour of label points on a boxplot
Hi all, I have 6 datasets(dataframes Assem_ContigsLen7 through all_ContigsLen12) containing 3 columns (contig_id, contig_length, read_count). Each dataset is composed of 3 types of contigs (assemblies of genomic fragments), 1- all Bacterial fragments, 2 - all Viral fragments, 3 - mixed fragments. I identified the type of contig through a merge with another table with just contig_id and contig_type as below: AssemViral_ContigsLen<-merge(Assem_ContigsLen,allViral_contigs,by.x="contig_id",by.y="X.Contid.ID",all.x=FALSE) Below is a boxplot for boxplot(Assem_ContigsLen7$length,Assem_ContigsLen8$length,Assem_ContigsLen9$length,Assem_ContigsLen10$length,Assem_ContigsLen11$length,Assem_ContigsLen12$length,main="100species_rep2",ylab="Contig_length") All of the longer contigs in the sixth data set are allViral. How can I colour or label these? I tried overlaying 2 boxplots of different colours (using add=TRUE), but the individual points of the whiskers aren't coloured (and I can't figure out how to do so) I experimented with using points, but there isn't a general function that I can apply to all 6 datasets to identify the allViral contigs. specific questions; 1 -how can I color the data points that represent the whiskers in a boxplot? 2 - Can I identify and colour subsets of datapoints within a boxplot? 3- any other suggestions? Thank you, Alison __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using sprintf to pass a variable to a RMySQL query
Hi all, I re-installed R and tcltk. I find some of the documentation misleading as it indicates that tcltk is included with R. And when you type library() it shows tcltk, even though it hasn't been installed. Anyways, I've decided to go with sprintf. I am having errors with my query criteria. I have slightly changed by criteria as I want to match 'MGi.' (so that I match MG1. and MG10. if I did %MGi% won't I match MG1. and MG10. I tried to escape the period with a backslash,quotes and double period. I think that R is fine with the syntax, but SQL doesn't like it. Can anyone please help me with the syntax. thank you, ## Error## Error in mysqlExecStatement(conn, statement, ...) : RS-DBI driver: (could not run statement: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '1' at line 2 ) Calls: dbGetQuery ... .valueClassTest -> is -> is -> mysqlExecStatement -> .Call Execution halted Script# library(RMySQL) mysql<-dbDriver("MySQL") con<-dbConnect(mysql,username="u",host="g",password="s",port=,dbname="M") i<-1 k<-0 while (k<=17) { while (i<=72) { sqlcmd_ScaffLen<-sprintf('SELECT scaffold.length FROM scaffold,scaffold2contig,contig2read WHERE scaffold.scaffold_id=scaffold2contig.scaffold_id AND scaffold2contig.contig_id=contig2read.contig_id AND contig2read.read_id LIKE \'%%MG%d..%%\'' ,i) sqlcmd_contigs<-sprintf('SELECT length FROM contig WHERE external_id LIKE\'%%MG%d..%%\'',i) sqlcmd_singletons<-paste('SELECT COUNT(*) FROM contig WHERE read_count=1 AND external_id LIKE \'%%MG%d..%%\'',i) MG_ScaffoldLen<-dbGetQuery(con,sqlcmd_ScaffLen) MG_ContigsLen<-dbGetQuery(con,sqlcmd_contigs) MG_SingletonsCount<-dbGetQuery(con,sqlcmd_singletons) MG_ScaffoldLen_Summ<-as.data.frame(c(summary(MG_ScaffoldLen$length),MG_SingletonsCount)) MG_ContigsLen_Summ<-summary(MG_ContigsLen$length) write.table(MG_ScaffoldLen_Summ,file="ScaffoldLen_SummStats.txt",append=TRUE,sep='\t') write.table(MG_ContigsLen_Summ,file="ContigsLen_SummStats.txt",append=TRUE,sep='\t') # Keep names for 4 of them so we can do summary plots for each treatment # (ie combine all 4 reps) MG_ScaffoldLen<-assign(paste('MG_ScaffoldLen',i,sep=''),MG_ScaffoldLen) MG_ContigsLen<-assign(paste('MG_ContigsLen',i,sep=''),MGContigsLen) i<-i+18 } ### Summary Plots For each Treatment ## jpeg(file=sprintf("Boxplots%dSanger_Virus.jpeg",k)) sprintf("boxplot(MG_ScaffoldLen(1+%d)$length,MG_ScaffoldLen(18+%d)$length,MG_ScaffoldLen(36+%d)$length,MG_ScaffoldLen(54+%d)$length)",k) dev.off() jpeg(file=sprintf("Scaffold_histograms%dSanger_Virus.jpeg",k)) par(mfrow=c(1,3)) sprintf("hist(MG_ScaffoldLen(1+%d)$length)",k) sprintf("hist(MG_ScaffoldLen(18+%d)$length)",k) sprintf("hist(MG_ScaffoldLen(36+%d)$length)",k) sprintf("hist(MG_ScaffoldLen(54+%d)$length)",k) dev.off() jpeg(file=sprintf("Contig_histograms%dSanger_Virus.jpeg",k)) par(mfrow=c(1,3)) sprintf("hist(MG_ContigsLen(1+%d)$length)",k) sprintf("hist(MG_ContigsLen(18+%d)$length)",k) sprintf("hist(MG_ContigsLen(36+%d)$length)",k) sprintf("hist(MG_ContigsLen(54+%d)$length)",k) dev.off() k<-k+1 i<-1+k } On 03/11/10 16:01, Uwe Ligges wrote: > On 10.03.2010 12:45, alison waller wrote: >> Thanks Gabor, >> >> As I said I would like to use gsubfn, but I am having problems >> installing it, which I assume are due to some conflict with the current >> tcltk package >> >> Below is the error I got after issuing install.packages("gsubfn") >> >> Any advice? > > > Re-install R including the tcltk package? > > Uwe Ligges > > >> ### >> * Installing *source* package 'gsubfn' ... >> ** R >> ** demo >> ** inst >> ** preparing package for lazy loading >> Warning: S3 methods '$.tclvar', '$<-.tclvar', 'as.character.tclObj', >> 'as.character.tclVar', 'as.double.tclObj', 'as.integer.tclObj', >> 'as.logical.tclObj', 'print.tclObj', '[[.tclArray', '[[<-.tclArray', >> '$.tclArray', '$<-.tclArray', 'names.tclArray', 'names<-.tclArray', >> 'length.tclArray', 'length<-.tcl
Re: [R] using sprintf to pass a variable to a RMySQL query
Thanks Gabor, As I said I would like to use gsubfn, but I am having problems installing it, which I assume are due to some conflict with the current tcltk package Below is the error I got after issuing install.packages("gsubfn") Any advice? ### * Installing *source* package 'gsubfn' ... ** R ** demo ** inst ** preparing package for lazy loading Warning: S3 methods '$.tclvar', '$<-.tclvar', 'as.character.tclObj', 'as.character.tclVar', 'as.double.tclObj', 'as.integer.tclObj', 'as.logical.tclObj', 'print.tclObj', '[[.tclArray', '[[<-.tclArray', '$.tclArray', '$<-.tclArray', 'names.tclArray', 'names<-.tclArray', 'length.tclArray', 'length<-.tclArray', 'tclObj.tclVar', 'tclObj<-.tclVar', 'tclvalue.default', 'tclvalue.tclObj', 'tclvalue.tclVar', 'tclvalue<-.default', 'tclvalue<-.tclVar' were declared in NAMESPACE but not found Error in namespaceExport(ns, exports) : undefined exports: addTclPath, as.tclObj, is.tclObj, is.tkwin Error : package 'tcltk' could not be loaded ERROR: lazy loading failed for package 'gsubfn' * Removing '/g/bork3/x86_64/lib64/R/library/gsubfn' The downloaded packages are in '/tmp/RtmpkfvT5f/downloaded_packages' Updating HTML index of packages in '.Library' Warning message: In install.packages("gsubfn", lib = "/g/bork3/x86_64/lib64/R/library") : installation of package 'gsubfn' had non-zero exit status ## this is the error when I tried to install tcltk# install.packages("tcltk") Warning message: In getDependencies(pkgs, dependencies, available, lib) : package 'tcltk' is not available On 03/09/10 16:26, Gabor Grothendieck wrote: > On Tue, Mar 9, 2010 at 7:10 AM, alison waller wrote: > >> Hi all, >> >> Thanks for help with the paste and sprintf syntax. >> >> So I've decided to use paste and or sprintf. 'gsubfn' looks like a >> great package but unfortunately I've had problems installing it, as I >> don't think it likes the version of tcltk that is installed. I'm >> working on a few unix clusters with many computers and there seems to be >> problems with different versions of R and different versions of the >> packages on different computers. >> > The fn$ functionality that I mentioned does not use the tcltk package > so the version of tcltk should not matter. > > The only part of the package that uses tcltk is strapply, which is not > used here, and even in that case there is R code to it as well if you > use strapply(..., engine = "R") or use ostrapply. > > Also the older 0.3-9 version of the gsubfn package did not use tcltk at all. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using sprintf to pass a variable to a RMySQL query
Hi all, Thanks for help with the paste and sprintf syntax. So I've decided to use paste and or sprintf. 'gsubfn' looks like a great package but unfortunately I've had problems installing it, as I don't think it likes the version of tcltk that is installed. I'm working on a few unix clusters with many computers and there seems to be problems with different versions of R and different versions of the packages on different computers. So, the other problem is that I want to rename the data.frames and names of the output jpeg files resulting from the queries. I've tried a few different approaches but none seem to work, using sprintf and paste turns the data frame into just a string of the name. I have a complicated loop here as I'd like to do some summary output after every 4 queries (ie. after MG1, MG 19, MG 37, MG 54) then I want to start again and do for MG2, MG20 etc.. Here's my code below, there are probably error in the loop structure that I can work out, but I need help with renaming the data frames based on the parameters i and j thanks i<-1 j<-1 for (i<=72 and j<=4){{ sqlcmd_ScaffLen<- paste("SELECT scaffold.length FROM scaffold, scaffold2contig, contig2read WHERE scaffold.scaffold_id=scaffold2contig.scaffold_id AND scaffold2contig.contig_id=contig2read.contig_id AND contig2read.read_id LIKE '%MG", i ,"%'", sep='') sqlcmd_contigs<-paste("SELECT length FROM contig WHERE external_id LIKE '%MG",i,"%'",sep='' ) sqlcmd_singletons<-paste("SELECT COUNT(*) FROM contig WHERE read_count=1 AND external_id LIKE '%MG",i,"%'",sep='') MGi_ScaffoldLen<-dbGetQuery(con,sqlcmd_ScaffLen) MGi_ContigsLen<-dbGetQuery(con,sqlcmd_contigs) MGi_SingletonsCount<-dbGetQuery(con,sqlcmd_singletons) MGi_ScaffoldLen_Summ<-as.data.frame(c(summary(MGi_ScaffoldLen$length),MGi_SingletonsCount)) MGi_ContigsLen_Summ<-summary(MGi_ContigsLen$length) write.table(MGi_ScaffoldLen_Summ,file="ScaffoldLen_SummStats.txt",append=TRUE,sep='\t') write.table(MGi_ContigsLen_Summ,file="ContigsLen_SummStats.txt",append=TRUE,sep='\t') i<-i+18 j<-j+1 } ### Summary Plots For each Treatment ## jpeg(file=sprintf("Boxplots_%d.jpeg",i) boxplot(MGi_ScaffoldLen$length,MG(i+18*j)_ScaffoldLen$length,MG(i+_ScaffoldLen$length,MG59_ScaffoldLen$length,Main="400spec_10virus") dev.off() jpeg(file=sprintf("Scaffold_histograms_%d.jpeg",i) hist(MGi_ScaffoldLen$length) hist(MG(i+j*18)_ScaffoldLen$length) hist(MG(i+j*18_ScaffoldLen$length) hist(MG(i+j*18_ScaffoldLen$length) dev.off() jpeg(file=sprintf("Contig_histograms_%d.jpeg",i) hist(MGi_ContigsLen$length) hist(MG(i+j*18)_ContigsLen$length) hist(MG(i+j*18_ContigsLen$length) hist(MG(i+j*18_ContigsLen$length) dev.off() j<-1 i<-2 } On 03/08/10 21:02, Don MacQueen wrote: > I always use paste() > > i <- 1 > sqlcmd_ScaffLen <- paste("SELECT scaffold.length > FROM scaffold, scaffold2contig, contig2read > WHERE scaffold.scaffold_id=scaffold2contig.scaffold_id AND > scaffold2contig.contig_id=contig2read.contig_id AND > contig2read.read_id LIKE '%MG", i ,"%'", sep='') > > That should create bits like >LIKE '%MG1%' >LIKE '%MG2%' > and so on. > > You just have to get the nesting of the single and double quotes > correct - the SQL requires single quotes, so use double quotes for the > fixed character strings insidte paste(). That, and use sep='' to get > rid of unwanted space characters. > > Using paste is also effective for constructs like > IN (3,4,5) > or > IN ('a','b','c') > though it can be necessary to nest one paste within another > > -Don > > At 2:06 PM +0100 3/8/10, alison waller wrote: >> Hello, >> >> I am using RmySQL and would like to iterate through a few queries. >> >> I would like to use sprintf but I think I'm having problems mixing and >> matching the sprintf syntax and the SQL regex. >> >> I have checked my sqlcmd and it works when I wan to match %MG1% but how >> do I iterate for i 1-72? Escape characters,? >> >> thanks in advance >> >> i<-1 >> sqlcmd_ScaffLen<-sprintf('SELECT scaffold.length >> FROM scaffold,scaffold2contig,contig2read >> WHERE scaffold.scaffold_id=scaffold2contig.scaffold_id AND >> scaffold2contig.contig_id=contig2read.contig_id AND >> contig2read.read_id LIKE >> '%MG%s%' ,i) >> >> = Here is my vague error message >> >> Error: unexpected input in: >> >> __ >> R-help@r-project.org mailing list >> https://*stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://*www.*R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] using sprintf to pass a variable to a RMySQL query
Hello, I am using RmySQL and would like to iterate through a few queries. I would like to use sprintf but I think I'm having problems mixing and matching the sprintf syntax and the SQL regex. I have checked my sqlcmd and it works when I wan to match %MG1% but how do I iterate for i 1-72? Escape characters,? thanks in advance i<-1 sqlcmd_ScaffLen<-sprintf('SELECT scaffold.length FROM scaffold,scaffold2contig,contig2read WHERE scaffold.scaffold_id=scaffold2contig.scaffold_id AND scaffold2contig.contig_id=contig2read.contig_id AND contig2read.read_id LIKE '%MG%s%' ,i) = Here is my vague error message Error: unexpected input in: __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] finding most highly transcribed genes - ranking, sorting and subsets?
Thanks - great, should have thought of option b) -Original Message- From: Martin Morgan [mailto:[EMAIL PROTECTED] Sent: Friday, December 07, 2007 12:52 PM To: alison waller Cc: [EMAIL PROTECTED] Subject: Re: [R] finding most highly transcribed genes - ranking, sorting and subsets? Hi Alison -- It's a funny twist of terminology, isn't it? high rank (we're #1!) corresponds to low value. Maybe a wimpy stats joke? Anyway, (a) if m is assigned rownames (e.g., from the appropriate column of the 'genes' data frame in the limma object, rownames(m) <- maList$genes$GeneName) they'll be caried through the analysis and (b) if you've extracted m from a limma MAList, then subsetting the MAList with hrow (maList[hrow,]) will give you a new MAList with all the info carrying through. This would be the better way to go. Martin "alison waller" <[EMAIL PROTECTED]> writes: > Thanks so much Martin, > > This method is definitely more straightforward. And you are right I don't > think I was doing anything wrong before. However, I thought that rank, would > rank the highest 1st, however after looking at the results using your > methods, I realized it ranks the lowest number 1. So I modified it for > rank>18500. And now I'm getting 300 rows for which the intensity is > consistenly high. > > However, I am still laking some information. For the results I can get a > matrix of 300 rows and the corresponding intensities (from m) or rank (from > h), but what I really want is the name of the original row, which > corresponds to a specific spot on the array). > > I did msubset<-m[hrows,] and as mentioned I just get the rows numbered > 1-300, while I want to essentially pickout the 300 rows from the original > 19,000 rows maintaing the original row designation as it corresponds to a > specific gene. > > Thanks again for any suggestions, > > Alison > > -Original Message- > From: Martin Morgan [mailto:[EMAIL PROTECTED] > Sent: Thursday, December 06, 2007 4:06 PM > To: alison waller > Subject: Re: [R] finding most highly transcribed genes - ranking, sorting > and subsets? > > Hi Alison -- > > I'm not sure where your problem is coming from, but R can help you to > more efficiently do your task. Skipping the bioc terminology and data > structures, you have a matrix > >> m <- matrix(runif(10), ncol=10) > > you'd like to determine the rank of values in each column > >> r <- apply(m, 2, rank) > > identfiy those with high rank > >> h <- r < 500 > > and find the rows for which the rank is always high > >> hrows <- apply(h, 1, all) > > you can then use hrows to subset your original matrix (m[hrows,]) or > otherwise, e.g., how many rows with high rank > >> sum(hrows) > [1] 0 > > or perhaps the distribution of the number of columns in which high > ranking genes occur. > >> table(apply(h, 1, sum)) > >01234 > 5996 3132 765 1007 > > Martin > > "alison waller" <[EMAIL PROTECTED]> writes: > >> Hello, >> >> >> >> I am not only interested in finding out which genes are the most highly > up- >> or down-regulated (which I have done using the linear models and Bayesian >> statistics in Limma), but I also want to know which genes are consistently >> highly transcribed (ie. they have a high intensity in the channel of >> interest eg. Cy5 or Cy3 across the set of experiments). I might have > missed >> a straight forward way to do this, or a valuable function, but I've been >> using my own methods and going around in circles. >> >> >> >> So far I've normalized within and between arrays, then returned the RG >> values using RG<-RG.MA, then I ranked each R and G values for each array > as >> below. >> >> rankRG<-RG >> >> rankRG$R[,1]<-rank(rankRG$R[,1]) >> >> rankRG$R[,2]<-rank(rankRG$R[,2]) .. and so on for 6 columns(ie. arrays, as >> well as the G's) >> >> >> >> then I thought I could pull out a subset of rankRG using something like; >> >> topRG<-rankRG >> >> topRG$R<-subset(topRG$R,topRG$R[,1]<500&topRG$R[,2]<500&topRG$R[,5]<500) >> >> >> >> However, this just returned me a matrix with one row of $R (the ranks were >> <500 for columns 1,2, and 5 and greater than 500 for 3,4,and 6). However, > I >> can't believe that there is only one gene that is in the top 500 for R >> intensitiy among those three arrays. >> >> >> >> Am I doing something wrong?
Re: [R] finding most highly transcribed genes - ranking, sorting and subsets?
Thanks so much Martin, This method is definitely more straightforward. And you are right I don't think I was doing anything wrong before. However, I thought that rank, would rank the highest 1st, however after looking at the results using your methods, I realized it ranks the lowest number 1. So I modified it for rank>18500. And now I'm getting 300 rows for which the intensity is consistenly high. However, I am still laking some information. For the results I can get a matrix of 300 rows and the corresponding intensities (from m) or rank (from h), but what I really want is the name of the original row, which corresponds to a specific spot on the array). I did msubset<-m[hrows,] and as mentioned I just get the rows numbered 1-300, while I want to essentially pickout the 300 rows from the original 19,000 rows maintaing the original row designation as it corresponds to a specific gene. Thanks again for any suggestions, Alison -Original Message- From: Martin Morgan [mailto:[EMAIL PROTECTED] Sent: Thursday, December 06, 2007 4:06 PM To: alison waller Subject: Re: [R] finding most highly transcribed genes - ranking, sorting and subsets? Hi Alison -- I'm not sure where your problem is coming from, but R can help you to more efficiently do your task. Skipping the bioc terminology and data structures, you have a matrix > m <- matrix(runif(10), ncol=10) you'd like to determine the rank of values in each column > r <- apply(m, 2, rank) identfiy those with high rank > h <- r < 500 and find the rows for which the rank is always high > hrows <- apply(h, 1, all) you can then use hrows to subset your original matrix (m[hrows,]) or otherwise, e.g., how many rows with high rank > sum(hrows) [1] 0 or perhaps the distribution of the number of columns in which high ranking genes occur. > table(apply(h, 1, sum)) 01 2 34 5996 3132 765 1007 Martin "alison waller" <[EMAIL PROTECTED]> writes: > Hello, > > > > I am not only interested in finding out which genes are the most highly up- > or down-regulated (which I have done using the linear models and Bayesian > statistics in Limma), but I also want to know which genes are consistently > highly transcribed (ie. they have a high intensity in the channel of > interest eg. Cy5 or Cy3 across the set of experiments). I might have missed > a straight forward way to do this, or a valuable function, but I've been > using my own methods and going around in circles. > > > > So far I've normalized within and between arrays, then returned the RG > values using RG<-RG.MA, then I ranked each R and G values for each array as > below. > > rankRG<-RG > > rankRG$R[,1]<-rank(rankRG$R[,1]) > > rankRG$R[,2]<-rank(rankRG$R[,2]) .. and so on for 6 columns(ie. arrays, as > well as the G's) > > > > then I thought I could pull out a subset of rankRG using something like; > > topRG<-rankRG > > topRG$R<-subset(topRG$R,topRG$R[,1]<500&topRG$R[,2]<500&topRG$R[,5]<500) > > > > However, this just returned me a matrix with one row of $R (the ranks were > <500 for columns 1,2, and 5 and greater than 500 for 3,4,and 6). However, I > can't believe that there is only one gene that is in the top 500 for R > intensitiy among those three arrays. > > > > Am I doing something wrong? Can someone think of a better way of doing > this? > > > > Thanks > > > > Alison > > > > > > ** > Alison S. Waller M.A.Sc. > Doctoral Candidate > [EMAIL PROTECTED] > 416-978-4222 (lab) > Department of Chemical Engineering > Wallberg Building > 200 College st. > Toronto, ON > M5S 3E5 > > > > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Dr. Martin Morgan, PhD Computational Biology Shared Resource Director Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] finding most highly transcribed genes - ranking, sorting and subsets?
Hello, I am not only interested in finding out which genes are the most highly up- or down-regulated (which I have done using the linear models and Bayesian statistics in Limma), but I also want to know which genes are consistently highly transcribed (ie. they have a high intensity in the channel of interest eg. Cy5 or Cy3 across the set of experiments). I might have missed a straight forward way to do this, or a valuable function, but I've been using my own methods and going around in circles. So far I've normalized within and between arrays, then returned the RG values using RG<-RG.MA, then I ranked each R and G values for each array as below. rankRG<-RG rankRG$R[,1]<-rank(rankRG$R[,1]) rankRG$R[,2]<-rank(rankRG$R[,2]) .. and so on for 6 columns(ie. arrays, as well as the G's) then I thought I could pull out a subset of rankRG using something like; topRG<-rankRG topRG$R<-subset(topRG$R,topRG$R[,1]<500&topRG$R[,2]<500&topRG$R[,5]<500) However, this just returned me a matrix with one row of $R (the ranks were <500 for columns 1,2, and 5 and greater than 500 for 3,4,and 6). However, I can't believe that there is only one gene that is in the top 500 for R intensitiy among those three arrays. Am I doing something wrong? Can someone think of a better way of doing this? Thanks Alison ** Alison S. Waller M.A.Sc. Doctoral Candidate [EMAIL PROTECTED] 416-978-4222 (lab) Department of Chemical Engineering Wallberg Building 200 College st. Toronto, ON M5S 3E5 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.