Hi Sonia, Could you tell me the details about how to generate the annotation file Annots/UCSC_knownGene.hg19.bed?
Yours sincerely, Jianhong Ou jianhong...@umassmed.edu On Aug 31, 2011, at 4:17 PM, Zhu, Lihua (Julie) wrote: > > ------ Forwarded Message > From: Sonia Leach <sonia.le...@gmail.com> > Date: Wed, 31 Aug 2011 15:52:43 -0400 > To: "bioc-sig-sequencing@r-project.org" <bioc-sig-sequencing@r-project.org> > Subject: [Bioc-sig-seq] ChIPpeakAnno annotatePeakInBatch problems in > R2.13/ChIPpeakAnno_1.8.0 and R2.13/ChIPpeakAnno_2.0.2 > > I had a problem with the original ChIPpeakAnno distribution > ChIPpeakAnno_1.8.0 for R2.13 where depending on the number of spaces > in the RangedData Annotation object sent to annotatePeakInBatch, I > would get the error: > Error in FUN(1L[[1L]], ...) : object 'r' not found > (see Problem 1 below) which went away when I downloaded the > development version R2.13/ChIPpeakAnno_2.0.2 > > However, then I had the problem that calling annotatePeakInBatch(..., > output="overlapping", multiple=FALSE) returned the same number of > answers as annotatePeakInBatch(..., output="overlapping", > multiple=TRUE) (see Problem 2 below). Obviously, the work around is to > take one hit from among the multiples returned but this should be > fixed. > > The annotation file I used is just a bed6 dump from UCSC goldenpath. > > ============ problem 1: > library(ChIPpeakAnno) > > myPeak = RangedData(IRanges(start = c(17208381), end = c(17208381), names = > c("S > ite1")),space = c("chr1"),strand = c('+')) > > ## This object has 25 spaces for chr1..22,X,Y,M > UCSC = read.delim('Annots/UCSC_knownGene.hg19.bed',header=FALSE) > UCSC_rangeD = RangedData(IRanges(start= UCSC[,2], end= UCSC[,3], > names=UCSC[,4]) > , space=as.character(UCSC[,1]),strand=UCSC[,6]) > > ## This object has just 1 space but the same data as UCSC_rangedD[868,] > feature = RangedData(IRanges(start = c(17066767), end = c(17267729), names = > c(" > Site1")),space = c("chr1"),strand = c('+')) > > ## with UCSC_rangeD[868,], gives error in R2.13/ChIPpeakAnno_1.8.0 > ## Error in FUN(1L[[1L]], ...) : object 'r' not found > annotation = annotatePeakInBatch(myPeak, AnnotationData=UCSC_rangeD[868,], > outpu > t="overlapping", maxgap=0, multiple=FALSE) > > ## with 1-space feature, no error > annotation = annotatePeakInBatch(myPeak, AnnotationData=feature, > output="overlap > ping", maxgap=0, multiple=FALSE) > > <sorry, I no longer have the session info for this run - but it is the > basic R2.13 install plus biocLite(ChIPpeakAnno), and should have the > same versions as the session info shown for problem 2 below, minus the > new dev version for ChIPpeakAnno (i.e. everything the same as below, > except ChIPpeakAnno_2.0.2.tar.gz, gplots_2.8.0.tar.gz, > caTools_1.12.tar.gz, gdata_2.8.2.tar.gz, gtools_2.6.2.tar.gz) >> > > ======== Problem 2 > R version 2.13.0 (2011-04-13) > Copyright (C) 2011 The R Foundation for Statistical Computing > ISBN 3-900051-07-0 > Platform: x86_64-unknown-linux-gnu (64-bit) > >> library(ChIPpeakAnno) > Warning message: > replacing previous import 'space' when loading 'IRanges' >> UCSC = read.delim('Annots/UCSC_knownGene.hg19.bed',header=FALSE) >> UCSC_rangeD = RangedData(IRanges(start= UCSC[,2], end= UCSC[,3], > names=UCSC[,4]), space=as.character(UCSC[,1]),strand=UCSC[,6]) >> data = unique(read.table(file[i], sep="\t", header=FALSE)) >> ids = sub("ID=(\\d+);.+", "ID\\1", data[,9], perl=TRUE) >> data_rangeD = RangedData(IRanges(start=data$V4, end=data$V5, > names=paste(ids,data$V3, sep="_")), space=data$V1, strand="+") >> dim(data_rangeD) > [1] 19501 1 >> annotationU = annotatePeakInBatch(data_rangeD, AnnotationData=UCSC_rangeD, >> out > put="overlapping", maxgap=0, multiple=FALSE) >> dim(annotationU) > [1] 16777 9 >> annotationU = annotatePeakInBatch(data_rangeD, AnnotationData=UCSC_rangeD, >> out > put="overlapping", maxgap=0, multiple=TRUE) >> dim(annotationU) > [1] 16777 9 >> sessionInfo() > R version 2.13.0 (2011-04-13) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] grid stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] ChIPpeakAnno_2.0.2 gplots_2.8.0 > [3] caTools_1.12 bitops_1.0-4.1 > [5] gdata_2.8.2 gtools_2.6.2 > [7] limma_3.8.3 org.Hs.eg.db_2.5.0 > [9] GO.db_2.5.0 RSQLite_0.9-4 > [11] DBI_0.2-5 AnnotationDbi_1.14.1 > [13] BSgenome.Ecoli.NCBI.20080805_1.3.17 BSgenome_1.20.0 > [15] GenomicRanges_1.4.8 Biostrings_2.20.2 > [17] IRanges_1.10.6 multtest_2.8.0 > [19] Biobase_2.12.2 biomaRt_2.8.1 > > loaded via a namespace (and not attached): > [1] MASS_7.3-12 RCurl_1.6-9 splines_2.13.0 survival_2.36-5 > [5] XML_3.4-2 >> > > _______________________________________________ > Bioc-sig-sequencing mailing list > Bioc-sig-sequencing@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > > ------ End of Forwarded Message > _______________________________________________ Bioc-sig-sequencing mailing list Bioc-sig-sequencing@r-project.org https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing