On Tue, May 31, 2011 at 2:33 PM, Cook, Malcolm <m...@stowers.org> wrote:
> Hi Michael, > > > > Regarding your workarounds: > > > > I find that passing trackLine=FALSE does not work as you suggest when the > file contains a track line. I get this error > > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, > : > > scan() expected 'an integer', got 'name=B52_merged.peak.ES' > > > Thanks, fixed in devel. > However, I find that importing with asRangedData=FALSE does work. I can > convert to RangedData after the import. > > > > Regarding the issue of the presence of track lines being UCSC specific: > > > > I can give you one example where track lines are NOT specific to UCSC, > namely IGV, which reads .bed files. IGV also interprets the track lines (to > some degree). See http://www.broadinstitute.org/software/igv/TrackLine > > > > IGVmakes no requirements that the chromosome names follow the naming > convention wherein the chromosomes must begin with chr. > > > > Does this convince you? > > > Yea, I guess. IGB is the same way. I just wish there were more formal definitions of the file formats. > How about this argument: There are genomes hosted at ucsc that are not > assembled to the chromosome level, and whose identifiers dont begin with > chr, such as elephant. > http://genome.ucsc.edu/cgi-bin/hgGateway?hgsid=196975635&clade=mammal&org=Elephant&db=0 > > > This helps and would argue against any sort of coercion of the names. It seems reasonable to just leave it up to the user. > Regarding rtracklayer code: > > > > Im pretty sure Ive been using import.bed with track files having > tracklines and using asRangedData=TRUE without experiencing a renaming of > chromosomes. > > > > Can you please confirm my suspicion that renaming the chromosomes on import > is a new behavior of rtracklayer since the new bioconductor release? > > > Maybe. The coercion of the names has been there forever when coercing to UCSCData (for upload to the UCSC browser). Maybe it has somehow come to apply to the import of tracks, as well, but I can't track down when that happened, due to the complexity of the code. The logic involved in importing a track is far more complex than one might imagine. Not sure how it got that way. > If the behavior is indeed new, would you consider recoding to be backward > compatible by providing a new option, something like recodeToUCSCStandards > (or something), with a default of FALSE. > > > > I dont have allot of code that depends on backward compatibility and > indeed your second workaround will work for all my cases. But, I might not > be alone, and it might save you some grief. > > > > Also, I think you might argue yourself into agreeing with me that renaming > like this should really by default be OFF (if done at all). > > > I think I'm convinced and will go ahead and disable the renaming in all cases. It's hard to think of situations where the user cannot simply perform the renaming (it's hardly ever necessary anyway), and making it optional would just increase the complexity further. > At the very least, the renaming should be documented somewhere. Excuse me > if I missed it.... I did look for it. > > > > Thanks for everything! > > > > I have a related issue regarding how to alter the seqnames in a > BSgenome.... I swear I had a working approach for this that longer works > with the recent bioC update, giving me unable to find an inherited method > for function "seqinfo<-", for signature "BSgenome" . Ill open another > thread.... > > > > ~Malcolm > > > > *From:* Michael Lawrence [mailto:lawrence.mich...@gene.com] > *Sent:* Tuesday, May 31, 2011 3:54 PM > *To:* Cook, Malcolm > *Cc:* bioc-sig-sequencing@r-project.org; Michael Lawrence > *Subject:* Re: rtracklayer BUG!? import.bed pastes/prefixes 'chr' during > import of SOME records > > > > Hi, > > Your BED file contains a "track line" which is something that is defined by > the UCSC Genome Browser. I view (and others might disagree, in fact I often > disagree with myself about this) that the track lines are specific to UCSC > and not in the core BED format. Thus when rtracklayer sees a track line, it > by default attempts to make the chromosomes conform to the UCSC convention. > > This can be avoided in a number of ways. First, passing trackLine=FALSE > will cause the track line to be ignored. Second, passing asRangedData=FALSE > will cause the return value to be a GRanges, in which case it cannot hold > the track line data (at least not in any formal way -- it could go into the > metadata) and thus the track line is ignored. > > So those are workarounds. I could be convinced though that presence of a > track line does NOT imply conformance to UCSC conventions. > > I fixed the incomplete conversion in devel. Please let me know if it needs > to be back-ported. > > Michael > > On Tue, May 31, 2011 at 12:53 PM, Cook, Malcolm <m...@stowers.org> wrote: > > I find that rtracklayer's import.bed function is pasting the string 'chr' > to SOME of my chromosome names in my bed file ( f, attached) > > It is as if it is trying to rename (some) of them to (partially) agree with > ucsc's naming convention. > > I would expect import.bed to preserve column 1 as the name of the 'space', > but it does not. > > But... some of the spaces have had a 'chr' prefix added! > > Look at the session transcipt below! > > I _think_ this behavior is NEW since I recently upgraded. > > I hope not to have to revert to previous version of R / bioconductor to > address this issue. > > Any suggestions? > > Thanks! > > Malcolm Cook > Computational Biology - Stowers Institute for Medical Research > > > > > > library(rtracklayer) > Loading required package: RCurl > Loading required package: bitops > > unique(read.table('f.bed',sep="\t",skip=1)[,1]) > [1] YHet dmel_mitochondrion_genome > [3] 2L X > [5] 3L 4 > [7] 2R 3R > [9] Uextra 2RHet > [11] 2LHet 3LHet > [13] 3RHet U > [15] XHet > 15 Levels: 2L 2LHet 2R 2RHet 3L 3LHet 3R 3RHet 4 U Uextra X XHet ... > dmel_mitochondrion_genome > > unique(space(import.bed ('f.bed'))) > [1] chr2L 2LHet > [3] chr2R 2RHet > [5] chr3L 3LHet > [7] chr3R 3RHet > [9] chr4 chrU > [11] Uextra chrX > [13] XHet YHet > [15] dmel_mitochondrion_genome > 15 Levels: chr2L 2LHet chr2R 2RHet chr3L 3LHet chr3R 3RHet chr4 chrU ... > dmel_mitochondrion_genome > > sessionInfo() > R version 2.13.0 (2011-04-13) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] rtracklayer_1.12.2 RCurl_1.5-0 bitops_1.0-4.1 > > loaded via a namespace (and not attached): > [1] BSgenome_1.20.0 Biostrings_2.20.1 GenomicRanges_1.4.6 > [4] IRanges_1.10.4 XML_3.4-0 > > > > > [[alternative HTML version deleted]]
_______________________________________________ Bioc-sig-sequencing mailing list Bioc-sig-sequencing@r-project.org https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing