On Tue, May 31, 2011 at 2:33 PM, Cook, Malcolm <m...@stowers.org> wrote:

> Hi Michael,
>
>
>
> Regarding your workarounds:
>
>
>
> I find that passing trackLine=FALSE does not work as you suggest when the
> file contains a track line.  I get this error
>
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
> :
>
>   scan() expected 'an integer', got 'name=B52_merged.peak.ES'
>
>
>

Thanks, fixed in devel.


> However, I find that importing with asRangedData=FALSE does work.  I can
> convert to RangedData after the import.
>
>
>
> Regarding the issue of the presence of track lines being UCSC specific:
>
>
>
> I can give you one example where track lines are NOT specific to UCSC,
> namely IGV, which reads .bed files.  IGV also interprets the track lines (to
> some degree).  See http://www.broadinstitute.org/software/igv/TrackLine
>
>
>
> IGVmakes no requirements that the chromosome names follow the naming
> convention wherein the chromosomes must begin with ‘chr’.
>
>
>
> Does this convince you?
>
>
>

Yea, I guess. IGB is the same way. I just wish there were more formal
definitions of the file formats.


> How about this argument: There are genomes hosted at ucsc that are not
> assembled to the chromosome level, and whose identifiers don’t begin with
> ‘chr’, such as elephant.
> http://genome.ucsc.edu/cgi-bin/hgGateway?hgsid=196975635&clade=mammal&org=Elephant&db=0
>
>
>

This helps and would argue against any sort of coercion of the names. It
seems reasonable to just leave it up to the user.


> Regarding rtracklayer code:
>
>
>
> I’m pretty sure I’ve been using import.bed with track files having
> tracklines and using asRangedData=TRUE without experiencing a renaming of
> chromosomes.
>
>
>
> Can you please confirm my suspicion that renaming the chromosomes on import
> is a new behavior of rtracklayer since the new bioconductor release?
>
>
>

Maybe. The coercion of the names has been there forever when coercing to
UCSCData (for upload to the UCSC browser). Maybe it has somehow come to
apply to the import of tracks, as well, but I can't track down when that
happened, due to the complexity of the code. The logic involved in importing
a track is far more complex than one might imagine. Not sure how it got that
way.


> If the behavior is indeed new, would you consider recoding to be backward
> compatible by providing a new option, something like “recodeToUCSCStandards”
> (or something), with a default of FALSE.
>
>
>
> I don’t have allot of code that depends on backward compatibility and
> indeed your second workaround will work for all my cases.  But, I might not
> be alone, and it might save you some grief.
>
>
>
> Also, I think you might argue yourself into agreeing with me that renaming
> like this should really by default be OFF (if done at all).
>
>
>

I think I'm convinced and will go ahead and disable the renaming in all
cases. It's hard to think of situations where the user cannot simply perform
the renaming (it's hardly ever necessary anyway), and making it optional
would just increase the complexity further.


> At the very least, the renaming should be documented somewhere.  Excuse me
> if I missed it.... I did look for it.
>
>
>
> Thanks for everything!
>
>
>
> I have a related issue regarding how to alter the seqnames in a
> BSgenome.... I swear I had a working approach for this that longer works
> with the recent bioC update, giving me “  unable to find an inherited method
> for function "seqinfo<-", for signature "BSgenome"” .  I’ll open another
> thread....
>
>
>
> ~Malcolm
>
>
>
> *From:* Michael Lawrence [mailto:lawrence.mich...@gene.com]
> *Sent:* Tuesday, May 31, 2011 3:54 PM
> *To:* Cook, Malcolm
> *Cc:* bioc-sig-sequencing@r-project.org; Michael Lawrence
> *Subject:* Re: rtracklayer BUG!? import.bed pastes/prefixes 'chr' during
> import of SOME records
>
>
>
> Hi,
>
> Your BED file contains a "track line" which is something that is defined by
> the UCSC Genome Browser. I view (and others might disagree, in fact I often
> disagree with myself about this) that the track lines are specific to UCSC
> and not in the core BED format. Thus when rtracklayer sees a track line, it
> by default attempts to make the chromosomes conform to the UCSC convention.
>
> This can be avoided in a number of ways. First, passing trackLine=FALSE
> will cause the track line to be ignored. Second, passing asRangedData=FALSE
> will cause the return value to be a GRanges, in which case it cannot hold
> the track line data (at least not in any formal way -- it could go into the
> metadata) and thus the track line is ignored.
>
> So those are workarounds. I could be convinced though that presence of a
> track line does NOT imply conformance to UCSC conventions.
>
> I fixed the incomplete conversion in devel. Please let me know if it needs
> to be back-ported.
>
> Michael
>
> On Tue, May 31, 2011 at 12:53 PM, Cook, Malcolm <m...@stowers.org> wrote:
>
> I find that rtracklayer's import.bed function is pasting the string 'chr'
> to SOME of my chromosome names in my bed file ( f, attached)
>
> It is as if it is trying to rename (some) of them to (partially) agree with
> ucsc's naming convention.
>
> I would expect import.bed  to preserve column 1 as the name of the 'space',
> but it does not.
>
> But... some of the spaces have had a 'chr' prefix added!
>
> Look at the session transcipt below!
>
> I _think_ this behavior is NEW since I recently upgraded.
>
> I hope not to have to revert to previous version of R / bioconductor to
> address this issue.
>
> Any suggestions?
>
> Thanks!
>
> Malcolm Cook
> Computational Biology - Stowers Institute for Medical Research
>
>
>
>
> > library(rtracklayer)
> Loading required package: RCurl
> Loading required package: bitops
> > unique(read.table('f.bed',sep="\t",skip=1)[,1])
>  [1] YHet                      dmel_mitochondrion_genome
>  [3] 2L                        X
>  [5] 3L                        4
>  [7] 2R                        3R
>  [9] Uextra                    2RHet
> [11] 2LHet                     3LHet
> [13] 3RHet                     U
> [15] XHet
> 15 Levels: 2L 2LHet 2R 2RHet 3L 3LHet 3R 3RHet 4 U Uextra X XHet ...
> dmel_mitochondrion_genome
> > unique(space(import.bed ('f.bed')))
>  [1] chr2L                     2LHet
>  [3] chr2R                     2RHet
>  [5] chr3L                     3LHet
>  [7] chr3R                     3RHet
>  [9] chr4                      chrU
> [11] Uextra                    chrX
> [13] XHet                      YHet
> [15] dmel_mitochondrion_genome
> 15 Levels: chr2L 2LHet chr2R 2RHet chr3L 3LHet chr3R 3RHet chr4 chrU ...
> dmel_mitochondrion_genome
> > sessionInfo()
> R version 2.13.0 (2011-04-13)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] rtracklayer_1.12.2 RCurl_1.5-0        bitops_1.0-4.1
>
> loaded via a namespace (and not attached):
> [1] BSgenome_1.20.0     Biostrings_2.20.1   GenomicRanges_1.4.6
> [4] IRanges_1.10.4      XML_3.4-0
> >
>
>
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to