Re: [Bioc-devel] should genome() be so complicated?/add genome report to GRanges show method

Hervé Pagès Mon, 08 Sep 2014 16:46:08 -0700

On 09/08/2014 02:28 PM, Peter Hickey wrote:

Just a vote for still allowing for multiple genomes in a Seqinfo object (in a 
GRanges object). My use case is in bisulfite-sequencing experiments where there 
is often a spike-in of a lambda phage genome along with the genome of interest 
(human or mouse). It's often useful to keep all data from a single library 
together in the same objet but process according to genome(x) for each seqlevel.


Note taken. Thanks Pete! It's always great to know about concrete use
cases.


FWIW, I like Vincent's proposal of selectSome(unique(genome(x))) in the show 
method.


Or what about displaying the genome next to the seqlevel it's
associated with? Like e.g.:

  > gr
  GRanges with 3 ranges and 0 metadata columns:
        seqnames               ranges strand
           <Rle>            <IRanges>  <Rle>
    [1]    chr14 [19069583, 19069654]      +
    [2]    chr14 [19363738, 19363809]      +
    [3]    chr14 [19363755, 19363826]      -
    [4]    chr14 [19369799, 19369870]      +
    ---
    seqinfo:
      seqlevels             seqlengths isCircular genome
      chr1                   249250621       <NA>   hg19
      chr10                  135534747       <NA>   hg19
      chr11                  135006516       <NA>   hg19
      ...                          ...        ...    ...
      chrUn_gl000249             38502       <NA>   hg19
      chrX                   155270560       <NA>   hg19
      chrY                    59373566       <NA>   hg19

That way, we also raise awareness about the isCircular field.
The current choice to only display the seqlengths pre-dates the
existence of the seqinfo slot but might be a little bit misleading
those days since it only exposes some arbitrary seqinfo fields.

H.


Cheers,
Pete

I might have requested the genome annotation, but I'm pretty sure it wasn't
me who decide on tracking it on a per-sequence basis. I could imagine use
cases for that though, e.g., when diagnosing sequencing contamination (like
human vs. mouse). But most other tools and file formats expect a single
genome per "track", so, for example, rtracklayer has an internal function
singleGenome() to take care of this.

On Mon, Sep 8, 2014 at 10:50 AM, Herv? Pag?s <hpa...@fhcrc.org> wrote:

Hi Vince,

Yes it would make sense to have the "show" method report the genome
when genome(x) contains a unique non-NA value. I think the main
use case for having the genome defined at the sequence level instead
of the whole object level is metagenomics. Maybe Michael has some other
good use cases to share since IIRC he requested the addition of the
genome field a couple of years ago and made the case for having it
defined at the sequence level.

Cheers,
H.


On 09/08/2014 07:21 AM, Vincent Carey wrote:

For GRanges x, my naive expectation is that genome(x) returns a length-

one tag identifying the genome to which chromosomal coordinates

correspond.  The genome() method seems to have sequence-specific

semantics, which makes sense, but when we identify sequence

with chromosome, it seems too complicated.  Is there a use case for

a GRanges with sequences from several different genomes?


One reason I am inquiring is that I feel it would be nice to have the
GRanges show() method report, prominently, the genome in use (or NA

if unspecified).  This could be accomplished by reporting
unique(genome(x)), and perhaps that would be satisfactory.

after example(genome) :

seqinfo(txdb)


Seqinfo of length 15

seqnames seqlengths isCircular genome

CH2L       23011544      FALSE    dm3

CH2R       21146708      FALSE    dm3

CH3L       24543557      FALSE    dm3

CH3R       27905053      FALSE    dm3

CH4         1351857      FALSE    dm3

...             ...        ...    ...

CH3LHet     2555491      FALSE    dm3

CH3RHet     2517507      FALSE    dm3

CHXHet       204112      FALSE    dm3

CHYHet       347038      FALSE    dm3

CHUextra   29004656      FALSE    dm3

genome(seqinfo(txdb))


     CH2L     CH2R     CH3L     CH3R      CH4      CHX      CHU        M

    "dm3"    "dm3"    "dm3"    "dm3"    "dm3"    "dm3"    "dm3"    "dm3"

  CH2LHet  CH2RHet  CH3LHet  CH3RHet   CHXHet   CHYHet CHUextra

    "dm3"    "dm3"    "dm3"    "dm3"    "dm3"    "dm3"    "dm3"

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

--
Herv? Pag?s

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319


_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


--------------------------------
Peter Hickey,
PhD Student/Research Assistant,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
Ph: +613 9345 2324

hic...@wehi.edu.au
http://www.wehi.edu.au

______________________________________________________________________
The information in this email is confidential and inte...{{dropped:21}}


_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] should genome() be so complicated?/add genome report to GRanges show method

Reply via email to