Re: [Bioc-devel] should genome() be so complicated?/add genome report to GRanges show method

2014-09-09 Thread Hervé Pagès

OK so let's go for a 1 line summarization of the seqinfo.
Vince is that OK if we keep this at the bottom of the object?
That way it will always be visible, even when the object requires
more than 1 screen to display (e.g. when it has a lot of metadata
cols). Will look something like:

  > gr
  GRanges with 3 ranges and 0 metadata columns:
seqnames   ranges strand
 
[1]chr14 [19069583, 19069654]  +
[2]chr14 [19363738, 19363809]  +
[3]chr14 [19363755, 19363826]  -
[4]chr14 [19369799, 19369870]  +
  seqinfo: 60 seqlevels (2 circular) on 2 genomes (hg19, mm10); no 
seqlengths


Thanks,
H.

On 09/09/2014 06:38 AM, Michael Lawrence wrote:

Agreed, that looks a lot nicer.

On Tue, Sep 9, 2014 at 4:42 AM, Martin Morgan  wrote:


On 09/09/2014 04:02 AM, Michael Lawrence wrote:


I'm in favor of this display. The seqinfo output at the bottom has always
been annoying (over-emphasized).



the fact that the lengths are 'NA' can be a helpful prompt to do something
about it, e.g., add seqinfo when inputting the data. Also they are helpful
when one is told that seqlengths are incompatible during, e.g.,
findOverlaps. But I like the idea of less but more informative display of
seqinfo, along the lines suggested by Vince.

seqinfo: 60 seqlevels (2 circular) on 2 genomes (hg19, mm10); 60 'NA'
seqlengths

Martin



On Mon, Sep 8, 2014 at 10:08 PM, Vincent Carey <
st...@channing.harvard.edu>
wrote:




On Tue, Sep 9, 2014 at 12:30 AM, Hervé Pagès  wrote:

  On 09/08/2014 06:42 PM, Michael Lawrence wrote:


  Instead of printing out multiple lines of a table that is rarely of

interest, could we develop Peter's idea toward something like:

hg19:chr1  hg19:chr2 ...
[lengths ...]

Not sure what condensed notation would be useful for circularity.



I don't know either. I'm worried that this would make the seqinfo
stuff look like a named vector and that the user would expect
hg19:chr1, hg19:chr2, etc... to be valid names.

With the table-like layout, some screen real estate can always be
saved by printing less lines:


  What I had in mind was



 > gr

GRanges with 3 ranges and 0 metadata columns:

 genome: hg19


   seqnames   ranges strand

   
  [1]chr14 [19069583, 19069654]  +
  [2]chr14 [19363738, 19363809]  +
  [3]chr14 [19363755, 19363826]  -
  [4]chr14 [19369799, 19369870]  +




you could then probably dispense with the seqlengths.  i have
never found them too useful except as a key to the  genome.

if there are multiple genomes, we have something like

genomes: hg19, mm9

the point is to make it prominent, particularly at a time of transition.



  --- seqinfo: 60 seqlevels (2 circulars) on 2 genomes (hg19, mm10) ---

  seqlevelsseqlengths isCircular
genome
  chr1  249250621   
  hg19
  chr10 135534747   
  hg19
  ... ......
...
  chrX  155270560   
  hg19
  chrY   59373566   
  hg19

I agree that the exact content of the seqinfo table itself is rarely
of interest so printing only 3 or 4 lines is OK. IMO it's important
to make the user aware of the existence of this hidden table and to
display it like what it really is (i.e. a table). Also displaying the
column names is a well established tradition and serves the purpose
of providing a quick summary of the accessors that are available to
access those fields.

H.




On Mon, Sep 8, 2014 at 5:21 PM, Peter Hickey mailto:hic...@wehi.edu.au>> wrote:

  Perhaps it might be useful to have some way of highlighting if any
  of the chromosomes are circular or highlighting if there are
  multiple genomes present? Otherwise this information might be
hidden
  in the "…"

  Cheers,
  Pete


  On 09/09/2014, at 9:44 AM, Hervé Pagès mailto:hpa...@fhcrc.org>> wrote:

   > On 09/08/2014 02:28 PM, Peter Hickey wrote:
   >> Just a vote for still allowing for multiple genomes in a
Seqinfo
  object (in a GRanges object). My use case is in
bisulfite-sequencing
  experiments where there is often a spike-in of a lambda phage
genome
  along with the genome of interest (human or mouse). It's often
  useful to keep all data from a single library together in the same
  objet but process according to genome(x) for each seqlevel.
   >
   > Note taken. Thanks Pete! It's always great to know about
concrete
use
   > cases.
   >
   >>
   >> FWIW, I like Vincent's proposal of
selectSome(unique(genome(x)))
  in the show method.
   >
   > Or what about displaying the genome next to the seqlevel it's
   > associated with? Like e.g.:
   >
   >  > 

Re: [Bioc-devel] should genome() be so complicated?/add genome report to GRanges show method

2014-09-09 Thread Michael Lawrence
Agreed, that looks a lot nicer.

On Tue, Sep 9, 2014 at 4:42 AM, Martin Morgan  wrote:

> On 09/09/2014 04:02 AM, Michael Lawrence wrote:
>
>> I'm in favor of this display. The seqinfo output at the bottom has always
>> been annoying (over-emphasized).
>>
>
> the fact that the lengths are 'NA' can be a helpful prompt to do something
> about it, e.g., add seqinfo when inputting the data. Also they are helpful
> when one is told that seqlengths are incompatible during, e.g.,
> findOverlaps. But I like the idea of less but more informative display of
> seqinfo, along the lines suggested by Vince.
>
> seqinfo: 60 seqlevels (2 circular) on 2 genomes (hg19, mm10); 60 'NA'
> seqlengths
>
> Martin
>
>
>> On Mon, Sep 8, 2014 at 10:08 PM, Vincent Carey <
>> st...@channing.harvard.edu>
>> wrote:
>>
>>
>>>
>>> On Tue, Sep 9, 2014 at 12:30 AM, Hervé Pagès  wrote:
>>>
>>>  On 09/08/2014 06:42 PM, Michael Lawrence wrote:

  Instead of printing out multiple lines of a table that is rarely of
> interest, could we develop Peter's idea toward something like:
>
> hg19:chr1  hg19:chr2 ...
> [lengths ...]
>
> Not sure what condensed notation would be useful for circularity.
>
>
 I don't know either. I'm worried that this would make the seqinfo
 stuff look like a named vector and that the user would expect
 hg19:chr1, hg19:chr2, etc... to be valid names.

 With the table-like layout, some screen real estate can always be
 saved by printing less lines:


  What I had in mind was
>>>
>>>
>>> > gr
GRanges with 3 ranges and 0 metadata columns:

 genome: hg19
>>>
>>>   seqnames   ranges strand
   
  [1]chr14 [19069583, 19069654]  +
  [2]chr14 [19363738, 19363809]  +
  [3]chr14 [19363755, 19363826]  -
  [4]chr14 [19369799, 19369870]  +


>>>
>>> you could then probably dispense with the seqlengths.  i have
>>> never found them too useful except as a key to the  genome.
>>>
>>> if there are multiple genomes, we have something like
>>>
>>> genomes: hg19, mm9
>>>
>>> the point is to make it prominent, particularly at a time of transition.
>>>
>>>
>>>
>>>  --- seqinfo: 60 seqlevels (2 circulars) on 2 genomes (hg19, mm10) ---
  seqlevelsseqlengths isCircular
 genome
  chr1  249250621   
  hg19
  chr10 135534747   
  hg19
  ... ......
 ...
  chrX  155270560   
  hg19
  chrY   59373566   
  hg19

 I agree that the exact content of the seqinfo table itself is rarely
 of interest so printing only 3 or 4 lines is OK. IMO it's important
 to make the user aware of the existence of this hidden table and to
 display it like what it really is (i.e. a table). Also displaying the
 column names is a well established tradition and serves the purpose
 of providing a quick summary of the accessors that are available to
 access those fields.

 H.



> On Mon, Sep 8, 2014 at 5:21 PM, Peter Hickey  > wrote:
>
>  Perhaps it might be useful to have some way of highlighting if any
>  of the chromosomes are circular or highlighting if there are
>  multiple genomes present? Otherwise this information might be
> hidden
>  in the "…"
>
>  Cheers,
>  Pete
>
>
>  On 09/09/2014, at 9:44 AM, Hervé Pagès   > wrote:
>
>   > On 09/08/2014 02:28 PM, Peter Hickey wrote:
>   >> Just a vote for still allowing for multiple genomes in a
> Seqinfo
>  object (in a GRanges object). My use case is in
> bisulfite-sequencing
>  experiments where there is often a spike-in of a lambda phage
> genome
>  along with the genome of interest (human or mouse). It's often
>  useful to keep all data from a single library together in the same
>  objet but process according to genome(x) for each seqlevel.
>   >
>   > Note taken. Thanks Pete! It's always great to know about
> concrete
> use
>   > cases.
>   >
>   >>
>   >> FWIW, I like Vincent's proposal of
> selectSome(unique(genome(x)))
>  in the show method.
>   >
>   > Or what about displaying the genome next to the seqlevel it's
>   > associated with? Like e.g.:
>   >
>   >  > gr
>   >  GRanges with 3 ranges and 0 metadata columns:
>   >seqnames   ranges strand
>   > 
>

Re: [Bioc-devel] should genome() be so complicated?/add genome report to GRanges show method

2014-09-09 Thread Martin Morgan

On 09/09/2014 04:02 AM, Michael Lawrence wrote:

I'm in favor of this display. The seqinfo output at the bottom has always
been annoying (over-emphasized).


the fact that the lengths are 'NA' can be a helpful prompt to do something about 
it, e.g., add seqinfo when inputting the data. Also they are helpful when one is 
told that seqlengths are incompatible during, e.g., findOverlaps. But I like the 
idea of less but more informative display of seqinfo, along the lines suggested 
by Vince.


seqinfo: 60 seqlevels (2 circular) on 2 genomes (hg19, mm10); 60 'NA' seqlengths

Martin



On Mon, Sep 8, 2014 at 10:08 PM, Vincent Carey 
wrote:




On Tue, Sep 9, 2014 at 12:30 AM, Hervé Pagès  wrote:


On 09/08/2014 06:42 PM, Michael Lawrence wrote:


Instead of printing out multiple lines of a table that is rarely of
interest, could we develop Peter's idea toward something like:

hg19:chr1  hg19:chr2 ...
[lengths ...]

Not sure what condensed notation would be useful for circularity.



I don't know either. I'm worried that this would make the seqinfo
stuff look like a named vector and that the user would expect
hg19:chr1, hg19:chr2, etc... to be valid names.

With the table-like layout, some screen real estate can always be
saved by printing less lines:



What I had in mind was



   > gr
   GRanges with 3 ranges and 0 metadata columns:


   genome: hg19


 seqnames   ranges strand
  
 [1]chr14 [19069583, 19069654]  +
 [2]chr14 [19363738, 19363809]  +
 [3]chr14 [19363755, 19363826]  -
 [4]chr14 [19369799, 19369870]  +




you could then probably dispense with the seqlengths.  i have
never found them too useful except as a key to the  genome.

if there are multiple genomes, we have something like

genomes: hg19, mm9

the point is to make it prominent, particularly at a time of transition.




--- seqinfo: 60 seqlevels (2 circulars) on 2 genomes (hg19, mm10) ---
 seqlevelsseqlengths isCircular genome
 chr1  249250621  hg19
 chr10 135534747  hg19
 ... .........
 chrX  155270560  hg19
 chrY   59373566  hg19

I agree that the exact content of the seqinfo table itself is rarely
of interest so printing only 3 or 4 lines is OK. IMO it's important
to make the user aware of the existence of this hidden table and to
display it like what it really is (i.e. a table). Also displaying the
column names is a well established tradition and serves the purpose
of providing a quick summary of the accessors that are available to
access those fields.

H.




On Mon, Sep 8, 2014 at 5:21 PM, Peter Hickey mailto:hic...@wehi.edu.au>> wrote:

 Perhaps it might be useful to have some way of highlighting if any
 of the chromosomes are circular or highlighting if there are
 multiple genomes present? Otherwise this information might be hidden
 in the "…"

 Cheers,
 Pete


 On 09/09/2014, at 9:44 AM, Hervé Pagès mailto:hpa...@fhcrc.org>> wrote:

  > On 09/08/2014 02:28 PM, Peter Hickey wrote:
  >> Just a vote for still allowing for multiple genomes in a Seqinfo
 object (in a GRanges object). My use case is in bisulfite-sequencing
 experiments where there is often a spike-in of a lambda phage genome
 along with the genome of interest (human or mouse). It's often
 useful to keep all data from a single library together in the same
 objet but process according to genome(x) for each seqlevel.
  >
  > Note taken. Thanks Pete! It's always great to know about concrete
use
  > cases.
  >
  >>
  >> FWIW, I like Vincent's proposal of selectSome(unique(genome(x)))
 in the show method.
  >
  > Or what about displaying the genome next to the seqlevel it's
  > associated with? Like e.g.:
  >
  >  > gr
  >  GRanges with 3 ranges and 0 metadata columns:
  >seqnames   ranges strand
  > 
  >[1]chr14 [19069583, 19069654]  +
  >[2]chr14 [19363738, 19363809]  +
  >[3]chr14 [19363755, 19363826]  -
  >[4]chr14 [19369799, 19369870]  +
  >---
  >seqinfo:
  >  seqlevels seqlengths isCircular genome
  >  chr1   249250621  hg19
  >  chr10  135534747  hg19
  >  chr11  135006516  hg19
  >  ...  .........
  >  chrUn_gl000249 38502  hg19
  >  chrX   155270560  hg19
  >  chrY59373566  hg19
  >
  > That way

Re: [Bioc-devel] should genome() be so complicated?/add genome report to GRanges show method

2014-09-09 Thread Michael Lawrence
I'm in favor of this display. The seqinfo output at the bottom has always
been annoying (over-emphasized).

On Mon, Sep 8, 2014 at 10:08 PM, Vincent Carey 
wrote:

>
>
> On Tue, Sep 9, 2014 at 12:30 AM, Hervé Pagès  wrote:
>
>> On 09/08/2014 06:42 PM, Michael Lawrence wrote:
>>
>>> Instead of printing out multiple lines of a table that is rarely of
>>> interest, could we develop Peter's idea toward something like:
>>>
>>> hg19:chr1  hg19:chr2 ...
>>> [lengths ...]
>>>
>>> Not sure what condensed notation would be useful for circularity.
>>>
>>
>> I don't know either. I'm worried that this would make the seqinfo
>> stuff look like a named vector and that the user would expect
>> hg19:chr1, hg19:chr2, etc... to be valid names.
>>
>> With the table-like layout, some screen real estate can always be
>> saved by printing less lines:
>>
>>
> What I had in mind was
>
>
>>   > gr
>>   GRanges with 3 ranges and 0 metadata columns:
>>
>   genome: hg19
>
>> seqnames   ranges strand
>>  
>> [1]chr14 [19069583, 19069654]  +
>> [2]chr14 [19363738, 19363809]  +
>> [3]chr14 [19363755, 19363826]  -
>> [4]chr14 [19369799, 19369870]  +
>>
>
>
> you could then probably dispense with the seqlengths.  i have
> never found them too useful except as a key to the  genome.
>
> if there are multiple genomes, we have something like
>
> genomes: hg19, mm9
>
> the point is to make it prominent, particularly at a time of transition.
>
>
>
>> --- seqinfo: 60 seqlevels (2 circulars) on 2 genomes (hg19, mm10) ---
>> seqlevelsseqlengths isCircular genome
>> chr1  249250621  hg19
>> chr10 135534747  hg19
>> ... .........
>> chrX  155270560  hg19
>> chrY   59373566  hg19
>>
>> I agree that the exact content of the seqinfo table itself is rarely
>> of interest so printing only 3 or 4 lines is OK. IMO it's important
>> to make the user aware of the existence of this hidden table and to
>> display it like what it really is (i.e. a table). Also displaying the
>> column names is a well established tradition and serves the purpose
>> of providing a quick summary of the accessors that are available to
>> access those fields.
>>
>> H.
>>
>>
>>>
>>> On Mon, Sep 8, 2014 at 5:21 PM, Peter Hickey >> > wrote:
>>>
>>> Perhaps it might be useful to have some way of highlighting if any
>>> of the chromosomes are circular or highlighting if there are
>>> multiple genomes present? Otherwise this information might be hidden
>>> in the "…"
>>>
>>> Cheers,
>>> Pete
>>>
>>>
>>> On 09/09/2014, at 9:44 AM, Hervé Pagès >> > wrote:
>>>
>>>  > On 09/08/2014 02:28 PM, Peter Hickey wrote:
>>>  >> Just a vote for still allowing for multiple genomes in a Seqinfo
>>> object (in a GRanges object). My use case is in bisulfite-sequencing
>>> experiments where there is often a spike-in of a lambda phage genome
>>> along with the genome of interest (human or mouse). It's often
>>> useful to keep all data from a single library together in the same
>>> objet but process according to genome(x) for each seqlevel.
>>>  >
>>>  > Note taken. Thanks Pete! It's always great to know about concrete
>>> use
>>>  > cases.
>>>  >
>>>  >>
>>>  >> FWIW, I like Vincent's proposal of selectSome(unique(genome(x)))
>>> in the show method.
>>>  >
>>>  > Or what about displaying the genome next to the seqlevel it's
>>>  > associated with? Like e.g.:
>>>  >
>>>  >  > gr
>>>  >  GRanges with 3 ranges and 0 metadata columns:
>>>  >seqnames   ranges strand
>>>  > 
>>>  >[1]chr14 [19069583, 19069654]  +
>>>  >[2]chr14 [19363738, 19363809]  +
>>>  >[3]chr14 [19363755, 19363826]  -
>>>  >[4]chr14 [19369799, 19369870]  +
>>>  >---
>>>  >seqinfo:
>>>  >  seqlevels seqlengths isCircular genome
>>>  >  chr1   249250621  hg19
>>>  >  chr10  135534747  hg19
>>>  >  chr11  135006516  hg19
>>>  >  ...  .........
>>>  >  chrUn_gl000249 38502  hg19
>>>  >  chrX   155270560  hg19
>>>  >  chrY59373566  hg19
>>>  >
>>>  > That way, we also raise awareness about the isCircular field.
>>>  > The current choice to only display the seqlengths pre-dates the
>>>  > existence of the seqinfo sl

Re: [Bioc-devel] should genome() be so complicated?/add genome report to GRanges show method

2014-09-08 Thread Vincent Carey
On Tue, Sep 9, 2014 at 12:30 AM, Hervé Pagès  wrote:

> On 09/08/2014 06:42 PM, Michael Lawrence wrote:
>
>> Instead of printing out multiple lines of a table that is rarely of
>> interest, could we develop Peter's idea toward something like:
>>
>> hg19:chr1  hg19:chr2 ...
>> [lengths ...]
>>
>> Not sure what condensed notation would be useful for circularity.
>>
>
> I don't know either. I'm worried that this would make the seqinfo
> stuff look like a named vector and that the user would expect
> hg19:chr1, hg19:chr2, etc... to be valid names.
>
> With the table-like layout, some screen real estate can always be
> saved by printing less lines:
>
>
What I had in mind was


>   > gr
>   GRanges with 3 ranges and 0 metadata columns:
>
  genome: hg19

> seqnames   ranges strand
>  
> [1]chr14 [19069583, 19069654]  +
> [2]chr14 [19363738, 19363809]  +
> [3]chr14 [19363755, 19363826]  -
> [4]chr14 [19369799, 19369870]  +
>


you could then probably dispense with the seqlengths.  i have
never found them too useful except as a key to the  genome.

if there are multiple genomes, we have something like

genomes: hg19, mm9

the point is to make it prominent, particularly at a time of transition.



> --- seqinfo: 60 seqlevels (2 circulars) on 2 genomes (hg19, mm10) ---
> seqlevelsseqlengths isCircular genome
> chr1  249250621  hg19
> chr10 135534747  hg19
> ... .........
> chrX  155270560  hg19
> chrY   59373566  hg19
>
> I agree that the exact content of the seqinfo table itself is rarely
> of interest so printing only 3 or 4 lines is OK. IMO it's important
> to make the user aware of the existence of this hidden table and to
> display it like what it really is (i.e. a table). Also displaying the
> column names is a well established tradition and serves the purpose
> of providing a quick summary of the accessors that are available to
> access those fields.
>
> H.
>
>
>>
>> On Mon, Sep 8, 2014 at 5:21 PM, Peter Hickey > > wrote:
>>
>> Perhaps it might be useful to have some way of highlighting if any
>> of the chromosomes are circular or highlighting if there are
>> multiple genomes present? Otherwise this information might be hidden
>> in the "…"
>>
>> Cheers,
>> Pete
>>
>>
>> On 09/09/2014, at 9:44 AM, Hervé Pagès > > wrote:
>>
>>  > On 09/08/2014 02:28 PM, Peter Hickey wrote:
>>  >> Just a vote for still allowing for multiple genomes in a Seqinfo
>> object (in a GRanges object). My use case is in bisulfite-sequencing
>> experiments where there is often a spike-in of a lambda phage genome
>> along with the genome of interest (human or mouse). It's often
>> useful to keep all data from a single library together in the same
>> objet but process according to genome(x) for each seqlevel.
>>  >
>>  > Note taken. Thanks Pete! It's always great to know about concrete
>> use
>>  > cases.
>>  >
>>  >>
>>  >> FWIW, I like Vincent's proposal of selectSome(unique(genome(x)))
>> in the show method.
>>  >
>>  > Or what about displaying the genome next to the seqlevel it's
>>  > associated with? Like e.g.:
>>  >
>>  >  > gr
>>  >  GRanges with 3 ranges and 0 metadata columns:
>>  >seqnames   ranges strand
>>  > 
>>  >[1]chr14 [19069583, 19069654]  +
>>  >[2]chr14 [19363738, 19363809]  +
>>  >[3]chr14 [19363755, 19363826]  -
>>  >[4]chr14 [19369799, 19369870]  +
>>  >---
>>  >seqinfo:
>>  >  seqlevels seqlengths isCircular genome
>>  >  chr1   249250621  hg19
>>  >  chr10  135534747  hg19
>>  >  chr11  135006516  hg19
>>  >  ...  .........
>>  >  chrUn_gl000249 38502  hg19
>>  >  chrX   155270560  hg19
>>  >  chrY59373566  hg19
>>  >
>>  > That way, we also raise awareness about the isCircular field.
>>  > The current choice to only display the seqlengths pre-dates the
>>  > existence of the seqinfo slot but might be a little bit misleading
>>  > those days since it only exposes some arbitrary seqinfo fields.
>>  >
>>  > H.
>>  >
>>  >>
>>  >> Cheers,
>>  >> Pete
>>  >>
>>  >>
>>  >>> I might have requested the genome annotation, but I'm pretty
>> sure i

Re: [Bioc-devel] should genome() be so complicated?/add genome report to GRanges show method

2014-09-08 Thread Hervé Pagès

On 09/08/2014 06:42 PM, Michael Lawrence wrote:

Instead of printing out multiple lines of a table that is rarely of
interest, could we develop Peter's idea toward something like:

hg19:chr1  hg19:chr2 ...
[lengths ...]

Not sure what condensed notation would be useful for circularity.


I don't know either. I'm worried that this would make the seqinfo
stuff look like a named vector and that the user would expect
hg19:chr1, hg19:chr2, etc... to be valid names.

With the table-like layout, some screen real estate can always be
saved by printing less lines:

  > gr
  GRanges with 3 ranges and 0 metadata columns:
seqnames   ranges strand
 
[1]chr14 [19069583, 19069654]  +
[2]chr14 [19363738, 19363809]  +
[3]chr14 [19363755, 19363826]  -
[4]chr14 [19369799, 19369870]  +
--- seqinfo: 60 seqlevels (2 circulars) on 2 genomes (hg19, mm10) ---
seqlevelsseqlengths isCircular genome
chr1  249250621  hg19
chr10 135534747  hg19
... .........
chrX  155270560  hg19
chrY   59373566  hg19

I agree that the exact content of the seqinfo table itself is rarely
of interest so printing only 3 or 4 lines is OK. IMO it's important
to make the user aware of the existence of this hidden table and to
display it like what it really is (i.e. a table). Also displaying the
column names is a well established tradition and serves the purpose
of providing a quick summary of the accessors that are available to
access those fields.

H.




On Mon, Sep 8, 2014 at 5:21 PM, Peter Hickey mailto:hic...@wehi.edu.au>> wrote:

Perhaps it might be useful to have some way of highlighting if any
of the chromosomes are circular or highlighting if there are
multiple genomes present? Otherwise this information might be hidden
in the "…"

Cheers,
Pete


On 09/09/2014, at 9:44 AM, Hervé Pagès mailto:hpa...@fhcrc.org>> wrote:

 > On 09/08/2014 02:28 PM, Peter Hickey wrote:
 >> Just a vote for still allowing for multiple genomes in a Seqinfo
object (in a GRanges object). My use case is in bisulfite-sequencing
experiments where there is often a spike-in of a lambda phage genome
along with the genome of interest (human or mouse). It's often
useful to keep all data from a single library together in the same
objet but process according to genome(x) for each seqlevel.
 >
 > Note taken. Thanks Pete! It's always great to know about concrete use
 > cases.
 >
 >>
 >> FWIW, I like Vincent's proposal of selectSome(unique(genome(x)))
in the show method.
 >
 > Or what about displaying the genome next to the seqlevel it's
 > associated with? Like e.g.:
 >
 >  > gr
 >  GRanges with 3 ranges and 0 metadata columns:
 >seqnames   ranges strand
 > 
 >[1]chr14 [19069583, 19069654]  +
 >[2]chr14 [19363738, 19363809]  +
 >[3]chr14 [19363755, 19363826]  -
 >[4]chr14 [19369799, 19369870]  +
 >---
 >seqinfo:
 >  seqlevels seqlengths isCircular genome
 >  chr1   249250621  hg19
 >  chr10  135534747  hg19
 >  chr11  135006516  hg19
 >  ...  .........
 >  chrUn_gl000249 38502  hg19
 >  chrX   155270560  hg19
 >  chrY59373566  hg19
 >
 > That way, we also raise awareness about the isCircular field.
 > The current choice to only display the seqlengths pre-dates the
 > existence of the seqinfo slot but might be a little bit misleading
 > those days since it only exposes some arbitrary seqinfo fields.
 >
 > H.
 >
 >>
 >> Cheers,
 >> Pete
 >>
 >>
 >>> I might have requested the genome annotation, but I'm pretty
sure it wasn't
 >>> me who decide on tracking it on a per-sequence basis. I could
imagine use
 >>> cases for that though, e.g., when diagnosing sequencing
contamination (like
 >>> human vs. mouse). But most other tools and file formats expect
a single
 >>> genome per "track", so, for example, rtracklayer has an
internal function
 >>> singleGenome() to take care of this.
 >>>
 >>> On Mon, Sep 8, 2014 at 10:50 AM, Herv? Pag?s mailto:hpa...@fhcrc.org>> wrote:
 >>>
  Hi Vince,
 
  Yes it would make sense to have the "show" method report the
genome
  when genome(x) contains a unique n

Re: [Bioc-devel] should genome() be so complicated?/add genome report to GRanges show method

2014-09-08 Thread Michael Lawrence
Instead of printing out multiple lines of a table that is rarely of
interest, could we develop Peter's idea toward something like:

hg19:chr1  hg19:chr2 ...
[lengths ...]

Not sure what condensed notation would be useful for circularity.


On Mon, Sep 8, 2014 at 5:21 PM, Peter Hickey  wrote:

> Perhaps it might be useful to have some way of highlighting if any of the
> chromosomes are circular or highlighting if there are multiple genomes
> present? Otherwise this information might be hidden in the "…"
>
> Cheers,
> Pete
>
>
> On 09/09/2014, at 9:44 AM, Hervé Pagès  wrote:
>
> > On 09/08/2014 02:28 PM, Peter Hickey wrote:
> >> Just a vote for still allowing for multiple genomes in a Seqinfo object
> (in a GRanges object). My use case is in bisulfite-sequencing experiments
> where there is often a spike-in of a lambda phage genome along with the
> genome of interest (human or mouse). It's often useful to keep all data
> from a single library together in the same objet but process according to
> genome(x) for each seqlevel.
> >
> > Note taken. Thanks Pete! It's always great to know about concrete use
> > cases.
> >
> >>
> >> FWIW, I like Vincent's proposal of selectSome(unique(genome(x))) in the
> show method.
> >
> > Or what about displaying the genome next to the seqlevel it's
> > associated with? Like e.g.:
> >
> >  > gr
> >  GRanges with 3 ranges and 0 metadata columns:
> >seqnames   ranges strand
> > 
> >[1]chr14 [19069583, 19069654]  +
> >[2]chr14 [19363738, 19363809]  +
> >[3]chr14 [19363755, 19363826]  -
> >[4]chr14 [19369799, 19369870]  +
> >---
> >seqinfo:
> >  seqlevels seqlengths isCircular genome
> >  chr1   249250621  hg19
> >  chr10  135534747  hg19
> >  chr11  135006516  hg19
> >  ...  .........
> >  chrUn_gl000249 38502  hg19
> >  chrX   155270560  hg19
> >  chrY59373566  hg19
> >
> > That way, we also raise awareness about the isCircular field.
> > The current choice to only display the seqlengths pre-dates the
> > existence of the seqinfo slot but might be a little bit misleading
> > those days since it only exposes some arbitrary seqinfo fields.
> >
> > H.
> >
> >>
> >> Cheers,
> >> Pete
> >>
> >>
> >>> I might have requested the genome annotation, but I'm pretty sure it
> wasn't
> >>> me who decide on tracking it on a per-sequence basis. I could imagine
> use
> >>> cases for that though, e.g., when diagnosing sequencing contamination
> (like
> >>> human vs. mouse). But most other tools and file formats expect a single
> >>> genome per "track", so, for example, rtracklayer has an internal
> function
> >>> singleGenome() to take care of this.
> >>>
> >>> On Mon, Sep 8, 2014 at 10:50 AM, Herv? Pag?s  wrote:
> >>>
>  Hi Vince,
> 
>  Yes it would make sense to have the "show" method report the genome
>  when genome(x) contains a unique non-NA value. I think the main
>  use case for having the genome defined at the sequence level instead
>  of the whole object level is metagenomics. Maybe Michael has some
> other
>  good use cases to share since IIRC he requested the addition of the
>  genome field a couple of years ago and made the case for having it
>  defined at the sequence level.
> 
>  Cheers,
>  H.
> 
> 
>  On 09/08/2014 07:21 AM, Vincent Carey wrote:
> 
> > For GRanges x, my naive expectation is that genome(x) returns a
> length-
> >
> > one tag identifying the genome to which chromosomal coordinates
> >
> > correspond.  The genome() method seems to have sequence-specific
> >
> > semantics, which makes sense, but when we identify sequence
> >
> > with chromosome, it seems too complicated.  Is there a use case for
> >
> > a GRanges with sequences from several different genomes?
> >
> >
> > One reason I am inquiring is that I feel it would be nice to have the
> > GRanges show() method report, prominently, the genome in use (or NA
> >
> > if unspecified).  This could be accomplished by reporting
> > unique(genome(x)), and perhaps that would be satisfactory.
> >
> > after example(genome) :
> >
> > seqinfo(txdb)
> >>
> >
> > Seqinfo of length 15
> >
> > seqnames seqlengths isCircular genome
> >
> > CH2L   23011544  FALSEdm3
> >
> > CH2R   21146708  FALSEdm3
> >
> > CH3L   24543557  FALSEdm3
> >
> > CH3R   27905053  FALSEdm3
> >
> > CH4 1351857  FALSEdm3
> >
> > ... .........
> >
> > CH3LHet 2555491  FALSEdm3
> >
> > CH3RHet 2517507  FAL

Re: [Bioc-devel] should genome() be so complicated?/add genome report to GRanges show method

2014-09-08 Thread Peter Hickey
Perhaps it might be useful to have some way of highlighting if any of the 
chromosomes are circular or highlighting if there are multiple genomes present? 
Otherwise this information might be hidden in the "�"

Cheers,
Pete


On 09/09/2014, at 9:44 AM, Herv� Pag�s  wrote:

> On 09/08/2014 02:28 PM, Peter Hickey wrote:
>> Just a vote for still allowing for multiple genomes in a Seqinfo object (in 
>> a GRanges object). My use case is in bisulfite-sequencing experiments where 
>> there is often a spike-in of a lambda phage genome along with the genome of 
>> interest (human or mouse). It's often useful to keep all data from a single 
>> library together in the same objet but process according to genome(x) for 
>> each seqlevel.
> 
> Note taken. Thanks Pete! It's always great to know about concrete use
> cases.
> 
>> 
>> FWIW, I like Vincent's proposal of selectSome(unique(genome(x))) in the show 
>> method.
> 
> Or what about displaying the genome next to the seqlevel it's
> associated with? Like e.g.:
> 
>  > gr
>  GRanges with 3 ranges and 0 metadata columns:
>seqnames   ranges strand
> 
>[1]chr14 [19069583, 19069654]  +
>[2]chr14 [19363738, 19363809]  +
>[3]chr14 [19363755, 19363826]  -
>[4]chr14 [19369799, 19369870]  +
>---
>seqinfo:
>  seqlevels seqlengths isCircular genome
>  chr1   249250621  hg19
>  chr10  135534747  hg19
>  chr11  135006516  hg19
>  ...  .........
>  chrUn_gl000249 38502  hg19
>  chrX   155270560  hg19
>  chrY59373566  hg19
> 
> That way, we also raise awareness about the isCircular field.
> The current choice to only display the seqlengths pre-dates the
> existence of the seqinfo slot but might be a little bit misleading
> those days since it only exposes some arbitrary seqinfo fields.
> 
> H.
> 
>> 
>> Cheers,
>> Pete
>> 
>> 
>>> I might have requested the genome annotation, but I'm pretty sure it wasn't
>>> me who decide on tracking it on a per-sequence basis. I could imagine use
>>> cases for that though, e.g., when diagnosing sequencing contamination (like
>>> human vs. mouse). But most other tools and file formats expect a single
>>> genome per "track", so, for example, rtracklayer has an internal function
>>> singleGenome() to take care of this.
>>> 
>>> On Mon, Sep 8, 2014 at 10:50 AM, Herv? Pag?s  wrote:
>>> 
 Hi Vince,
 
 Yes it would make sense to have the "show" method report the genome
 when genome(x) contains a unique non-NA value. I think the main
 use case for having the genome defined at the sequence level instead
 of the whole object level is metagenomics. Maybe Michael has some other
 good use cases to share since IIRC he requested the addition of the
 genome field a couple of years ago and made the case for having it
 defined at the sequence level.
 
 Cheers,
 H.
 
 
 On 09/08/2014 07:21 AM, Vincent Carey wrote:
 
> For GRanges x, my naive expectation is that genome(x) returns a length-
> 
> one tag identifying the genome to which chromosomal coordinates
> 
> correspond.  The genome() method seems to have sequence-specific
> 
> semantics, which makes sense, but when we identify sequence
> 
> with chromosome, it seems too complicated.  Is there a use case for
> 
> a GRanges with sequences from several different genomes?
> 
> 
> One reason I am inquiring is that I feel it would be nice to have the
> GRanges show() method report, prominently, the genome in use (or NA
> 
> if unspecified).  This could be accomplished by reporting
> unique(genome(x)), and perhaps that would be satisfactory.
> 
> after example(genome) :
> 
> seqinfo(txdb)
>> 
> 
> Seqinfo of length 15
> 
> seqnames seqlengths isCircular genome
> 
> CH2L   23011544  FALSEdm3
> 
> CH2R   21146708  FALSEdm3
> 
> CH3L   24543557  FALSEdm3
> 
> CH3R   27905053  FALSEdm3
> 
> CH4 1351857  FALSEdm3
> 
> ... .........
> 
> CH3LHet 2555491  FALSEdm3
> 
> CH3RHet 2517507  FALSEdm3
> 
> CHXHet   204112  FALSEdm3
> 
> CHYHet   347038  FALSEdm3
> 
> CHUextra   29004656  FALSEdm3
> 
> genome(seqinfo(txdb))
>> 
> 
> CH2L CH2R CH3L CH3R  CH4  CHX  CHUM
> 
>"dm3""dm3""dm3""dm3""dm3""dm3""dm3""dm3"
> 
>  CH2LHet  CH2RHet  CH3LHet  CH3RHet   CHXHet   CHYHet CHUextra
> 
>"dm3""dm3""dm3"

Re: [Bioc-devel] should genome() be so complicated?/add genome report to GRanges show method

2014-09-08 Thread Hervé Pagès

On 09/08/2014 02:28 PM, Peter Hickey wrote:

Just a vote for still allowing for multiple genomes in a Seqinfo object (in a 
GRanges object). My use case is in bisulfite-sequencing experiments where there 
is often a spike-in of a lambda phage genome along with the genome of interest 
(human or mouse). It's often useful to keep all data from a single library 
together in the same objet but process according to genome(x) for each seqlevel.


Note taken. Thanks Pete! It's always great to know about concrete use
cases.



FWIW, I like Vincent's proposal of selectSome(unique(genome(x))) in the show 
method.


Or what about displaying the genome next to the seqlevel it's
associated with? Like e.g.:

  > gr
  GRanges with 3 ranges and 0 metadata columns:
seqnames   ranges strand
 
[1]chr14 [19069583, 19069654]  +
[2]chr14 [19363738, 19363809]  +
[3]chr14 [19363755, 19363826]  -
[4]chr14 [19369799, 19369870]  +
---
seqinfo:
  seqlevels seqlengths isCircular genome
  chr1   249250621  hg19
  chr10  135534747  hg19
  chr11  135006516  hg19
  ...  .........
  chrUn_gl000249 38502  hg19
  chrX   155270560  hg19
  chrY59373566  hg19

That way, we also raise awareness about the isCircular field.
The current choice to only display the seqlengths pre-dates the
existence of the seqinfo slot but might be a little bit misleading
those days since it only exposes some arbitrary seqinfo fields.

H.



Cheers,
Pete



I might have requested the genome annotation, but I'm pretty sure it wasn't
me who decide on tracking it on a per-sequence basis. I could imagine use
cases for that though, e.g., when diagnosing sequencing contamination (like
human vs. mouse). But most other tools and file formats expect a single
genome per "track", so, for example, rtracklayer has an internal function
singleGenome() to take care of this.

On Mon, Sep 8, 2014 at 10:50 AM, Herv? Pag?s  wrote:


Hi Vince,

Yes it would make sense to have the "show" method report the genome
when genome(x) contains a unique non-NA value. I think the main
use case for having the genome defined at the sequence level instead
of the whole object level is metagenomics. Maybe Michael has some other
good use cases to share since IIRC he requested the addition of the
genome field a couple of years ago and made the case for having it
defined at the sequence level.

Cheers,
H.


On 09/08/2014 07:21 AM, Vincent Carey wrote:


For GRanges x, my naive expectation is that genome(x) returns a length-

one tag identifying the genome to which chromosomal coordinates

correspond.  The genome() method seems to have sequence-specific

semantics, which makes sense, but when we identify sequence

with chromosome, it seems too complicated.  Is there a use case for

a GRanges with sequences from several different genomes?


One reason I am inquiring is that I feel it would be nice to have the
GRanges show() method report, prominently, the genome in use (or NA

if unspecified).  This could be accomplished by reporting
unique(genome(x)), and perhaps that would be satisfactory.

after example(genome) :

seqinfo(txdb)




Seqinfo of length 15

seqnames seqlengths isCircular genome

CH2L   23011544  FALSEdm3

CH2R   21146708  FALSEdm3

CH3L   24543557  FALSEdm3

CH3R   27905053  FALSEdm3

CH4 1351857  FALSEdm3

... .........

CH3LHet 2555491  FALSEdm3

CH3RHet 2517507  FALSEdm3

CHXHet   204112  FALSEdm3

CHYHet   347038  FALSEdm3

CHUextra   29004656  FALSEdm3

genome(seqinfo(txdb))




 CH2L CH2R CH3L CH3R  CH4  CHX  CHUM

"dm3""dm3""dm3""dm3""dm3""dm3""dm3""dm3"

  CH2LHet  CH2RHet  CH3LHet  CH3RHet   CHXHet   CHYHet CHUextra

"dm3""dm3""dm3""dm3""dm3""dm3""dm3"

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Herv? Pag?s

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel




Peter Hickey,
PhD Student/Research Assistant,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.

[Bioc-devel] should genome() be so complicated?/add genome report to GRanges show method

2014-09-08 Thread Peter Hickey
Just a vote for still allowing for multiple genomes in a Seqinfo object (in a 
GRanges object). My use case is in bisulfite-sequencing experiments where there 
is often a spike-in of a lambda phage genome along with the genome of interest 
(human or mouse). It's often useful to keep all data from a single library 
together in the same objet but process according to genome(x) for each seqlevel.

FWIW, I like Vincent's proposal of selectSome(unique(genome(x))) in the show 
method.

Cheers,
Pete


> I might have requested the genome annotation, but I'm pretty sure it wasn't
> me who decide on tracking it on a per-sequence basis. I could imagine use
> cases for that though, e.g., when diagnosing sequencing contamination (like
> human vs. mouse). But most other tools and file formats expect a single
> genome per "track", so, for example, rtracklayer has an internal function
> singleGenome() to take care of this.
> 
> On Mon, Sep 8, 2014 at 10:50 AM, Herv? Pag?s  wrote:
> 
>> Hi Vince,
>> 
>> Yes it would make sense to have the "show" method report the genome
>> when genome(x) contains a unique non-NA value. I think the main
>> use case for having the genome defined at the sequence level instead
>> of the whole object level is metagenomics. Maybe Michael has some other
>> good use cases to share since IIRC he requested the addition of the
>> genome field a couple of years ago and made the case for having it
>> defined at the sequence level.
>> 
>> Cheers,
>> H.
>> 
>> 
>> On 09/08/2014 07:21 AM, Vincent Carey wrote:
>> 
>>> For GRanges x, my naive expectation is that genome(x) returns a length-
>>> 
>>> one tag identifying the genome to which chromosomal coordinates
>>> 
>>> correspond.  The genome() method seems to have sequence-specific
>>> 
>>> semantics, which makes sense, but when we identify sequence
>>> 
>>> with chromosome, it seems too complicated.  Is there a use case for
>>> 
>>> a GRanges with sequences from several different genomes?
>>> 
>>> 
>>> One reason I am inquiring is that I feel it would be nice to have the
>>> GRanges show() method report, prominently, the genome in use (or NA
>>> 
>>> if unspecified).  This could be accomplished by reporting
>>> unique(genome(x)), and perhaps that would be satisfactory.
>>> 
>>> after example(genome) :
>>> 
>>> seqinfo(txdb)
 
>>> 
>>> Seqinfo of length 15
>>> 
>>> seqnames seqlengths isCircular genome
>>> 
>>> CH2L   23011544  FALSEdm3
>>> 
>>> CH2R   21146708  FALSEdm3
>>> 
>>> CH3L   24543557  FALSEdm3
>>> 
>>> CH3R   27905053  FALSEdm3
>>> 
>>> CH4 1351857  FALSEdm3
>>> 
>>> ... .........
>>> 
>>> CH3LHet 2555491  FALSEdm3
>>> 
>>> CH3RHet 2517507  FALSEdm3
>>> 
>>> CHXHet   204112  FALSEdm3
>>> 
>>> CHYHet   347038  FALSEdm3
>>> 
>>> CHUextra   29004656  FALSEdm3
>>> 
>>> genome(seqinfo(txdb))
 
>>> 
>>> CH2L CH2R CH3L CH3R  CH4  CHX  CHUM
>>> 
>>>"dm3""dm3""dm3""dm3""dm3""dm3""dm3""dm3"
>>> 
>>>  CH2LHet  CH2RHet  CH3LHet  CH3RHet   CHXHet   CHYHet CHUextra
>>> 
>>>"dm3""dm3""dm3""dm3""dm3""dm3""dm3"
>>> 
>>>[[alternative HTML version deleted]]
>>> 
>>> ___
>>> Bioc-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>> 
>>> 
>> --
>> Herv? Pag?s
>> 
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>> 
>> E-mail: hpa...@fhcrc.org
>> Phone:  (206) 667-5791
>> Fax:(206) 667-1319
>> 
>> 
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> 


Peter Hickey,
PhD Student/Research Assistant,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
Ph: +613 9345 2324

hic...@wehi.edu.au
http://www.wehi.edu.au

__
The information in this email is confidential and intend...{{dropped:6}}

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] should genome() be so complicated?/add genome report to GRanges show method

2014-09-08 Thread Hervé Pagès

On 09/08/2014 11:41 AM, Michael Lawrence wrote:

I might have requested the genome annotation, but I'm pretty sure it
wasn't me who decide on tracking it on a per-sequence basis.


OK, maybe. Don't trust my memory too much on this. No regrets though.
I think it was the right thing to do ;-) Just because the SAM/BAM
format does it is a good enough reason for us to do it too. According
to the SAM Spec, the header can look like:

  @HD   VN:1.3  SO:coordinate
  @SQ   SN:chr1_hg19LN:45   AS:hg19
  @SQ   SN:chr1_mm10LN:42   AS:mm10

The only problem is that seqinfo(BamFile("test.bam")) seems to ignore
the AS tag (genome assembly identifier) at the moment:

  > seqinfo(BamFile("test.bam"))
  Seqinfo of length 2
  seqnames seqlengths isCircular genome
  chr1_hg1945 NA   
  chr1_mm1042 NA   

Hopefully that can be addressed. But that's a separate issue...

Cheers,
H.



I could
imagine use cases for that though, e.g., when diagnosing sequencing
contamination (like human vs. mouse). But most other tools and file
formats expect a single genome per "track", so, for example, rtracklayer
has an internal function singleGenome() to take care of this.

On Mon, Sep 8, 2014 at 10:50 AM, Hervé Pagès mailto:hpa...@fhcrc.org>> wrote:

Hi Vince,

Yes it would make sense to have the "show" method report the genome
when genome(x) contains a unique non-NA value. I think the main
use case for having the genome defined at the sequence level instead
of the whole object level is metagenomics. Maybe Michael has some other
good use cases to share since IIRC he requested the addition of the
genome field a couple of years ago and made the case for having it
defined at the sequence level.

Cheers,
H.


On 09/08/2014 07:21 AM, Vincent Carey wrote:

For GRanges x, my naive expectation is that genome(x) returns a
length-

one tag identifying the genome to which chromosomal coordinates

correspond.  The genome() method seems to have sequence-specific

semantics, which makes sense, but when we identify sequence

with chromosome, it seems too complicated.  Is there a use case for

a GRanges with sequences from several different genomes?


One reason I am inquiring is that I feel it would be nice to
have the
GRanges show() method report, prominently, the genome in use (or NA

if unspecified).  This could be accomplished by reporting
unique(genome(x)), and perhaps that would be satisfactory.

after example(genome) :

seqinfo(txdb)


Seqinfo of length 15

seqnames seqlengths isCircular genome

CH2L   23011544  FALSEdm3

CH2R   21146708  FALSEdm3

CH3L   24543557  FALSEdm3

CH3R   27905053  FALSEdm3

CH4 1351857  FALSEdm3

... .........

CH3LHet 2555491  FALSEdm3

CH3RHet 2517507  FALSEdm3

CHXHet   204112  FALSEdm3

CHYHet   347038  FALSEdm3

CHUextra   29004656  FALSEdm3

genome(seqinfo(txdb))


  CH2L CH2R CH3L CH3R  CH4  CHX
CHUM

 "dm3""dm3""dm3""dm3""dm3""dm3"
"dm3""dm3"

   CH2LHet  CH2RHet  CH3LHet  CH3RHet   CHXHet   CHYHet CHUextra

 "dm3""dm3""dm3""dm3""dm3""dm3""dm3"

 [[alternative HTML version deleted]]

_
Bioc-devel@r-project.org 
mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org 
Phone: (206) 667-5791 
Fax: (206) 667-1319 


_
Bioc-devel@r-project.org  mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel





--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] should genome() be so complicated?/add genome report to GRanges show method

2014-09-08 Thread Michael Lawrence
I might have requested the genome annotation, but I'm pretty sure it wasn't
me who decide on tracking it on a per-sequence basis. I could imagine use
cases for that though, e.g., when diagnosing sequencing contamination (like
human vs. mouse). But most other tools and file formats expect a single
genome per "track", so, for example, rtracklayer has an internal function
singleGenome() to take care of this.

On Mon, Sep 8, 2014 at 10:50 AM, Hervé Pagès  wrote:

> Hi Vince,
>
> Yes it would make sense to have the "show" method report the genome
> when genome(x) contains a unique non-NA value. I think the main
> use case for having the genome defined at the sequence level instead
> of the whole object level is metagenomics. Maybe Michael has some other
> good use cases to share since IIRC he requested the addition of the
> genome field a couple of years ago and made the case for having it
> defined at the sequence level.
>
> Cheers,
> H.
>
>
> On 09/08/2014 07:21 AM, Vincent Carey wrote:
>
>> For GRanges x, my naive expectation is that genome(x) returns a length-
>>
>> one tag identifying the genome to which chromosomal coordinates
>>
>> correspond.  The genome() method seems to have sequence-specific
>>
>> semantics, which makes sense, but when we identify sequence
>>
>> with chromosome, it seems too complicated.  Is there a use case for
>>
>> a GRanges with sequences from several different genomes?
>>
>>
>> One reason I am inquiring is that I feel it would be nice to have the
>> GRanges show() method report, prominently, the genome in use (or NA
>>
>> if unspecified).  This could be accomplished by reporting
>> unique(genome(x)), and perhaps that would be satisfactory.
>>
>> after example(genome) :
>>
>>  seqinfo(txdb)
>>>
>>
>> Seqinfo of length 15
>>
>> seqnames seqlengths isCircular genome
>>
>> CH2L   23011544  FALSEdm3
>>
>> CH2R   21146708  FALSEdm3
>>
>> CH3L   24543557  FALSEdm3
>>
>> CH3R   27905053  FALSEdm3
>>
>> CH4 1351857  FALSEdm3
>>
>> ... .........
>>
>> CH3LHet 2555491  FALSEdm3
>>
>> CH3RHet 2517507  FALSEdm3
>>
>> CHXHet   204112  FALSEdm3
>>
>> CHYHet   347038  FALSEdm3
>>
>> CHUextra   29004656  FALSEdm3
>>
>>  genome(seqinfo(txdb))
>>>
>>
>>  CH2L CH2R CH3L CH3R  CH4  CHX  CHUM
>>
>> "dm3""dm3""dm3""dm3""dm3""dm3""dm3""dm3"
>>
>>   CH2LHet  CH2RHet  CH3LHet  CH3RHet   CHXHet   CHYHet CHUextra
>>
>> "dm3""dm3""dm3""dm3""dm3""dm3""dm3"
>>
>> [[alternative HTML version deleted]]
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fhcrc.org
> Phone:  (206) 667-5791
> Fax:(206) 667-1319
>
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] should genome() be so complicated?/add genome report to GRanges show method

2014-09-08 Thread Vincent Carey
On Mon, Sep 8, 2014 at 1:50 PM, Hervé Pagès  wrote:

> Hi Vince,
>
> Yes it would make sense to have the "show" method report the genome
> when genome(x) contains a unique non-NA value. I think the main
>

i would propose that it show selectSome(unique(genome(x))) -- seems
consistent with the multiple genomes in use when relevant

if NA, IHMO that is an important lack of metadata and people should
see that then as well



> use case for having the genome defined at the sequence level instead
> of the whole object level is metagenomics. Maybe Michael has some other
> good use cases to share since IIRC he requested the addition of the
> genome field a couple of years ago and made the case for having it
> defined at the sequence level.
>
> Cheers,
> H.
>
>
> On 09/08/2014 07:21 AM, Vincent Carey wrote:
>
>> For GRanges x, my naive expectation is that genome(x) returns a length-
>>
>> one tag identifying the genome to which chromosomal coordinates
>>
>> correspond.  The genome() method seems to have sequence-specific
>>
>> semantics, which makes sense, but when we identify sequence
>>
>> with chromosome, it seems too complicated.  Is there a use case for
>>
>> a GRanges with sequences from several different genomes?
>>
>>
>> One reason I am inquiring is that I feel it would be nice to have the
>> GRanges show() method report, prominently, the genome in use (or NA
>>
>> if unspecified).  This could be accomplished by reporting
>> unique(genome(x)), and perhaps that would be satisfactory.
>>
>> after example(genome) :
>>
>>  seqinfo(txdb)
>>>
>>
>> Seqinfo of length 15
>>
>> seqnames seqlengths isCircular genome
>>
>> CH2L   23011544  FALSEdm3
>>
>> CH2R   21146708  FALSEdm3
>>
>> CH3L   24543557  FALSEdm3
>>
>> CH3R   27905053  FALSEdm3
>>
>> CH4 1351857  FALSEdm3
>>
>> ... .........
>>
>> CH3LHet 2555491  FALSEdm3
>>
>> CH3RHet 2517507  FALSEdm3
>>
>> CHXHet   204112  FALSEdm3
>>
>> CHYHet   347038  FALSEdm3
>>
>> CHUextra   29004656  FALSEdm3
>>
>>  genome(seqinfo(txdb))
>>>
>>
>>  CH2L CH2R CH3L CH3R  CH4  CHX  CHUM
>>
>> "dm3""dm3""dm3""dm3""dm3""dm3""dm3""dm3"
>>
>>   CH2LHet  CH2RHet  CH3LHet  CH3RHet   CHXHet   CHYHet CHUextra
>>
>> "dm3""dm3""dm3""dm3""dm3""dm3""dm3"
>>
>> [[alternative HTML version deleted]]
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fhcrc.org
> Phone:  (206) 667-5791
> Fax:(206) 667-1319
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] should genome() be so complicated?/add genome report to GRanges show method

2014-09-08 Thread Hervé Pagès

Hi Vince,

Yes it would make sense to have the "show" method report the genome
when genome(x) contains a unique non-NA value. I think the main
use case for having the genome defined at the sequence level instead
of the whole object level is metagenomics. Maybe Michael has some other
good use cases to share since IIRC he requested the addition of the
genome field a couple of years ago and made the case for having it
defined at the sequence level.

Cheers,
H.

On 09/08/2014 07:21 AM, Vincent Carey wrote:

For GRanges x, my naive expectation is that genome(x) returns a length-

one tag identifying the genome to which chromosomal coordinates

correspond.  The genome() method seems to have sequence-specific

semantics, which makes sense, but when we identify sequence

with chromosome, it seems too complicated.  Is there a use case for

a GRanges with sequences from several different genomes?


One reason I am inquiring is that I feel it would be nice to have the
GRanges show() method report, prominently, the genome in use (or NA

if unspecified).  This could be accomplished by reporting
unique(genome(x)), and perhaps that would be satisfactory.

after example(genome) :


seqinfo(txdb)


Seqinfo of length 15

seqnames seqlengths isCircular genome

CH2L   23011544  FALSEdm3

CH2R   21146708  FALSEdm3

CH3L   24543557  FALSEdm3

CH3R   27905053  FALSEdm3

CH4 1351857  FALSEdm3

... .........

CH3LHet 2555491  FALSEdm3

CH3RHet 2517507  FALSEdm3

CHXHet   204112  FALSEdm3

CHYHet   347038  FALSEdm3

CHUextra   29004656  FALSEdm3


genome(seqinfo(txdb))


 CH2L CH2R CH3L CH3R  CH4  CHX  CHUM

"dm3""dm3""dm3""dm3""dm3""dm3""dm3""dm3"

  CH2LHet  CH2RHet  CH3LHet  CH3RHet   CHXHet   CHYHet CHUextra

"dm3""dm3""dm3""dm3""dm3""dm3""dm3"

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] should genome() be so complicated?/add genome report to GRanges show method

2014-09-08 Thread Vincent Carey
For GRanges x, my naive expectation is that genome(x) returns a length-

one tag identifying the genome to which chromosomal coordinates

correspond.  The genome() method seems to have sequence-specific

semantics, which makes sense, but when we identify sequence

with chromosome, it seems too complicated.  Is there a use case for

a GRanges with sequences from several different genomes?


One reason I am inquiring is that I feel it would be nice to have the
GRanges show() method report, prominently, the genome in use (or NA

if unspecified).  This could be accomplished by reporting
unique(genome(x)), and perhaps that would be satisfactory.

after example(genome) :

> seqinfo(txdb)

Seqinfo of length 15

seqnames seqlengths isCircular genome

CH2L   23011544  FALSEdm3

CH2R   21146708  FALSEdm3

CH3L   24543557  FALSEdm3

CH3R   27905053  FALSEdm3

CH4 1351857  FALSEdm3

... .........

CH3LHet 2555491  FALSEdm3

CH3RHet 2517507  FALSEdm3

CHXHet   204112  FALSEdm3

CHYHet   347038  FALSEdm3

CHUextra   29004656  FALSEdm3

> genome(seqinfo(txdb))

CH2L CH2R CH3L CH3R  CH4  CHX  CHUM

   "dm3""dm3""dm3""dm3""dm3""dm3""dm3""dm3"

 CH2LHet  CH2RHet  CH3LHet  CH3RHet   CHXHet   CHYHet CHUextra

   "dm3""dm3""dm3""dm3""dm3""dm3""dm3"

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel