Agreed, that looks a lot nicer. On Tue, Sep 9, 2014 at 4:42 AM, Martin Morgan <mtmor...@fhcrc.org> wrote:
> On 09/09/2014 04:02 AM, Michael Lawrence wrote: > >> I'm in favor of this display. The seqinfo output at the bottom has always >> been annoying (over-emphasized). >> > > the fact that the lengths are 'NA' can be a helpful prompt to do something > about it, e.g., add seqinfo when inputting the data. Also they are helpful > when one is told that seqlengths are incompatible during, e.g., > findOverlaps. But I like the idea of less but more informative display of > seqinfo, along the lines suggested by Vince. > > seqinfo: 60 seqlevels (2 circular) on 2 genomes (hg19, mm10); 60 'NA' > seqlengths > > Martin > > >> On Mon, Sep 8, 2014 at 10:08 PM, Vincent Carey < >> st...@channing.harvard.edu> >> wrote: >> >> >>> >>> On Tue, Sep 9, 2014 at 12:30 AM, Hervé Pagès <hpa...@fhcrc.org> wrote: >>> >>> On 09/08/2014 06:42 PM, Michael Lawrence wrote: >>>> >>>> Instead of printing out multiple lines of a table that is rarely of >>>>> interest, could we develop Peter's idea toward something like: >>>>> >>>>> hg19:chr1 hg19:chr2 ... >>>>> [lengths ...] >>>>> >>>>> Not sure what condensed notation would be useful for circularity. >>>>> >>>>> >>>> I don't know either. I'm worried that this would make the seqinfo >>>> stuff look like a named vector and that the user would expect >>>> hg19:chr1, hg19:chr2, etc... to be valid names. >>>> >>>> With the table-like layout, some screen real estate can always be >>>> saved by printing less lines: >>>> >>>> >>>> What I had in mind was >>> >>> >>> > gr >>>> GRanges with 3 ranges and 0 metadata columns: >>>> >>>> genome: hg19 >>> >>> seqnames ranges strand >>>> <Rle> <IRanges> <Rle> >>>> [1] chr14 [19069583, 19069654] + >>>> [2] chr14 [19363738, 19363809] + >>>> [3] chr14 [19363755, 19363826] - >>>> [4] chr14 [19369799, 19369870] + >>>> >>>> >>> >>> you could then probably dispense with the seqlengths. i have >>> never found them too useful except as a key to the genome. >>> >>> if there are multiple genomes, we have something like >>> >>> genomes: hg19, mm9 >>> >>> the point is to make it prominent, particularly at a time of transition. >>> >>> >>> >>> --- seqinfo: 60 seqlevels (2 circulars) on 2 genomes (hg19, mm10) --- >>>> seqlevels seqlengths isCircular >>>> genome >>>> chr1 249250621 <NA> >>>> hg19 >>>> chr10 135534747 <NA> >>>> hg19 >>>> ... ... ... >>>> ... >>>> chrX 155270560 <NA> >>>> hg19 >>>> chrY 59373566 <NA> >>>> hg19 >>>> >>>> I agree that the exact content of the seqinfo table itself is rarely >>>> of interest so printing only 3 or 4 lines is OK. IMO it's important >>>> to make the user aware of the existence of this hidden table and to >>>> display it like what it really is (i.e. a table). Also displaying the >>>> column names is a well established tradition and serves the purpose >>>> of providing a quick summary of the accessors that are available to >>>> access those fields. >>>> >>>> H. >>>> >>>> >>>> >>>>> On Mon, Sep 8, 2014 at 5:21 PM, Peter Hickey <hic...@wehi.edu.au >>>>> <mailto:hic...@wehi.edu.au>> wrote: >>>>> >>>>> Perhaps it might be useful to have some way of highlighting if any >>>>> of the chromosomes are circular or highlighting if there are >>>>> multiple genomes present? Otherwise this information might be >>>>> hidden >>>>> in the "…" >>>>> >>>>> Cheers, >>>>> Pete >>>>> >>>>> >>>>> On 09/09/2014, at 9:44 AM, Hervé Pagès <hpa...@fhcrc.org >>>>> <mailto:hpa...@fhcrc.org>> wrote: >>>>> >>>>> > On 09/08/2014 02:28 PM, Peter Hickey wrote: >>>>> >> Just a vote for still allowing for multiple genomes in a >>>>> Seqinfo >>>>> object (in a GRanges object). My use case is in >>>>> bisulfite-sequencing >>>>> experiments where there is often a spike-in of a lambda phage >>>>> genome >>>>> along with the genome of interest (human or mouse). It's often >>>>> useful to keep all data from a single library together in the same >>>>> objet but process according to genome(x) for each seqlevel. >>>>> > >>>>> > Note taken. Thanks Pete! It's always great to know about >>>>> concrete >>>>> use >>>>> > cases. >>>>> > >>>>> >> >>>>> >> FWIW, I like Vincent's proposal of >>>>> selectSome(unique(genome(x))) >>>>> in the show method. >>>>> > >>>>> > Or what about displaying the genome next to the seqlevel it's >>>>> > associated with? Like e.g.: >>>>> > >>>>> > > gr >>>>> > GRanges with 3 ranges and 0 metadata columns: >>>>> > seqnames ranges strand >>>>> > <Rle> <IRanges> <Rle> >>>>> > [1] chr14 [19069583, 19069654] + >>>>> > [2] chr14 [19363738, 19363809] + >>>>> > [3] chr14 [19363755, 19363826] - >>>>> > [4] chr14 [19369799, 19369870] + >>>>> > --- >>>>> > seqinfo: >>>>> > seqlevels seqlengths isCircular genome >>>>> > chr1 249250621 <NA> hg19 >>>>> > chr10 135534747 <NA> hg19 >>>>> > chr11 135006516 <NA> hg19 >>>>> > ... ... ... ... >>>>> > chrUn_gl000249 38502 <NA> hg19 >>>>> > chrX 155270560 <NA> hg19 >>>>> > chrY 59373566 <NA> hg19 >>>>> > >>>>> > That way, we also raise awareness about the isCircular field. >>>>> > The current choice to only display the seqlengths pre-dates the >>>>> > existence of the seqinfo slot but might be a little bit >>>>> misleading >>>>> > those days since it only exposes some arbitrary seqinfo fields. >>>>> > >>>>> > H. >>>>> > >>>>> >> >>>>> >> Cheers, >>>>> >> Pete >>>>> >> >>>>> >> >>>>> >>> I might have requested the genome annotation, but I'm pretty >>>>> sure it wasn't >>>>> >>> me who decide on tracking it on a per-sequence basis. I could >>>>> imagine use >>>>> >>> cases for that though, e.g., when diagnosing sequencing >>>>> contamination (like >>>>> >>> human vs. mouse). But most other tools and file formats >>>>> expect >>>>> a single >>>>> >>> genome per "track", so, for example, rtracklayer has an >>>>> internal function >>>>> >>> singleGenome() to take care of this. >>>>> >>> >>>>> >>> On Mon, Sep 8, 2014 at 10:50 AM, Herv? Pag?s < >>>>> hpa...@fhcrc.org >>>>> <mailto:hpa...@fhcrc.org>> wrote: >>>>> >>> >>>>> >>>> Hi Vince, >>>>> >>>> >>>>> >>>> Yes it would make sense to have the "show" method report the >>>>> genome >>>>> >>>> when genome(x) contains a unique non-NA value. I think the >>>>> main >>>>> >>>> use case for having the genome defined at the sequence level >>>>> instead >>>>> >>>> of the whole object level is metagenomics. Maybe Michael has >>>>> some other >>>>> >>>> good use cases to share since IIRC he requested the addition >>>>> of the >>>>> >>>> genome field a couple of years ago and made the case for >>>>> having it >>>>> >>>> defined at the sequence level. >>>>> >>>> >>>>> >>>> Cheers, >>>>> >>>> H. >>>>> >>>> >>>>> >>>> >>>>> >>>> On 09/08/2014 07:21 AM, Vincent Carey wrote: >>>>> >>>> >>>>> >>>>> For GRanges x, my naive expectation is that genome(x) >>>>> returns >>>>> a length- >>>>> >>>>> >>>>> >>>>> one tag identifying the genome to which chromosomal >>>>> coordinates >>>>> >>>>> >>>>> >>>>> correspond. The genome() method seems to have >>>>> sequence-specific >>>>> >>>>> >>>>> >>>>> semantics, which makes sense, but when we identify sequence >>>>> >>>>> >>>>> >>>>> with chromosome, it seems too complicated. Is there a use >>>>> case for >>>>> >>>>> >>>>> >>>>> a GRanges with sequences from several different genomes? >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> One reason I am inquiring is that I feel it would be nice >>>>> to >>>>> have the >>>>> >>>>> GRanges show() method report, prominently, the genome in >>>>> use >>>>> (or NA >>>>> >>>>> >>>>> >>>>> if unspecified). This could be accomplished by reporting >>>>> >>>>> unique(genome(x)), and perhaps that would be satisfactory. >>>>> >>>>> >>>>> >>>>> after example(genome) : >>>>> >>>>> >>>>> >>>>> seqinfo(txdb) >>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> Seqinfo of length 15 >>>>> >>>>> >>>>> >>>>> seqnames seqlengths isCircular genome >>>>> >>>>> >>>>> >>>>> CH2L 23011544 FALSE dm3 >>>>> >>>>> >>>>> >>>>> CH2R 21146708 FALSE dm3 >>>>> >>>>> >>>>> >>>>> CH3L 24543557 FALSE dm3 >>>>> >>>>> >>>>> >>>>> CH3R 27905053 FALSE dm3 >>>>> >>>>> >>>>> >>>>> CH4 1351857 FALSE dm3 >>>>> >>>>> >>>>> >>>>> ... ... ... ... >>>>> >>>>> >>>>> >>>>> CH3LHet 2555491 FALSE dm3 >>>>> >>>>> >>>>> >>>>> CH3RHet 2517507 FALSE dm3 >>>>> >>>>> >>>>> >>>>> CHXHet 204112 FALSE dm3 >>>>> >>>>> >>>>> >>>>> CHYHet 347038 FALSE dm3 >>>>> >>>>> >>>>> >>>>> CHUextra 29004656 FALSE dm3 >>>>> >>>>> >>>>> >>>>> genome(seqinfo(txdb)) >>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> CH2L CH2R CH3L CH3R CH4 CHX >>>>> CHU M >>>>> >>>>> >>>>> >>>>> "dm3" "dm3" "dm3" "dm3" "dm3" "dm3" >>>>> "dm3" "dm3" >>>>> >>>>> >>>>> >>>>> CH2LHet CH2RHet CH3LHet CH3RHet CHXHet CHYHet >>>>> CHUextra >>>>> >>>>> >>>>> >>>>> "dm3" "dm3" "dm3" "dm3" "dm3" "dm3" >>>>> "dm3" >>>>> >>>>> >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> >>>>> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> >>>>> mailing list >>>>> >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>> >>>>> >>>>> >>>>> >>>>> >>>> -- >>>>> >>>> Herv? Pag?s >>>>> >>>> >>>>> >>>> Program in Computational Biology >>>>> >>>> Division of Public Health Sciences >>>>> >>>> Fred Hutchinson Cancer Research Center >>>>> >>>> 1100 Fairview Ave. N, M1-B514 >>>>> >>>> P.O. Box 19024 >>>>> >>>> Seattle, WA 98109-1024 >>>>> >>>> >>>>> >>>> E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org> >>>>> >>>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> >>>>> >>>> Fax: (206) 667-1319 <tel:%28206%29%20667-1319> >>>>> >>>> >>>>> >>>> >>>>> >>>> _______________________________________________ >>>>> >>>> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> >>>>> mailing list >>>>> >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>> >>>> >>>>> >> >>>>> >> -------------------------------- >>>>> >> Peter Hickey, >>>>> >> PhD Student/Research Assistant, >>>>> >> Bioinformatics Division, >>>>> >> Walter and Eliza Hall Institute of Medical Research, >>>>> >> 1G Royal Parade, Parkville, Vic 3052, Australia. >>>>> >> Ph: +613 9345 2324 <tel:%2B613%209345%202324> >>>>> >> >>>>> >> hic...@wehi.edu.au <mailto:hic...@wehi.edu.au> >>>>> >> http://www.wehi.edu.au >>>>> >> >>>>> >> >>>>> ____________________________________________________________ >>>>> __________ >>>>> >> The information in this email is confidential and >>>>> intend...{{dropped:6}} >>>>> >> >>>>> >> _______________________________________________ >>>>> >>Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> >>>>> mailing list >>>>> >>https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>> >> >>>>> > >>>>> > -- >>>>> > Hervé Pagès >>>>> > >>>>> > Program in Computational Biology >>>>> > Division of Public Health Sciences >>>>> > Fred Hutchinson Cancer Research Center >>>>> > 1100 Fairview Ave. N, M1-B514 >>>>> > P.O. Box 19024 >>>>> > Seattle, WA 98109-1024 >>>>> > >>>>> > E-mail:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org> >>>>> > Phone:(206) 667-5791 <tel:%28206%29%20667-5791> >>>>> > Fax:(206) 667-1319 <tel:%28206%29%20667-1319> >>>>> >>>>> -------------------------------- >>>>> Peter Hickey, >>>>> PhD Student/Research Assistant, >>>>> Bioinformatics Division, >>>>> Walter and Eliza Hall Institute of Medical Research, >>>>> 1G Royal Parade, Parkville, Vic 3052, Australia. >>>>> Ph: +613 9345 2324 <tel:%2B613%209345%202324> >>>>> >>>>> hic...@wehi.edu.au <mailto:hic...@wehi.edu.au> >>>>> http://www.wehi.edu.au >>>>> >>>>> >>>>> ____________________________________________________________ >>>>> __________ >>>>> The information in this email is confidential and >>>>> intend...{{dropped:8}} >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> >>>>> mailing >>>>> list >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>> >>>>> >>>>> >>>>> -- >>>> Hervé Pagès >>>> >>>> Program in Computational Biology >>>> Division of Public Health Sciences >>>> Fred Hutchinson Cancer Research Center >>>> 1100 Fairview Ave. N, M1-B514 >>>> P.O. Box 19024 >>>> Seattle, WA 98109-1024 >>>> >>>> E-mail: hpa...@fhcrc.org >>>> Phone: (206) 667-5791 >>>> Fax: (206) 667-1319 >>>> >>>> _______________________________________________ >>>> Bioc-devel@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>> >>>> >>> >>> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> >> > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel