[Bioc-devel] as.character method for GenomicRanges?

2015-04-24 Thread Peter Haverty
Would people be interested in having this:

setMethod("as.character", "GenomicRanges",
  function(x) {
  paste0(seqnames(x), ":", start(x), "-", end(x))
  })

?

I find myself doing that a lot to make unique names or for output that
goes to collaborators.  I suppose we might want to tack on the strand if it
isn't "*".  I have some code for going the other direction too, if there is
interest.



Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] as.character method for GenomicRanges?

2015-04-24 Thread Hervé Pagès

Hi Pete,

Excellent idea. That will make things like table() work out-of-the-box
on GenomicRanges objects. I'll add that.

Thanks,
H.


On 04/24/2015 09:43 AM, Peter Haverty wrote:

Would people be interested in having this:

setMethod("as.character", "GenomicRanges",
   function(x) {
   paste0(seqnames(x), ":", start(x), "-", end(x))
   })

?

I find myself doing that a lot to make unique names or for output that
goes to collaborators.  I suppose we might want to tack on the strand if it
isn't "*".  I have some code for going the other direction too, if there is
interest.



Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] as.character method for GenomicRanges?

2015-04-24 Thread Michael Lawrence
It is a great idea, but I'm not sure I would use it to implement table().
Allocating those strings will be costly. Don't we already have the 4-way
int hash? Of course, my intuition might be completely off here.


On Fri, Apr 24, 2015 at 9:59 AM, Hervé Pagès  wrote:

> Hi Pete,
>
> Excellent idea. That will make things like table() work out-of-the-box
> on GenomicRanges objects. I'll add that.
>
> Thanks,
> H.
>
>
>
> On 04/24/2015 09:43 AM, Peter Haverty wrote:
>
>> Would people be interested in having this:
>>
>> setMethod("as.character", "GenomicRanges",
>>function(x) {
>>paste0(seqnames(x), ":", start(x), "-", end(x))
>>})
>>
>> ?
>>
>> I find myself doing that a lot to make unique names or for output that
>> goes to collaborators.  I suppose we might want to tack on the strand if
>> it
>> isn't "*".  I have some code for going the other direction too, if there
>> is
>> interest.
>>
>>
>>
>> Pete
>>
>> 
>> Peter M. Haverty, Ph.D.
>> Genentech, Inc.
>> phave...@gene.com
>>
>> [[alternative HTML version deleted]]
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fredhutch.org
> Phone:  (206) 667-5791
> Fax:(206) 667-1319
>
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] as.character method for GenomicRanges?

2015-04-24 Thread Michael Lawrence
Sorry, one more concern, if you're thinking of using as a range key, you
will need the strand, but many use cases might not want the strand on
there. Like for pasting into a genome browser.

On Fri, Apr 24, 2015 at 10:18 AM, Michael Lawrence 
wrote:

> It is a great idea, but I'm not sure I would use it to implement table().
> Allocating those strings will be costly. Don't we already have the 4-way
> int hash? Of course, my intuition might be completely off here.
>
>
> On Fri, Apr 24, 2015 at 9:59 AM, Hervé Pagès  wrote:
>
>> Hi Pete,
>>
>> Excellent idea. That will make things like table() work out-of-the-box
>> on GenomicRanges objects. I'll add that.
>>
>> Thanks,
>> H.
>>
>>
>>
>> On 04/24/2015 09:43 AM, Peter Haverty wrote:
>>
>>> Would people be interested in having this:
>>>
>>> setMethod("as.character", "GenomicRanges",
>>>function(x) {
>>>paste0(seqnames(x), ":", start(x), "-", end(x))
>>>})
>>>
>>> ?
>>>
>>> I find myself doing that a lot to make unique names or for output that
>>> goes to collaborators.  I suppose we might want to tack on the strand if
>>> it
>>> isn't "*".  I have some code for going the other direction too, if there
>>> is
>>> interest.
>>>
>>>
>>>
>>> Pete
>>>
>>> 
>>> Peter M. Haverty, Ph.D.
>>> Genentech, Inc.
>>> phave...@gene.com
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ___
>>> Bioc-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpa...@fredhutch.org
>> Phone:  (206) 667-5791
>> Fax:(206) 667-1319
>>
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] as.character method for GenomicRanges?

2015-04-24 Thread Hervé Pagès

On 04/24/2015 10:21 AM, Michael Lawrence wrote:

Sorry, one more concern, if you're thinking of using as a range key, you
will need the strand, but many use cases might not want the strand on
there. Like for pasting into a genome browser.


What about appending the strand only for GRanges objects that
have at least one range that is not on *?

setMethod("as.character", "GenomicRanges",
function(x)
{
if (length(x) == 0L)
return(character(0))
ans <- paste0(seqnames(x), ":", start(x), "-", end(x))
if (any(strand(x) != "*"))
  ans <- paste0(ans, ":", strand(x))
ans
}
)

> as.character(gr)
 [1] "chr1:1-10"  "chr2:2-10"  "chr2:3-10"  "chr2:4-10"  "chr1:5-10"
 [6] "chr1:6-10"  "chr3:7-10"  "chr3:8-10"  "chr3:9-10"  "chr3:10-10"

> strand(gr)[2:3] <- c("-", "+")
> as.character(gr)
 [1] "chr1:1-10:*"  "chr2:2-10:-"  "chr2:3-10:+"  "chr2:4-10:*" 
"chr1:5-10:*"
 [6] "chr1:6-10:*"  "chr3:7-10:*"  "chr3:8-10:*"  "chr3:9-10:*" 
"chr3:10-10:*"


H.



On Fri, Apr 24, 2015 at 10:18 AM, Michael Lawrence mailto:micha...@gene.com>> wrote:

It is a great idea, but I'm not sure I would use it to implement
table(). Allocating those strings will be costly. Don't we already
have the 4-way int hash? Of course, my intuition might be completely
off here.


On Fri, Apr 24, 2015 at 9:59 AM, Hervé Pagès mailto:hpa...@fredhutch.org>> wrote:

Hi Pete,

Excellent idea. That will make things like table() work
out-of-the-box
on GenomicRanges objects. I'll add that.

Thanks,
H.



On 04/24/2015 09:43 AM, Peter Haverty wrote:

Would people be interested in having this:

setMethod("as.character", "GenomicRanges",
function(x) {
paste0(seqnames(x), ":", start(x), "-", end(x))
})

?

I find myself doing that a lot to make unique names or for
output that
goes to collaborators.  I suppose we might want to tack on
the strand if it
isn't "*".  I have some code for going the other direction
too, if there is
interest.



Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com 

 [[alternative HTML version deleted]]

___
Bioc-devel@r-project.org 
mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org 
Phone: (206) 667-5791 
Fax: (206) 667-1319 


___
Bioc-devel@r-project.org 
mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel





--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] as.character method for GenomicRanges?

2015-04-24 Thread Hervé Pagès

On 04/24/2015 10:18 AM, Michael Lawrence wrote:

It is a great idea, but I'm not sure I would use it to implement
table(). Allocating those strings will be costly. Don't we already have
the 4-way int hash? Of course, my intuition might be completely off here.


It does use the 4-way int hash internally. as.character() is only used
at the very-end to stick the names on the returned table object.

H.




On Fri, Apr 24, 2015 at 9:59 AM, Hervé Pagès mailto:hpa...@fredhutch.org>> wrote:

Hi Pete,

Excellent idea. That will make things like table() work out-of-the-box
on GenomicRanges objects. I'll add that.

Thanks,
H.



On 04/24/2015 09:43 AM, Peter Haverty wrote:

Would people be interested in having this:

setMethod("as.character", "GenomicRanges",
function(x) {
paste0(seqnames(x), ":", start(x), "-", end(x))
})

?

I find myself doing that a lot to make unique names or for
output that
goes to collaborators.  I suppose we might want to tack on the
strand if it
isn't "*".  I have some code for going the other direction too,
if there is
interest.



Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com 

 [[alternative HTML version deleted]]

___
Bioc-devel@r-project.org 
mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org 
Phone: (206) 667-5791 
Fax: (206) 667-1319 


___
Bioc-devel@r-project.org  mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel




--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] as.character method for GenomicRanges?

2015-04-24 Thread Peter Haverty
Good catch. We'll want the strand in case we need to go back to a GRanges.
I would make the strand addition optional with the default of FALSE. It's
nice to have a column of strings you can paste right into a genome browser
(sorry Michael :-) ).  I often pass my bench collaborators a spreadsheet
with such a column.

Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com

On Fri, Apr 24, 2015 at 10:50 AM, Herv� Pag�s  wrote:

> On 04/24/2015 10:21 AM, Michael Lawrence wrote:
>
>> Sorry, one more concern, if you're thinking of using as a range key, you
>> will need the strand, but many use cases might not want the strand on
>> there. Like for pasting into a genome browser.
>>
>
> What about appending the strand only for GRanges objects that
> have at least one range that is not on *?
>
> setMethod("as.character", "GenomicRanges",
> function(x)
> {
> if (length(x) == 0L)
> return(character(0))
> ans <- paste0(seqnames(x), ":", start(x), "-", end(x))
> if (any(strand(x) != "*"))
>   ans <- paste0(ans, ":", strand(x))
> ans
> }
> )
>
> > as.character(gr)
>  [1] "chr1:1-10"  "chr2:2-10"  "chr2:3-10"  "chr2:4-10"  "chr1:5-10"
>  [6] "chr1:6-10"  "chr3:7-10"  "chr3:8-10"  "chr3:9-10"  "chr3:10-10"
>
> > strand(gr)[2:3] <- c("-", "+")
> > as.character(gr)
>  [1] "chr1:1-10:*"  "chr2:2-10:-"  "chr2:3-10:+"  "chr2:4-10:*"
> "chr1:5-10:*"
>  [6] "chr1:6-10:*"  "chr3:7-10:*"  "chr3:8-10:*"  "chr3:9-10:*"
> "chr3:10-10:*"
>
> H.
>
>
>> On Fri, Apr 24, 2015 at 10:18 AM, Michael Lawrence > > wrote:
>>
>> It is a great idea, but I'm not sure I would use it to implement
>> table(). Allocating those strings will be costly. Don't we already
>> have the 4-way int hash? Of course, my intuition might be completely
>> off here.
>>
>>
>> On Fri, Apr 24, 2015 at 9:59 AM, Herv� Pag�s > > wrote:
>>
>> Hi Pete,
>>
>> Excellent idea. That will make things like table() work
>> out-of-the-box
>> on GenomicRanges objects. I'll add that.
>>
>> Thanks,
>> H.
>>
>>
>>
>> On 04/24/2015 09:43 AM, Peter Haverty wrote:
>>
>> Would people be interested in having this:
>>
>> setMethod("as.character", "GenomicRanges",
>> function(x) {
>> paste0(seqnames(x), ":", start(x), "-",
>> end(x))
>> })
>>
>> ?
>>
>> I find myself doing that a lot to make unique names or for
>> output that
>> goes to collaborators.  I suppose we might want to tack on
>> the strand if it
>> isn't "*".  I have some code for going the other direction
>> too, if there is
>> interest.
>>
>>
>>
>> Pete
>>
>> 
>> Peter M. Haverty, Ph.D.
>> Genentech, Inc.
>> phave...@gene.com 
>>
>>  [[alternative HTML version deleted]]
>>
>> ___
>> Bioc-devel@r-project.org 
>> mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>> --
>> Herv� Pag�s
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpa...@fredhutch.org 
>> Phone: (206) 667-5791 
>> Fax: (206) 667-1319 
>>
>>
>> ___
>> Bioc-devel@r-project.org 
>> mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>>
>>
> --
> Herv� Pag�s
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fredhutch.org
> Phone:  (206) 667-5791
> Fax:(206) 667-1319
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] as.character method for GenomicRanges?

2015-04-24 Thread Hervé Pagès

On 04/24/2015 11:08 AM, Peter Haverty wrote:

Good catch. We'll want the strand in case we need to go back to a GRanges.
I would make the strand addition optional with the default of FALSE.
It's nice to have a column of strings you can paste right into a genome
browser (sorry Michael :-) ).  I often pass my bench collaborators a
spreadsheet with such a column.


as.character(unstrand(gr)) ?

3 reasons I'm not too keen about 'ignore.strand=TRUE' being the default:

(1) Many functions and methods in GenomicRanges/GenomicAlignments
have an 'ignore.strand' argument. For consistency, the default
value has been set to FALSE everywhere. Note that this was done
even if this default doesn't reflect the most common use case
(e.g. summarizeOverlaps).

(2) I think it's good to have the default behavior of as.character()
allow going back and forth between GRanges and character vector
without losing the strand information.

(3) The "table" method for Vector would break if as.character was
ignoring the strand by default. Can be worked-around by
implementing a method for GenomicRanges objects but...

Hope that makes sense.

H.




Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com 

On Fri, Apr 24, 2015 at 10:50 AM, Hervé Pagès mailto:hpa...@fredhutch.org>> wrote:

On 04/24/2015 10:21 AM, Michael Lawrence wrote:

Sorry, one more concern, if you're thinking of using as a range
key, you
will need the strand, but many use cases might not want the
strand on
there. Like for pasting into a genome browser.


What about appending the strand only for GRanges objects that
have at least one range that is not on *?

setMethod("as.character", "GenomicRanges",
 function(x)
 {
 if (length(x) == 0L)
 return(character(0))
 ans <- paste0(seqnames(x), ":", start(x), "-", end(x))
 if (any(strand(x) != "*"))
   ans <- paste0(ans, ":", strand(x))
 ans
 }
)

 > as.character(gr)
  [1] "chr1:1-10"  "chr2:2-10"  "chr2:3-10"  "chr2:4-10"  "chr1:5-10"
  [6] "chr1:6-10"  "chr3:7-10"  "chr3:8-10"  "chr3:9-10"  "chr3:10-10"

 > strand(gr)[2:3] <- c("-", "+")
 > as.character(gr)
  [1] "chr1:1-10:*"  "chr2:2-10:-"  "chr2:3-10:+"  "chr2:4-10:*"
"chr1:5-10:*"
  [6] "chr1:6-10:*"  "chr3:7-10:*"  "chr3:8-10:*"  "chr3:9-10:*"
"chr3:10-10:*"

H.


On Fri, Apr 24, 2015 at 10:18 AM, Michael Lawrence
mailto:micha...@gene.com>
>> wrote:

 It is a great idea, but I'm not sure I would use it to
implement
 table(). Allocating those strings will be costly. Don't we
already
 have the 4-way int hash? Of course, my intuition might be
completely
 off here.


 On Fri, Apr 24, 2015 at 9:59 AM, Hervé Pagès
mailto:hpa...@fredhutch.org>
 >> wrote:

 Hi Pete,

 Excellent idea. That will make things like table() work
 out-of-the-box
 on GenomicRanges objects. I'll add that.

 Thanks,
 H.



 On 04/24/2015 09:43 AM, Peter Haverty wrote:

 Would people be interested in having this:

 setMethod("as.character", "GenomicRanges",
 function(x) {
 paste0(seqnames(x), ":", start(x),
"-", end(x))
 })

 ?

 I find myself doing that a lot to make unique names
or for
 output that
 goes to collaborators.  I suppose we might want to
tack on
 the strand if it
 isn't "*".  I have some code for going the other
direction
 too, if there is
 interest.



 Pete

 
 Peter M. Haverty, Ph.D.
 Genentech, Inc.
phave...@gene.com 
>

  [[alternative HTML version deleted]]

 ___
Bioc-devel@r-project.org 
>
 mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


 --
 Hervé Pagès

 Program in Computational Biology
 Division of Public Health Sciences

Re: [Bioc-devel] as.character method for GenomicRanges?

2015-04-24 Thread Peter Haverty
Going the other way can look like this:

##' Parse one or more location strings and return as a GRanges



##'



##' Parse one or more location strings and return as a GRanges. GRanges
will get the names from the location.strings.


##' @param location.string character



##' @export



##' @return GRanges



##' @family location strings



locstring2GRanges <- function(location.string) {



  #  Take a location string, "chr11:123-127" or "11:123..456 +" and
return a list with chr, start, end elements


  location.string = sub("\\s+","",location.string)
  location.string = sub(",","",location.string)
  #location.string = sub("\\.\\.","-",location.string)  # TWU style
location strings


  if (any(! grepl("^(chr){0,1}.+:\\d+-\\d+$", location.string))) {
stop("Some location strings do not look like chr1:123-456.") }
  start = as.integer(sub("^.+:(\\d+)-.+$", "\\1", location.string))
  stop = as.integer(sub("^.+-(\\d+)", "\\1", location.string))
  gr = GRanges( IRanges(
start=pmin(start, stop),
end=pmax(start, stop),
names=names(location.string))
, seqnames=sub("^chr{0,1}(.*):.*$", "\\1", location.string) )
  return(gr)
}

Surprisingly the repeated subs are faster than splitting.  Some people,
such as GSNAP author Tom Wu, use the format "chr1:1234..1235", which we
might want to support. The pmin/pmax stuff handles cases where the negative
strand is expressed by flipping start and stop. We might not need that.



Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com

On Fri, Apr 24, 2015 at 11:08 AM, Peter Haverty  wrote:

> Good catch. We'll want the strand in case we need to go back to a GRanges.
> I would make the strand addition optional with the default of FALSE. It's
> nice to have a column of strings you can paste right into a genome browser
> (sorry Michael :-) ).  I often pass my bench collaborators a spreadsheet
> with such a column.
>
> Pete
>
> 
> Peter M. Haverty, Ph.D.
> Genentech, Inc.
> phave...@gene.com
>
> On Fri, Apr 24, 2015 at 10:50 AM, Herv� Pag�s 
> wrote:
>
>> On 04/24/2015 10:21 AM, Michael Lawrence wrote:
>>
>>> Sorry, one more concern, if you're thinking of using as a range key, you
>>> will need the strand, but many use cases might not want the strand on
>>> there. Like for pasting into a genome browser.
>>>
>>
>> What about appending the strand only for GRanges objects that
>> have at least one range that is not on *?
>>
>> setMethod("as.character", "GenomicRanges",
>> function(x)
>> {
>> if (length(x) == 0L)
>> return(character(0))
>> ans <- paste0(seqnames(x), ":", start(x), "-", end(x))
>> if (any(strand(x) != "*"))
>>   ans <- paste0(ans, ":", strand(x))
>> ans
>> }
>> )
>>
>> > as.character(gr)
>>  [1] "chr1:1-10"  "chr2:2-10"  "chr2:3-10"  "chr2:4-10"  "chr1:5-10"
>>  [6] "chr1:6-10"  "chr3:7-10"  "chr3:8-10"  "chr3:9-10"  "chr3:10-10"
>>
>> > strand(gr)[2:3] <- c("-", "+")
>> > as.character(gr)
>>  [1] "chr1:1-10:*"  "chr2:2-10:-"  "chr2:3-10:+"  "chr2:4-10:*"
>> "chr1:5-10:*"
>>  [6] "chr1:6-10:*"  "chr3:7-10:*"  "chr3:8-10:*"  "chr3:9-10:*"
>> "chr3:10-10:*"
>>
>> H.
>>
>>
>>> On Fri, Apr 24, 2015 at 10:18 AM, Michael Lawrence >> > wrote:
>>>
>>> It is a great idea, but I'm not sure I would use it to implement
>>> table(). Allocating those strings will be costly. Don't we already
>>> have the 4-way int hash? Of course, my intuition might be completely
>>> off here.
>>>
>>>
>>> On Fri, Apr 24, 2015 at 9:59 AM, Herv� Pag�s >> > wrote:
>>>
>>> Hi Pete,
>>>
>>> Excellent idea. That will make things like table() work
>>> out-of-the-box
>>> on GenomicRanges objects. I'll add that.
>>>
>>> Thanks,
>>> H.
>>>
>>>
>>>
>>> On 04/24/2015 09:43 AM, Peter Haverty wrote:
>>>
>>> Would people be interested in having this:
>>>
>>> setMethod("as.character", "GenomicRanges",
>>> function(x) {
>>> paste0(seqnames(x), ":", start(x), "-",
>>> end(x))
>>> })
>>>
>>> ?
>>>
>>> I find myself doing that a lot to make unique names or for
>>> output that
>>> goes to collaborators.  I suppose we might want to tack on
>>> the strand if it
>>> isn't "*".  I have some code for going the other direction
>>> too, if there is
>>> interest.
>>>
>>>
>>>
>>> Pete
>>>
>>> 
>>> Peter M. Haverty, Ph.D.
>>> Genentech, Inc.
>>> phave...@gene.com 
>>>
>>>  [[alternative HTML version deleted]]
>>>
>>> ___
>>> Bioc-devel@r-project.org 

Re: [Bioc-devel] as.character method for GenomicRanges?

2015-04-24 Thread Peter Haverty
Those are all good reasons for keeping the strand by default.  I'm on board.

Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com

On Fri, Apr 24, 2015 at 11:26 AM, Herv� Pag�s  wrote:

> On 04/24/2015 11:08 AM, Peter Haverty wrote:
>
>> Good catch. We'll want the strand in case we need to go back to a GRanges.
>> I would make the strand addition optional with the default of FALSE.
>> It's nice to have a column of strings you can paste right into a genome
>> browser (sorry Michael :-) ).  I often pass my bench collaborators a
>> spreadsheet with such a column.
>>
>
> as.character(unstrand(gr)) ?
>
> 3 reasons I'm not too keen about 'ignore.strand=TRUE' being the default:
>
> (1) Many functions and methods in GenomicRanges/GenomicAlignments
> have an 'ignore.strand' argument. For consistency, the default
> value has been set to FALSE everywhere. Note that this was done
> even if this default doesn't reflect the most common use case
> (e.g. summarizeOverlaps).
>
> (2) I think it's good to have the default behavior of as.character()
> allow going back and forth between GRanges and character vector
> without losing the strand information.
>
> (3) The "table" method for Vector would break if as.character was
> ignoring the strand by default. Can be worked-around by
> implementing a method for GenomicRanges objects but...
>
> Hope that makes sense.
>
> H.
>
>
>
>> Pete
>>
>> 
>> Peter M. Haverty, Ph.D.
>> Genentech, Inc.
>> phave...@gene.com 
>>
>> On Fri, Apr 24, 2015 at 10:50 AM, Herv� Pag�s > > wrote:
>>
>> On 04/24/2015 10:21 AM, Michael Lawrence wrote:
>>
>> Sorry, one more concern, if you're thinking of using as a range
>> key, you
>> will need the strand, but many use cases might not want the
>> strand on
>> there. Like for pasting into a genome browser.
>>
>>
>> What about appending the strand only for GRanges objects that
>> have at least one range that is not on *?
>>
>> setMethod("as.character", "GenomicRanges",
>>  function(x)
>>  {
>>  if (length(x) == 0L)
>>  return(character(0))
>>  ans <- paste0(seqnames(x), ":", start(x), "-", end(x))
>>  if (any(strand(x) != "*"))
>>ans <- paste0(ans, ":", strand(x))
>>  ans
>>  }
>> )
>>
>>  > as.character(gr)
>>   [1] "chr1:1-10"  "chr2:2-10"  "chr2:3-10"  "chr2:4-10"  "chr1:5-10"
>>   [6] "chr1:6-10"  "chr3:7-10"  "chr3:8-10"  "chr3:9-10"  "chr3:10-10"
>>
>>  > strand(gr)[2:3] <- c("-", "+")
>>  > as.character(gr)
>>   [1] "chr1:1-10:*"  "chr2:2-10:-"  "chr2:3-10:+"  "chr2:4-10:*"
>> "chr1:5-10:*"
>>   [6] "chr1:6-10:*"  "chr3:7-10:*"  "chr3:8-10:*"  "chr3:9-10:*"
>> "chr3:10-10:*"
>>
>> H.
>>
>>
>> On Fri, Apr 24, 2015 at 10:18 AM, Michael Lawrence
>> mailto:micha...@gene.com>
>> >> wrote:
>>
>>  It is a great idea, but I'm not sure I would use it to
>> implement
>>  table(). Allocating those strings will be costly. Don't we
>> already
>>  have the 4-way int hash? Of course, my intuition might be
>> completely
>>  off here.
>>
>>
>>  On Fri, Apr 24, 2015 at 9:59 AM, Herv� Pag�s
>> mailto:hpa...@fredhutch.org>
>>  >
>> >> wrote:
>>
>>  Hi Pete,
>>
>>  Excellent idea. That will make things like table() work
>>  out-of-the-box
>>  on GenomicRanges objects. I'll add that.
>>
>>  Thanks,
>>  H.
>>
>>
>>
>>  On 04/24/2015 09:43 AM, Peter Haverty wrote:
>>
>>  Would people be interested in having this:
>>
>>  setMethod("as.character", "GenomicRanges",
>>  function(x) {
>>  paste0(seqnames(x), ":", start(x),
>> "-", end(x))
>>  })
>>
>>  ?
>>
>>  I find myself doing that a lot to make unique names
>> or for
>>  output that
>>  goes to collaborators.  I suppose we might want to
>> tack on
>>  the strand if it
>>  isn't "*".  I have some code for going the other
>> direction
>>  too, if there is
>>  interest.
>>
>>
>>
>>  Pete
>>
>>  
>>  Peter M. Haverty, Ph.D.
>>  Genentech, Inc.
>> phave...@gene.com 
>> 

Re: [Bioc-devel] as.character method for GenomicRanges?

2015-04-24 Thread Michael Lawrence
Taking this a bit off topic but it would be nice if we could get the
GRanges equivalent of as.data.frame(table(x)), i.e., unique(x) with a count
mcol. Should be easy to support but what should the API be like?

On Fri, Apr 24, 2015 at 10:54 AM, Hervé Pagès  wrote:

> On 04/24/2015 10:18 AM, Michael Lawrence wrote:
>
>> It is a great idea, but I'm not sure I would use it to implement
>> table(). Allocating those strings will be costly. Don't we already have
>> the 4-way int hash? Of course, my intuition might be completely off here.
>>
>
> It does use the 4-way int hash internally. as.character() is only used
> at the very-end to stick the names on the returned table object.
>
> H.
>
>
>>
>> On Fri, Apr 24, 2015 at 9:59 AM, Hervé Pagès > > wrote:
>>
>> Hi Pete,
>>
>> Excellent idea. That will make things like table() work out-of-the-box
>> on GenomicRanges objects. I'll add that.
>>
>> Thanks,
>> H.
>>
>>
>>
>> On 04/24/2015 09:43 AM, Peter Haverty wrote:
>>
>> Would people be interested in having this:
>>
>> setMethod("as.character", "GenomicRanges",
>> function(x) {
>> paste0(seqnames(x), ":", start(x), "-", end(x))
>> })
>>
>> ?
>>
>> I find myself doing that a lot to make unique names or for
>> output that
>> goes to collaborators.  I suppose we might want to tack on the
>> strand if it
>> isn't "*".  I have some code for going the other direction too,
>> if there is
>> interest.
>>
>>
>>
>> Pete
>>
>> 
>> Peter M. Haverty, Ph.D.
>> Genentech, Inc.
>> phave...@gene.com 
>>
>>  [[alternative HTML version deleted]]
>>
>> ___
>> Bioc-devel@r-project.org 
>> mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpa...@fredhutch.org 
>> Phone: (206) 667-5791 
>> Fax: (206) 667-1319 
>>
>>
>> ___
>> Bioc-devel@r-project.org  mailing
>> list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fredhutch.org
> Phone:  (206) 667-5791
> Fax:(206) 667-1319
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] as.character method for GenomicRanges?

2015-04-27 Thread Hervé Pagès

On 04/24/2015 11:41 AM, Michael Lawrence wrote:

Taking this a bit off topic but it would be nice if we could get the
GRanges equivalent of as.data.frame(table(x)), i.e., unique(x) with a
count mcol. Should be easy to support but what should the API be like?


This was actually the motivating use case for introducing
findMatches/countMatches a couple of years ago:

  ux <- unique(x)
  mcols(ux)$Freq <- countMatches(ux, x)

Don't know what a good API would be to make this even more
straightforward though. Maybe via some extra argument to unique()
e.g. 'with.freq'? This is kind of similar to the 'with.revmap'
argument of reduce(). Note that unique() could also support the
'with.revmap' arg. Once it does, the 'with.freq' arg can also
be implemented by just calling elementLengths() on the "revmap"
metadata column.

H.



On Fri, Apr 24, 2015 at 10:54 AM, Hervé Pagès mailto:hpa...@fredhutch.org>> wrote:

On 04/24/2015 10:18 AM, Michael Lawrence wrote:

It is a great idea, but I'm not sure I would use it to implement
table(). Allocating those strings will be costly. Don't we
already have
the 4-way int hash? Of course, my intuition might be completely
off here.


It does use the 4-way int hash internally. as.character() is only used
at the very-end to stick the names on the returned table object.

H.



On Fri, Apr 24, 2015 at 9:59 AM, Hervé Pagès
mailto:hpa...@fredhutch.org>
>> wrote:

 Hi Pete,

 Excellent idea. That will make things like table() work
out-of-the-box
 on GenomicRanges objects. I'll add that.

 Thanks,
 H.



 On 04/24/2015 09:43 AM, Peter Haverty wrote:

 Would people be interested in having this:

 setMethod("as.character", "GenomicRanges",
 function(x) {
 paste0(seqnames(x), ":", start(x), "-",
end(x))
 })

 ?

 I find myself doing that a lot to make unique names or for
 output that
 goes to collaborators.  I suppose we might want to tack
on the
 strand if it
 isn't "*".  I have some code for going the other
direction too,
 if there is
 interest.



 Pete

 
 Peter M. Haverty, Ph.D.
 Genentech, Inc.
phave...@gene.com 
>

  [[alternative HTML version deleted]]

 ___
Bioc-devel@r-project.org 
>
 mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


 --
 Hervé Pagès

 Program in Computational Biology
 Division of Public Health Sciences
 Fred Hutchinson Cancer Research Center
 1100 Fairview Ave. N, M1-B514
 P.O. Box 19024
 Seattle, WA 98109-1024

 E-mail: hpa...@fredhutch.org 
>
 Phone: (206) 667-5791 

 Fax: (206) 667-1319 



 ___
Bioc-devel@r-project.org 
> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org 
Phone: (206) 667-5791 
Fax: (206) 667-1319 




--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] as.character method for GenomicRanges?

2015-04-27 Thread Michael Lawrence
It would be nice to have a single function call that would hide these
details. It could probably be made more efficient also by avoiding multiple
matching, unnecessary revmap lists, etc. tableAsGRanges() is not a good
name but it conveys what I mean (does that make it actually good?).

On Mon, Apr 27, 2015 at 12:23 PM, Hervé Pagès  wrote:

> On 04/24/2015 11:41 AM, Michael Lawrence wrote:
>
>> Taking this a bit off topic but it would be nice if we could get the
>> GRanges equivalent of as.data.frame(table(x)), i.e., unique(x) with a
>> count mcol. Should be easy to support but what should the API be like?
>>
>
> This was actually the motivating use case for introducing
> findMatches/countMatches a couple of years ago:
>
>   ux <- unique(x)
>   mcols(ux)$Freq <- countMatches(ux, x)
>
> Don't know what a good API would be to make this even more
> straightforward though. Maybe via some extra argument to unique()
> e.g. 'with.freq'? This is kind of similar to the 'with.revmap'
> argument of reduce(). Note that unique() could also support the
> 'with.revmap' arg. Once it does, the 'with.freq' arg can also
> be implemented by just calling elementLengths() on the "revmap"
> metadata column.
>
> H.
>
>
>> On Fri, Apr 24, 2015 at 10:54 AM, Hervé Pagès > > wrote:
>>
>> On 04/24/2015 10:18 AM, Michael Lawrence wrote:
>>
>> It is a great idea, but I'm not sure I would use it to implement
>> table(). Allocating those strings will be costly. Don't we
>> already have
>> the 4-way int hash? Of course, my intuition might be completely
>> off here.
>>
>>
>> It does use the 4-way int hash internally. as.character() is only used
>> at the very-end to stick the names on the returned table object.
>>
>> H.
>>
>>
>>
>> On Fri, Apr 24, 2015 at 9:59 AM, Hervé Pagès
>> mailto:hpa...@fredhutch.org>
>> >>
>> wrote:
>>
>>  Hi Pete,
>>
>>  Excellent idea. That will make things like table() work
>> out-of-the-box
>>  on GenomicRanges objects. I'll add that.
>>
>>  Thanks,
>>  H.
>>
>>
>>
>>  On 04/24/2015 09:43 AM, Peter Haverty wrote:
>>
>>  Would people be interested in having this:
>>
>>  setMethod("as.character", "GenomicRanges",
>>  function(x) {
>>  paste0(seqnames(x), ":", start(x), "-",
>> end(x))
>>  })
>>
>>  ?
>>
>>  I find myself doing that a lot to make unique names or
>> for
>>  output that
>>  goes to collaborators.  I suppose we might want to tack
>> on the
>>  strand if it
>>  isn't "*".  I have some code for going the other
>> direction too,
>>  if there is
>>  interest.
>>
>>
>>
>>  Pete
>>
>>  
>>  Peter M. Haverty, Ph.D.
>>  Genentech, Inc.
>> phave...@gene.com 
>> >
>>
>>   [[alternative HTML version deleted]]
>>
>>  ___
>> Bioc-devel@r-project.org 
>> > >>
>>  mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>>  --
>>  Hervé Pagès
>>
>>  Program in Computational Biology
>>  Division of Public Health Sciences
>>  Fred Hutchinson Cancer Research Center
>>  1100 Fairview Ave. N, M1-B514
>>  P.O. Box 19024
>>  Seattle, WA 98109-1024
>>
>>  E-mail: hpa...@fredhutch.org 
>> >
>>  Phone: (206) 667-5791 
>> 
>>  Fax: (206) 667-1319 
>> 
>>
>>
>>  ___
>> Bioc-devel@r-project.org 
>> > > mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpa...@fredhutch.org 
>> Phone: (206) 667-5791 
>> Fax: (206) 667-1319 
>>
>>
>>
> --
> Hervé Pagès
>
> Program in Compu

Re: [Bioc-devel] as.character method for GenomicRanges?

2015-04-27 Thread Hervé Pagès

On 04/27/2015 02:15 PM, Michael Lawrence wrote:

It would be nice to have a single function call that would hide these
details. It could probably be made more efficient also by avoiding
multiple matching, unnecessary revmap lists, etc. tableAsGRanges() is
not a good name but it conveys what I mean (does that make it actually
good?).


There is nothing specific to GRanges here. We're just reporting the
frequency of unique elements in a metadata column so this belongs to
the "extended" Vector API in the same way that findMatches/countMatches
do.

H.



On Mon, Apr 27, 2015 at 12:23 PM, Hervé Pagès mailto:hpa...@fredhutch.org>> wrote:

On 04/24/2015 11:41 AM, Michael Lawrence wrote:

Taking this a bit off topic but it would be nice if we could get the
GRanges equivalent of as.data.frame(table(x)), i.e., unique(x)
with a
count mcol. Should be easy to support but what should the API be
like?


This was actually the motivating use case for introducing
findMatches/countMatches a couple of years ago:

   ux <- unique(x)
   mcols(ux)$Freq <- countMatches(ux, x)

Don't know what a good API would be to make this even more
straightforward though. Maybe via some extra argument to unique()
e.g. 'with.freq'? This is kind of similar to the 'with.revmap'
argument of reduce(). Note that unique() could also support the
'with.revmap' arg. Once it does, the 'with.freq' arg can also
be implemented by just calling elementLengths() on the "revmap"
metadata column.

H.


On Fri, Apr 24, 2015 at 10:54 AM, Hervé Pagès
mailto:hpa...@fredhutch.org>
>> wrote:

 On 04/24/2015 10:18 AM, Michael Lawrence wrote:

 It is a great idea, but I'm not sure I would use it to
implement
 table(). Allocating those strings will be costly. Don't we
 already have
 the 4-way int hash? Of course, my intuition might be
completely
 off here.


 It does use the 4-way int hash internally. as.character()
is only used
 at the very-end to stick the names on the returned table
object.

 H.



 On Fri, Apr 24, 2015 at 9:59 AM, Hervé Pagès
 mailto:hpa...@fredhutch.org>
>
  
>
 
>>

   [[alternative HTML version deleted]]

  ___
Bioc-devel@r-project.org 
>
 
>>
  mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


  --
  Hervé Pagès

  Program in Computational Biology
  Division of Public Health Sciences
 

Re: [Bioc-devel] as.character method for GenomicRanges?

2015-09-15 Thread Michael Lawrence
Did this as.character method ever get added? It was a good idea, and we
should add it even though we haven't figured out the table stuff yet. It's
fine if it appends the strand whenever there is at least one range with +/-.

Michael

On Mon, Apr 27, 2015 at 2:23 PM, Hervé Pagès  wrote:

> On 04/27/2015 02:15 PM, Michael Lawrence wrote:
>
>> It would be nice to have a single function call that would hide these
>> details. It could probably be made more efficient also by avoiding
>> multiple matching, unnecessary revmap lists, etc. tableAsGRanges() is
>> not a good name but it conveys what I mean (does that make it actually
>> good?).
>>
>
> There is nothing specific to GRanges here. We're just reporting the
> frequency of unique elements in a metadata column so this belongs to
> the "extended" Vector API in the same way that findMatches/countMatches
> do.
>
> H.
>
>
>> On Mon, Apr 27, 2015 at 12:23 PM, Hervé Pagès > > wrote:
>>
>> On 04/24/2015 11:41 AM, Michael Lawrence wrote:
>>
>> Taking this a bit off topic but it would be nice if we could get
>> the
>> GRanges equivalent of as.data.frame(table(x)), i.e., unique(x)
>> with a
>> count mcol. Should be easy to support but what should the API be
>> like?
>>
>>
>> This was actually the motivating use case for introducing
>> findMatches/countMatches a couple of years ago:
>>
>>ux <- unique(x)
>>mcols(ux)$Freq <- countMatches(ux, x)
>>
>> Don't know what a good API would be to make this even more
>> straightforward though. Maybe via some extra argument to unique()
>> e.g. 'with.freq'? This is kind of similar to the 'with.revmap'
>> argument of reduce(). Note that unique() could also support the
>> 'with.revmap' arg. Once it does, the 'with.freq' arg can also
>> be implemented by just calling elementLengths() on the "revmap"
>> metadata column.
>>
>> H.
>>
>>
>> On Fri, Apr 24, 2015 at 10:54 AM, Hervé Pagès
>> mailto:hpa...@fredhutch.org>
>> >>
>> wrote:
>>
>>  On 04/24/2015 10:18 AM, Michael Lawrence wrote:
>>
>>  It is a great idea, but I'm not sure I would use it to
>> implement
>>  table(). Allocating those strings will be costly. Don't
>> we
>>  already have
>>  the 4-way int hash? Of course, my intuition might be
>> completely
>>  off here.
>>
>>
>>  It does use the 4-way int hash internally. as.character()
>> is only used
>>  at the very-end to stick the names on the returned table
>> object.
>>
>>  H.
>>
>>
>>
>>  On Fri, Apr 24, 2015 at 9:59 AM, Hervé Pagès
>>  mailto:hpa...@fredhutch.org>
>> >
>>  >  > >
>>   Hi Pete,
>>
>>   Excellent idea. That will make things like table()
>> work
>>  out-of-the-box
>>   on GenomicRanges objects. I'll add that.
>>
>>   Thanks,
>>   H.
>>
>>
>>
>>   On 04/24/2015 09:43 AM, Peter Haverty wrote:
>>
>>   Would people be interested in having this:
>>
>>   setMethod("as.character", "GenomicRanges",
>>   function(x) {
>>   paste0(seqnames(x), ":",
>> start(x), "-",
>>  end(x))
>>   })
>>
>>   ?
>>
>>   I find myself doing that a lot to make unique
>> names or for
>>   output that
>>   goes to collaborators.  I suppose we might
>> want to tack
>>  on the
>>   strand if it
>>   isn't "*".  I have some code for going the other
>>  direction too,
>>   if there is
>>   interest.
>>
>>
>>
>>   Pete
>>
>>   
>>   Peter M. Haverty, Ph.D.
>>   Genentech, Inc.
>> phave...@gene.com 
>> >
>>  
>> >>
>>
>>[[alternative HTML version deleted]]
>>
>>   ___

Re: [Bioc-devel] as.character method for GenomicRanges?

2015-09-15 Thread Michael Love
+1

I was in need of this function yesterday and generally about once a week,
when looking up ranges in IGV or UCSC.

On Tue, Sep 15, 2015 at 12:59 PM, Michael Lawrence <
lawrence.mich...@gene.com> wrote:

> Did this as.character method ever get added? It was a good idea, and we
> should add it even though we haven't figured out the table stuff yet. It's
> fine if it appends the strand whenever there is at least one range with
> +/-.
>
> Michael
>
> On Mon, Apr 27, 2015 at 2:23 PM, Hervé Pagès  wrote:
>
> > On 04/27/2015 02:15 PM, Michael Lawrence wrote:
> >
> >> It would be nice to have a single function call that would hide these
> >> details. It could probably be made more efficient also by avoiding
> >> multiple matching, unnecessary revmap lists, etc. tableAsGRanges() is
> >> not a good name but it conveys what I mean (does that make it actually
> >> good?).
> >>
> >
> > There is nothing specific to GRanges here. We're just reporting the
> > frequency of unique elements in a metadata column so this belongs to
> > the "extended" Vector API in the same way that findMatches/countMatches
> > do.
> >
> > H.
> >
> >
> >> On Mon, Apr 27, 2015 at 12:23 PM, Hervé Pagès  >> > wrote:
> >>
> >> On 04/24/2015 11:41 AM, Michael Lawrence wrote:
> >>
> >> Taking this a bit off topic but it would be nice if we could get
> >> the
> >> GRanges equivalent of as.data.frame(table(x)), i.e., unique(x)
> >> with a
> >> count mcol. Should be easy to support but what should the API be
> >> like?
> >>
> >>
> >> This was actually the motivating use case for introducing
> >> findMatches/countMatches a couple of years ago:
> >>
> >>ux <- unique(x)
> >>mcols(ux)$Freq <- countMatches(ux, x)
> >>
> >> Don't know what a good API would be to make this even more
> >> straightforward though. Maybe via some extra argument to unique()
> >> e.g. 'with.freq'? This is kind of similar to the 'with.revmap'
> >> argument of reduce(). Note that unique() could also support the
> >> 'with.revmap' arg. Once it does, the 'with.freq' arg can also
> >> be implemented by just calling elementLengths() on the "revmap"
> >> metadata column.
> >>
> >> H.
> >>
> >>
> >> On Fri, Apr 24, 2015 at 10:54 AM, Hervé Pagès
> >> mailto:hpa...@fredhutch.org>
> >> >>
> >> wrote:
> >>
> >>  On 04/24/2015 10:18 AM, Michael Lawrence wrote:
> >>
> >>  It is a great idea, but I'm not sure I would use it to
> >> implement
> >>  table(). Allocating those strings will be costly. Don't
> >> we
> >>  already have
> >>  the 4-way int hash? Of course, my intuition might be
> >> completely
> >>  off here.
> >>
> >>
> >>  It does use the 4-way int hash internally. as.character()
> >> is only used
> >>  at the very-end to stick the names on the returned table
> >> object.
> >>
> >>  H.
> >>
> >>
> >>
> >>  On Fri, Apr 24, 2015 at 9:59 AM, Hervé Pagès
> >>  mailto:hpa...@fredhutch.org>
> >> >
> >>   >>   >>  >>
> >>   Hi Pete,
> >>
> >>   Excellent idea. That will make things like table()
> >> work
> >>  out-of-the-box
> >>   on GenomicRanges objects. I'll add that.
> >>
> >>   Thanks,
> >>   H.
> >>
> >>
> >>
> >>   On 04/24/2015 09:43 AM, Peter Haverty wrote:
> >>
> >>   Would people be interested in having this:
> >>
> >>   setMethod("as.character", "GenomicRanges",
> >>   function(x) {
> >>   paste0(seqnames(x), ":",
> >> start(x), "-",
> >>  end(x))
> >>   })
> >>
> >>   ?
> >>
> >>   I find myself doing that a lot to make unique
> >> names or for
> >>   output that
> >>   goes to collaborators.  I suppose we might
> >> want to tack
> >>  on the
> >>   strand if it
> >>   isn't "*".  I have some code for going the
> other
> >>  direction too,
> >>   if there is
> >>   interest.
> >>
> >>
> >>
> >>   Pete
> >>
> >>   
> >>

Re: [Bioc-devel] as.character method for GenomicRanges?

2015-09-15 Thread Hervé Pagès

Hi Michael and Michael,

It's on its way. Probably before the end of the week. Thanks for the
reminder!

H.

On 09/15/2015 10:07 AM, Michael Love wrote:

+1

I was in need of this function yesterday and generally about once a
week, when looking up ranges in IGV or UCSC.

On Tue, Sep 15, 2015 at 12:59 PM, Michael Lawrence
mailto:lawrence.mich...@gene.com>> wrote:

Did this as.character method ever get added? It was a good idea, and we
should add it even though we haven't figured out the table stuff
yet. It's
fine if it appends the strand whenever there is at least one range
with +/-.

Michael

On Mon, Apr 27, 2015 at 2:23 PM, Hervé Pagès mailto:hpa...@fredhutch.org>> wrote:

 > On 04/27/2015 02:15 PM, Michael Lawrence wrote:
 >
 >> It would be nice to have a single function call that would hide
these
 >> details. It could probably be made more efficient also by avoiding
 >> multiple matching, unnecessary revmap lists, etc.
tableAsGRanges() is
 >> not a good name but it conveys what I mean (does that make it
actually
 >> good?).
 >>
 >
 > There is nothing specific to GRanges here. We're just reporting the
 > frequency of unique elements in a metadata column so this belongs to
 > the "extended" Vector API in the same way that
findMatches/countMatches
 > do.
 >
 > H.
 >
 >
 >> On Mon, Apr 27, 2015 at 12:23 PM, Hervé Pagès
mailto:hpa...@fredhutch.org>
 >> >> wrote:
 >>
 >> On 04/24/2015 11:41 AM, Michael Lawrence wrote:
 >>
 >> Taking this a bit off topic but it would be nice if we
could get
 >> the
 >> GRanges equivalent of as.data.frame(table(x)), i.e.,
unique(x)
 >> with a
 >> count mcol. Should be easy to support but what should
the API be
 >> like?
 >>
 >>
 >> This was actually the motivating use case for introducing
 >> findMatches/countMatches a couple of years ago:
 >>
 >>ux <- unique(x)
 >>mcols(ux)$Freq <- countMatches(ux, x)
 >>
 >> Don't know what a good API would be to make this even more
 >> straightforward though. Maybe via some extra argument to
unique()
 >> e.g. 'with.freq'? This is kind of similar to the 'with.revmap'
 >> argument of reduce(). Note that unique() could also support the
 >> 'with.revmap' arg. Once it does, the 'with.freq' arg can also
 >> be implemented by just calling elementLengths() on the "revmap"
 >> metadata column.
 >>
 >> H.
 >>
 >>
 >> On Fri, Apr 24, 2015 at 10:54 AM, Hervé Pagès
 >> mailto:hpa...@fredhutch.org>
>
 >>  > wrote:
 >>
 >>  On 04/24/2015 10:18 AM, Michael Lawrence wrote:
 >>
 >>  It is a great idea, but I'm not sure I would
use it to
 >> implement
 >>  table(). Allocating those strings will be
costly. Don't
 >> we
 >>  already have
 >>  the 4-way int hash? Of course, my intuition
might be
 >> completely
 >>  off here.
 >>
 >>
 >>  It does use the 4-way int hash internally.
as.character()
 >> is only used
 >>  at the very-end to stick the names on the returned
table
 >> object.
 >>
 >>  H.
 >>
 >>
 >>
 >>  On Fri, Apr 24, 2015 at 9:59 AM, Hervé Pagès
 >>  mailto:hpa...@fredhutch.org> >
 >>  >>
 >>  
 >> > 
 >>  wrote:
 >>
 >>   Hi Pete,
 >>
 >>   Excellent idea. That will make things like
table()
 >> work
 >>  out-of-the-box
 >>   on GenomicRanges objects. I'll add that.
 >>
 >>   Thanks,
 >>   H.
 >>
 >>
 >>
 >>   On 04/24/2015 09:43 AM, Peter Haverty wrote:
 >>
 >>   Would people be interested in having thi

Re: [Bioc-devel] as.character method for GenomicRanges?

2015-09-15 Thread Tim Triche, Jr.
rad, I was just about to say, "hey, I've written that function in 3
different places, I think I could send in a patch..." and then of course
Herve is on it.

I need to remember to leave things to the professionals...

--t

On Tue, Sep 15, 2015 at 10:38 AM, Hervé Pagès  wrote:

> Hi Michael and Michael,
>
> It's on its way. Probably before the end of the week. Thanks for the
> reminder!
>
> H.
>
> On 09/15/2015 10:07 AM, Michael Love wrote:
>
>> +1
>>
>> I was in need of this function yesterday and generally about once a
>> week, when looking up ranges in IGV or UCSC.
>>
>> On Tue, Sep 15, 2015 at 12:59 PM, Michael Lawrence
>> mailto:lawrence.mich...@gene.com>> wrote:
>>
>> Did this as.character method ever get added? It was a good idea, and
>> we
>> should add it even though we haven't figured out the table stuff
>> yet. It's
>> fine if it appends the strand whenever there is at least one range
>> with +/-.
>>
>> Michael
>>
>> On Mon, Apr 27, 2015 at 2:23 PM, Hervé Pagès > > wrote:
>>
>>  > On 04/27/2015 02:15 PM, Michael Lawrence wrote:
>>  >
>>  >> It would be nice to have a single function call that would hide
>> these
>>  >> details. It could probably be made more efficient also by avoiding
>>  >> multiple matching, unnecessary revmap lists, etc.
>> tableAsGRanges() is
>>  >> not a good name but it conveys what I mean (does that make it
>> actually
>>  >> good?).
>>  >>
>>  >
>>  > There is nothing specific to GRanges here. We're just reporting the
>>  > frequency of unique elements in a metadata column so this belongs
>> to
>>  > the "extended" Vector API in the same way that
>> findMatches/countMatches
>>  > do.
>>  >
>>  > H.
>>  >
>>  >
>>  >> On Mon, Apr 27, 2015 at 12:23 PM, Hervé Pagès
>> mailto:hpa...@fredhutch.org>
>>  >> >>
>> wrote:
>>  >>
>>  >> On 04/24/2015 11:41 AM, Michael Lawrence wrote:
>>  >>
>>  >> Taking this a bit off topic but it would be nice if we
>> could get
>>  >> the
>>  >> GRanges equivalent of as.data.frame(table(x)), i.e.,
>> unique(x)
>>  >> with a
>>  >> count mcol. Should be easy to support but what should
>> the API be
>>  >> like?
>>  >>
>>  >>
>>  >> This was actually the motivating use case for introducing
>>  >> findMatches/countMatches a couple of years ago:
>>  >>
>>  >>ux <- unique(x)
>>  >>mcols(ux)$Freq <- countMatches(ux, x)
>>  >>
>>  >> Don't know what a good API would be to make this even more
>>  >> straightforward though. Maybe via some extra argument to
>> unique()
>>  >> e.g. 'with.freq'? This is kind of similar to the 'with.revmap'
>>  >> argument of reduce(). Note that unique() could also support
>> the
>>  >> 'with.revmap' arg. Once it does, the 'with.freq' arg can also
>>  >> be implemented by just calling elementLengths() on the
>> "revmap"
>>  >> metadata column.
>>  >>
>>  >> H.
>>  >>
>>  >>
>>  >> On Fri, Apr 24, 2015 at 10:54 AM, Hervé Pagès
>>  >> mailto:hpa...@fredhutch.org>
>> >
>>  >> >  > >  >> wrote:
>>  >>
>>  >>  On 04/24/2015 10:18 AM, Michael Lawrence wrote:
>>  >>
>>  >>  It is a great idea, but I'm not sure I would
>> use it to
>>  >> implement
>>  >>  table(). Allocating those strings will be
>> costly. Don't
>>  >> we
>>  >>  already have
>>  >>  the 4-way int hash? Of course, my intuition
>> might be
>>  >> completely
>>  >>  off here.
>>  >>
>>  >>
>>  >>  It does use the 4-way int hash internally.
>> as.character()
>>  >> is only used
>>  >>  at the very-end to stick the names on the returned
>> table
>>  >> object.
>>  >>
>>  >>  H.
>>  >>
>>  >>
>>  >>
>>  >>  On Fri, Apr 24, 2015 at 9:59 AM, Hervé Pagès
>>  >>  >  > >
>>  >> >  > >>
>>  >>  > 
>>  >> > > 

Re: [Bioc-devel] as.character method for GenomicRanges?

2015-09-17 Thread Hervé Pagès

Done. (see Coercion section in ?GRanges)

Thanks for trusting the professionals.

H.

On 09/15/2015 10:51 AM, Tim Triche, Jr. wrote:

rad, I was just about to say, "hey, I've written that function in 3
different places, I think I could send in a patch..." and then of course
Herve is on it.

I need to remember to leave things to the professionals...

--t

On Tue, Sep 15, 2015 at 10:38 AM, Hervé Pagès mailto:hpa...@fredhutch.org>> wrote:

Hi Michael and Michael,

It's on its way. Probably before the end of the week. Thanks for the
reminder!

H.

On 09/15/2015 10:07 AM, Michael Love wrote:

+1

I was in need of this function yesterday and generally about once a
week, when looking up ranges in IGV or UCSC.

On Tue, Sep 15, 2015 at 12:59 PM, Michael Lawrence
mailto:lawrence.mich...@gene.com>
>> wrote:

 Did this as.character method ever get added? It was a good
idea, and we
 should add it even though we haven't figured out the table
stuff
 yet. It's
 fine if it appends the strand whenever there is at least
one range
 with +/-.

 Michael

 On Mon, Apr 27, 2015 at 2:23 PM, Hervé Pagès
mailto:hpa...@fredhutch.org>
 >> wrote:

  > On 04/27/2015 02:15 PM, Michael Lawrence wrote:
  >
  >> It would be nice to have a single function call that
would hide
 these
  >> details. It could probably be made more efficient also
by avoiding
  >> multiple matching, unnecessary revmap lists, etc.
 tableAsGRanges() is
  >> not a good name but it conveys what I mean (does that
make it
 actually
  >> good?).
  >>
  >
  > There is nothing specific to GRanges here. We're just
reporting the
  > frequency of unique elements in a metadata column so
this belongs to
  > the "extended" Vector API in the same way that
 findMatches/countMatches
  > do.
  >
  > H.
  >
  >
  >> On Mon, Apr 27, 2015 at 12:23 PM, Hervé Pagès
 mailto:hpa...@fredhutch.org>
>
  >>  >
  >> On 04/24/2015 11:41 AM, Michael Lawrence wrote:
  >>
  >> Taking this a bit off topic but it would be
nice if we
 could get
  >> the
  >> GRanges equivalent of as.data.frame(table(x)),
i.e.,
 unique(x)
  >> with a
  >> count mcol. Should be easy to support but what
should
 the API be
  >> like?
  >>
  >>
  >> This was actually the motivating use case for
introducing
  >> findMatches/countMatches a couple of years ago:
  >>
  >>ux <- unique(x)
  >>mcols(ux)$Freq <- countMatches(ux, x)
  >>
  >> Don't know what a good API would be to make this
even more
  >> straightforward though. Maybe via some extra
argument to
 unique()
  >> e.g. 'with.freq'? This is kind of similar to the
'with.revmap'
  >> argument of reduce(). Note that unique() could also
support the
  >> 'with.revmap' arg. Once it does, the 'with.freq'
arg can also
  >> be implemented by just calling elementLengths() on
the "revmap"
  >> metadata column.
  >>
  >> H.
  >>
  >>
  >> On Fri, Apr 24, 2015 at 10:54 AM, Hervé Pagès
  >> mailto:hpa...@fredhutch.org> >
 
>>
  >> 
 >

 
  >> wrote:
  >>
  >>  On 04/24/2015 10:18 AM, Michael Law