Going the other way can look like this:

##' Parse one or more location strings and return as a GRanges


##' Parse one or more location strings and return as a GRanges. GRanges
will get the names from the location.strings.

##' @param location.string character

##' @export

##' @return GRanges

##' @family location strings

locstring2GRanges <- function(location.string) {

  #####  Take a location string, "chr11:123-127" or "11:123..456 +" and
return a list with chr, start, end elements

  location.string = sub("\\s+","",location.string)
  location.string = sub(",","",location.string)
  #location.string = sub("\\.\\.","-",location.string)  # TWU style
location strings

  if (any(! grepl("^(chr){0,1}.+:\\d+-\\d+$", location.string))) {
stop("Some location strings do not look like chr1:123-456.") }
  start = as.integer(sub("^.+:(\\d+)-.+$", "\\1", location.string))
  stop = as.integer(sub("^.+-(\\d+)", "\\1", location.string))
  gr = GRanges( IRanges(
    start=pmin(start, stop),
    end=pmax(start, stop),
    , seqnames=sub("^chr{0,1}(.*):.*$", "\\1", location.string) )

Surprisingly the repeated subs are faster than splitting.  Some people,
such as GSNAP author Tom Wu, use the format "chr1:1234..1235", which we
might want to support. The pmin/pmax stuff handles cases where the negative
strand is expressed by flipping start and stop. We might not need that.


Peter M. Haverty, Ph.D.
Genentech, Inc.

On Fri, Apr 24, 2015 at 11:08 AM, Peter Haverty <phave...@gene.com> wrote:

> Good catch. We'll want the strand in case we need to go back to a GRanges.
> I would make the strand addition optional with the default of FALSE. It's
> nice to have a column of strings you can paste right into a genome browser
> (sorry Michael :-) ).  I often pass my bench collaborators a spreadsheet
> with such a column.
> Pete
> ____________________
> Peter M. Haverty, Ph.D.
> Genentech, Inc.
> phave...@gene.com
> On Fri, Apr 24, 2015 at 10:50 AM, Herv� Pag�s <hpa...@fredhutch.org>
> wrote:
>> On 04/24/2015 10:21 AM, Michael Lawrence wrote:
>>> Sorry, one more concern, if you're thinking of using as a range key, you
>>> will need the strand, but many use cases might not want the strand on
>>> there. Like for pasting into a genome browser.
>> What about appending the strand only for GRanges objects that
>> have at least one range that is not on *?
>> setMethod("as.character", "GenomicRanges",
>>     function(x)
>>     {
>>         if (length(x) == 0L)
>>             return(character(0))
>>         ans <- paste0(seqnames(x), ":", start(x), "-", end(x))
>>         if (any(strand(x) != "*"))
>>               ans <- paste0(ans, ":", strand(x))
>>         ans
>>     }
>> )
>> > as.character(gr)
>>  [1] "chr1:1-10"  "chr2:2-10"  "chr2:3-10"  "chr2:4-10"  "chr1:5-10"
>>  [6] "chr1:6-10"  "chr3:7-10"  "chr3:8-10"  "chr3:9-10"  "chr3:10-10"
>> > strand(gr)[2:3] <- c("-", "+")
>> > as.character(gr)
>>  [1] "chr1:1-10:*"  "chr2:2-10:-"  "chr2:3-10:+"  "chr2:4-10:*"
>> "chr1:5-10:*"
>>  [6] "chr1:6-10:*"  "chr3:7-10:*"  "chr3:8-10:*"  "chr3:9-10:*"
>> "chr3:10-10:*"
>> H.
>>> On Fri, Apr 24, 2015 at 10:18 AM, Michael Lawrence <micha...@gene.com
>>> <mailto:micha...@gene.com>> wrote:
>>>     It is a great idea, but I'm not sure I would use it to implement
>>>     table(). Allocating those strings will be costly. Don't we already
>>>     have the 4-way int hash? Of course, my intuition might be completely
>>>     off here.
>>>     On Fri, Apr 24, 2015 at 9:59 AM, Herv� Pag�s <hpa...@fredhutch.org
>>>     <mailto:hpa...@fredhutch.org>> wrote:
>>>         Hi Pete,
>>>         Excellent idea. That will make things like table() work
>>>         out-of-the-box
>>>         on GenomicRanges objects. I'll add that.
>>>         Thanks,
>>>         H.
>>>         On 04/24/2015 09:43 AM, Peter Haverty wrote:
>>>             Would people be interested in having this:
>>>             setMethod("as.character", "GenomicRanges",
>>>                         function(x) {
>>>                             paste0(seqnames(x), ":", start(x), "-",
>>> end(x))
>>>                         })
>>>             ?
>>>             I find myself doing that a lot to make unique names or for
>>>             output that
>>>             goes to collaborators.  I suppose we might want to tack on
>>>             the strand if it
>>>             isn't "*".  I have some code for going the other direction
>>>             too, if there is
>>>             interest.
>>>             Pete
>>>             ____________________
>>>             Peter M. Haverty, Ph.D.
>>>             Genentech, Inc.
>>>             phave...@gene.com <mailto:phave...@gene.com>
>>>                      [[alternative HTML version deleted]]
>>>             _______________________________________________
>>>             Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
>>>             mailing list
>>>             https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>         --
>>>         Herv� Pag�s
>>>         Program in Computational Biology
>>>         Division of Public Health Sciences
>>>         Fred Hutchinson Cancer Research Center
>>>         1100 Fairview Ave. N, M1-B514
>>>         P.O. Box 19024
>>>         Seattle, WA 98109-1024
>>>         E-mail: hpa...@fredhutch.org <mailto:hpa...@fredhutch.org>
>>>         Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>>>         Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>>>         _______________________________________________
>>>         Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
>>>         mailing list
>>>         https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> --
>> Herv� Pag�s
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>> E-mail: hpa...@fredhutch.org
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319

        [[alternative HTML version deleted]]

Bioc-devel@r-project.org mailing list

Reply via email to