Going the other way can look like this: ##' Parse one or more location strings and return as a GRanges
##' ##' Parse one or more location strings and return as a GRanges. GRanges will get the names from the location.strings. ##' @param location.string character ##' @export ##' @return GRanges ##' @family location strings locstring2GRanges <- function(location.string) { ##### Take a location string, "chr11:123-127" or "11:123..456 +" and return a list with chr, start, end elements location.string = sub("\\s+","",location.string) location.string = sub(",","",location.string) #location.string = sub("\\.\\.","-",location.string) # TWU style location strings if (any(! grepl("^(chr){0,1}.+:\\d+-\\d+$", location.string))) { stop("Some location strings do not look like chr1:123-456.") } start = as.integer(sub("^.+:(\\d+)-.+$", "\\1", location.string)) stop = as.integer(sub("^.+-(\\d+)", "\\1", location.string)) gr = GRanges( IRanges( start=pmin(start, stop), end=pmax(start, stop), names=names(location.string)) , seqnames=sub("^chr{0,1}(.*):.*$", "\\1", location.string) ) return(gr) } Surprisingly the repeated subs are faster than splitting. Some people, such as GSNAP author Tom Wu, use the format "chr1:1234..1235", which we might want to support. The pmin/pmax stuff handles cases where the negative strand is expressed by flipping start and stop. We might not need that. Pete ____________________ Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Fri, Apr 24, 2015 at 11:08 AM, Peter Haverty <phave...@gene.com> wrote: > Good catch. We'll want the strand in case we need to go back to a GRanges. > I would make the strand addition optional with the default of FALSE. It's > nice to have a column of strings you can paste right into a genome browser > (sorry Michael :-) ). I often pass my bench collaborators a spreadsheet > with such a column. > > Pete > > ____________________ > Peter M. Haverty, Ph.D. > Genentech, Inc. > phave...@gene.com > > On Fri, Apr 24, 2015 at 10:50 AM, Herv� Pag�s <hpa...@fredhutch.org> > wrote: > >> On 04/24/2015 10:21 AM, Michael Lawrence wrote: >> >>> Sorry, one more concern, if you're thinking of using as a range key, you >>> will need the strand, but many use cases might not want the strand on >>> there. Like for pasting into a genome browser. >>> >> >> What about appending the strand only for GRanges objects that >> have at least one range that is not on *? >> >> setMethod("as.character", "GenomicRanges", >> function(x) >> { >> if (length(x) == 0L) >> return(character(0)) >> ans <- paste0(seqnames(x), ":", start(x), "-", end(x)) >> if (any(strand(x) != "*")) >> ans <- paste0(ans, ":", strand(x)) >> ans >> } >> ) >> >> > as.character(gr) >> [1] "chr1:1-10" "chr2:2-10" "chr2:3-10" "chr2:4-10" "chr1:5-10" >> [6] "chr1:6-10" "chr3:7-10" "chr3:8-10" "chr3:9-10" "chr3:10-10" >> >> > strand(gr)[2:3] <- c("-", "+") >> > as.character(gr) >> [1] "chr1:1-10:*" "chr2:2-10:-" "chr2:3-10:+" "chr2:4-10:*" >> "chr1:5-10:*" >> [6] "chr1:6-10:*" "chr3:7-10:*" "chr3:8-10:*" "chr3:9-10:*" >> "chr3:10-10:*" >> >> H. >> >> >>> On Fri, Apr 24, 2015 at 10:18 AM, Michael Lawrence <micha...@gene.com >>> <mailto:micha...@gene.com>> wrote: >>> >>> It is a great idea, but I'm not sure I would use it to implement >>> table(). Allocating those strings will be costly. Don't we already >>> have the 4-way int hash? Of course, my intuition might be completely >>> off here. >>> >>> >>> On Fri, Apr 24, 2015 at 9:59 AM, Herv� Pag�s <hpa...@fredhutch.org >>> <mailto:hpa...@fredhutch.org>> wrote: >>> >>> Hi Pete, >>> >>> Excellent idea. That will make things like table() work >>> out-of-the-box >>> on GenomicRanges objects. I'll add that. >>> >>> Thanks, >>> H. >>> >>> >>> >>> On 04/24/2015 09:43 AM, Peter Haverty wrote: >>> >>> Would people be interested in having this: >>> >>> setMethod("as.character", "GenomicRanges", >>> function(x) { >>> paste0(seqnames(x), ":", start(x), "-", >>> end(x)) >>> }) >>> >>> ? >>> >>> I find myself doing that a lot to make unique names or for >>> output that >>> goes to collaborators. I suppose we might want to tack on >>> the strand if it >>> isn't "*". I have some code for going the other direction >>> too, if there is >>> interest. >>> >>> >>> >>> Pete >>> >>> ____________________ >>> Peter M. Haverty, Ph.D. >>> Genentech, Inc. >>> phave...@gene.com <mailto:phave...@gene.com> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> >>> mailing list >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>> >>> >>> -- >>> Herv� Pag�s >>> >>> Program in Computational Biology >>> Division of Public Health Sciences >>> Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N, M1-B514 >>> P.O. Box 19024 >>> Seattle, WA 98109-1024 >>> >>> E-mail: hpa...@fredhutch.org <mailto:hpa...@fredhutch.org> >>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> >>> Fax: (206) 667-1319 <tel:%28206%29%20667-1319> >>> >>> >>> _______________________________________________ >>> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> >>> mailing list >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>> >>> >>> >>> >> -- >> Herv� Pag�s >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpa...@fredhutch.org >> Phone: (206) 667-5791 >> Fax: (206) 667-1319 >> > > [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel