Re: [Bioc-devel] Interoperability between DataFrame and dplyr?
Lists are vectors, but as long as they aren't doing too much C++, or assert specific types, it should work. Functions like length(), names(), [, etc dispatch to both S3 and S4. On Thu, Apr 23, 2015 at 6:11 PM, Ryan Thompson r...@thompsonclan.org wrote: It looks like dplyr has support for base R lists as columns in data frames: http://cran.r-project.org/web/packages/dplyr/vignettes/data_frames.html Hopefully that feature is flexible enough to accommodate anything that looks sufficiently like a vector. Is love to give this a try, but I can't find anything on what specifically needs to be implemented for a new dplyr backend. In theory, probably not that hard. DataFrame implements methods on primitive and S3 generics, so even the darkest shadows of the S3 world will dispatch correctly on those. One potential roadblock is that dplyr may assume that all columns are base R vectors, which would obviously fail for stuff like Rle. Ideally, the data.frame implementation of dplyr has restricted itself to some subset of the base R API, without diving down into C++, except through dispatch. But that might be too idealistic. Good luck, I'm interested to see what you come up with. Michael On Thu, Apr 23, 2015 at 4:06 PM, Ryan C. Thompson r...@thompsonclan.org wrote: Hi all, So, dplyr is a pretty cool thing, but it currently works with data.frame and data.table, but not S4Vectors::DataFrame. I'd like to change that if possible, and I assume that this would simply involve writing some glue code. However, I'm not really sure where to start, and I expect things might be complicated because dplyr uses S3 and S4Vectors uses S4. Can anyone offer any pointers? -Ryan ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Interoperability between DataFrame and dplyr?
Sure, but the way DataFrame is flexible is by relying on two abstractions in base R. Just length() and '['. If dplyr does the same thing, which seems totally reasonable, everything should work the same. On Thu, Apr 23, 2015 at 4:32 PM, Vincent Carey st...@channing.harvard.edu wrote: Seems to me that DataFrame is too flexible -- you can have very complex objects in the columns (anything that inherits from Vector) with which, in its current state, dplyr would not work too naturally. You would wind up doing a fair amount of coercion of such entities, so it seems to me that arranging a coercion of DataFrames satisfying specific conditions to data.frame would be a path of low resistance. Ready to be corrected of course. On Thu, Apr 23, 2015 at 7:06 PM, Ryan C. Thompson r...@thompsonclan.org wrote: Hi all, So, dplyr is a pretty cool thing, but it currently works with data.frame and data.table, but not S4Vectors::DataFrame. I'd like to change that if possible, and I assume that this would simply involve writing some glue code. However, I'm not really sure where to start, and I expect things might be complicated because dplyr uses S3 and S4Vectors uses S4. Can anyone offer any pointers? -Ryan ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] as.character method for GenomicRanges?
Good catch. We'll want the strand in case we need to go back to a GRanges. I would make the strand addition optional with the default of FALSE. It's nice to have a column of strings you can paste right into a genome browser (sorry Michael :-) ). I often pass my bench collaborators a spreadsheet with such a column. Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Fri, Apr 24, 2015 at 10:50 AM, Herv� Pag�s hpa...@fredhutch.org wrote: On 04/24/2015 10:21 AM, Michael Lawrence wrote: Sorry, one more concern, if you're thinking of using as a range key, you will need the strand, but many use cases might not want the strand on there. Like for pasting into a genome browser. What about appending the strand only for GRanges objects that have at least one range that is not on *? setMethod(as.character, GenomicRanges, function(x) { if (length(x) == 0L) return(character(0)) ans - paste0(seqnames(x), :, start(x), -, end(x)) if (any(strand(x) != *)) ans - paste0(ans, :, strand(x)) ans } ) as.character(gr) [1] chr1:1-10 chr2:2-10 chr2:3-10 chr2:4-10 chr1:5-10 [6] chr1:6-10 chr3:7-10 chr3:8-10 chr3:9-10 chr3:10-10 strand(gr)[2:3] - c(-, +) as.character(gr) [1] chr1:1-10:* chr2:2-10:- chr2:3-10:+ chr2:4-10:* chr1:5-10:* [6] chr1:6-10:* chr3:7-10:* chr3:8-10:* chr3:9-10:* chr3:10-10:* H. On Fri, Apr 24, 2015 at 10:18 AM, Michael Lawrence micha...@gene.com mailto:micha...@gene.com wrote: It is a great idea, but I'm not sure I would use it to implement table(). Allocating those strings will be costly. Don't we already have the 4-way int hash? Of course, my intuition might be completely off here. On Fri, Apr 24, 2015 at 9:59 AM, Herv� Pag�s hpa...@fredhutch.org mailto:hpa...@fredhutch.org wrote: Hi Pete, Excellent idea. That will make things like table() work out-of-the-box on GenomicRanges objects. I'll add that. Thanks, H. On 04/24/2015 09:43 AM, Peter Haverty wrote: Would people be interested in having this: setMethod(as.character, GenomicRanges, function(x) { paste0(seqnames(x), :, start(x), -, end(x)) }) ? I find myself doing that a lot to make unique names or for output that goes to collaborators. I suppose we might want to tack on the strand if it isn't *. I have some code for going the other direction too, if there is interest. Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com mailto:phave...@gene.com [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Herv� Pag�s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org mailto:hpa...@fredhutch.org Phone: (206) 667-5791 tel:%28206%29%20667-5791 Fax: (206) 667-1319 tel:%28206%29%20667-1319 ___ Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Herv� Pag�s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] as.character method for GenomicRanges?
On 04/24/2015 11:08 AM, Peter Haverty wrote: Good catch. We'll want the strand in case we need to go back to a GRanges. I would make the strand addition optional with the default of FALSE. It's nice to have a column of strings you can paste right into a genome browser (sorry Michael :-) ). I often pass my bench collaborators a spreadsheet with such a column. as.character(unstrand(gr)) ? 3 reasons I'm not too keen about 'ignore.strand=TRUE' being the default: (1) Many functions and methods in GenomicRanges/GenomicAlignments have an 'ignore.strand' argument. For consistency, the default value has been set to FALSE everywhere. Note that this was done even if this default doesn't reflect the most common use case (e.g. summarizeOverlaps). (2) I think it's good to have the default behavior of as.character() allow going back and forth between GRanges and character vector without losing the strand information. (3) The table method for Vector would break if as.character was ignoring the strand by default. Can be worked-around by implementing a method for GenomicRanges objects but... Hope that makes sense. H. Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com mailto:phave...@gene.com On Fri, Apr 24, 2015 at 10:50 AM, Hervé Pagès hpa...@fredhutch.org mailto:hpa...@fredhutch.org wrote: On 04/24/2015 10:21 AM, Michael Lawrence wrote: Sorry, one more concern, if you're thinking of using as a range key, you will need the strand, but many use cases might not want the strand on there. Like for pasting into a genome browser. What about appending the strand only for GRanges objects that have at least one range that is not on *? setMethod(as.character, GenomicRanges, function(x) { if (length(x) == 0L) return(character(0)) ans - paste0(seqnames(x), :, start(x), -, end(x)) if (any(strand(x) != *)) ans - paste0(ans, :, strand(x)) ans } ) as.character(gr) [1] chr1:1-10 chr2:2-10 chr2:3-10 chr2:4-10 chr1:5-10 [6] chr1:6-10 chr3:7-10 chr3:8-10 chr3:9-10 chr3:10-10 strand(gr)[2:3] - c(-, +) as.character(gr) [1] chr1:1-10:* chr2:2-10:- chr2:3-10:+ chr2:4-10:* chr1:5-10:* [6] chr1:6-10:* chr3:7-10:* chr3:8-10:* chr3:9-10:* chr3:10-10:* H. On Fri, Apr 24, 2015 at 10:18 AM, Michael Lawrence micha...@gene.com mailto:micha...@gene.com mailto:micha...@gene.com mailto:micha...@gene.com wrote: It is a great idea, but I'm not sure I would use it to implement table(). Allocating those strings will be costly. Don't we already have the 4-way int hash? Of course, my intuition might be completely off here. On Fri, Apr 24, 2015 at 9:59 AM, Hervé Pagès hpa...@fredhutch.org mailto:hpa...@fredhutch.org mailto:hpa...@fredhutch.org mailto:hpa...@fredhutch.org wrote: Hi Pete, Excellent idea. That will make things like table() work out-of-the-box on GenomicRanges objects. I'll add that. Thanks, H. On 04/24/2015 09:43 AM, Peter Haverty wrote: Would people be interested in having this: setMethod(as.character, GenomicRanges, function(x) { paste0(seqnames(x), :, start(x), -, end(x)) }) ? I find myself doing that a lot to make unique names or for output that goes to collaborators. I suppose we might want to tack on the strand if it isn't *. I have some code for going the other direction too, if there is interest. Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com mailto:phave...@gene.com mailto:phave...@gene.com mailto:phave...@gene.com [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer
Re: [Bioc-devel] as.character method for GenomicRanges?
Going the other way can look like this: ##' Parse one or more location strings and return as a GRanges ##' ##' Parse one or more location strings and return as a GRanges. GRanges will get the names from the location.strings. ##' @param location.string character ##' @export ##' @return GRanges ##' @family location strings locstring2GRanges - function(location.string) { # Take a location string, chr11:123-127 or 11:123..456 + and return a list with chr, start, end elements location.string = sub(\\s+,,location.string) location.string = sub(,,,location.string) #location.string = sub(\\.\\.,-,location.string) # TWU style location strings if (any(! grepl(^(chr){0,1}.+:\\d+-\\d+$, location.string))) { stop(Some location strings do not look like chr1:123-456.) } start = as.integer(sub(^.+:(\\d+)-.+$, \\1, location.string)) stop = as.integer(sub(^.+-(\\d+), \\1, location.string)) gr = GRanges( IRanges( start=pmin(start, stop), end=pmax(start, stop), names=names(location.string)) , seqnames=sub(^chr{0,1}(.*):.*$, \\1, location.string) ) return(gr) } Surprisingly the repeated subs are faster than splitting. Some people, such as GSNAP author Tom Wu, use the format chr1:1234..1235, which we might want to support. The pmin/pmax stuff handles cases where the negative strand is expressed by flipping start and stop. We might not need that. Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Fri, Apr 24, 2015 at 11:08 AM, Peter Haverty phave...@gene.com wrote: Good catch. We'll want the strand in case we need to go back to a GRanges. I would make the strand addition optional with the default of FALSE. It's nice to have a column of strings you can paste right into a genome browser (sorry Michael :-) ). I often pass my bench collaborators a spreadsheet with such a column. Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Fri, Apr 24, 2015 at 10:50 AM, Herv� Pag�s hpa...@fredhutch.org wrote: On 04/24/2015 10:21 AM, Michael Lawrence wrote: Sorry, one more concern, if you're thinking of using as a range key, you will need the strand, but many use cases might not want the strand on there. Like for pasting into a genome browser. What about appending the strand only for GRanges objects that have at least one range that is not on *? setMethod(as.character, GenomicRanges, function(x) { if (length(x) == 0L) return(character(0)) ans - paste0(seqnames(x), :, start(x), -, end(x)) if (any(strand(x) != *)) ans - paste0(ans, :, strand(x)) ans } ) as.character(gr) [1] chr1:1-10 chr2:2-10 chr2:3-10 chr2:4-10 chr1:5-10 [6] chr1:6-10 chr3:7-10 chr3:8-10 chr3:9-10 chr3:10-10 strand(gr)[2:3] - c(-, +) as.character(gr) [1] chr1:1-10:* chr2:2-10:- chr2:3-10:+ chr2:4-10:* chr1:5-10:* [6] chr1:6-10:* chr3:7-10:* chr3:8-10:* chr3:9-10:* chr3:10-10:* H. On Fri, Apr 24, 2015 at 10:18 AM, Michael Lawrence micha...@gene.com mailto:micha...@gene.com wrote: It is a great idea, but I'm not sure I would use it to implement table(). Allocating those strings will be costly. Don't we already have the 4-way int hash? Of course, my intuition might be completely off here. On Fri, Apr 24, 2015 at 9:59 AM, Herv� Pag�s hpa...@fredhutch.org mailto:hpa...@fredhutch.org wrote: Hi Pete, Excellent idea. That will make things like table() work out-of-the-box on GenomicRanges objects. I'll add that. Thanks, H. On 04/24/2015 09:43 AM, Peter Haverty wrote: Would people be interested in having this: setMethod(as.character, GenomicRanges, function(x) { paste0(seqnames(x), :, start(x), -, end(x)) }) ? I find myself doing that a lot to make unique names or for output that goes to collaborators. I suppose we might want to tack on the strand if it isn't *. I have some code for going the other direction too, if there is interest. Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com mailto:phave...@gene.com [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Herv� Pag�s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N,
Re: [Bioc-devel] as.character method for GenomicRanges?
Those are all good reasons for keeping the strand by default. I'm on board. Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Fri, Apr 24, 2015 at 11:26 AM, Herv� Pag�s hpa...@fredhutch.org wrote: On 04/24/2015 11:08 AM, Peter Haverty wrote: Good catch. We'll want the strand in case we need to go back to a GRanges. I would make the strand addition optional with the default of FALSE. It's nice to have a column of strings you can paste right into a genome browser (sorry Michael :-) ). I often pass my bench collaborators a spreadsheet with such a column. as.character(unstrand(gr)) ? 3 reasons I'm not too keen about 'ignore.strand=TRUE' being the default: (1) Many functions and methods in GenomicRanges/GenomicAlignments have an 'ignore.strand' argument. For consistency, the default value has been set to FALSE everywhere. Note that this was done even if this default doesn't reflect the most common use case (e.g. summarizeOverlaps). (2) I think it's good to have the default behavior of as.character() allow going back and forth between GRanges and character vector without losing the strand information. (3) The table method for Vector would break if as.character was ignoring the strand by default. Can be worked-around by implementing a method for GenomicRanges objects but... Hope that makes sense. H. Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com mailto:phave...@gene.com On Fri, Apr 24, 2015 at 10:50 AM, Herv� Pag�s hpa...@fredhutch.org mailto:hpa...@fredhutch.org wrote: On 04/24/2015 10:21 AM, Michael Lawrence wrote: Sorry, one more concern, if you're thinking of using as a range key, you will need the strand, but many use cases might not want the strand on there. Like for pasting into a genome browser. What about appending the strand only for GRanges objects that have at least one range that is not on *? setMethod(as.character, GenomicRanges, function(x) { if (length(x) == 0L) return(character(0)) ans - paste0(seqnames(x), :, start(x), -, end(x)) if (any(strand(x) != *)) ans - paste0(ans, :, strand(x)) ans } ) as.character(gr) [1] chr1:1-10 chr2:2-10 chr2:3-10 chr2:4-10 chr1:5-10 [6] chr1:6-10 chr3:7-10 chr3:8-10 chr3:9-10 chr3:10-10 strand(gr)[2:3] - c(-, +) as.character(gr) [1] chr1:1-10:* chr2:2-10:- chr2:3-10:+ chr2:4-10:* chr1:5-10:* [6] chr1:6-10:* chr3:7-10:* chr3:8-10:* chr3:9-10:* chr3:10-10:* H. On Fri, Apr 24, 2015 at 10:18 AM, Michael Lawrence micha...@gene.com mailto:micha...@gene.com mailto:micha...@gene.com mailto:micha...@gene.com wrote: It is a great idea, but I'm not sure I would use it to implement table(). Allocating those strings will be costly. Don't we already have the 4-way int hash? Of course, my intuition might be completely off here. On Fri, Apr 24, 2015 at 9:59 AM, Herv� Pag�s hpa...@fredhutch.org mailto:hpa...@fredhutch.org mailto:hpa...@fredhutch.org mailto:hpa...@fredhutch.org wrote: Hi Pete, Excellent idea. That will make things like table() work out-of-the-box on GenomicRanges objects. I'll add that. Thanks, H. On 04/24/2015 09:43 AM, Peter Haverty wrote: Would people be interested in having this: setMethod(as.character, GenomicRanges, function(x) { paste0(seqnames(x), :, start(x), -, end(x)) }) ? I find myself doing that a lot to make unique names or for output that goes to collaborators. I suppose we might want to tack on the strand if it isn't *. I have some code for going the other direction too, if there is interest. Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com mailto:phave...@gene.com mailto:phave...@gene.com mailto:phave...@gene.com [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org
Re: [Bioc-devel] as.character method for GenomicRanges?
Taking this a bit off topic but it would be nice if we could get the GRanges equivalent of as.data.frame(table(x)), i.e., unique(x) with a count mcol. Should be easy to support but what should the API be like? On Fri, Apr 24, 2015 at 10:54 AM, Hervé Pagès hpa...@fredhutch.org wrote: On 04/24/2015 10:18 AM, Michael Lawrence wrote: It is a great idea, but I'm not sure I would use it to implement table(). Allocating those strings will be costly. Don't we already have the 4-way int hash? Of course, my intuition might be completely off here. It does use the 4-way int hash internally. as.character() is only used at the very-end to stick the names on the returned table object. H. On Fri, Apr 24, 2015 at 9:59 AM, Hervé Pagès hpa...@fredhutch.org mailto:hpa...@fredhutch.org wrote: Hi Pete, Excellent idea. That will make things like table() work out-of-the-box on GenomicRanges objects. I'll add that. Thanks, H. On 04/24/2015 09:43 AM, Peter Haverty wrote: Would people be interested in having this: setMethod(as.character, GenomicRanges, function(x) { paste0(seqnames(x), :, start(x), -, end(x)) }) ? I find myself doing that a lot to make unique names or for output that goes to collaborators. I suppose we might want to tack on the strand if it isn't *. I have some code for going the other direction too, if there is interest. Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com mailto:phave...@gene.com [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org mailto:hpa...@fredhutch.org Phone: (206) 667-5791 tel:%28206%29%20667-5791 Fax: (206) 667-1319 tel:%28206%29%20667-1319 ___ Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Interoperability between DataFrame and dplyr?
dplyr internally converts all `data.frame` objects to its `tbl_df` class and most dplyr methods operate on the `tbl` superclass, see ( https://github.com/hadley/dplyr/blob/master/R/tbl-df.r, https://github.com/hadley/dplyr/blob/master/R/tbl.r). The most direct route would to getting DataFrame objects working be just to just provide a method that converts the `DataFrame` objects to `data.frame`, then call `tbl_df()` on that. However this would copy the data multiple times, so probably the best option would be to create a new `tbl_DF` class to handle `DataFrame` objects directly. You can look in the various tbl-*.r files at ( https://github.com/hadley/dplyr/blob/master/R/) to see what methods should be implemented. On Fri, Apr 24, 2015 at 10:16 AM, Michael Lawrence lawrence.mich...@gene.com wrote: Sure, but the way DataFrame is flexible is by relying on two abstractions in base R. Just length() and '['. If dplyr does the same thing, which seems totally reasonable, everything should work the same. On Thu, Apr 23, 2015 at 4:32 PM, Vincent Carey st...@channing.harvard.edu wrote: Seems to me that DataFrame is too flexible -- you can have very complex objects in the columns (anything that inherits from Vector) with which, in its current state, dplyr would not work too naturally. You would wind up doing a fair amount of coercion of such entities, so it seems to me that arranging a coercion of DataFrames satisfying specific conditions to data.frame would be a path of low resistance. Ready to be corrected of course. On Thu, Apr 23, 2015 at 7:06 PM, Ryan C. Thompson r...@thompsonclan.org wrote: Hi all, So, dplyr is a pretty cool thing, but it currently works with data.frame and data.table, but not S4Vectors::DataFrame. I'd like to change that if possible, and I assume that this would simply involve writing some glue code. However, I'm not really sure where to start, and I expect things might be complicated because dplyr uses S3 and S4Vectors uses S4. Can anyone offer any pointers? -Ryan ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Interoperability between DataFrame and dplyr?
On Fri, Apr 24, 2015 at 7:42 AM, Jim Hester james.f.hes...@gmail.com wrote: dplyr internally converts all `data.frame` objects to its `tbl_df` class and most dplyr methods operate on the `tbl` superclass, see ( https://github.com/hadley/dplyr/blob/master/R/tbl-df.r, https://github.com/hadley/dplyr/blob/master/R/tbl.r). I hope you're speaking only of the data frame implementation here. The most direct route would to getting DataFrame objects working be just to just provide a method that converts the `DataFrame` objects to `data.frame`, then call `tbl_df()` on that. That coercion already exists, of course, and it's via the S3 as.data.frame, so it should work already. However this would copy the data multiple times, so probably the best option would be to create a new `tbl_DF` class to handle `DataFrame` objects directly. It doesn't copy the data, outside of the list of pointers (so it's pretty much instantaneous), but yea, I agree a new implementation is the way to go. You can look in the various tbl-*.r files at ( https://github.com/hadley/dplyr/blob/master/R/) to see what methods should be implemented. On Fri, Apr 24, 2015 at 10:16 AM, Michael Lawrence lawrence.mich...@gene.com wrote: Sure, but the way DataFrame is flexible is by relying on two abstractions in base R. Just length() and '['. If dplyr does the same thing, which seems totally reasonable, everything should work the same. On Thu, Apr 23, 2015 at 4:32 PM, Vincent Carey st...@channing.harvard.edu wrote: Seems to me that DataFrame is too flexible -- you can have very complex objects in the columns (anything that inherits from Vector) with which, in its current state, dplyr would not work too naturally. You would wind up doing a fair amount of coercion of such entities, so it seems to me that arranging a coercion of DataFrames satisfying specific conditions to data.frame would be a path of low resistance. Ready to be corrected of course. On Thu, Apr 23, 2015 at 7:06 PM, Ryan C. Thompson r...@thompsonclan.org wrote: Hi all, So, dplyr is a pretty cool thing, but it currently works with data.frame and data.table, but not S4Vectors::DataFrame. I'd like to change that if possible, and I assume that this would simply involve writing some glue code. However, I'm not really sure where to start, and I expect things might be complicated because dplyr uses S3 and S4Vectors uses S4. Can anyone offer any pointers? -Ryan ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] BioC 2015 Conference Posters
Poster registration is now open. Visit the web site http://www.bioconductor.org/help/course-materials/2015/BioC2015/ and registration page https://register.bioconductor.org/BioC2015/poster_submit.php for more information. Posters can be from any area of computational biology, medicine, computer science, mathematics and statistics. Purely experimental work or package overviews are welcome. They will be up for viewing during Tuesday's (and probably Wednesday's) social hour. Deadline is July 15 so we can estimate the number of display boards. Max size is 48 x 36 inches. For questions please contact voben...@fredhutch.org. ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Warnings from ls() in AnnotationDbi
On 04/24/2015 09:07 AM, James W. MacDonald wrote: A poster on the support site reported some warnings issued when running hyperGTest() from GOstats. I tracked this down to the ls() function from AnnotationDbi, when it dispatches on AnnDbBimap objects. The method is: setMethod(ls, signature(name=Bimap), function(name, pos, envir, all.names, pattern){ if (!missing(pos)) warning(ignoring 'pos' argument) if (!missing(envir)) warning(ignoring 'envir' argument) if (!missing(all.names)) warning(ignoring 'all.names' argument) .ls(name, pos, envir, all.names, pattern) } ) And as noted, everything but 'name' is ignored in .ls(). This seemingly hasn't changed in years. In R-3.1.1 the method appears as showMethods(ls, class = AnnDbBimap, includeDefs = T) Function: ls (package base) name=AnnDbBimap function (name, pos = -1L, envir = as.environment(pos), all.names = FALSE, pattern) { if (!missing(pos)) warning(ignoring 'pos' argument) if (!missing(envir)) warning(ignoring 'envir' argument) if (!missing(all.names)) warning(ignoring 'all.names' argument) .ls(name, pos, envir, all.names, pattern) } And now in R-3.2.0, the method appears as showMethods(ls, class = Bimap, includeDefs = TRUE) Function: ls (package base) name=Bimap function (name, pos = -1L, envir = as.environment(pos), all.names = FALSE, pattern, sorted = TRUE) { .local - function (name, pos, envir, all.names, pattern) { if (!missing(pos)) warning(ignoring 'pos' argument) if (!missing(envir)) warning(ignoring 'envir' argument) if (!missing(all.names)) warning(ignoring 'all.names' argument) .ls(name, pos, envir, all.names, pattern) } .local(name, pos, envir, all.names, pattern) } Where everything gets wrapped in a call to .local(). Prior to this, we never saw the warnings, but now we do: z - ls(annotate:::getAnnMap(BPPARENTS, chip = GO)) Warning messages: 1: In .local(name, pos, envir, all.names, pattern) : ignoring 'pos' argument 2: In .local(name, pos, envir, all.names, pattern) : ignoring 'envir' argument 3: In .local(name, pos, envir, all.names, pattern) : ignoring 'all.names' argument This is going to be a consistent issue going forward, so should all those warnings be stripped out of the method for ls() in AnnotationDbi? I changed the signature of the method to match the signature of the generic, so the missing-ness of the arguments, and hence warnings, are assessed appropriately. In devel, I supported sort=..., whereas in release it is ignored. Thanks for pointing this out. Martin Best, Jim -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] as.character method for GenomicRanges?
Sorry, one more concern, if you're thinking of using as a range key, you will need the strand, but many use cases might not want the strand on there. Like for pasting into a genome browser. On Fri, Apr 24, 2015 at 10:18 AM, Michael Lawrence micha...@gene.com wrote: It is a great idea, but I'm not sure I would use it to implement table(). Allocating those strings will be costly. Don't we already have the 4-way int hash? Of course, my intuition might be completely off here. On Fri, Apr 24, 2015 at 9:59 AM, Hervé Pagès hpa...@fredhutch.org wrote: Hi Pete, Excellent idea. That will make things like table() work out-of-the-box on GenomicRanges objects. I'll add that. Thanks, H. On 04/24/2015 09:43 AM, Peter Haverty wrote: Would people be interested in having this: setMethod(as.character, GenomicRanges, function(x) { paste0(seqnames(x), :, start(x), -, end(x)) }) ? I find myself doing that a lot to make unique names or for output that goes to collaborators. I suppose we might want to tack on the strand if it isn't *. I have some code for going the other direction too, if there is interest. Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] as.character method for GenomicRanges?
It is a great idea, but I'm not sure I would use it to implement table(). Allocating those strings will be costly. Don't we already have the 4-way int hash? Of course, my intuition might be completely off here. On Fri, Apr 24, 2015 at 9:59 AM, Hervé Pagès hpa...@fredhutch.org wrote: Hi Pete, Excellent idea. That will make things like table() work out-of-the-box on GenomicRanges objects. I'll add that. Thanks, H. On 04/24/2015 09:43 AM, Peter Haverty wrote: Would people be interested in having this: setMethod(as.character, GenomicRanges, function(x) { paste0(seqnames(x), :, start(x), -, end(x)) }) ? I find myself doing that a lot to make unique names or for output that goes to collaborators. I suppose we might want to tack on the strand if it isn't *. I have some code for going the other direction too, if there is interest. Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] as.character method for GenomicRanges?
On 04/24/2015 10:18 AM, Michael Lawrence wrote: It is a great idea, but I'm not sure I would use it to implement table(). Allocating those strings will be costly. Don't we already have the 4-way int hash? Of course, my intuition might be completely off here. It does use the 4-way int hash internally. as.character() is only used at the very-end to stick the names on the returned table object. H. On Fri, Apr 24, 2015 at 9:59 AM, Hervé Pagès hpa...@fredhutch.org mailto:hpa...@fredhutch.org wrote: Hi Pete, Excellent idea. That will make things like table() work out-of-the-box on GenomicRanges objects. I'll add that. Thanks, H. On 04/24/2015 09:43 AM, Peter Haverty wrote: Would people be interested in having this: setMethod(as.character, GenomicRanges, function(x) { paste0(seqnames(x), :, start(x), -, end(x)) }) ? I find myself doing that a lot to make unique names or for output that goes to collaborators. I suppose we might want to tack on the strand if it isn't *. I have some code for going the other direction too, if there is interest. Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com mailto:phave...@gene.com [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org mailto:hpa...@fredhutch.org Phone: (206) 667-5791 tel:%28206%29%20667-5791 Fax: (206) 667-1319 tel:%28206%29%20667-1319 ___ Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel