Re: [Bioc-devel] Interoperability between DataFrame and dplyr?

2015-04-24 Thread Michael Lawrence
Lists are vectors, but as long as they aren't doing too much C++, or assert
specific types, it should work. Functions like length(), names(), [, etc
dispatch to both S3 and S4.


On Thu, Apr 23, 2015 at 6:11 PM, Ryan Thompson r...@thompsonclan.org wrote:

 It looks like dplyr has support for base R lists as columns in data
 frames:
 http://cran.r-project.org/web/packages/dplyr/vignettes/data_frames.html

 Hopefully that feature is flexible enough to accommodate anything that
 looks sufficiently like a vector. Is love to give this a try, but I can't
 find anything on what specifically needs to be implemented for a new dplyr
 backend.
 In theory, probably not that hard. DataFrame implements methods on
 primitive and S3 generics, so even the darkest shadows of the S3 world will
 dispatch correctly on those. One potential roadblock is that dplyr may
 assume that all columns are base R vectors, which would obviously fail for
 stuff like Rle.  Ideally, the data.frame implementation of dplyr has
 restricted itself to some subset of the base R API, without diving down
 into C++, except through dispatch.  But that might be too idealistic. Good
 luck, I'm interested to see what you come up with.

 Michael


 On Thu, Apr 23, 2015 at 4:06 PM, Ryan C. Thompson r...@thompsonclan.org
 wrote:

 Hi all,

 So, dplyr is a pretty cool thing, but it currently works with data.frame
 and data.table, but not S4Vectors::DataFrame. I'd like to change that if
 possible, and I assume that this would simply involve writing some glue
 code. However, I'm not really sure where to start, and I expect things
 might be complicated because dplyr uses S3 and S4Vectors uses S4. Can
 anyone offer any pointers?

 -Ryan

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel




[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Interoperability between DataFrame and dplyr?

2015-04-24 Thread Michael Lawrence
Sure, but the way DataFrame is flexible is by relying on two abstractions
in base R. Just length() and '['. If dplyr does the same thing, which seems
totally reasonable, everything should work the same.

On Thu, Apr 23, 2015 at 4:32 PM, Vincent Carey st...@channing.harvard.edu
wrote:

 Seems to me that DataFrame is too flexible -- you can have very complex
 objects in the columns (anything that inherits from Vector) with which, in
 its current state, dplyr would not work too naturally.  You would wind up
 doing a fair amount of coercion of such entities, so it seems to me that
 arranging a coercion of DataFrames satisfying specific conditions to
 data.frame would be a path of low resistance.

 Ready to be corrected of course.


 On Thu, Apr 23, 2015 at 7:06 PM, Ryan C. Thompson r...@thompsonclan.org
 wrote:

  Hi all,
 
  So, dplyr is a pretty cool thing, but it currently works with data.frame
  and data.table, but not S4Vectors::DataFrame. I'd like to change that if
  possible, and I assume that this would simply involve writing some glue
  code. However, I'm not really sure where to start, and I expect things
  might be complicated because dplyr uses S3 and S4Vectors uses S4. Can
  anyone offer any pointers?
 
  -Ryan
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel
 

 [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] as.character method for GenomicRanges?

2015-04-24 Thread Peter Haverty
Good catch. We'll want the strand in case we need to go back to a GRanges.
I would make the strand addition optional with the default of FALSE. It's
nice to have a column of strings you can paste right into a genome browser
(sorry Michael :-) ).  I often pass my bench collaborators a spreadsheet
with such a column.

Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com

On Fri, Apr 24, 2015 at 10:50 AM, Herv� Pag�s hpa...@fredhutch.org wrote:

 On 04/24/2015 10:21 AM, Michael Lawrence wrote:

 Sorry, one more concern, if you're thinking of using as a range key, you
 will need the strand, but many use cases might not want the strand on
 there. Like for pasting into a genome browser.


 What about appending the strand only for GRanges objects that
 have at least one range that is not on *?

 setMethod(as.character, GenomicRanges,
 function(x)
 {
 if (length(x) == 0L)
 return(character(0))
 ans - paste0(seqnames(x), :, start(x), -, end(x))
 if (any(strand(x) != *))
   ans - paste0(ans, :, strand(x))
 ans
 }
 )

  as.character(gr)
  [1] chr1:1-10  chr2:2-10  chr2:3-10  chr2:4-10  chr1:5-10
  [6] chr1:6-10  chr3:7-10  chr3:8-10  chr3:9-10  chr3:10-10

  strand(gr)[2:3] - c(-, +)
  as.character(gr)
  [1] chr1:1-10:*  chr2:2-10:-  chr2:3-10:+  chr2:4-10:*
 chr1:5-10:*
  [6] chr1:6-10:*  chr3:7-10:*  chr3:8-10:*  chr3:9-10:*
 chr3:10-10:*

 H.


 On Fri, Apr 24, 2015 at 10:18 AM, Michael Lawrence micha...@gene.com
 mailto:micha...@gene.com wrote:

 It is a great idea, but I'm not sure I would use it to implement
 table(). Allocating those strings will be costly. Don't we already
 have the 4-way int hash? Of course, my intuition might be completely
 off here.


 On Fri, Apr 24, 2015 at 9:59 AM, Herv� Pag�s hpa...@fredhutch.org
 mailto:hpa...@fredhutch.org wrote:

 Hi Pete,

 Excellent idea. That will make things like table() work
 out-of-the-box
 on GenomicRanges objects. I'll add that.

 Thanks,
 H.



 On 04/24/2015 09:43 AM, Peter Haverty wrote:

 Would people be interested in having this:

 setMethod(as.character, GenomicRanges,
 function(x) {
 paste0(seqnames(x), :, start(x), -,
 end(x))
 })

 ?

 I find myself doing that a lot to make unique names or for
 output that
 goes to collaborators.  I suppose we might want to tack on
 the strand if it
 isn't *.  I have some code for going the other direction
 too, if there is
 interest.



 Pete

 
 Peter M. Haverty, Ph.D.
 Genentech, Inc.
 phave...@gene.com mailto:phave...@gene.com

  [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


 --
 Herv� Pag�s

 Program in Computational Biology
 Division of Public Health Sciences
 Fred Hutchinson Cancer Research Center
 1100 Fairview Ave. N, M1-B514
 P.O. Box 19024
 Seattle, WA 98109-1024

 E-mail: hpa...@fredhutch.org mailto:hpa...@fredhutch.org
 Phone: (206) 667-5791 tel:%28206%29%20667-5791
 Fax: (206) 667-1319 tel:%28206%29%20667-1319


 ___
 Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel




 --
 Herv� Pag�s

 Program in Computational Biology
 Division of Public Health Sciences
 Fred Hutchinson Cancer Research Center
 1100 Fairview Ave. N, M1-B514
 P.O. Box 19024
 Seattle, WA 98109-1024

 E-mail: hpa...@fredhutch.org
 Phone:  (206) 667-5791
 Fax:(206) 667-1319


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] as.character method for GenomicRanges?

2015-04-24 Thread Hervé Pagès

On 04/24/2015 11:08 AM, Peter Haverty wrote:

Good catch. We'll want the strand in case we need to go back to a GRanges.
I would make the strand addition optional with the default of FALSE.
It's nice to have a column of strings you can paste right into a genome
browser (sorry Michael :-) ).  I often pass my bench collaborators a
spreadsheet with such a column.


as.character(unstrand(gr)) ?

3 reasons I'm not too keen about 'ignore.strand=TRUE' being the default:

(1) Many functions and methods in GenomicRanges/GenomicAlignments
have an 'ignore.strand' argument. For consistency, the default
value has been set to FALSE everywhere. Note that this was done
even if this default doesn't reflect the most common use case
(e.g. summarizeOverlaps).

(2) I think it's good to have the default behavior of as.character()
allow going back and forth between GRanges and character vector
without losing the strand information.

(3) The table method for Vector would break if as.character was
ignoring the strand by default. Can be worked-around by
implementing a method for GenomicRanges objects but...

Hope that makes sense.

H.




Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com mailto:phave...@gene.com

On Fri, Apr 24, 2015 at 10:50 AM, Hervé Pagès hpa...@fredhutch.org
mailto:hpa...@fredhutch.org wrote:

On 04/24/2015 10:21 AM, Michael Lawrence wrote:

Sorry, one more concern, if you're thinking of using as a range
key, you
will need the strand, but many use cases might not want the
strand on
there. Like for pasting into a genome browser.


What about appending the strand only for GRanges objects that
have at least one range that is not on *?

setMethod(as.character, GenomicRanges,
 function(x)
 {
 if (length(x) == 0L)
 return(character(0))
 ans - paste0(seqnames(x), :, start(x), -, end(x))
 if (any(strand(x) != *))
   ans - paste0(ans, :, strand(x))
 ans
 }
)

  as.character(gr)
  [1] chr1:1-10  chr2:2-10  chr2:3-10  chr2:4-10  chr1:5-10
  [6] chr1:6-10  chr3:7-10  chr3:8-10  chr3:9-10  chr3:10-10

  strand(gr)[2:3] - c(-, +)
  as.character(gr)
  [1] chr1:1-10:*  chr2:2-10:-  chr2:3-10:+  chr2:4-10:*
chr1:5-10:*
  [6] chr1:6-10:*  chr3:7-10:*  chr3:8-10:*  chr3:9-10:*
chr3:10-10:*

H.


On Fri, Apr 24, 2015 at 10:18 AM, Michael Lawrence
micha...@gene.com mailto:micha...@gene.com
mailto:micha...@gene.com mailto:micha...@gene.com wrote:

 It is a great idea, but I'm not sure I would use it to
implement
 table(). Allocating those strings will be costly. Don't we
already
 have the 4-way int hash? Of course, my intuition might be
completely
 off here.


 On Fri, Apr 24, 2015 at 9:59 AM, Hervé Pagès
hpa...@fredhutch.org mailto:hpa...@fredhutch.org
 mailto:hpa...@fredhutch.org
mailto:hpa...@fredhutch.org wrote:

 Hi Pete,

 Excellent idea. That will make things like table() work
 out-of-the-box
 on GenomicRanges objects. I'll add that.

 Thanks,
 H.



 On 04/24/2015 09:43 AM, Peter Haverty wrote:

 Would people be interested in having this:

 setMethod(as.character, GenomicRanges,
 function(x) {
 paste0(seqnames(x), :, start(x),
-, end(x))
 })

 ?

 I find myself doing that a lot to make unique names
or for
 output that
 goes to collaborators.  I suppose we might want to
tack on
 the strand if it
 isn't *.  I have some code for going the other
direction
 too, if there is
 interest.



 Pete

 
 Peter M. Haverty, Ph.D.
 Genentech, Inc.
phave...@gene.com mailto:phave...@gene.com
mailto:phave...@gene.com mailto:phave...@gene.com

  [[alternative HTML version deleted]]

 ___
Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org
mailto:Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org
 mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


 --
 Hervé Pagès

 Program in Computational Biology
 Division of Public Health Sciences
 Fred Hutchinson Cancer 

Re: [Bioc-devel] as.character method for GenomicRanges?

2015-04-24 Thread Peter Haverty
Going the other way can look like this:

##' Parse one or more location strings and return as a GRanges



##'



##' Parse one or more location strings and return as a GRanges. GRanges
will get the names from the location.strings.


##' @param location.string character



##' @export



##' @return GRanges



##' @family location strings



locstring2GRanges - function(location.string) {



  #  Take a location string, chr11:123-127 or 11:123..456 + and
return a list with chr, start, end elements


  location.string = sub(\\s+,,location.string)
  location.string = sub(,,,location.string)
  #location.string = sub(\\.\\.,-,location.string)  # TWU style
location strings


  if (any(! grepl(^(chr){0,1}.+:\\d+-\\d+$, location.string))) {
stop(Some location strings do not look like chr1:123-456.) }
  start = as.integer(sub(^.+:(\\d+)-.+$, \\1, location.string))
  stop = as.integer(sub(^.+-(\\d+), \\1, location.string))
  gr = GRanges( IRanges(
start=pmin(start, stop),
end=pmax(start, stop),
names=names(location.string))
, seqnames=sub(^chr{0,1}(.*):.*$, \\1, location.string) )
  return(gr)
}

Surprisingly the repeated subs are faster than splitting.  Some people,
such as GSNAP author Tom Wu, use the format chr1:1234..1235, which we
might want to support. The pmin/pmax stuff handles cases where the negative
strand is expressed by flipping start and stop. We might not need that.



Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com

On Fri, Apr 24, 2015 at 11:08 AM, Peter Haverty phave...@gene.com wrote:

 Good catch. We'll want the strand in case we need to go back to a GRanges.
 I would make the strand addition optional with the default of FALSE. It's
 nice to have a column of strings you can paste right into a genome browser
 (sorry Michael :-) ).  I often pass my bench collaborators a spreadsheet
 with such a column.

 Pete

 
 Peter M. Haverty, Ph.D.
 Genentech, Inc.
 phave...@gene.com

 On Fri, Apr 24, 2015 at 10:50 AM, Herv� Pag�s hpa...@fredhutch.org
 wrote:

 On 04/24/2015 10:21 AM, Michael Lawrence wrote:

 Sorry, one more concern, if you're thinking of using as a range key, you
 will need the strand, but many use cases might not want the strand on
 there. Like for pasting into a genome browser.


 What about appending the strand only for GRanges objects that
 have at least one range that is not on *?

 setMethod(as.character, GenomicRanges,
 function(x)
 {
 if (length(x) == 0L)
 return(character(0))
 ans - paste0(seqnames(x), :, start(x), -, end(x))
 if (any(strand(x) != *))
   ans - paste0(ans, :, strand(x))
 ans
 }
 )

  as.character(gr)
  [1] chr1:1-10  chr2:2-10  chr2:3-10  chr2:4-10  chr1:5-10
  [6] chr1:6-10  chr3:7-10  chr3:8-10  chr3:9-10  chr3:10-10

  strand(gr)[2:3] - c(-, +)
  as.character(gr)
  [1] chr1:1-10:*  chr2:2-10:-  chr2:3-10:+  chr2:4-10:*
 chr1:5-10:*
  [6] chr1:6-10:*  chr3:7-10:*  chr3:8-10:*  chr3:9-10:*
 chr3:10-10:*

 H.


 On Fri, Apr 24, 2015 at 10:18 AM, Michael Lawrence micha...@gene.com
 mailto:micha...@gene.com wrote:

 It is a great idea, but I'm not sure I would use it to implement
 table(). Allocating those strings will be costly. Don't we already
 have the 4-way int hash? Of course, my intuition might be completely
 off here.


 On Fri, Apr 24, 2015 at 9:59 AM, Herv� Pag�s hpa...@fredhutch.org
 mailto:hpa...@fredhutch.org wrote:

 Hi Pete,

 Excellent idea. That will make things like table() work
 out-of-the-box
 on GenomicRanges objects. I'll add that.

 Thanks,
 H.



 On 04/24/2015 09:43 AM, Peter Haverty wrote:

 Would people be interested in having this:

 setMethod(as.character, GenomicRanges,
 function(x) {
 paste0(seqnames(x), :, start(x), -,
 end(x))
 })

 ?

 I find myself doing that a lot to make unique names or for
 output that
 goes to collaborators.  I suppose we might want to tack on
 the strand if it
 isn't *.  I have some code for going the other direction
 too, if there is
 interest.



 Pete

 
 Peter M. Haverty, Ph.D.
 Genentech, Inc.
 phave...@gene.com mailto:phave...@gene.com

  [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


 --
 Herv� Pag�s

 Program in Computational Biology
 Division of Public Health Sciences
 Fred Hutchinson Cancer Research Center
 1100 Fairview Ave. N, 

Re: [Bioc-devel] as.character method for GenomicRanges?

2015-04-24 Thread Peter Haverty
Those are all good reasons for keeping the strand by default.  I'm on board.

Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com

On Fri, Apr 24, 2015 at 11:26 AM, Herv� Pag�s hpa...@fredhutch.org wrote:

 On 04/24/2015 11:08 AM, Peter Haverty wrote:

 Good catch. We'll want the strand in case we need to go back to a GRanges.
 I would make the strand addition optional with the default of FALSE.
 It's nice to have a column of strings you can paste right into a genome
 browser (sorry Michael :-) ).  I often pass my bench collaborators a
 spreadsheet with such a column.


 as.character(unstrand(gr)) ?

 3 reasons I'm not too keen about 'ignore.strand=TRUE' being the default:

 (1) Many functions and methods in GenomicRanges/GenomicAlignments
 have an 'ignore.strand' argument. For consistency, the default
 value has been set to FALSE everywhere. Note that this was done
 even if this default doesn't reflect the most common use case
 (e.g. summarizeOverlaps).

 (2) I think it's good to have the default behavior of as.character()
 allow going back and forth between GRanges and character vector
 without losing the strand information.

 (3) The table method for Vector would break if as.character was
 ignoring the strand by default. Can be worked-around by
 implementing a method for GenomicRanges objects but...

 Hope that makes sense.

 H.



 Pete

 
 Peter M. Haverty, Ph.D.
 Genentech, Inc.
 phave...@gene.com mailto:phave...@gene.com

 On Fri, Apr 24, 2015 at 10:50 AM, Herv� Pag�s hpa...@fredhutch.org
 mailto:hpa...@fredhutch.org wrote:

 On 04/24/2015 10:21 AM, Michael Lawrence wrote:

 Sorry, one more concern, if you're thinking of using as a range
 key, you
 will need the strand, but many use cases might not want the
 strand on
 there. Like for pasting into a genome browser.


 What about appending the strand only for GRanges objects that
 have at least one range that is not on *?

 setMethod(as.character, GenomicRanges,
  function(x)
  {
  if (length(x) == 0L)
  return(character(0))
  ans - paste0(seqnames(x), :, start(x), -, end(x))
  if (any(strand(x) != *))
ans - paste0(ans, :, strand(x))
  ans
  }
 )

   as.character(gr)
   [1] chr1:1-10  chr2:2-10  chr2:3-10  chr2:4-10  chr1:5-10
   [6] chr1:6-10  chr3:7-10  chr3:8-10  chr3:9-10  chr3:10-10

   strand(gr)[2:3] - c(-, +)
   as.character(gr)
   [1] chr1:1-10:*  chr2:2-10:-  chr2:3-10:+  chr2:4-10:*
 chr1:5-10:*
   [6] chr1:6-10:*  chr3:7-10:*  chr3:8-10:*  chr3:9-10:*
 chr3:10-10:*

 H.


 On Fri, Apr 24, 2015 at 10:18 AM, Michael Lawrence
 micha...@gene.com mailto:micha...@gene.com
 mailto:micha...@gene.com mailto:micha...@gene.com wrote:

  It is a great idea, but I'm not sure I would use it to
 implement
  table(). Allocating those strings will be costly. Don't we
 already
  have the 4-way int hash? Of course, my intuition might be
 completely
  off here.


  On Fri, Apr 24, 2015 at 9:59 AM, Herv� Pag�s
 hpa...@fredhutch.org mailto:hpa...@fredhutch.org
  mailto:hpa...@fredhutch.org

 mailto:hpa...@fredhutch.org wrote:

  Hi Pete,

  Excellent idea. That will make things like table() work
  out-of-the-box
  on GenomicRanges objects. I'll add that.

  Thanks,
  H.



  On 04/24/2015 09:43 AM, Peter Haverty wrote:

  Would people be interested in having this:

  setMethod(as.character, GenomicRanges,
  function(x) {
  paste0(seqnames(x), :, start(x),
 -, end(x))
  })

  ?

  I find myself doing that a lot to make unique names
 or for
  output that
  goes to collaborators.  I suppose we might want to
 tack on
  the strand if it
  isn't *.  I have some code for going the other
 direction
  too, if there is
  interest.



  Pete

  
  Peter M. Haverty, Ph.D.
  Genentech, Inc.
 phave...@gene.com mailto:phave...@gene.com
 mailto:phave...@gene.com mailto:phave...@gene.com

   [[alternative HTML version deleted]]

  ___
 Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org
 

Re: [Bioc-devel] as.character method for GenomicRanges?

2015-04-24 Thread Michael Lawrence
Taking this a bit off topic but it would be nice if we could get the
GRanges equivalent of as.data.frame(table(x)), i.e., unique(x) with a count
mcol. Should be easy to support but what should the API be like?

On Fri, Apr 24, 2015 at 10:54 AM, Hervé Pagès hpa...@fredhutch.org wrote:

 On 04/24/2015 10:18 AM, Michael Lawrence wrote:

 It is a great idea, but I'm not sure I would use it to implement
 table(). Allocating those strings will be costly. Don't we already have
 the 4-way int hash? Of course, my intuition might be completely off here.


 It does use the 4-way int hash internally. as.character() is only used
 at the very-end to stick the names on the returned table object.

 H.



 On Fri, Apr 24, 2015 at 9:59 AM, Hervé Pagès hpa...@fredhutch.org
 mailto:hpa...@fredhutch.org wrote:

 Hi Pete,

 Excellent idea. That will make things like table() work out-of-the-box
 on GenomicRanges objects. I'll add that.

 Thanks,
 H.



 On 04/24/2015 09:43 AM, Peter Haverty wrote:

 Would people be interested in having this:

 setMethod(as.character, GenomicRanges,
 function(x) {
 paste0(seqnames(x), :, start(x), -, end(x))
 })

 ?

 I find myself doing that a lot to make unique names or for
 output that
 goes to collaborators.  I suppose we might want to tack on the
 strand if it
 isn't *.  I have some code for going the other direction too,
 if there is
 interest.



 Pete

 
 Peter M. Haverty, Ph.D.
 Genentech, Inc.
 phave...@gene.com mailto:phave...@gene.com

  [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


 --
 Hervé Pagès

 Program in Computational Biology
 Division of Public Health Sciences
 Fred Hutchinson Cancer Research Center
 1100 Fairview Ave. N, M1-B514
 P.O. Box 19024
 Seattle, WA 98109-1024

 E-mail: hpa...@fredhutch.org mailto:hpa...@fredhutch.org
 Phone: (206) 667-5791 tel:%28206%29%20667-5791
 Fax: (206) 667-1319 tel:%28206%29%20667-1319


 ___
 Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org mailing
 list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel



 --
 Hervé Pagès

 Program in Computational Biology
 Division of Public Health Sciences
 Fred Hutchinson Cancer Research Center
 1100 Fairview Ave. N, M1-B514
 P.O. Box 19024
 Seattle, WA 98109-1024

 E-mail: hpa...@fredhutch.org
 Phone:  (206) 667-5791
 Fax:(206) 667-1319


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Interoperability between DataFrame and dplyr?

2015-04-24 Thread Jim Hester
dplyr internally converts all `data.frame` objects to its `tbl_df` class
and most dplyr methods operate on the `tbl` superclass,  see (
https://github.com/hadley/dplyr/blob/master/R/tbl-df.r,
https://github.com/hadley/dplyr/blob/master/R/tbl.r).

The most direct route would to getting DataFrame objects working be just to
just provide a method that converts the `DataFrame` objects to
`data.frame`, then call `tbl_df()` on that.

However this would copy the data multiple times, so probably the best
option would be to create a new `tbl_DF` class to handle `DataFrame`
objects directly.  You can look in the various tbl-*.r files at (
https://github.com/hadley/dplyr/blob/master/R/) to see what methods should
be implemented.

On Fri, Apr 24, 2015 at 10:16 AM, Michael Lawrence 
lawrence.mich...@gene.com wrote:

 Sure, but the way DataFrame is flexible is by relying on two abstractions
 in base R. Just length() and '['. If dplyr does the same thing, which seems
 totally reasonable, everything should work the same.

 On Thu, Apr 23, 2015 at 4:32 PM, Vincent Carey st...@channing.harvard.edu
 
 wrote:

  Seems to me that DataFrame is too flexible -- you can have very complex
  objects in the columns (anything that inherits from Vector) with which,
 in
  its current state, dplyr would not work too naturally.  You would wind up
  doing a fair amount of coercion of such entities, so it seems to me that
  arranging a coercion of DataFrames satisfying specific conditions to
  data.frame would be a path of low resistance.
 
  Ready to be corrected of course.
 
 
  On Thu, Apr 23, 2015 at 7:06 PM, Ryan C. Thompson r...@thompsonclan.org
  wrote:
 
   Hi all,
  
   So, dplyr is a pretty cool thing, but it currently works with
 data.frame
   and data.table, but not S4Vectors::DataFrame. I'd like to change that
 if
   possible, and I assume that this would simply involve writing some
 glue
   code. However, I'm not really sure where to start, and I expect things
   might be complicated because dplyr uses S3 and S4Vectors uses S4. Can
   anyone offer any pointers?
  
   -Ryan
  
   ___
   Bioc-devel@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/bioc-devel
  
 
  [[alternative HTML version deleted]]
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel
 

 [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Interoperability between DataFrame and dplyr?

2015-04-24 Thread Michael Lawrence
On Fri, Apr 24, 2015 at 7:42 AM, Jim Hester james.f.hes...@gmail.com
wrote:

 dplyr internally converts all `data.frame` objects to its `tbl_df` class
 and most dplyr methods operate on the `tbl` superclass,  see (
 https://github.com/hadley/dplyr/blob/master/R/tbl-df.r,
 https://github.com/hadley/dplyr/blob/master/R/tbl.r).


I hope you're speaking only of the data frame implementation here.


 The most direct route would to getting DataFrame objects working be just
 to just provide a method that converts the `DataFrame` objects to
 `data.frame`, then call `tbl_df()` on that.


That coercion already exists, of course, and it's via the S3 as.data.frame,
so it should work already.


 However this would copy the data multiple times, so probably the best
 option would be to create a new `tbl_DF` class to handle `DataFrame`
 objects directly.


It doesn't copy the data, outside of the list of pointers (so it's pretty
much instantaneous), but yea, I agree a new implementation is the way to go.


 You can look in the various tbl-*.r files at (
 https://github.com/hadley/dplyr/blob/master/R/) to see what methods
 should be implemented.

 On Fri, Apr 24, 2015 at 10:16 AM, Michael Lawrence 
 lawrence.mich...@gene.com wrote:

 Sure, but the way DataFrame is flexible is by relying on two abstractions
 in base R. Just length() and '['. If dplyr does the same thing, which
 seems
 totally reasonable, everything should work the same.

 On Thu, Apr 23, 2015 at 4:32 PM, Vincent Carey 
 st...@channing.harvard.edu
 wrote:

  Seems to me that DataFrame is too flexible -- you can have very complex
  objects in the columns (anything that inherits from Vector) with which,
 in
  its current state, dplyr would not work too naturally.  You would wind
 up
  doing a fair amount of coercion of such entities, so it seems to me that
  arranging a coercion of DataFrames satisfying specific conditions to
  data.frame would be a path of low resistance.
 
  Ready to be corrected of course.
 
 
  On Thu, Apr 23, 2015 at 7:06 PM, Ryan C. Thompson r...@thompsonclan.org
 
  wrote:
 
   Hi all,
  
   So, dplyr is a pretty cool thing, but it currently works with
 data.frame
   and data.table, but not S4Vectors::DataFrame. I'd like to change that
 if
   possible, and I assume that this would simply involve writing some
 glue
   code. However, I'm not really sure where to start, and I expect things
   might be complicated because dplyr uses S3 and S4Vectors uses S4. Can
   anyone offer any pointers?
  
   -Ryan
  
   ___
   Bioc-devel@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/bioc-devel
  
 
  [[alternative HTML version deleted]]
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel
 

 [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel




[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] BioC 2015 Conference Posters

2015-04-24 Thread Valerie Obenchain

Poster registration is now open. Visit the web site

  http://www.bioconductor.org/help/course-materials/2015/BioC2015/

and registration page

  https://register.bioconductor.org/BioC2015/poster_submit.php

for more information.

Posters can be from any area of computational biology, medicine, 
computer science, mathematics and statistics. Purely experimental work 
or package overviews are welcome. They will be up for viewing during 
Tuesday's (and probably Wednesday's) social hour.


Deadline is July 15 so we can estimate the number of display boards. Max 
size is 48 x 36 inches.


For questions please contact voben...@fredhutch.org.

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Warnings from ls() in AnnotationDbi

2015-04-24 Thread Martin Morgan

On 04/24/2015 09:07 AM, James W. MacDonald wrote:

A poster on the support site reported some warnings issued when running
hyperGTest() from GOstats. I tracked this down to the ls() function from
AnnotationDbi, when it dispatches on AnnDbBimap objects. The method is:

setMethod(ls, signature(name=Bimap),
 function(name, pos, envir, all.names, pattern){
 if (!missing(pos))
   warning(ignoring 'pos' argument)
 if (!missing(envir))
   warning(ignoring 'envir' argument)
 if (!missing(all.names))
   warning(ignoring 'all.names' argument)
 .ls(name, pos, envir, all.names, pattern)
 }
)

And as noted, everything but 'name' is ignored in .ls().

This seemingly hasn't changed in years. In R-3.1.1 the method appears as


showMethods(ls, class = AnnDbBimap, includeDefs = T)

Function: ls (package base)
name=AnnDbBimap
function (name, pos = -1L, envir = as.environment(pos), all.names = FALSE,
 pattern)
{
 if (!missing(pos))
 warning(ignoring 'pos' argument)
 if (!missing(envir))
 warning(ignoring 'envir' argument)
 if (!missing(all.names))
 warning(ignoring 'all.names' argument)
 .ls(name, pos, envir, all.names, pattern)
}

And now in R-3.2.0, the method appears as


showMethods(ls, class = Bimap, includeDefs = TRUE)

Function: ls (package base)
name=Bimap
function (name, pos = -1L, envir = as.environment(pos), all.names = FALSE,
 pattern, sorted = TRUE)
{
 .local - function (name, pos, envir, all.names, pattern)
 {
 if (!missing(pos))
 warning(ignoring 'pos' argument)
 if (!missing(envir))
 warning(ignoring 'envir' argument)
 if (!missing(all.names))
 warning(ignoring 'all.names' argument)
 .ls(name, pos, envir, all.names, pattern)
 }
 .local(name, pos, envir, all.names, pattern)
}

Where everything gets wrapped in a call to .local(). Prior to this, we
never saw the warnings, but now we do:

  z - ls(annotate:::getAnnMap(BPPARENTS, chip = GO))
Warning messages:
1: In .local(name, pos, envir, all.names, pattern) :
   ignoring 'pos' argument
2: In .local(name, pos, envir, all.names, pattern) :
   ignoring 'envir' argument
3: In .local(name, pos, envir, all.names, pattern) :
   ignoring 'all.names' argument

This is going to be a consistent issue going forward, so should all those
warnings be stripped out of the method for ls() in AnnotationDbi?


I changed the signature of the method to match the signature of the generic, so 
the missing-ness of the arguments, and hence warnings, are assessed appropriately.


In devel, I supported sort=..., whereas in release it is ignored.

Thanks for pointing this out.

Martin



Best,

Jim








--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] as.character method for GenomicRanges?

2015-04-24 Thread Michael Lawrence
Sorry, one more concern, if you're thinking of using as a range key, you
will need the strand, but many use cases might not want the strand on
there. Like for pasting into a genome browser.

On Fri, Apr 24, 2015 at 10:18 AM, Michael Lawrence micha...@gene.com
wrote:

 It is a great idea, but I'm not sure I would use it to implement table().
 Allocating those strings will be costly. Don't we already have the 4-way
 int hash? Of course, my intuition might be completely off here.


 On Fri, Apr 24, 2015 at 9:59 AM, Hervé Pagès hpa...@fredhutch.org wrote:

 Hi Pete,

 Excellent idea. That will make things like table() work out-of-the-box
 on GenomicRanges objects. I'll add that.

 Thanks,
 H.



 On 04/24/2015 09:43 AM, Peter Haverty wrote:

 Would people be interested in having this:

 setMethod(as.character, GenomicRanges,
function(x) {
paste0(seqnames(x), :, start(x), -, end(x))
})

 ?

 I find myself doing that a lot to make unique names or for output that
 goes to collaborators.  I suppose we might want to tack on the strand if
 it
 isn't *.  I have some code for going the other direction too, if there
 is
 interest.



 Pete

 
 Peter M. Haverty, Ph.D.
 Genentech, Inc.
 phave...@gene.com

 [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


 --
 Hervé Pagès

 Program in Computational Biology
 Division of Public Health Sciences
 Fred Hutchinson Cancer Research Center
 1100 Fairview Ave. N, M1-B514
 P.O. Box 19024
 Seattle, WA 98109-1024

 E-mail: hpa...@fredhutch.org
 Phone:  (206) 667-5791
 Fax:(206) 667-1319


 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel




[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] as.character method for GenomicRanges?

2015-04-24 Thread Michael Lawrence
It is a great idea, but I'm not sure I would use it to implement table().
Allocating those strings will be costly. Don't we already have the 4-way
int hash? Of course, my intuition might be completely off here.


On Fri, Apr 24, 2015 at 9:59 AM, Hervé Pagès hpa...@fredhutch.org wrote:

 Hi Pete,

 Excellent idea. That will make things like table() work out-of-the-box
 on GenomicRanges objects. I'll add that.

 Thanks,
 H.



 On 04/24/2015 09:43 AM, Peter Haverty wrote:

 Would people be interested in having this:

 setMethod(as.character, GenomicRanges,
function(x) {
paste0(seqnames(x), :, start(x), -, end(x))
})

 ?

 I find myself doing that a lot to make unique names or for output that
 goes to collaborators.  I suppose we might want to tack on the strand if
 it
 isn't *.  I have some code for going the other direction too, if there
 is
 interest.



 Pete

 
 Peter M. Haverty, Ph.D.
 Genentech, Inc.
 phave...@gene.com

 [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


 --
 Hervé Pagès

 Program in Computational Biology
 Division of Public Health Sciences
 Fred Hutchinson Cancer Research Center
 1100 Fairview Ave. N, M1-B514
 P.O. Box 19024
 Seattle, WA 98109-1024

 E-mail: hpa...@fredhutch.org
 Phone:  (206) 667-5791
 Fax:(206) 667-1319


 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] as.character method for GenomicRanges?

2015-04-24 Thread Hervé Pagès

On 04/24/2015 10:18 AM, Michael Lawrence wrote:

It is a great idea, but I'm not sure I would use it to implement
table(). Allocating those strings will be costly. Don't we already have
the 4-way int hash? Of course, my intuition might be completely off here.


It does use the 4-way int hash internally. as.character() is only used
at the very-end to stick the names on the returned table object.

H.




On Fri, Apr 24, 2015 at 9:59 AM, Hervé Pagès hpa...@fredhutch.org
mailto:hpa...@fredhutch.org wrote:

Hi Pete,

Excellent idea. That will make things like table() work out-of-the-box
on GenomicRanges objects. I'll add that.

Thanks,
H.



On 04/24/2015 09:43 AM, Peter Haverty wrote:

Would people be interested in having this:

setMethod(as.character, GenomicRanges,
function(x) {
paste0(seqnames(x), :, start(x), -, end(x))
})

?

I find myself doing that a lot to make unique names or for
output that
goes to collaborators.  I suppose we might want to tack on the
strand if it
isn't *.  I have some code for going the other direction too,
if there is
interest.



Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com mailto:phave...@gene.com

 [[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org
mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org mailto:hpa...@fredhutch.org
Phone: (206) 667-5791 tel:%28206%29%20667-5791
Fax: (206) 667-1319 tel:%28206%29%20667-1319


___
Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel




--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel