Re: [Bioc-devel] Compatibility of Bioconductor with tidyverse S3 classes/methods

Martin Morgan Fri, 07 Feb 2020 02:27:48 -0800

yes, absolutely. A common pattern might be to implement a generic

    setGeneric("foo", function(x, ...) standardGeneric("foo"))


an ‘internal’ function that implements the method on base R data types

    .foo <- function(x) {
        stopifnot("'x' must be a matrix" = is.matrix(x))
        t(x)
    }

and methods that act as a facade to the implementation

    setMethod("foo", "tbl_df", function(x) {
        x <- as.matrix(x)
        result <- .foo(x)
        as_tibble(result)
    })

    setMethod("foo", "SummarizedExperiment", function(x) {
        result <- .foo(assay(x))
        assays(x)[["foo"]] <- result
        x
    })

One would expect the vignette and examples to primarily emphasize the use of 
the interoperable (SummmarizedExperiment) version.

Martin Morgan

From: stefano <mangiolastef...@gmail.com>
Date: Friday, February 7, 2020 at 12:31 AM
To: Michael Lawrence <lawrence.mich...@gene.com>
Cc: Martin Morgan <mtmorgan.b...@gmail.com>, "bioc-devel@r-project.org" 
<bioc-devel@r-project.org>
Subject: Re: [Bioc-devel] Compatibility of Bioconductor with tidyverse S3 
classes/methods

Would this scenario satisfy " make the package _directly_ compatible with 
standard Bioconductor data structures"

If an input is SummarizedExperiment return SummarizedExperiment, if the input 
is a tbl_df or ttBulk, return ttBulk (?) 


Best wishes.
Stefano 
 
Stefano Mangiola | Postdoctoral fellow
Papenfuss Laboratory
The Walter Eliza Hall Institute of Medical Research
+61 (0)466452544


Il giorno ven 7 feb 2020 alle ore 16:15 Michael Lawrence 
<mailto:lawrence.mich...@gene.com> ha scritto:
I would urge you to make the package _directly_ compatible with
standard Bioconductor data structures; no explicit conversion. But you
can create wrapper methods (even on an S3 generic) that perform the
conversion automatically. You'll probably want two separate APIs
though (in different styles), for one thing automatic conversion is
obviously not possible for return values.

Michael

On Thu, Feb 6, 2020 at 5:34 PM stefano <mailto:mangiolastef...@gmail.com> wrote:
>
> Thanks Michael,
>
> yes in a sense, ttBulk and SummariseExperiment can be considere as two 
> interfaces. Would be fair enough to create a function that convert from one 
> to the other, although the default would be ttBulk?
>
> > I'm not sure the tidyverse is a great answer to the user interface, because 
> > it lacks domain semantics
>
> Would be fair to say that ttBulk class could be considered a tibble with 
> specific semantics? In the sense that it holds information about key column 
> names (.sample, .transcript, .abundance, .normalised_abundance, etc..), and 
> has a validator (that is triggered at every ttBulk function).
>
> I think at the moment, given (i) S3 problem, and (ii) the lack of formal 
> foundation on SummaisedExperiment interface (that maybe would require an S4 
> technology itself, where SummariseExperiment could be a slot?) my package 
> would belong more to CRAN, until those two issues will have been resolved.
>
> I imagine there are not many cases where a CRAN package migrated to 
> Bioconductor after complying with the ecosystem policies.
>
> Thanks a lot.
>
> Best wishes.
>
> Stefano
>
>
>
> Stefano Mangiola | Postdoctoral fellow
>
> Papenfuss Laboratory
>
> The Walter Eliza Hall Institute of Medical Research
>
> +61 (0)466452544
>
>
>
> Il giorno ven 7 feb 2020 alle ore 12:12 Michael Lawrence 
> <mailto:lawrence.mich...@gene.com> ha scritto:
>>
>> There's a difference between implementing software, where one wants
>> formal data structures, and providing a convenient user interface.
>> Software needs to interface with other software, so a package could
>> provide both types of interfaces, one based on rich (S4) data
>> structures, another on simpler structures with an API more amenable to
>> analysis. I'm not sure the tidyverse is a great answer to the user
>> interface, because it lacks domain semantics. This is still an active
>> area of research (see Stuart Lee's plyranges, for example). I hope you
>> can find a reasonable compromise that enables you to integrate ttBulk
>> into Bioconductor, so that it can take advantage of the synergies the
>> ecosystem provides.
>>
>> PS: There is no simple fix for your example.
>>
>> Michael
>>
>> On Thu, Feb 6, 2020 at 4:12 PM stefano <mailto:mangiolastef...@gmail.com> 
>> wrote:
>> >
>> > Thanks a lot for your comment Martin and Michael,
>> >
>> > Here I reply to Marti's comment. Michael I will try to implement your
>> > solution!
>> >
>> > I think a key point from
>> > https://github.com/Bioconductor/Contributions/issues/1355#issuecomment-580977106
>> > (that I was under-looking) is
>> >
>> > *>> "So to sum up: if you submit a package to Bioconductor, there is an
>> > expectation that your package can work seamlessly with other Bioconductor
>> > packages, and your implementation should support that. The safest and
>> > easiest way to do that is to use Bioconductor data structures"*
>> >
>> > In this case my package would not be suited as I do not use pre-existing
>> > Bioconductor data structures, but instead i see value in using a simple
>> > tibble, for the reasons in part explained in the README
>> > https://github.com/stemangiola/ttBulk (harvesting the power of tidyverse
>> > and friends for bulk transcriptomic analyses).
>> >
>> > *>> "with the minimum standard of being able to accept such objects even if
>> > you do not rely on them internally (though you should)"*
>> >
>> > With this I can comply in the sense that I can built converters to and from
>> > SummarizedExperiment (for example).
>> >
>> > * >> "If you don't want to do that, then that's a shame, but it would
>> > suggest that Bioconductor would not be the right place to host this
>> > package."*
>> >
>> > Well said.
>> >
>> > In summary, I do not rely on Bioconductor data structure, as I am proposing
>> > another paradigm, but my back end is made of largely Bioconductor analysis
>> > packages that I would like to interface with tidyverse. So
>> >
>> > 1) Should I build converters to Bioc. data structures, and force the use of
>> > S3 object (needed to tiidyverse to work), or
>> > 2) Submit to CRAN
>> >
>> > I don't have strong feeling for either, although I think Bioconductor would
>> > be a good fit. Please community give me your honest opinions, I will take
>> > them seriously and proceed.
>> >
>> >
>> >
>> > Best wishes.
>> >
>> > *Stefano *
>> >
>> >
>> >
>> > Stefano Mangiola | Postdoctoral fellow
>> >
>> > Papenfuss Laboratory
>> >
>> > The Walter Eliza Hall Institute of Medical Research
>> >
>> > +61 (0)466452544
>> >
>> >
>> > Il giorno ven 7 feb 2020 alle ore 10:46 Martin Morgan <
>> > mailto:mtmorgan.b...@gmail.com> ha scritto:
>> >
>> > > The idea isn't to use S4 at any cost, but to 'play well' with the
>> > > Bioconductor ecosystem, including writing robust and maintainable code.
>> > >
>> > > This comment
>> > > https://github.com/Bioconductor/Contributions/issues/1355#issuecomment-580977106
>> > > provides some motivation; there was also an interesting exchange on the
>> > > Bioconductor community slack about this (join at
>> > > https://bioc-community.herokuapp.com/; discussion starting with
>> > > https://community-bioc.slack.com/archives/C35G93GJH/p1580144746014800).
>> > > The plyranges package http://bioconductor.org/packages/plyranges and
>> > > recently accepted fluentGenomics workflow
>> > > https://github.com/Bioconductor/Contributions/issues/1350 provide
>> > > illustrations.
>> > >
>> > > In your domain it's really surprising that your package does not use
>> > > (Import or Depend on) SummarizedExperiment or GenomicRanges packages. 
>> > > From
>> > > a superficial look at your package, it seems like something like
>> > > `reduce_dimensions()` could be defined to take & return a
>> > > SummarizedExperiment and hence benefit from some of the points in the
>> > > github issue comment mentioned above.
>> > >
>> > > Certainly there is a useful transition, both 'on the way in' to a
>> > > SummarizedExperiment, and after leaving the more specialized 
>> > > bioinformatic
>> > > computations to, e.g., display a pairs plot of the reduced dimensions,
>> > > where one might re-shape the data to a tidy format and use 'plain old'
>> > > tibbles; the fluentGenomics workflow might provide some guidance.
>> > >
>> > > At the end of the day it would not be surprising for Bioconductor 
>> > > packages
>> > > to make use of tidy concepts and data structures, particularly in the
>> > > vignette, and it would be a mistake for Bioconductor to exclude
>> > > well-motivated 'tidy' representations.
>> > >
>> > > Martin Morgan
>> > >
>> > > On 2/6/20, 5:46 PM, "Bioc-devel on behalf of stefano" <
>> > > mailto:bioc-devel-boun...@r-project.org on behalf of 
>> > > mailto:mangiolastef...@gmail.com>
>> > > wrote:
>> > >
>> > >     Hello,
>> > >
>> > >     I have a package (ttBulk) under review. I have been told to replace
>> > > the S3
>> > >     system to S4. My package is based on the class tbl_df and must be 
>> > >fully
>> > >     compatible with tidyverse methods (inheritance). After some tests and
>> > >     research I understood that tidyverse ecosystem is not compatible with
>> > > S4
>> > >     classes.
>> > >
>> > >      For example, several methos do not apparently handle S4 objects 
>> > >based
>> > > on
>> > >     S3 tbl_df
>> > >
>> > >     ```library(tidyverse)setOldClass("tbl_df")
>> > >     setClass("test2", contains = "tbl_df")
>> > >     my <- new("test2",  tibble(a = 1))
>> > >     my %>%  mutate(b = 3)
>> > >
>> > >        a b
>> > >     1 1 3
>> > >     ```
>> > >
>> > >      ```my <- new("test2",  tibble(a = rnorm(100), b = 1))
>> > >     my %>% nest(data = -b)
>> > >     Error: `x` must be a vector, not a `test2` object
>> > >     Run `rlang::last_error()` to see where the error occurred.
>> > >     ```
>> > >
>> > >     Could you please advise whether a tidyverse based package can be
>> > > hosted on
>> > >     Bioconductor, and if S4 classes are really mandatory? I need to
>> > > understand
>> > >     if I am forced to submit to CRAN instead (although Bioconductor would
>> > > be a
>> > >     good fit, sice I try to interface transcriptional analysis tools to
>> > > tidy
>> > >     universe)
>> > >
>> > >
>> > >     Thanks a lot.
>> > >     Stefano
>> > >
>> > >         [[alternative HTML version deleted]]
>> > >
>> > >     _______________________________________________
>> > >     mailto:Bioc-devel@r-project.org mailing list
>> > >     https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> > >
>> > >
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > _______________________________________________
>> > mailto:Bioc-devel@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>>
>> --
>> Michael Lawrence
>> Senior Scientist, Bioinformatics and Computational Biology
>> Genentech, A Member of the Roche Group
>> Office +1 (650) 225-7760
>> mailto:micha...@gene.com
>>
>> Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube



-- 
Michael Lawrence
Senior Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
mailto:micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Compatibility of Bioconductor with tidyverse S3 classes/methods

Reply via email to