Re: [Bioc-devel] plotPCA for BiocGenerics

Thomas Lin Pedersen Fri, 31 Oct 2014 14:37:34 -0700

With regards to abstraction - I would personally much rather read and write 
code that contained plotScores() and plotScree() etc. where the intend of the 
code is clearly communicated, instead of relying on a plot() function whose 
result is only known from experience. Trying to squeeze every kind of visual 
output into the same plot generic seems artificial and constrained to me. I 
totally agree on the plotPCA critique on the other hand…


Thomas


> On 31 Oct 2014, at 22:09, Michael Lawrence <lawrence.mich...@gene.com> wrote:
> 
> I strongly agree with Kevin's position. plotPCA() represents two separate 
> concerns in its very name: the computation and the rendering. Those need to 
> be separated, at least behind the scenes. The syntax of plot(pca(x)) is 
> preferable to plotPCA, because the structure of the operation is represented 
> by in the expression itself, not just in a non-computable function name.
> 
> With regard to how a plot,PCA should behave: there is always a tension 
> between high-level and low-level APIs. In the end, we need multiple levels of 
> abstraction.  While high-level APIs sacrifice flexibility, we need them 
> because they communicate the high-level *intent* of the user in the code 
> itself (self-documenting code), and they enable reusability, which not only 
> reduces redudant effort but also ensures consistency. Once our brains no 
> longer need to parse low-level code, we can focus our mental power on 
> correctness and efficiency. To design a high-level API, one needs to 
> carefully analyze user requirements, i.e., the use cases. To choose the 
> default behavior, one needs to rate the use cases by their prevalance, and by 
> how closely they match the intuition-based expectations of the user.
> 
> The fact that at least 9 packages are performing such a similar task seems to 
> indicate that a common abstraction is warranted, but I am not sure if 
> BiocGenerics is the appropriate place.
> 
> Michael
> 
> On Tue, Oct 21, 2014 at 12:54 AM, Thomas Dybdal Pedersen <thomas...@gmail.com 
> <mailto:thomas...@gmail.com>> wrote:
> While I tend to agree with you that PCA is too big an operation to be hidden 
> within a plotting function (MDS is an edge-case I would say), I can’t see how 
> we can ever reach a point where there is only one generic plot function. In 
> the case of PCA there is a number of different plot-types that can all lay 
> claim to the plot function of a PCA class, for instance scoreplot, 
> scatterplot matrix of all scores, biplot, screeplot, accumulated R^2 barplot, 
> leverage vs. distance-to-model… (you get the idea). So while having some very 
> well-thought out classes for very common result types such as PCA, this class 
> would still need a lot of different plot methods such as plotScores, 
> plotScree etc (or plot(…, type=‘score’), but I don’t find that very 
> appealing). Expanding beyond PCA only muddles the water even more - there are 
> very few interesting data structures that only have one visual representation 
> to-rule-them-all…
> 
> just my 2c
> 
> best
> Thomas
> 
> 
> > Date: Mon, 20 Oct 2014 18:50:48 -0400
> > From: Kevin Coombes <kevin.r.coom...@gmail.com 
> > <mailto:kevin.r.coom...@gmail.com>>
> >
> > Well. I have two responses to that.
> >
> > First, I think it would be a lot better/easier for users if (most)
> > developers could make use of the same plot function for "basic" classes
> > like PCA.
> >
> > Second, if you think the basic PCA plotting routine needs enhancements,
> > you still have two options.  On the one hand, you could (as you said)
> > try to convince the maintainer of PCA to add what you want.  If it's
> > generally valuable, then he'd probably do it --- and other classes that
> > use it would benefit.  On the other hand, if it really is a special
> > enhancement that only makes sense for your class, then you can derive a
> > class from the basic PCA class
> >     setClass("mySpecialPCA", contains=c("PCA"), *other stuff here*)
> >  and implement your own version of the "plot" generic for this class.
> > And you could tweak the "as.PCA" function so it returns an object of the
> > mySpecialPCA class. And the user could still just "plot" the result
> > without hacving to care what's happening behind the scenes.
> >
> > On 10/20/2014 5:59 PM, Michael Love wrote:
> >> Ah, I see now. Personally, I don't think Bioconductor developers
> >> should have to agree on single plotting functions for basic classes
> >> like 'PCA' (because this logic applies equally to the situation of all
> >> Bioconductor developers agreeing on single MA-plot, a single
> >> variance-mean plot, etc). I think letting developers define their
> >> plotPCA makes contributions easier (I don't have to ask the owner of
> >> plot.PCA to incorporate something), even though it means we have a
> >> growing list of generics.
> >>
> >> Still you have a good point about splitting computation and plotting.
> >> In practice, we subset the rows so PCA is not laborious.
> >>
> >>
> >> On Mon, Oct 20, 2014 at 5:38 PM, Kevin Coombes
> >> <kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com> 
> >> <mailto:kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com>>> 
> >> wrote:
> >>
> >>    Hi,
> >>
> >>    I don't see how it needs more functions (as long as you can get
> >>    developers to agree).  Suppose that someone can define a reusable
> >>    PCA class.  This will contain a single "plot" generic function,
> >>    defined once and reused by other classes. The existing "plotPCA"
> >>    interface can also be implemented just once, in this class, as
> >>
> >>        plotPCA <- function(object, ...) plot(as.PCA(object), ...)
> >>
> >>    This can be exposed to users of your class through namespaces.
> >>    Then the only thing a developer needs to implement in his own
> >>    class is the single "as.PCA" function.  And he/she would have
> >>    already been rquired to implement this as part of the old
> >>    "plotPCA" function.  So it can be extracted from that, and the
> >>    developer doesn't have to reimplement the visualization code from
> >>    the PCA class.
> >>
> >>    Best,
> >>      Kevin
> >>
> >>
> >>    On 10/20/2014 5:15 PM, davide risso wrote:
> >>>    Hi Kevin,
> >>>
> >>>    I see your points and I agree (especially for the specific case
> >>>    of plotPCA that involves some non trivial computations).
> >>>
> >>>    On the other hand, having a wrapper function that starting from
> >>>    the "raw" data gives you a pretty picture (with virtually zero
> >>>    effort by the user) using a sensible choice of parameters that
> >>>    are more or less OK for RNA-seq data is useful for practitioners
> >>>    that just want to look for patterns in the data.
> >>>
> >>>    I guess it would be the same to have a PCA method for each of the
> >>>    objects and then using the plot method on those new objects, but
> >>>    that would just create a lot more objects and functions than the
> >>>    current approach (like Mike was saying).
> >>>
> >>>    Your "as.pca" or "performPCA" approach would be definitely better
> >>>    if all the different methods would create objects of the *same*
> >>>    PCA class, but since we are talking about different packages, I
> >>>    don't know how easy it would be to coordinate. But perhaps this
> >>>    is the way we should go.
> >>>
> >>>    Best,
> >>>    davide
> >>>
> >>>
> >>>
> >>>    On Mon, Oct 20, 2014 at 1:26 PM, Kevin Coombes
> >>>    <kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com> 
> >>> <mailto:kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com>>> 
> >>> wrote:
> >>>
> >>>        Hi,
> >>>
> >>>        It depends.
> >>>
> >>>        The "traditional" R approach to these matters is that you (a)
> >>>        first perform some sort of an analysis and save the results
> >>>        as an object and then (b) show or plot what you got.  It is
> >>>        part (b) that tends to be really generic, and (in my opinion)
> >>>        should have really generic names -- like "show" or "plot" or
> >>>        "hist" or "image".
> >>>
> >>>        With PCA in particular, you usually have to perform a bunch
> >>>        of computations in order to get the principal components from
> >>>        some part of the data.  As I understand it now, these
> >>>        computations are performed along the way as part of the
> >>>        various "plotPCA" functions.  The "R way" to do this would be
> >>>        something like
> >>>            pca <- performPCA(mySpecialObject)  # or
> >>>        as.PCA(mySpecialObject)
> >>>            plot(pca) # to get the scatter plot
> >>>        This apporach has the user-friendly advantage that you can
> >>>        tweak the plot (in terms of colors, symbols, ranges, titles,
> >>>        etc) without having to recompute the principal components
> >>>        every time. (I often find myself re-plotting the same PCA
> >>>        several times, with different colors or symbols for different
> >>>        factrors associated with the samples.) In addition, you could
> >>>        then also do something like
> >>>            screeplot(pca)
> >>>        to get a plot of the percentages of variance explained.
> >>>
> >>>        My own feeling is that if the object doesn't know what to do
> >>>        when you tell it to "plot" itself, then you haven't got the
> >>>        right abstraction.
> >>>
> >>>        You may still end up needing generics for each kind of
> >>>        computation you want to perform (PCA, RLE, MA, etc), which is
> >>>        why I suggested an "as.PCA" function.  After all, "as" is
> >>>        already pretty generic.  In the long run, l this would herlp
> >>>        BioConductor developers, since they wouldn't all have to
> >>>        reimplement the visualization code; they would just have to
> >>>        figure out how to convert their own object into a PCA or RLE
> >>>        or MA object.
> >>>
> >>>        And I know that this "plotWhatever" approach is used
> >>>        elsewhere in BioConductor, and it has always bothered me. It
> >>>        just seemed that a post suggesting a new generic function
> >>>        provided a reasonable opportunity to point out that there
> >>>        might be a better way.
> >>>
> >>>        Best,
> >>>          Kevin
> >>>
> >>>        PS: My own "ClassDicsovery" package, which is available from
> >>>        RForge via
> >>>        **|install.packages("ClassDiscovery",
> >>>        repos="http://R-Forge.R-project.org 
> >>> <http://r-forge.r-project.org/>"
> >>>        <http://R-Forge.R-project.org <http://r-forge.r-project.org/>>)|**
> >>>        includes a "SamplePCA" class that does something roughly
> >>>        similar to this for microarrays.
> >>>
> >>>        PPS (off-topic): The worst offender in base R -- because it
> >>>        doesn't use this "typical" approch -- is the "heatmap"
> >>>        function.  Having tried to teach this function in several
> >>>        different classes, I have come to the conclusion that it is
> >>>        basically unusable by mortals.  And I think the problem is
> >>>        that it tries to combine too many steps -- clustering rows,
> >>>        clustering columns, scaling, visualization -- all in a single
> >>>        fiunction
> >>>
> >>>
> >>>        On 10/20/2014 3:47 PM, davide risso wrote:
> >>>>        Hi Kevin,
> >>>>
> >>>>        I don't agree. In the case of EDASeq (as I suppose it is the
> >>>>        case for DESeq/DESeq2) plotting the principal components of
> >>>>        the count matrix is only one of possible exploratory plots
> >>>>        (RLE plots, MA plots, etc.).
> >>>>        So, in my opinion, it makes more sense from an object
> >>>>        oriented point of view to have multiple plotting methods for
> >>>>        a single "RNA-seq experiment" object.
> >>>>
> >>>>        In addition, this is the same strategy adopted elsewhere in
> >>>>        Bioconductor, e.g., for the plotMA method.
> >>>>
> >>>>        Just my two cents.
> >>>>
> >>>>        Best,
> >>>>        davide
> >>>>
> >>>>        On Mon, Oct 20, 2014 at 11:30 AM, Kevin Coombes
> >>>>        <kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com>
> >>>>        <mailto:kevin.r.coom...@gmail.com 
> >>>> <mailto:kevin.r.coom...@gmail.com>>> wrote:
> >>>>
> >>>>            I understand that breaking code is a problem, and that
> >>>>            is admittedly the main reason not to immediately adopt
> >>>>            my suggestion.
> >>>>
> >>>>            But as a purely logical exercise, creating a "PCA"
> >>>>            object X or something similar and using either
> >>>>                plot(X)
> >>>>            or
> >>>>            plot(as.PCA(mySpecialObject))
> >>>>            is a much more sensible use of object-oriented
> >>>>            programming/design. This requires no new generics (to
> >>>>            write or to learn).
> >>>>
> >>>>            And you could use it to transition away from the current
> >>>>            system by convincing the various package maintainers to
> >>>>            re-implement plotPCA as follows:
> >>>>
> >>>>            plotPCA <- function(object, ...) {
> >>>>              plot(as.PCA(object), ...)
> >>>>            }
> >>>>
> >>>>            This would be relatively easy to eventually deprecate
> >>>>            and teach users to switch to the alternative.
> >>>>
> >>>>
> >>>>            On 10/20/2014 1:07 PM, Michael Love wrote:
> >>>>>            hi Kevin,
> >>>>>
> >>>>>            that would imply there is only one way to plot an
> >>>>>            object of a given class. Additionally, it would break a
> >>>>>            lot of code.?
> >>>>>
> >>>>>            best,
> >>>>>
> >>>>>            Mike
> >>>>>
> >>>>>            On Mon, Oct 20, 2014 at 12:50 PM, Kevin Coombes
> >>>>>            <kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com>
> >>>>>            <mailto:kevin.r.coom...@gmail.com 
> >>>>> <mailto:kevin.r.coom...@gmail.com>>> wrote:
> >>>>>
> >>>>>                But shouldn't they all really just be named "plot"
> >>>>>                for the appropriate objects?  In which case, there
> >>>>>                would already be a perfectly good generic....
> >>>>>
> >>>>>                On Oct 20, 2014 10:27 AM, "Michael Love"
> >>>>>                <michaelisaiahl...@gmail.com 
> >>>>> <mailto:michaelisaiahl...@gmail.com>
> >>>>>                <mailto:michaelisaiahl...@gmail.com 
> >>>>> <mailto:michaelisaiahl...@gmail.com>>> wrote:
> >>>>>
> >>>>>                    I noticed that 'plotPCA' functions are defined
> >>>>>                    in EDASeq, DESeq2, DESeq,
> >>>>>                    affycoretools, Rcade, facopy, CopyNumber450k,
> >>>>>                    netresponse, MAIT (maybe
> >>>>>                    more).
> >>>>>
> >>>>>                    Sounds like a case for BiocGenerics.
> >>>>>
> >>>>>                    best,
> >>>>>
> >>>>>                    Mike
> >>>>>
> >>>>>                    [[alternative HTML version deleted]]
> >>>>>
> >>>>>                    _______________________________________________
> >>>>>                    Bioc-devel@r-project.org 
> >>>>> <mailto:Bioc-devel@r-project.org>
> >>>>>                    <mailto:Bioc-devel@r-project.org 
> >>>>> <mailto:Bioc-devel@r-project.org>> mailing list
> >>>>>                    https://stat.ethz.ch/mailman/listinfo/bioc-devel 
> >>>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>>            
> >>>> ------------------------------------------------------------------------
> >>>>            <http://www.avast.com/ <http://www.avast.com/>>
> >>>>
> >>>>            This email is free from viruses and malware because
> >>>>            avast! Antivirus <http://www.avast.com/ 
> >>>> <http://www.avast.com/>> protection is
> >>>>            active.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>        --
> >>>>        Davide Risso, PhD
> >>>>        Post Doctoral Scholar
> >>>>        Division of Biostatistics
> >>>>        School of Public Health
> >>>>        University of California, Berkeley
> >>>>        344 Li Ka Shing Center, #3370
> >>>>        Berkeley, CA 94720-3370
> >>>>        E-mail: davide.ri...@berkeley.edu 
> >>>> <mailto:davide.ri...@berkeley.edu>
> >>>>        <mailto:davide.ri...@berkeley.edu 
> >>>> <mailto:davide.ri...@berkeley.edu>>
> >>>
> >>>
> >>>
> >>>        
> >>> ------------------------------------------------------------------------
> >>>        <http://www.avast.com/ <http://www.avast.com/>>
> >>>
> >>>        This email is free from viruses and malware because avast!
> >>>        Antivirus <http://www.avast.com/ <http://www.avast.com/>> 
> >>> protection is active.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>    --
> >>>    Davide Risso, PhD
> >>>    Post Doctoral Scholar
> >>>    Division of Biostatistics
> >>>    School of Public Health
> >>>    University of California, Berkeley
> >>>    344 Li Ka Shing Center, #3370
> >>>    Berkeley, CA 94720-3370
> >>>    E-mail: davide.ri...@berkeley.edu <mailto:davide.ri...@berkeley.edu> 
> >>> <mailto:davide.ri...@berkeley.edu <mailto:davide.ri...@berkeley.edu>>
> >>
> >>
> >>
> >>    ------------------------------------------------------------------------
> >>    <http://www.avast.com/ <http://www.avast.com/>>
> >>
> >>    This email is free from viruses and malware because avast!
> >>    Antivirus <http://www.avast.com/ <http://www.avast.com/>> protection is 
> >> active.
> >>
> >>
> >>
> >
> >
> >
> > ---
> > This email is free from viruses and malware because avast! Antivirus 
> > protection is active.
> >
> >
> >       [[alternative HTML version deleted]]
> >
> >
> >
> > ------------------------------
> >
> > _______________________________________________
> > Bioc-devel mailing list
> > Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel 
> > <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
> >
> >
> > End of Bioc-devel Digest, Vol 127, Issue 43
> > *******************************************
> 
> _______________________________________________
> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel 
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
> 


        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] plotPCA for BiocGenerics

Reply via email to