With regards to abstraction - I would personally much rather read and write code that contained plotScores() and plotScree() etc. where the intend of the code is clearly communicated, instead of relying on a plot() function whose result is only known from experience. Trying to squeeze every kind of visual output into the same plot generic seems artificial and constrained to me. I totally agree on the plotPCA critique on the other hand…
Thomas > On 31 Oct 2014, at 22:09, Michael Lawrence <lawrence.mich...@gene.com> wrote: > > I strongly agree with Kevin's position. plotPCA() represents two separate > concerns in its very name: the computation and the rendering. Those need to > be separated, at least behind the scenes. The syntax of plot(pca(x)) is > preferable to plotPCA, because the structure of the operation is represented > by in the expression itself, not just in a non-computable function name. > > With regard to how a plot,PCA should behave: there is always a tension > between high-level and low-level APIs. In the end, we need multiple levels of > abstraction. While high-level APIs sacrifice flexibility, we need them > because they communicate the high-level *intent* of the user in the code > itself (self-documenting code), and they enable reusability, which not only > reduces redudant effort but also ensures consistency. Once our brains no > longer need to parse low-level code, we can focus our mental power on > correctness and efficiency. To design a high-level API, one needs to > carefully analyze user requirements, i.e., the use cases. To choose the > default behavior, one needs to rate the use cases by their prevalance, and by > how closely they match the intuition-based expectations of the user. > > The fact that at least 9 packages are performing such a similar task seems to > indicate that a common abstraction is warranted, but I am not sure if > BiocGenerics is the appropriate place. > > Michael > > On Tue, Oct 21, 2014 at 12:54 AM, Thomas Dybdal Pedersen <thomas...@gmail.com > <mailto:thomas...@gmail.com>> wrote: > While I tend to agree with you that PCA is too big an operation to be hidden > within a plotting function (MDS is an edge-case I would say), I can’t see how > we can ever reach a point where there is only one generic plot function. In > the case of PCA there is a number of different plot-types that can all lay > claim to the plot function of a PCA class, for instance scoreplot, > scatterplot matrix of all scores, biplot, screeplot, accumulated R^2 barplot, > leverage vs. distance-to-model… (you get the idea). So while having some very > well-thought out classes for very common result types such as PCA, this class > would still need a lot of different plot methods such as plotScores, > plotScree etc (or plot(…, type=‘score’), but I don’t find that very > appealing). Expanding beyond PCA only muddles the water even more - there are > very few interesting data structures that only have one visual representation > to-rule-them-all… > > just my 2c > > best > Thomas > > > > Date: Mon, 20 Oct 2014 18:50:48 -0400 > > From: Kevin Coombes <kevin.r.coom...@gmail.com > > <mailto:kevin.r.coom...@gmail.com>> > > > > Well. I have two responses to that. > > > > First, I think it would be a lot better/easier for users if (most) > > developers could make use of the same plot function for "basic" classes > > like PCA. > > > > Second, if you think the basic PCA plotting routine needs enhancements, > > you still have two options. On the one hand, you could (as you said) > > try to convince the maintainer of PCA to add what you want. If it's > > generally valuable, then he'd probably do it --- and other classes that > > use it would benefit. On the other hand, if it really is a special > > enhancement that only makes sense for your class, then you can derive a > > class from the basic PCA class > > setClass("mySpecialPCA", contains=c("PCA"), *other stuff here*) > > and implement your own version of the "plot" generic for this class. > > And you could tweak the "as.PCA" function so it returns an object of the > > mySpecialPCA class. And the user could still just "plot" the result > > without hacving to care what's happening behind the scenes. > > > > On 10/20/2014 5:59 PM, Michael Love wrote: > >> Ah, I see now. Personally, I don't think Bioconductor developers > >> should have to agree on single plotting functions for basic classes > >> like 'PCA' (because this logic applies equally to the situation of all > >> Bioconductor developers agreeing on single MA-plot, a single > >> variance-mean plot, etc). I think letting developers define their > >> plotPCA makes contributions easier (I don't have to ask the owner of > >> plot.PCA to incorporate something), even though it means we have a > >> growing list of generics. > >> > >> Still you have a good point about splitting computation and plotting. > >> In practice, we subset the rows so PCA is not laborious. > >> > >> > >> On Mon, Oct 20, 2014 at 5:38 PM, Kevin Coombes > >> <kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com> > >> <mailto:kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com>>> > >> wrote: > >> > >> Hi, > >> > >> I don't see how it needs more functions (as long as you can get > >> developers to agree). Suppose that someone can define a reusable > >> PCA class. This will contain a single "plot" generic function, > >> defined once and reused by other classes. The existing "plotPCA" > >> interface can also be implemented just once, in this class, as > >> > >> plotPCA <- function(object, ...) plot(as.PCA(object), ...) > >> > >> This can be exposed to users of your class through namespaces. > >> Then the only thing a developer needs to implement in his own > >> class is the single "as.PCA" function. And he/she would have > >> already been rquired to implement this as part of the old > >> "plotPCA" function. So it can be extracted from that, and the > >> developer doesn't have to reimplement the visualization code from > >> the PCA class. > >> > >> Best, > >> Kevin > >> > >> > >> On 10/20/2014 5:15 PM, davide risso wrote: > >>> Hi Kevin, > >>> > >>> I see your points and I agree (especially for the specific case > >>> of plotPCA that involves some non trivial computations). > >>> > >>> On the other hand, having a wrapper function that starting from > >>> the "raw" data gives you a pretty picture (with virtually zero > >>> effort by the user) using a sensible choice of parameters that > >>> are more or less OK for RNA-seq data is useful for practitioners > >>> that just want to look for patterns in the data. > >>> > >>> I guess it would be the same to have a PCA method for each of the > >>> objects and then using the plot method on those new objects, but > >>> that would just create a lot more objects and functions than the > >>> current approach (like Mike was saying). > >>> > >>> Your "as.pca" or "performPCA" approach would be definitely better > >>> if all the different methods would create objects of the *same* > >>> PCA class, but since we are talking about different packages, I > >>> don't know how easy it would be to coordinate. But perhaps this > >>> is the way we should go. > >>> > >>> Best, > >>> davide > >>> > >>> > >>> > >>> On Mon, Oct 20, 2014 at 1:26 PM, Kevin Coombes > >>> <kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com> > >>> <mailto:kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com>>> > >>> wrote: > >>> > >>> Hi, > >>> > >>> It depends. > >>> > >>> The "traditional" R approach to these matters is that you (a) > >>> first perform some sort of an analysis and save the results > >>> as an object and then (b) show or plot what you got. It is > >>> part (b) that tends to be really generic, and (in my opinion) > >>> should have really generic names -- like "show" or "plot" or > >>> "hist" or "image". > >>> > >>> With PCA in particular, you usually have to perform a bunch > >>> of computations in order to get the principal components from > >>> some part of the data. As I understand it now, these > >>> computations are performed along the way as part of the > >>> various "plotPCA" functions. The "R way" to do this would be > >>> something like > >>> pca <- performPCA(mySpecialObject) # or > >>> as.PCA(mySpecialObject) > >>> plot(pca) # to get the scatter plot > >>> This apporach has the user-friendly advantage that you can > >>> tweak the plot (in terms of colors, symbols, ranges, titles, > >>> etc) without having to recompute the principal components > >>> every time. (I often find myself re-plotting the same PCA > >>> several times, with different colors or symbols for different > >>> factrors associated with the samples.) In addition, you could > >>> then also do something like > >>> screeplot(pca) > >>> to get a plot of the percentages of variance explained. > >>> > >>> My own feeling is that if the object doesn't know what to do > >>> when you tell it to "plot" itself, then you haven't got the > >>> right abstraction. > >>> > >>> You may still end up needing generics for each kind of > >>> computation you want to perform (PCA, RLE, MA, etc), which is > >>> why I suggested an "as.PCA" function. After all, "as" is > >>> already pretty generic. In the long run, l this would herlp > >>> BioConductor developers, since they wouldn't all have to > >>> reimplement the visualization code; they would just have to > >>> figure out how to convert their own object into a PCA or RLE > >>> or MA object. > >>> > >>> And I know that this "plotWhatever" approach is used > >>> elsewhere in BioConductor, and it has always bothered me. It > >>> just seemed that a post suggesting a new generic function > >>> provided a reasonable opportunity to point out that there > >>> might be a better way. > >>> > >>> Best, > >>> Kevin > >>> > >>> PS: My own "ClassDicsovery" package, which is available from > >>> RForge via > >>> **|install.packages("ClassDiscovery", > >>> repos="http://R-Forge.R-project.org > >>> <http://r-forge.r-project.org/>" > >>> <http://R-Forge.R-project.org <http://r-forge.r-project.org/>>)|** > >>> includes a "SamplePCA" class that does something roughly > >>> similar to this for microarrays. > >>> > >>> PPS (off-topic): The worst offender in base R -- because it > >>> doesn't use this "typical" approch -- is the "heatmap" > >>> function. Having tried to teach this function in several > >>> different classes, I have come to the conclusion that it is > >>> basically unusable by mortals. And I think the problem is > >>> that it tries to combine too many steps -- clustering rows, > >>> clustering columns, scaling, visualization -- all in a single > >>> fiunction > >>> > >>> > >>> On 10/20/2014 3:47 PM, davide risso wrote: > >>>> Hi Kevin, > >>>> > >>>> I don't agree. In the case of EDASeq (as I suppose it is the > >>>> case for DESeq/DESeq2) plotting the principal components of > >>>> the count matrix is only one of possible exploratory plots > >>>> (RLE plots, MA plots, etc.). > >>>> So, in my opinion, it makes more sense from an object > >>>> oriented point of view to have multiple plotting methods for > >>>> a single "RNA-seq experiment" object. > >>>> > >>>> In addition, this is the same strategy adopted elsewhere in > >>>> Bioconductor, e.g., for the plotMA method. > >>>> > >>>> Just my two cents. > >>>> > >>>> Best, > >>>> davide > >>>> > >>>> On Mon, Oct 20, 2014 at 11:30 AM, Kevin Coombes > >>>> <kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com> > >>>> <mailto:kevin.r.coom...@gmail.com > >>>> <mailto:kevin.r.coom...@gmail.com>>> wrote: > >>>> > >>>> I understand that breaking code is a problem, and that > >>>> is admittedly the main reason not to immediately adopt > >>>> my suggestion. > >>>> > >>>> But as a purely logical exercise, creating a "PCA" > >>>> object X or something similar and using either > >>>> plot(X) > >>>> or > >>>> plot(as.PCA(mySpecialObject)) > >>>> is a much more sensible use of object-oriented > >>>> programming/design. This requires no new generics (to > >>>> write or to learn). > >>>> > >>>> And you could use it to transition away from the current > >>>> system by convincing the various package maintainers to > >>>> re-implement plotPCA as follows: > >>>> > >>>> plotPCA <- function(object, ...) { > >>>> plot(as.PCA(object), ...) > >>>> } > >>>> > >>>> This would be relatively easy to eventually deprecate > >>>> and teach users to switch to the alternative. > >>>> > >>>> > >>>> On 10/20/2014 1:07 PM, Michael Love wrote: > >>>>> hi Kevin, > >>>>> > >>>>> that would imply there is only one way to plot an > >>>>> object of a given class. Additionally, it would break a > >>>>> lot of code.? > >>>>> > >>>>> best, > >>>>> > >>>>> Mike > >>>>> > >>>>> On Mon, Oct 20, 2014 at 12:50 PM, Kevin Coombes > >>>>> <kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com> > >>>>> <mailto:kevin.r.coom...@gmail.com > >>>>> <mailto:kevin.r.coom...@gmail.com>>> wrote: > >>>>> > >>>>> But shouldn't they all really just be named "plot" > >>>>> for the appropriate objects? In which case, there > >>>>> would already be a perfectly good generic.... > >>>>> > >>>>> On Oct 20, 2014 10:27 AM, "Michael Love" > >>>>> <michaelisaiahl...@gmail.com > >>>>> <mailto:michaelisaiahl...@gmail.com> > >>>>> <mailto:michaelisaiahl...@gmail.com > >>>>> <mailto:michaelisaiahl...@gmail.com>>> wrote: > >>>>> > >>>>> I noticed that 'plotPCA' functions are defined > >>>>> in EDASeq, DESeq2, DESeq, > >>>>> affycoretools, Rcade, facopy, CopyNumber450k, > >>>>> netresponse, MAIT (maybe > >>>>> more). > >>>>> > >>>>> Sounds like a case for BiocGenerics. > >>>>> > >>>>> best, > >>>>> > >>>>> Mike > >>>>> > >>>>> [[alternative HTML version deleted]] > >>>>> > >>>>> _______________________________________________ > >>>>> Bioc-devel@r-project.org > >>>>> <mailto:Bioc-devel@r-project.org> > >>>>> <mailto:Bioc-devel@r-project.org > >>>>> <mailto:Bioc-devel@r-project.org>> mailing list > >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel > >>>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel> > >>>>> > >>>>> > >>>> > >>>> > >>>> > >>>> > >>>> ------------------------------------------------------------------------ > >>>> <http://www.avast.com/ <http://www.avast.com/>> > >>>> > >>>> This email is free from viruses and malware because > >>>> avast! Antivirus <http://www.avast.com/ > >>>> <http://www.avast.com/>> protection is > >>>> active. > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> Davide Risso, PhD > >>>> Post Doctoral Scholar > >>>> Division of Biostatistics > >>>> School of Public Health > >>>> University of California, Berkeley > >>>> 344 Li Ka Shing Center, #3370 > >>>> Berkeley, CA 94720-3370 > >>>> E-mail: davide.ri...@berkeley.edu > >>>> <mailto:davide.ri...@berkeley.edu> > >>>> <mailto:davide.ri...@berkeley.edu > >>>> <mailto:davide.ri...@berkeley.edu>> > >>> > >>> > >>> > >>> > >>> ------------------------------------------------------------------------ > >>> <http://www.avast.com/ <http://www.avast.com/>> > >>> > >>> This email is free from viruses and malware because avast! > >>> Antivirus <http://www.avast.com/ <http://www.avast.com/>> > >>> protection is active. > >>> > >>> > >>> > >>> > >>> > >>> -- > >>> Davide Risso, PhD > >>> Post Doctoral Scholar > >>> Division of Biostatistics > >>> School of Public Health > >>> University of California, Berkeley > >>> 344 Li Ka Shing Center, #3370 > >>> Berkeley, CA 94720-3370 > >>> E-mail: davide.ri...@berkeley.edu <mailto:davide.ri...@berkeley.edu> > >>> <mailto:davide.ri...@berkeley.edu <mailto:davide.ri...@berkeley.edu>> > >> > >> > >> > >> ------------------------------------------------------------------------ > >> <http://www.avast.com/ <http://www.avast.com/>> > >> > >> This email is free from viruses and malware because avast! > >> Antivirus <http://www.avast.com/ <http://www.avast.com/>> protection is > >> active. > >> > >> > >> > > > > > > > > --- > > This email is free from viruses and malware because avast! Antivirus > > protection is active. > > > > > > [[alternative HTML version deleted]] > > > > > > > > ------------------------------ > > > > _______________________________________________ > > Bioc-devel mailing list > > Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> > > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > <https://stat.ethz.ch/mailman/listinfo/bioc-devel> > > > > > > End of Bioc-devel Digest, Vol 127, Issue 43 > > ******************************************* > > _______________________________________________ > Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > <https://stat.ethz.ch/mailman/listinfo/bioc-devel> > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel