Re: [Bioc-devel] plotPCA for BiocGenerics
hi Kevin, that would imply there is only one way to plot an object of a given class. Additionally, it would break a lot of code. best, Mike On Mon, Oct 20, 2014 at 12:50 PM, Kevin Coombes kevin.r.coom...@gmail.com wrote: But shouldn't they all really just be named plot for the appropriate objects? In which case, there would already be a perfectly good generic On Oct 20, 2014 10:27 AM, Michael Love michaelisaiahl...@gmail.com wrote: I noticed that 'plotPCA' functions are defined in EDASeq, DESeq2, DESeq, affycoretools, Rcade, facopy, CopyNumber450k, netresponse, MAIT (maybe more). Sounds like a case for BiocGenerics. best, Mike [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] plotPCA for BiocGenerics
I agree it depends on programming style. But to continue with the example, it would have to be: plot(as.mySpecialObjectPCA(mySpecialObject)) because we have many packages here with their own functions. Wouldn't this just proliferate the number of classes instead of functions? For every type of plot in my package, I would have to add a special class branching off of mySpecialObject. Users would have to remember to convert from mySpecialObject to mySpecialObjectPCA or mySpecialObjectMAPlot or mySpecialObjectVarianceMeanPlot, etc. [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] plotPCA for BiocGenerics
Hi, It depends. The traditional R approach to these matters is that you (a) first perform some sort of an analysis and save the results as an object and then (b) show or plot what you got. It is part (b) that tends to be really generic, and (in my opinion) should have really generic names -- like show or plot or hist or image. With PCA in particular, you usually have to perform a bunch of computations in order to get the principal components from some part of the data. As I understand it now, these computations are performed along the way as part of the various plotPCA functions. The R way to do this would be something like pca - performPCA(mySpecialObject) # or as.PCA(mySpecialObject) plot(pca) # to get the scatter plot This apporach has the user-friendly advantage that you can tweak the plot (in terms of colors, symbols, ranges, titles, etc) without having to recompute the principal components every time. (I often find myself re-plotting the same PCA several times, with different colors or symbols for different factrors associated with the samples.) In addition, you could then also do something like screeplot(pca) to get a plot of the percentages of variance explained. My own feeling is that if the object doesn't know what to do when you tell it to plot itself, then you haven't got the right abstraction. You may still end up needing generics for each kind of computation you want to perform (PCA, RLE, MA, etc), which is why I suggested an as.PCA function. After all, as is already pretty generic. In the long run, l this would herlp BioConductor developers, since they wouldn't all have to reimplement the visualization code; they would just have to figure out how to convert their own object into a PCA or RLE or MA object. And I know that this plotWhatever approach is used elsewhere in BioConductor, and it has always bothered me. It just seemed that a post suggesting a new generic function provided a reasonable opportunity to point out that there might be a better way. Best, Kevin PS: My own ClassDicsovery package, which is available from RForge via **|install.packages(ClassDiscovery, repos=http://R-Forge.R-project.org;)|** includes a SamplePCA class that does something roughly similar to this for microarrays. PPS (off-topic): The worst offender in base R -- because it doesn't use this typical approch -- is the heatmap function. Having tried to teach this function in several different classes, I have come to the conclusion that it is basically unusable by mortals. And I think the problem is that it tries to combine too many steps -- clustering rows, clustering columns, scaling, visualization -- all in a single fiunction On 10/20/2014 3:47 PM, davide risso wrote: Hi Kevin, I don't agree. In the case of EDASeq (as I suppose it is the case for DESeq/DESeq2) plotting the principal components of the count matrix is only one of possible exploratory plots (RLE plots, MA plots, etc.). So, in my opinion, it makes more sense from an object oriented point of view to have multiple plotting methods for a single RNA-seq experiment object. In addition, this is the same strategy adopted elsewhere in Bioconductor, e.g., for the plotMA method. Just my two cents. Best, davide On Mon, Oct 20, 2014 at 11:30 AM, Kevin Coombes kevin.r.coom...@gmail.com mailto:kevin.r.coom...@gmail.com wrote: I understand that breaking code is a problem, and that is admittedly the main reason not to immediately adopt my suggestion. But as a purely logical exercise, creating a PCA object X or something similar and using either plot(X) or plot(as.PCA(mySpecialObject)) is a much more sensible use of object-oriented programming/design. This requires no new generics (to write or to learn). And you could use it to transition away from the current system by convincing the various package maintainers to re-implement plotPCA as follows: plotPCA - function(object, ...) { plot(as.PCA(object), ...) } This would be relatively easy to eventually deprecate and teach users to switch to the alternative. On 10/20/2014 1:07 PM, Michael Love wrote: hi Kevin, that would imply there is only one way to plot an object of a given class. Additionally, it would break a lot of code. best, Mike On Mon, Oct 20, 2014 at 12:50 PM, Kevin Coombes kevin.r.coom...@gmail.com mailto:kevin.r.coom...@gmail.com wrote: But shouldn't they all really just be named plot for the appropriate objects? In which case, there would already be a perfectly good generic On Oct 20, 2014 10:27 AM, Michael Love michaelisaiahl...@gmail.com mailto:michaelisaiahl...@gmail.com wrote: I noticed that 'plotPCA' functions are defined in EDASeq, DESeq2, DESeq, affycoretools, Rcade,
Re: [Bioc-devel] plotPCA for BiocGenerics
Hi, I don't see how it needs more functions (as long as you can get developers to agree). Suppose that someone can define a reusable PCA class. This will contain a single plot generic function, defined once and reused by other classes. The existing plotPCA interface can also be implemented just once, in this class, as plotPCA - function(object, ...) plot(as.PCA(object), ...) This can be exposed to users of your class through namespaces. Then the only thing a developer needs to implement in his own class is the single as.PCA function. And he/she would have already been rquired to implement this as part of the old plotPCA function. So it can be extracted from that, and the developer doesn't have to reimplement the visualization code from the PCA class. Best, Kevin On 10/20/2014 5:15 PM, davide risso wrote: Hi Kevin, I see your points and I agree (especially for the specific case of plotPCA that involves some non trivial computations). On the other hand, having a wrapper function that starting from the raw data gives you a pretty picture (with virtually zero effort by the user) using a sensible choice of parameters that are more or less OK for RNA-seq data is useful for practitioners that just want to look for patterns in the data. I guess it would be the same to have a PCA method for each of the objects and then using the plot method on those new objects, but that would just create a lot more objects and functions than the current approach (like Mike was saying). Your as.pca or performPCA approach would be definitely better if all the different methods would create objects of the *same* PCA class, but since we are talking about different packages, I don't know how easy it would be to coordinate. But perhaps this is the way we should go. Best, davide On Mon, Oct 20, 2014 at 1:26 PM, Kevin Coombes kevin.r.coom...@gmail.com mailto:kevin.r.coom...@gmail.com wrote: Hi, It depends. The traditional R approach to these matters is that you (a) first perform some sort of an analysis and save the results as an object and then (b) show or plot what you got. It is part (b) that tends to be really generic, and (in my opinion) should have really generic names -- like show or plot or hist or image. With PCA in particular, you usually have to perform a bunch of computations in order to get the principal components from some part of the data. As I understand it now, these computations are performed along the way as part of the various plotPCA functions. The R way to do this would be something like pca - performPCA(mySpecialObject)# or as.PCA(mySpecialObject) plot(pca) # to get the scatter plot This apporach has the user-friendly advantage that you can tweak the plot (in terms of colors, symbols, ranges, titles, etc) without having to recompute the principal components every time. (I often find myself re-plotting the same PCA several times, with different colors or symbols for different factrors associated with the samples.) In addition, you could then also do something like screeplot(pca) to get a plot of the percentages of variance explained. My own feeling is that if the object doesn't know what to do when you tell it to plot itself, then you haven't got the right abstraction. You may still end up needing generics for each kind of computation you want to perform (PCA, RLE, MA, etc), which is why I suggested an as.PCA function. After all, as is already pretty generic. In the long run, l this would herlp BioConductor developers, since they wouldn't all have to reimplement the visualization code; they would just have to figure out how to convert their own object into a PCA or RLE or MA object. And I know that this plotWhatever approach is used elsewhere in BioConductor, and it has always bothered me. It just seemed that a post suggesting a new generic function provided a reasonable opportunity to point out that there might be a better way. Best, Kevin PS: My own ClassDicsovery package, which is available from RForge via **|install.packages(ClassDiscovery, repos=http://R-Forge.R-project.org; http://R-Forge.R-project.org)|** includes a SamplePCA class that does something roughly similar to this for microarrays. PPS (off-topic): The worst offender in base R -- because it doesn't use this typical approch -- is the heatmap function. Having tried to teach this function in several different classes, I have come to the conclusion that it is basically unusable by mortals. And I think the problem is that it tries to combine too many steps -- clustering rows, clustering columns, scaling, visualization -- all in a single fiunction On 10/20/2014 3:47 PM, davide risso wrote:
Re: [Bioc-devel] Why be default 'D' is not dropped from coverage?
Hi Leo, On 10/18/2014 10:50 AM, Leonardo Collado Torres wrote: Hi, A collaborator of mine is working on a new software and we while we were doing a sanity check to compare the base-level coverage from BAM files and bigWig files generated from his software we realized that by default bases corresponding to a 'D' on the CIGAR string get counted when reading coverage from BAM files. That is: getMethod('coverage', 'GAlignments') Method Definition: function (x, shift = 0L, width = NULL, weight = 1L, ...) { .local - function (x, shift = 0L, width = NULL, weight = 1L, method = c(auto, sort, hash), drop.D.ranges = FALSE) coverage(grglist(x, drop.D.ranges = drop.D.ranges), shift = shift, width = width, weight = weight, method = method) .local(x, shift, width, weight, ...) } environment: namespace:GenomicAlignments Signatures: x target GAlignments defined GAlignments packageVersion('GenomicAlignments') [1] ‘1.2.0’ You can see that this is default elsewhere, for example: extractAlignmentRangesOnReference function (cigar, pos = 1L, drop.D.ranges = FALSE, f = NULL) { if (!isTRUEorFALSE(drop.D.ranges)) stop('drop.D.ranges' must be TRUE or FALSE) if (drop.D.ranges) { ops - c(M, =, X, I) } else { ops - c(M, =, X, I, D) } cigarRangesAlongReferenceSpace(cigar, flag = NULL, pos = pos, f = f, ops = ops, drop.empty.ranges = FALSE, reduce.ranges = TRUE) } environment: namespace:GenomicAlignments What is the rationale for setting `drop.D.ranges` by default to FALSE? Because last time I checked (but that was 3 or 4 years ago), that's what Samtools pileup was doing. Cheers, H. Thanks, Leo -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fhcrc.org Phone: (206) 667-5791 Fax:(206) 667-1319 ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] plotPCA for BiocGenerics
Ah, I see now. Personally, I don't think Bioconductor developers should have to agree on single plotting functions for basic classes like 'PCA' (because this logic applies equally to the situation of all Bioconductor developers agreeing on single MA-plot, a single variance-mean plot, etc). I think letting developers define their plotPCA makes contributions easier (I don't have to ask the owner of plot.PCA to incorporate something), even though it means we have a growing list of generics. Still you have a good point about splitting computation and plotting. In practice, we subset the rows so PCA is not laborious. On Mon, Oct 20, 2014 at 5:38 PM, Kevin Coombes kevin.r.coom...@gmail.com wrote: Hi, I don't see how it needs more functions (as long as you can get developers to agree). Suppose that someone can define a reusable PCA class. This will contain a single plot generic function, defined once and reused by other classes. The existing plotPCA interface can also be implemented just once, in this class, as plotPCA - function(object, ...) plot(as.PCA(object), ...) This can be exposed to users of your class through namespaces. Then the only thing a developer needs to implement in his own class is the single as.PCA function. And he/she would have already been rquired to implement this as part of the old plotPCA function. So it can be extracted from that, and the developer doesn't have to reimplement the visualization code from the PCA class. Best, Kevin On 10/20/2014 5:15 PM, davide risso wrote: Hi Kevin, I see your points and I agree (especially for the specific case of plotPCA that involves some non trivial computations). On the other hand, having a wrapper function that starting from the raw data gives you a pretty picture (with virtually zero effort by the user) using a sensible choice of parameters that are more or less OK for RNA-seq data is useful for practitioners that just want to look for patterns in the data. I guess it would be the same to have a PCA method for each of the objects and then using the plot method on those new objects, but that would just create a lot more objects and functions than the current approach (like Mike was saying). Your as.pca or performPCA approach would be definitely better if all the different methods would create objects of the *same* PCA class, but since we are talking about different packages, I don't know how easy it would be to coordinate. But perhaps this is the way we should go. Best, davide On Mon, Oct 20, 2014 at 1:26 PM, Kevin Coombes kevin.r.coom...@gmail.com wrote: Hi, It depends. The traditional R approach to these matters is that you (a) first perform some sort of an analysis and save the results as an object and then (b) show or plot what you got. It is part (b) that tends to be really generic, and (in my opinion) should have really generic names -- like show or plot or hist or image. With PCA in particular, you usually have to perform a bunch of computations in order to get the principal components from some part of the data. As I understand it now, these computations are performed along the way as part of the various plotPCA functions. The R way to do this would be something like pca - performPCA(mySpecialObject) # or as.PCA(mySpecialObject) plot(pca) # to get the scatter plot This apporach has the user-friendly advantage that you can tweak the plot (in terms of colors, symbols, ranges, titles, etc) without having to recompute the principal components every time. (I often find myself re-plotting the same PCA several times, with different colors or symbols for different factrors associated with the samples.) In addition, you could then also do something like screeplot(pca) to get a plot of the percentages of variance explained. My own feeling is that if the object doesn't know what to do when you tell it to plot itself, then you haven't got the right abstraction. You may still end up needing generics for each kind of computation you want to perform (PCA, RLE, MA, etc), which is why I suggested an as.PCA function. After all, as is already pretty generic. In the long run, l this would herlp BioConductor developers, since they wouldn't all have to reimplement the visualization code; they would just have to figure out how to convert their own object into a PCA or RLE or MA object. And I know that this plotWhatever approach is used elsewhere in BioConductor, and it has always bothered me. It just seemed that a post suggesting a new generic function provided a reasonable opportunity to point out that there might be a better way. Best, Kevin PS: My own ClassDicsovery package, which is available from RForge via *install.packages(ClassDiscovery, repos=http://R-Forge.R-project.org; http://R-Forge.R-project.org)* includes a SamplePCA class that does something roughly similar to this for microarrays. PPS (off-topic):
Re: [Bioc-devel] plotPCA for BiocGenerics
Well. I have two responses to that. First, I think it would be a lot better/easier for users if (most) developers could make use of the same plot function for basic classes like PCA. Second, if you think the basic PCA plotting routine needs enhancements, you still have two options. On the one hand, you could (as you said) try to convince the maintainer of PCA to add what you want. If it's generally valuable, then he'd probably do it --- and other classes that use it would benefit. On the other hand, if it really is a special enhancement that only makes sense for your class, then you can derive a class from the basic PCA class setClass(mySpecialPCA, contains=c(PCA), *other stuff here*) and implement your own version of the plot generic for this class. And you could tweak the as.PCA function so it returns an object of the mySpecialPCA class. And the user could still just plot the result without hacving to care what's happening behind the scenes. On 10/20/2014 5:59 PM, Michael Love wrote: Ah, I see now. Personally, I don't think Bioconductor developers should have to agree on single plotting functions for basic classes like 'PCA' (because this logic applies equally to the situation of all Bioconductor developers agreeing on single MA-plot, a single variance-mean plot, etc). I think letting developers define their plotPCA makes contributions easier (I don't have to ask the owner of plot.PCA to incorporate something), even though it means we have a growing list of generics. Still you have a good point about splitting computation and plotting. In practice, we subset the rows so PCA is not laborious. On Mon, Oct 20, 2014 at 5:38 PM, Kevin Coombes kevin.r.coom...@gmail.com mailto:kevin.r.coom...@gmail.com wrote: Hi, I don't see how it needs more functions (as long as you can get developers to agree). Suppose that someone can define a reusable PCA class. This will contain a single plot generic function, defined once and reused by other classes. The existing plotPCA interface can also be implemented just once, in this class, as plotPCA - function(object, ...) plot(as.PCA(object), ...) This can be exposed to users of your class through namespaces. Then the only thing a developer needs to implement in his own class is the single as.PCA function. And he/she would have already been rquired to implement this as part of the old plotPCA function. So it can be extracted from that, and the developer doesn't have to reimplement the visualization code from the PCA class. Best, Kevin On 10/20/2014 5:15 PM, davide risso wrote: Hi Kevin, I see your points and I agree (especially for the specific case of plotPCA that involves some non trivial computations). On the other hand, having a wrapper function that starting from the raw data gives you a pretty picture (with virtually zero effort by the user) using a sensible choice of parameters that are more or less OK for RNA-seq data is useful for practitioners that just want to look for patterns in the data. I guess it would be the same to have a PCA method for each of the objects and then using the plot method on those new objects, but that would just create a lot more objects and functions than the current approach (like Mike was saying). Your as.pca or performPCA approach would be definitely better if all the different methods would create objects of the *same* PCA class, but since we are talking about different packages, I don't know how easy it would be to coordinate. But perhaps this is the way we should go. Best, davide On Mon, Oct 20, 2014 at 1:26 PM, Kevin Coombes kevin.r.coom...@gmail.com mailto:kevin.r.coom...@gmail.com wrote: Hi, It depends. The traditional R approach to these matters is that you (a) first perform some sort of an analysis and save the results as an object and then (b) show or plot what you got. It is part (b) that tends to be really generic, and (in my opinion) should have really generic names -- like show or plot or hist or image. With PCA in particular, you usually have to perform a bunch of computations in order to get the principal components from some part of the data. As I understand it now, these computations are performed along the way as part of the various plotPCA functions. The R way to do this would be something like pca - performPCA(mySpecialObject) # or as.PCA(mySpecialObject) plot(pca) # to get the scatter plot This apporach has the user-friendly advantage that you can tweak the plot (in terms of colors, symbols, ranges, titles, etc) without having to recompute the principal components