Re: [Bioc-devel] Compatibility of Bioconductor with tidyverse S3 classes/methods
Thanks again. *To Martin.* Got the point and I agree. I will do my best. *To Vincent. * I see. At the moment ttBulk is an API for users, not yet for developers. But I can already imagine an API framework where you can feed a new custom functionality (let's say UMAP dimensionality reduction function), to a wrapper validator and information integrator (to the original input) that ensures endomorphic properties, with the output having the same properties of the input. In order to do this, the definition of a ttBulk tibble and its requirements/validation will have to be a little more established, upon community feedback. The transition then will be pretty easy. When that happens, I would be interested in having some feedback from you! Best wishes. *Stefano * Stefano Mangiola | Postdoctoral fellow Papenfuss Laboratory The Walter Eliza Hall Institute of Medical Research +61 (0)466452544 Il giorno dom 9 feb 2020 alle ore 03:08 Martin Morgan < mtmorgan.b...@gmail.com> ha scritto: > The first thing is that most contributed packages end up being accepted, > so the discussion here should be considered as (strong) advice, rather than > requirement. The advice is partly offered to maximize the success of > contributed packages in the Bioconductor ecosystem, but at the end of the > day the success of your package depends on the value it adds to the users > who find it. Vince offered some pretty high enthusiasm, which is a good > sign! > > > > I used ‘primarily’ mostly to encourage a more careful implementation of > support for SE – it’s easy to say ‘yes, my package interoperates with SE’, > but much more challenging to demonstrate through evaluated code that it > actually does! > > > > Cynically but with empirical experience and not a reflection of your own > commitment, I’ve learned that the promise of ‘future’ integration is seldom > realized – package submission is often the last time that the community can > directly influence package implementation and development. It would be > interesting to develop review processes that continuously assessed package > quality and utility. > > > > Martin > > > > > > *From: *stefano > *Date: *Friday, February 7, 2020 at 6:39 PM > *To: *Vincent Carey > *Cc: *Martin Morgan , Michael Lawrence < > lawrence.mich...@gene.com>, "bioc-devel@r-project.org" < > bioc-devel@r-project.org> > *Subject: *Re: [Bioc-devel] Compatibility of Bioconductor with tidyverse > S3 classes/methods > > > > Thanks Guys for the discussion (I am learning a lot), > > > > *To Martin:* > > > > Thanks for the tips. I will start to implement those S4 style methods > https://github.com/stemangiola/ttBulk/issues/7 > > > > I would *really *like to be part of Bioconductor community with this > package, if just this > > > > > " One would expect the vignette and examples to primarily emphasize the > use of the interoperable (SummmarizedExperiment) version. " > > > > Could become this > > > > > One would expect the vignette and examples to emphasize the use of the > interoperable (SummmarizedExperiment) version. > > > > I agree with the integration priority of Bioconductor, but this repository > (and this philosophy) is more than its data structures. There should be > space for more than one approach to do things, given that the principle are > respected. > > > > If this is true, I could really spend energies to use methods as you > suggested and implement the SummarisedExperiment stream. And with the tips > of the community the link will become stronger and stronger with time and > versions. > > > > > > *To Vincent* > > > > Thanks a lot for the interest. > > > > *> One thing I feel is missing is an approach to the following question: > [..] How do I make one that works the way ttBulk's operators work?* > > > > I'm afraid I don't really understand the question. Are you wondering about > extension of the framework? Or creating a similar framework for other > applications? Could you please reformulate, maybe giving a concrete > example? > > > > *> Are there patterns there that are preserved across different operators? > * > > > > A commonality is the use of code for integrating the new calculated > information (dplyr), validation functions, .. > > > > *> Can they be factored out to improve maintainability?* > > > > Almost surely yes, this is the first version, I hope to see enough > interest, improve the API upon feedback, and hope for (intellectual and > practical) contributions from more experts in software engineering. > > > > *> validObject * > > > > Seems a good method, and as far as I tested works for S3 objects as well. > I will try to implement it. In fact I already added it as issue into Github > https://github.com/stemangiola/ttBulk/issues/6 > > > > At the moment I have a custom validation function > > > > Best wishes. > > *Stefano * > > > > Stefano Mangiola | Postdoctoral fellow > > Papenfuss Laboratory > > The Walter Eliza Hall Institute of Medical Research > > +61 (0)466452544 > > > > > > Il gi
Re: [Bioc-devel] how to trace 'Matrix' as package dependency for 'GenomicScores'
I find it quite interesting to identify formal strategies for removing dependencies, but also a little outside my domain of expertise. This code library(tools) library(dplyr) ## non-base packages the user requires for GenomicScores deps <- package_dependencies("GenomicScores", db, recursive=TRUE)[[1]] deps <- intersect(deps, rownames(db)) ## only need the 'universe' of GenomicScores dependencies db1 <- db[c("GenomicScores", deps),] ## sub-graph of packages between each dependency and GenomicScores revdeps <- package_dependencies(deps, db1, recursive = TRUE, reverse = TRUE) tibble( package = names(olap), n_remove = lengths(revdeps), ) %>% arrange(n_remove) produces a tibble # A tibble: 106 x 2 package n_remove 1 BSgenome 1 2 AnnotationHub1 3 shinyjs 1 4 DT 1 5 shinycustomloader1 6 data.table 1 7 shinythemes 1 8 rtracklayer 2 9 BiocFileCache2 10 BiocManager 2 # … with 96 more rows shows me, via n_remove, that I can remove the dependency on AnnotationHub by removing the dependency on just one package (AnnotationHub!), but to remove BiocFileCache I'd also have to remove another package (AnnotationHub, I'd guess). So this provides some measure of the ease with which a package can be removed. I'd like a 'benefit' column, too -- if I were to remove AnnotationHub, how many additional packages would I also be able to remove, because they are present only to satisfy the dependency on AnnotationHub? More generally, perhaps there is a dependency of AnnotationHub that is only used by AnnotationHub and BSgenome. So removing AnnotationHub as a dependency would make it easier to remove BSgenome, etc. I guess this is a graph optimization problem. Probably also worth mentioning the itdepends package (https://github.com/r-lib/itdepends), which I think tries primarily to determine the relationship between package dependencies and lines of code, which seems like complementary information. Martin On 2/6/20, 12:29 PM, "Robert Castelo" wrote: true, i was just searching for the shortest path, we can search for all simple (i.e., without repeating "vertices") paths and there are up to five routes from "GenomicScores" to "Matrix" igraph::all_simple_paths(igraph::igraph.from.graphNEL(g), from="GenomicScores", to="Matrix", mode="out") [[1]] + 7/117 vertices, named, from 04133ec: [1] GenomicScoresBSgenome rtracklayer [4] GenomicAlignmentsSummarizedExperiment DelayedArray [7] Matrix [[2]] + 6/117 vertices, named, from 04133ec: [1] GenomicScoresBSgenome rtracklayer [4] GenomicAlignmentsSummarizedExperiment Matrix [[3]] + 6/117 vertices, named, from 04133ec: [1] GenomicScores DTcrosstalk ggplot2 mgcv [6] Matrix [[4]] + 6/117 vertices, named, from 04133ec: [1] GenomicScoresrtracklayer GenomicAlignments [4] SummarizedExperiment DelayedArray Matrix [[5]] + 5/117 vertices, named, from 04133ec: [1] GenomicScoresrtracklayer GenomicAlignments [4] SummarizedExperiment Matrix this is interesting, because it means that if i wanted to get rid of the "Matrix" dependence i'd need to get rid not only of the "rtracklayer" dependence but also of "BSgenome" and "DT". robert. On 2/6/20 5:41 PM, Martin Morgan wrote: > Excellent! I think there are other, independent, paths between your immediate dependents... > > RBGL::sp.between(g, start="DT", finish="Matrix", detail=TRUE)[[1]]$path_detail > [1] "DT""crosstalk" "ggplot2" "mgcv" "Matrix" > > ?? > > Martin > > On 2/6/20, 10:47 AM, "Robert Castelo" wrote: > > hi Martin, > > thanks for hint!! i wasn't aware of 'tools::package_dependencies()', > adding a bit of graph sorcery i get the result i was looking for: > > repos <- BiocManager::repositories()[c(1,5)] > repos >BioCsoft > "https://bioconductor.org/packages/3.11/bioc"; >CRAN > "https://cran.rstudio.com"; > > db <- available.packages(repos=repos) > > deps <- tools::package_dependencies("GenomicScores", db, > recursive=TRUE)[[1]] > > deps <- tools::package_dependencies(c("GenomicScores", deps), db) > > g <- graph::graphNEL(nodes=names(deps), edgeL=deps, edgemode="directed") > > RBGL::sp.between(g, start="GenomicScores", finish="Matrix", > detail=TRUE)[[1]]$path_detail > [1] "
Re: [Bioc-devel] Compatibility of Bioconductor with tidyverse S3 classes/methods
The first thing is that most contributed packages end up being accepted, so the discussion here should be considered as (strong) advice, rather than requirement. The advice is partly offered to maximize the success of contributed packages in the Bioconductor ecosystem, but at the end of the day the success of your package depends on the value it adds to the users who find it. Vince offered some pretty high enthusiasm, which is a good sign! I used ‘primarily’ mostly to encourage a more careful implementation of support for SE – it’s easy to say ‘yes, my package interoperates with SE’, but much more challenging to demonstrate through evaluated code that it actually does! Cynically but with empirical experience and not a reflection of your own commitment, I’ve learned that the promise of ‘future’ integration is seldom realized – package submission is often the last time that the community can directly influence package implementation and development. It would be interesting to develop review processes that continuously assessed package quality and utility. Martin From: stefano Date: Friday, February 7, 2020 at 6:39 PM To: Vincent Carey Cc: Martin Morgan , Michael Lawrence , "bioc-devel@r-project.org" Subject: Re: [Bioc-devel] Compatibility of Bioconductor with tidyverse S3 classes/methods Thanks Guys for the discussion (I am learning a lot), To Martin: Thanks for the tips. I will start to implement those S4 style methods https://github.com/stemangiola/ttBulk/issues/7 I would really like to be part of Bioconductor community with this package, if just this > " One would expect the vignette and examples to primarily emphasize the use > of the interoperable (SummmarizedExperiment) version. " Could become this > One would expect the vignette and examples to emphasize the use of the > interoperable (SummmarizedExperiment) version. I agree with the integration priority of Bioconductor, but this repository (and this philosophy) is more than its data structures. There should be space for more than one approach to do things, given that the principle are respected. If this is true, I could really spend energies to use methods as you suggested and implement the SummarisedExperiment stream. And with the tips of the community the link will become stronger and stronger with time and versions. To Vincent Thanks a lot for the interest. > One thing I feel is missing is an approach to the following question: [..] > How do I make one that works the way ttBulk's operators work? I'm afraid I don't really understand the question. Are you wondering about extension of the framework? Or creating a similar framework for other applications? Could you please reformulate, maybe giving a concrete example? > Are there patterns there that are preserved across different operators? A commonality is the use of code for integrating the new calculated information (dplyr), validation functions, .. > Can they be factored out to improve maintainability? Almost surely yes, this is the first version, I hope to see enough interest, improve the API upon feedback, and hope for (intellectual and practical) contributions from more experts in software engineering. > validObject Seems a good method, and as far as I tested works for S3 objects as well. I will try to implement it. In fact I already added it as issue into Github https://github.com/stemangiola/ttBulk/issues/6 At the moment I have a custom validation function Best wishes. Stefano Stefano Mangiola | Postdoctoral fellow Papenfuss Laboratory The Walter Eliza Hall Institute of Medical Research +61 (0)466452544 Il giorno sab 8 feb 2020 alle ore 01:54 Vincent Carey mailto:st...@channing.harvard.edu>> ha scritto: This is an interesting discussion and I hope it is ok to continue it a bit. I found the readme for the ttBulk repo extremely enticing and I am sure many people will want to explore this way of working with genomic data. I have only a few moments to explore it and did not read the vignette, but it looks to me as if it is mostly recapitulated in the README, which is an excellent overview. One thing I feel is missing is an approach to the following question: I like the idea of a pipe-oriented operator for programming steps in genomic workflows. How do I make one that works the way ttBulk's operators work? Well, I can have a look at ttBulk:::reduce_dimensions.ttBulk ... It's involved. Are there patterns there that are preserved across different operators? Can they be factored out to improve maintainability? One other point before I run It seems to me the operators "require" that certain fields be defined in their tibble operands. > names(attributes(counts)) [1] "names" "class" "row.names" "parameters" > attributes(counts)$names [1] "sample" "transcript" "Cell type" [4] "count" "time" "condition" [7] "batch" "factor_of_interest" > validObjec
Re: [Bioc-devel] Compatibility of Bioconductor with tidyverse S3 classes/methods
On Fri, Feb 7, 2020 at 6:39 PM stefano wrote: > Thanks Guys for the discussion (I am learning a lot), > > *To Martin:* > > Thanks for the tips. I will start to implement those S4 style methods > https://github.com/stemangiola/ttBulk/issues/7 > > I would *really *like to be part of Bioconductor community with this > package, if just this > > > " One would expect the vignette and examples to primarily emphasize the > use of the interoperable (SummmarizedExperiment) version. " > > Could become this > > > One would expect the vignette and examples to emphasize the use of the > interoperable (SummmarizedExperiment) version. > > I agree with the integration priority of Bioconductor, but this repository > (and this philosophy) is more than its data structures. There should be > space for more than one approach to do things, given that the principle are > respected. > > If this is true, I could really spend energies to use methods as you > suggested and implement the SummarisedExperiment stream. And with the tips > of the community the link will become stronger and stronger with time and > versions. > > > *To Vincent* > > Thanks a lot for the interest. > > *> One thing I feel is missing is an approach to the following question: > [..] How do I make one that works the way ttBulk's operators work?* > > I'm afraid I don't really understand the question. Are you wondering about > extension of the framework? Or creating a similar framework for other > applications? Could you please reformulate, maybe giving a concrete > example? > We can take further discussion to the issues on the github repo but I will briefly respond here. Consider reduce_dimensions. You give a small number of method options here -- PCA, MDS, tSNE. The MDS option makes its way to stats::cmdscale via limma::plotMDS; the PCA option uses prcomp. For any number of reasons, users may want to select alternate dimension reduction procedures or tune them in ways not passed up through your interface. This might involve modifications to your code to introduce changes, or one could imagine a protocol for "dropping in" a new operator for ttBulk pipelines. My question is to understand how this level of flexibility might be achieved. An example of an R package that pursues this is mlr3, see https://github.com/mlr-org/mlr3learners.template ... a link there is broken but the full details of contributing new pipeline elements are at https://mlr3book.mlr-org.com/pipelines.html > *> Are there patterns there that are preserved across different operators? > * > > A commonality is the use of code for integrating the new calculated > information (dplyr), validation functions, .. > > *> Can they be factored out to improve maintainability?* > > Almost surely yes, this is the first version, I hope to see enough > interest, improve the API upon feedback, and hope for (intellectual and > practical) contributions from more experts in software engineering. > > *> validObject * > > Seems a good method, and as far as I tested works for S3 objects as well. > I will try to implement it. In fact I already added it as issue into Github > https://github.com/stemangiola/ttBulk/issues/6 > > At the moment I have a custom validation function > > Best wishes. > > *Stefano * > > > > Stefano Mangiola | Postdoctoral fellow > > Papenfuss Laboratory > > The Walter Eliza Hall Institute of Medical Research > > +61 (0)466452544 > > > Il giorno sab 8 feb 2020 alle ore 01:54 Vincent Carey < > st...@channing.harvard.edu> ha scritto: > >> This is an interesting discussion and I hope it is ok to continue it a >> bit. I found the >> readme for the ttBulk repo extremely enticing and I am sure many people >> will want to >> explore this way of working with genomic data. I have only a few moments >> to explore >> it and did not read the vignette, but it looks to me as if it is mostly >> recapitulated in the >> README, which is an excellent overview. >> >> One thing I feel is missing is an approach to the following question: I >> like the >> idea of a pipe-oriented operator for programming steps in genomic >> workflows. >> How do I make one that works the way ttBulk's operators work? Well, I can >> have a look at ttBulk:::reduce_dimensions.ttBulk ... >> >> It's involved. Are there patterns there that >> are preserved across different operators? Can >> they be factored out to improve maintainability? >> >> One other point before I run >> >> It seems to me the operators "require" that certain >> fields be defined in their tibble operands. >> >> > names(attributes(counts)) >> >> [1] "names" "class" "row.names" "parameters" >> >> > attributes(counts)$names >> >> [1] "sample" "transcript" "Cell type" >> >> [4] "count" "time" "condition" >> >> [7] "batch" "factor_of_interest" >> >> > validObject(counts) >> >> *Error in .classEnv(classDef) : * >> >> * trying to get slot "package" from an object of a basic class ("NULL") >> with