I find it quite interesting to identify formal strategies for removing 
dependencies, but also a little outside my domain of expertise. This code

library(tools)
library(dplyr)

## non-base packages the user requires for GenomicScores
deps <- package_dependencies("GenomicScores", db, recursive=TRUE)[[1]]
deps <- intersect(deps, rownames(db))

## only need the 'universe' of GenomicScores dependencies
db1 <- db[c("GenomicScores", deps),]

## sub-graph of packages between each dependency and GenomicScores
revdeps <- package_dependencies(deps, db1, recursive = TRUE, reverse = TRUE)

tibble(
    package = names(olap),
    n_remove = lengths(revdeps),
) %>%
    arrange(n_remove)

produces a tibble

# A tibble: 106 x 2
   package           n_remove
   <chr>                <int>
 1 BSgenome                 1
 2 AnnotationHub            1
 3 shinyjs                  1
 4 DT                       1
 5 shinycustomloader        1
 6 data.table               1
 7 shinythemes              1
 8 rtracklayer              2
 9 BiocFileCache            2
10 BiocManager              2
# … with 96 more rows

shows me, via n_remove, that I can remove the dependency on AnnotationHub by 
removing the dependency on just one package (AnnotationHub!), but to remove 
BiocFileCache I'd also have to remove another package (AnnotationHub, I'd 
guess). So this provides some measure of the ease with which a package can be 
removed.

I'd like a 'benefit' column, too -- if I were to remove AnnotationHub, how many 
additional packages would I also be able to remove, because they are present 
only to satisfy the dependency on AnnotationHub? More generally, perhaps there 
is a dependency of AnnotationHub that is only used by AnnotationHub and 
BSgenome. So removing AnnotationHub as a dependency would make it easier to 
remove BSgenome, etc. I guess this is a graph optimization problem.

Probably also worth mentioning the itdepends package 
(https://github.com/r-lib/itdepends), which I think tries primarily to 
determine the relationship between package dependencies and lines of code, 
which seems like complementary information.

Martin

On 2/6/20, 12:29 PM, "Robert Castelo" <robert.cast...@upf.edu> wrote:

    true, i was just searching for the shortest path, we can search for all 
    simple (i.e., without repeating "vertices") paths and there are up to 
    five routes from "GenomicScores" to "Matrix"
    
    igraph::all_simple_paths(igraph::igraph.from.graphNEL(g), 
    from="GenomicScores", to="Matrix", mode="out")
    [[1]]
    + 7/117 vertices, named, from 04133ec:
    [1] GenomicScores        BSgenome             rtracklayer
    [4] GenomicAlignments    SummarizedExperiment DelayedArray
    [7] Matrix
    
    [[2]]
    + 6/117 vertices, named, from 04133ec:
    [1] GenomicScores        BSgenome             rtracklayer
    [4] GenomicAlignments    SummarizedExperiment Matrix
    
    [[3]]
    + 6/117 vertices, named, from 04133ec:
    [1] GenomicScores DT            crosstalk     ggplot2       mgcv
    [6] Matrix
    
    [[4]]
    + 6/117 vertices, named, from 04133ec:
    [1] GenomicScores        rtracklayer          GenomicAlignments
    [4] SummarizedExperiment DelayedArray         Matrix
    
    [[5]]
    + 5/117 vertices, named, from 04133ec:
    [1] GenomicScores        rtracklayer          GenomicAlignments
    [4] SummarizedExperiment Matrix
    
    this is interesting, because it means that if i wanted to get rid of the 
    "Matrix" dependence i'd need to get rid not only of the "rtracklayer" 
    dependence but also of "BSgenome" and "DT".
    
    robert.
    
    
    On 2/6/20 5:41 PM, Martin Morgan wrote:
    > Excellent! I think there are other, independent, paths between your 
immediate dependents...
    > 
    > RBGL::sp.between(g, start="DT", finish="Matrix", 
detail=TRUE)[[1]]$path_detail
    > [1] "DT"        "crosstalk" "ggplot2"   "mgcv"      "Matrix"
    > 
    > ??
    > 
    > Martin
    > 
    > On 2/6/20, 10:47 AM, "Robert Castelo" <robert.cast...@upf.edu> wrote:
    > 
    >      hi Martin,
    >      
    >      thanks for hint!! i wasn't aware of 'tools::package_dependencies()',
    >      adding a bit of graph sorcery i get the result i was looking for:
    >      
    >      repos <- BiocManager::repositories()[c(1,5)]
    >      repos
    >                                            BioCsoft
    >      "https://bioconductor.org/packages/3.11/bioc";
    >                                                CRAN
    >                          "https://cran.rstudio.com";
    >      
    >      db <- available.packages(repos=repos)
    >      
    >      deps <- tools::package_dependencies("GenomicScores", db,
    >      recursive=TRUE)[[1]]
    >      
    >      deps <- tools::package_dependencies(c("GenomicScores", deps), db)
    >      
    >      g <- graph::graphNEL(nodes=names(deps), edgeL=deps, 
edgemode="directed")
    >      
    >      RBGL::sp.between(g, start="GenomicScores", finish="Matrix",
    >      detail=TRUE)[[1]]$path_detail
    >      [1] "GenomicScores"        "rtracklayer"          "GenomicAlignments"
    >      [4] "SummarizedExperiment" "Matrix"
    >      
    >      so, it was the rtracklayer dependency that leads to Matrix through
    >      GenomeAlignments and SummarizedExperiment.
    >      
    >      maybe the BioC package 'pkgDepTools' should be deprecated if its
    >      functionality is part of 'tools' and it does not even work as fast 
and
    >      correct as 'tools'.
    >      
    >      cheers,
    >      
    >      robert.
    >      
    >      
    >      On 2/6/20 2:51 PM, Martin Morgan wrote:
    >      > The first thing is to get the correct repositories
    >      >
    >      >    repos = BiocManager::repositories()
    >      >
    >      > (maybe trim the experiment and annotation repos from this). I also 
tried pkgDepTools::makeDepGraph() but it took so long that I moved on... it has 
an option 'keep.builtin' which might include Matrix.
    >      >
    >      > There is also BiocPkgTools::buildPkgDependencyDataFrame() & 
friends, but this seems to build dependencies within a single repository...
    >      >
    >      > The building block for a solution is 
`tools::package_dependencies()`, and I can confirm that "Matrix" _is_ a 
dependency
    >      >
    >      >    db = available.packages(repos = BiocManager::repositories())
    >      >    revdeps <- tools::package_dependencies("GenomicScores", db, 
recursive = TRUE)
    >      >    "Matrix" %in% revdeps[[1]]
    >      >    ## [1] TRUE
    >      >
    >      > so I'll leave the clever recursive or graph-based algorithm up to 
you, to report back to the mailing list?
    >      >
    >      > For what it's worth I think the last time this came up Martin 
Maechler pointed to a function in base R (probably the tools package) that 
implements this, too...?
    >      >
    >      > Martin Morgan
    >      >
    >      > On 2/6/20, 6:40 AM, "Bioc-devel on behalf of Robert Castelo" 
<bioc-devel-boun...@r-project.org on behalf of robert.cast...@upf.edu> wrote:
    >      >
    >      >      hi,
    >      >
    >      >      when i load the package 'GenomicScores' in a clean session i 
see thorugh
    >      >      the 'sessionInfo()' that the package 'Matrix' is listed under 
"loaded
    >      >      via a namespace (and not attached)".
    >      >
    >      >      i'd like to know what is the dependency that 'GenomicsScores' 
has that
    >      >      ends up requiring the package 'Matrix'.
    >      >
    >      >      i've tried using the package 'pkgDepTools' without success, 
because the
    >      >      dependency graph does not list any path from 'GenomicScores' 
to 'Matrix'.
    >      >
    >      >      i've been manually browsing the Bioc website and, unless i've 
overlooked
    >      >      something, the only association with 'Matrix' i could find is 
that
    >      >      'S4Vectors' and 'GenomicRanges', which are required by 
'GenomicScores',
    >      >      list 'Matrix' in the 'Suggests' field, but my understanding 
is that
    >      >      those packages are not required and should not be loaded.
    >      >
    >      >      so, is there any way in which i can figure out what of the
    >      >      'GenomicScores' dependencies leads to loading the package 
'Matrix'?
    >      >
    >      >      here are the depends, import and suggests fields from 
'GenomicScores':
    >      >
    >      >      Depends: R (>= 3.5), S4Vectors (>= 0.7.21), GenomicRanges, 
methods,
    >      >               BiocGenerics (>= 0.13.8)
    >      >      Imports: utils, XML, Biobase, IRanges (>= 2.3.23), Biostrings,
    >      >               BSgenome, GenomeInfoDb, AnnotationHub, shiny, 
shinyjs,
    >      >            DT, shinycustomloader, rtracklayer, data.table, 
shinythemes
    >      >      Suggests: BiocStyle, knitr, rmarkdown, 
BSgenome.Hsapiens.UCSC.hg19,
    >      >               phastCons100way.UCSC.hg19, 
MafDb.1Kgenomes.phase1.hs37d5,
    >      >               SNPlocs.Hsapiens.dbSNP144.GRCh37, VariantAnnotation,
    >      >               TxDb.Hsapiens.UCSC.hg19.knownGene, gwascat, 
RColorBrewer
    >      >
    >      >      and here a session information in a fresh R-devel session 
after loading
    >      >      the package 'GenomicScores':
    >      >
    >      >      R Under development (unstable) (2020-01-29 r77745)
    >      >      Platform: x86_64-pc-linux-gnu (64-bit)
    >      >      Running under: CentOS Linux 7 (Core)
    >      >
    >      >      Matrix products: default
    >      >      BLAS:   /opt/R/R-devel/lib64/R/lib/libRblas.so
    >      >      LAPACK: /opt/R/R-devel/lib64/R/lib/libRlapack.so
    >      >
    >      >      locale:
    >      >        [1] LC_CTYPE=en_US.UTF8       LC_NUMERIC=C
    >      >        [3] LC_TIME=en_US.UTF8        LC_COLLATE=en_US.UTF8
    >      >        [5] LC_MONETARY=en_US.UTF8    LC_MESSAGES=en_US.UTF8
    >      >        [7] LC_PAPER=en_US.UTF8       LC_NAME=C
    >      >        [9] LC_ADDRESS=C              LC_TELEPHONE=C
    >      >      [11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C
    >      >
    >      >      attached base packages:
    >      >      [1] parallel  stats4    stats     graphics  grDevices utils   
  datasets
    >      >      [8] methods   base
    >      >
    >      >      other attached packages:
    >      >      [1] GenomicScores_1.11.4 GenomicRanges_1.39.2 
GenomeInfoDb_1.23.10
    >      >      [4] IRanges_2.21.3       S4Vectors_0.25.12    
BiocGenerics_0.33.0
    >      >      [7] colorout_1.2-2
    >      >
    >      >      loaded via a namespace (and not attached):
    >      >        [1] Rcpp_1.0.3                    lattice_0.20-38
    >      >        [3] shinycustomloader_0.9.0       Rsamtools_2.3.3
    >      >        [5] Biostrings_2.55.4             assertthat_0.2.1
    >      >        [7] digest_0.6.23                 mime_0.9
    >      >        [9] BiocFileCache_1.11.4          R6_2.4.1
    >      >      [11] RSQLite_2.2.0                 httr_1.4.1
    >      >      [13] pillar_1.4.3                  zlibbioc_1.33.1
    >      >      [15] rlang_0.4.4                   curl_4.3
    >      >      [17] data.table_1.12.8             blob_1.2.1
    >      >      [19] DT_0.12                       Matrix_1.2-18
    >      >      [21] shinythemes_1.1.2             shinyjs_1.1
    >      >      [23] BiocParallel_1.21.2           AnnotationHub_2.19.7
    >      >      [25] htmlwidgets_1.5.1             RCurl_1.98-1.1
    >      >      [27] bit_1.1-15.1                  shiny_1.4.0
    >      >      [29] DelayedArray_0.13.3           compiler_4.0.0
    >      >      [31] httpuv_1.5.2                  rtracklayer_1.47.0
    >      >      [33] pkgconfig_2.0.3               htmltools_0.4.0
    >      >      [35] tidyselect_1.0.0              SummarizedExperiment_1.17.1
    >      >      [37] tibble_2.1.3                  GenomeInfoDbData_1.2.2
    >      >      [39] interactiveDisplayBase_1.25.0 matrixStats_0.55.0
    >      >      [41] XML_3.99-0.3                  crayon_1.3.4
    >      >      [43] dplyr_0.8.4                   dbplyr_1.4.2
    >      >      [45] later_1.0.0                   GenomicAlignments_1.23.1
    >      >      [47] bitops_1.0-6                  rappdirs_0.3.1
    >      >      [49] grid_4.0.0                    xtable_1.8-4
    >      >      [51] DBI_1.1.0                     magrittr_1.5
    >      >      [53] XVector_0.27.0                promises_1.1.0
    >      >      [55] vctrs_0.2.2                   tools_4.0.0
    >      >      [57] bit64_0.9-7                   BSgenome_1.55.3
    >      >      [59] Biobase_2.47.2                glue_1.3.1
    >      >      [61] purrr_0.3.3                   BiocVersion_3.11.1
    >      >      [63] fastmap_1.0.1                 yaml_2.2.1
    >      >      [65] AnnotationDbi_1.49.1          BiocManager_1.30.10
    >      >      [67] memoise_1.1.0
    >      >
    >      >
    >      >
    >      >      thanks!!
    >      >
    >      >      robert.
    >      >
    >      >      _______________________________________________
    >      >      Bioc-devel@r-project.org mailing list
    >      >      https://stat.ethz.ch/mailman/listinfo/bioc-devel
    >      >
    >      >
    >      
    >      --
    >      Robert Castelo, PhD
    >      Associate Professor
    >      Dept. of Experimental and Health Sciences
    >      Universitat Pompeu Fabra (UPF)
    >      Barcelona Biomedical Research Park (PRBB)
    >      Dr Aiguader 88
    >      E-08003 Barcelona, Spain
    >      telf: +34.933.160.514
    >      fax: +34.933.160.550
    >      
    > 
    
    -- 
    Robert Castelo, PhD
    Associate Professor
    Dept. of Experimental and Health Sciences
    Universitat Pompeu Fabra (UPF)
    Barcelona Biomedical Research Park (PRBB)
    Dr Aiguader 88
    E-08003 Barcelona, Spain
    telf: +34.933.160.514
    fax: +34.933.160.550
    
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to