Re: [Bioc-devel] BFG repo cleaner did not perfectly work

2023-01-20 Thread Nathan Sheffield via Bioc-devel
Hi Adam,

I think the recommended way to remove large, inadvertently committed files from 
a git repo is no longer BFG or filter-branch, but a new approach called 
`filter-repo`. You might try it. You can read about it here: 
https://github.com/newren/git-filter-repo

I've found it easier to use and more effective and faster than BFG or git 
filter-branch. For example I have this in my notes...

First, use this script to identify large files:

```
git rev-list --objects --all \
| git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) 
%(rest)' \
| sed -n 's/^blob //p' \
| sort --numeric-sort --key=2 \
| cut -c 1-12,41- \
| $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B 
--padding=7 --round=nearest
```

Then I use this to remove the files from history. As of 2020, `filter-repo` has 
replaced `filter-branch` and `bfg` as the recommended way to change history, 
but it's a separate tool that you'll have to install (with *e.g.* `pip3 install 
git-filter-repo`).

```
git filter-repo --path-glob '*.RData' --invert-paths
```

Hope that helps.
-Nathan

On Mon, Jan 16, 2023, at 11:48 AM, Park, Adam Keebum wrote:
> Dear community,
> 
> This is a compact version of the same issue I sent last week, for asking a 
> general advice.
> 
>   *   Running the recommended command below did not perfectly remove every 
> such file.
> 
> bfg --strip-blobs-bigger-than 5M repo.git
> 
>   *   The BiocChecker still picks up a pack file and emits a warning 
> (.git/objects/pack-xxx..xxx.pack).
> 
>   *   However, the reference is not detected by tools like git-branch-filter 
> or bfg.
> 
> I would appreciate any kinds of an advice for digging into this problem.
> 
> Sincerely,
> Adam.
> 
> [[alternative HTML version deleted]]
> 
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> 

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] as.list of a GRanges

2018-02-16 Thread Nathan Sheffield
For what it's worth, my package (LOLA) was one that used as.list on a 
GRanges or GRangesList, and those calls were broken by changes to devel. 
Since I was also pushing changes at the time, I assumed the devel build 
errors were due to my updates -- I spent quite a bit of time trying to 
figure out what was wrong before I realized this breakage was not caused 
by my updates, but by upstream changes in GRanges...eventually I tracked 
down errors to as.list (and ultimately, found other errors, which we 
discussed earlier on this list), but my conclusion from this was that, 
from my perspective, using the deployed bioc devel as a way to test for 
what refactoring will break doesn't seem like the ideal way to go -- I 
assumed that generally, other package changes wouldn't typically be 
pushed that would break my package's build, so it devalued the role of 
the dev builds and reduced my confidence in using that (now when I see 
error I may assume it's something else, and wait a few days, instead of 
diving right in to try to solve the problem).


I like the idea of temporarily restoring as.list with a deprecation 
message -- also, as a general development philosophy going forward in 
terms of testing on devel. This would have saved me a lot of time 
troubleshooting in this instance.


Just my 2 cents.

-Nathan


On 02/16/2018 02:57 AM, Bernat Gel wrote:

Hi Hervé and others,

Thanks for the responses.

I woudn't call as.list() of a GRanges an "obscure behaviour" but more 
a "works as expected, even if not clearly documented" behaviour.


In any case I can change the code to as(gr, "GRangesList") as suggested.

Thanks again for the responses and discussion :)

Bernat


*Bernat Gel Moreno*
Bioinformatician

Hereditary Cancer Program
Program of Predictive and Personalized Medicine of Cancer (PMPPC)
Germans Trias i Pujol Research Institute (IGTP)

Campus Can Ruti
Carretera de Can Ruti, Camí de les Escoles s/n
08916 Badalona, Barcelona, Spain

Tel: (+34) 93 554 3068
Fax: (+34) 93 497 8654
08916 Badalona, Barcelona, Spain
b...@igtp.cat 
www.germanstrias.org 









El 02/15/2018 a las 11:19 PM, Hervé Pagès escribió:

On 02/15/2018 01:57 PM, Michael Lawrence wrote:



On Thu, Feb 15, 2018 at 1:45 PM, Hervé Pagès > wrote:


    On 02/15/2018 11:53 AM, Cook, Malcolm wrote:

    Hi,

    Can I ask, is this change under discussion in current 
release or

    so far in Bioconductor devel only (my assumption)?


    Bioconductor devel only.


   > On 02/15/2018 08:37 AM, Michael Lawrence wrote:
   > > So is as.list() no longer supported for GRanges objects?
    I have found it
   > > useful in places.
   >
   > Very few places. I found a dozen of them in the entire
    software repo.

    However there are probably more in the wild...


    What as.list() was doing on a GRanges object was not documented. 
Relying

    on some kind of obscure undocumented feature is never a good idea.


There's just too much that is documented implicitly through 
inherited behaviors, or where we say things like "this data 
structure behaves as one would expect given base R". It's not fair 
to claim that those features are undocumented. Our documentation is 
not complete enough to use it as an excuse.


It's not fair to suggest that this is a widely used feature either.

I've identified all the places in the 1500 software packages where
this was used, and, as I said, there were very few places. BTW I
fixed most of them but my plan is to fix all of them. Some of the
code that is outside the Bioc package corpus might be affected but
it's fair to assume that this will be a very rare occurence. This can
be mitigated by temporary restoring as.list() on GRanges, with a
deprecation message, and wait 1 more devel cycle to replace it with
the new behavior. I chose to disable it for now, on purpose, so I can
identify packages that break (the build report is a great tool for
that) and fix them.

I'm not using the fact that as.list() on a GRanges is not documented
as an excuse for anything. Only to help those with concerns to
relativize and relax.

H.




   > Now you should use as.list(as(gr, "GRangesList")) instead.
   > as.list() was behaving inconsistently on IRanges and
    GRanges objects,
   > which is blocking new developments. It will come back with
    a consistent
   > behavior. More generally speaking IRanges and GRanges will
    behave
   > consistently as far as their "list interpretation" is
    concerned.

    Can we please be assured to be reminded of this prominently in
    release notes?


    The changes will be announced and described on this list and in the
    NEWS files of the IRanges and GenomicRanges packages.

    H.


    Thanks!

    ~malcolm


    --     Hervé Pagès

    Program 

Re: [Bioc-devel] Issues with GenomicRanges updates

2018-02-12 Thread Nathan Sheffield

Hi Herve,

The updates have indeed solved those issues for that sample -- However, 
when you try to apply a more complicated function, I am getting the same 
error. Here's a reproducible example of the error again, this time using 
the latest GenomicRanges and IRanges packages:


library(GenomicRanges)
data("sample_input", package="LOLA")
as.list(userSets)
lapply(userSets, length)

# That works now


data("sample_universe", package="LOLA")

f = function(x, userUniverse) {
        fo = findOverlaps(x, userUniverse)
        y = userUniverse[unique(subjectHits(fo))]
        return(y)
 }

lapply(userSets, f, userUniverse)

Error in lapply_CompressedList(X, FUN, ...) :
  invalid output element of class "GRanges"


This works under earlier versions. Can you see if this fails in your 
setup? Or am I doing something strange in the function? Here's my 
sessionInfo showing I updated to the latest packages:


-Nathan

> sessionInfo()
R Under development (unstable) (2018-02-04 r74204)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8    LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats graphics  grDevices utils datasets
[8] methods   base

other attached packages:
[1] GenomicRanges_1.31.19 GenomeInfoDb_1.15.5   IRanges_2.13.25
[4] S4Vectors_0.17.31 BiocGenerics_0.25.3

loaded via a namespace (and not attached):
[1] zlibbioc_1.25.0    compiler_3.5.0 tools_3.5.0
[4] XVector_0.19.8 GenomeInfoDbData_1.1.0 RCurl_1.95-4.10
[7] bitops_1.0-6

On 02/10/2018 10:40 PM, Nathan Sheffield wrote:

Hi Herve,

Never mind, I see now I am still a day old, looks like I was looking 
at your sessionInfo paste and thought it was mine, whoops. I'll give 
it another try tomorrow with the new versions.


other attached packages:
[1] GenomicRanges_1.31.18 GenomeInfoDb_1.15.5 IRanges_2.13.24
[4] S4Vectors_0.17.31 BiocGenerics_0.25.3

-Nathan

On 02/10/2018 07:33 PM, Nathan Sheffield wrote:
According to my `sessionInfo` (see below), those are the versions I 
had been using:


other attached packages:
[1] GenomicRanges_1.31.19 GenomeInfoDb_1.15.5   IRanges_2.13.25
[4] S4Vectors_0.17.31 BiocGenerics_0.25.3

And I had pulled them from biocLite...what's going on?

-Nathan


On 02/10/2018 06:19 PM, Hervé Pagès wrote:

Hi Nathan,

I can't reproduce this with the latest versions of S4Vectors (0.17.31),
IRanges (2.13.25), and GenomicRanges (1.31.19). Note that these 
versions

will only become available via biocLite() tomorrow but you can get them
directly from git.bioconductor.org.

With these versions, as.list, lapply, and mclapply work for on 
userSets:


  > library(GenomicRanges)
  > data("sample_input", package="LOLA")
  > as.list(userSets)
  $setA
  GRanges object with 3142 ranges and 0 metadata columns:
   seqnames   ranges strand
       
   [1] chr1   [ 437151,  438164]  *
   [2] chr1   [ 875730,  878363]  *
   [3] chr1   [ 933387,  937410]  *
   [4] chr1   [ 967966,  970238]  *
   [5] chr1   [1016863, 1017439]  *
   ...  ...  ...    ...
    [3138] chrY [ 9364545,  9364859]  *
    [3139] chrY [ 9385471,  9385777]  *
    [3140] chrY [14532115, 14533600]  *
    [3141] chrY [23696580, 23696878]  *
    [3142] chrY [26959489, 26959716]  *
    ---
    seqinfo: 69 sequences from an unspecified genome; no seqlengths

  $setB
  GRanges object with 5831 ranges and 0 metadata columns:
   seqnames   ranges strand
       
   [1] chr1 [ 28735,  29810]  *
   [2] chr1 [544738, 546649]  *
   [3] chr1 [713984, 714547]  *
   [4] chr1 [762416, 763445]  *
   [5] chr1 [805198, 805628]  *
   ...  ...  ...    ...
    [5827] chrY [20508190, 20508452]  *
    [5828] chrY [21154603, 21155040]  *
    [5829] chrY [21238448, 21240005]  *
    [5830] chrY [26979889, 26980116]  *
    [5831] chrY [28773315, 28773544]  *
    ---
    seqinfo: 69 sequences from an unspecified genome; no seqlengths

  > lapply(userSets, length)
  $setA
  [1] 3142

  $setB
  [1] 5831

  > mclapply(userSets, length)
  $setA
  [1] 3142

  $setB
  [1] 5831

Note that you should not need to call as.list() on a GRangesList object
before passing it to lapply() or mclapply().

Let me kn

Re: [Bioc-devel] Issues with GenomicRanges updates

2018-02-10 Thread Nathan Sheffield

Hi Herve,

Never mind, I see now I am still a day old, looks like I was looking at 
your sessionInfo paste and thought it was mine, whoops. I'll give it 
another try tomorrow with the new versions.


other attached packages:
[1] GenomicRanges_1.31.18 GenomeInfoDb_1.15.5 IRanges_2.13.24
[4] S4Vectors_0.17.31 BiocGenerics_0.25.3

-Nathan

On 02/10/2018 07:33 PM, Nathan Sheffield wrote:
According to my `sessionInfo` (see below), those are the versions I 
had been using:


other attached packages:
[1] GenomicRanges_1.31.19 GenomeInfoDb_1.15.5   IRanges_2.13.25
[4] S4Vectors_0.17.31 BiocGenerics_0.25.3

And I had pulled them from biocLite...what's going on?

-Nathan


On 02/10/2018 06:19 PM, Hervé Pagès wrote:

Hi Nathan,

I can't reproduce this with the latest versions of S4Vectors (0.17.31),
IRanges (2.13.25), and GenomicRanges (1.31.19). Note that these versions
will only become available via biocLite() tomorrow but you can get them
directly from git.bioconductor.org.

With these versions, as.list, lapply, and mclapply work for on userSets:

  > library(GenomicRanges)
  > data("sample_input", package="LOLA")
  > as.list(userSets)
  $setA
  GRanges object with 3142 ranges and 0 metadata columns:
   seqnames   ranges strand
       
   [1] chr1   [ 437151,  438164]  *
   [2] chr1   [ 875730,  878363]  *
   [3] chr1   [ 933387,  937410]  *
   [4] chr1   [ 967966,  970238]  *
   [5] chr1   [1016863, 1017439]  *
   ...  ...  ...    ...
    [3138] chrY [ 9364545,  9364859]  *
    [3139] chrY [ 9385471,  9385777]  *
    [3140] chrY [14532115, 14533600]  *
    [3141] chrY [23696580, 23696878]  *
    [3142] chrY [26959489, 26959716]  *
    ---
    seqinfo: 69 sequences from an unspecified genome; no seqlengths

  $setB
  GRanges object with 5831 ranges and 0 metadata columns:
   seqnames   ranges strand
       
   [1] chr1 [ 28735,  29810]  *
   [2] chr1 [544738, 546649]  *
   [3] chr1 [713984, 714547]  *
   [4] chr1 [762416, 763445]  *
   [5] chr1 [805198, 805628]  *
   ...  ...  ...    ...
    [5827] chrY [20508190, 20508452]  *
    [5828] chrY [21154603, 21155040]  *
    [5829] chrY [21238448, 21240005]  *
    [5830] chrY [26979889, 26980116]  *
    [5831] chrY [28773315, 28773544]  *
    ---
    seqinfo: 69 sequences from an unspecified genome; no seqlengths

  > lapply(userSets, length)
  $setA
  [1] 3142

  $setB
  [1] 5831

  > mclapply(userSets, length)
  $setA
  [1] 3142

  $setB
  [1] 5831

Note that you should not need to call as.list() on a GRangesList object
before passing it to lapply() or mclapply().

Let me know if the problem persist after you update.

Best,
H.

> sessionInfo()
R Under development (unstable) (2017-12-11 r73889)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS

Matrix products: default
BLAS: /home/hpages/R/R-3.5.r73889/lib/libRblas.so
LAPACK: /home/hpages/R/R-3.5.r73889/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8    LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats graphics  grDevices utils datasets
[8] methods   base

other attached packages:
[1] GenomicRanges_1.31.19 GenomeInfoDb_1.15.5   IRanges_2.13.25
[4] S4Vectors_0.17.31 BiocGenerics_0.25.3

loaded via a namespace (and not attached):
[1] zlibbioc_1.25.0    compiler_3.5.0 tools_3.5.0
[4] XVector_0.19.8 GenomeInfoDbData_1.1.0 RCurl_1.95-4.10
[7] bitops_1.0-6


On 02/10/2018 02:48 PM, Nathan Sheffield wrote:
I'm having some issues getting my package LOLA to pass R CMD check 
using the updated dev versions of GenomicRanges et al.


It seems like any time I try to apply something across a 
"CompressedGRangesList" object, it's giving errors when I use 
mclapply from parallel. Here's a reproducible example:


data("sample_input", package="LOLA")
library(parallel)
mclapply(userSets, length)

(loads packages...)

Error in lapply_CompressedList(X, FUN, ...) :
   invalid output element of class "GRanges"


It works with regular lapply:

 > lapply(userSets, length)
$setA
[1] 3142

$setB
[1] 5831


This is running on the bioconductor docker devel_core2 container, 
and I've then gone and updated to the latest dev versions of these 
packages with `biocLite()`.


I earlier ran into issues using `as.list()` on these same 
CompressedGRangesList

Re: [Bioc-devel] Issues with GenomicRanges updates

2018-02-10 Thread Nathan Sheffield
According to my `sessionInfo` (see below), those are the versions I had 
been using:


other attached packages:
[1] GenomicRanges_1.31.19 GenomeInfoDb_1.15.5   IRanges_2.13.25
[4] S4Vectors_0.17.31 BiocGenerics_0.25.3

And I had pulled them from biocLite...what's going on?

-Nathan


On 02/10/2018 06:19 PM, Hervé Pagès wrote:

Hi Nathan,

I can't reproduce this with the latest versions of S4Vectors (0.17.31),
IRanges (2.13.25), and GenomicRanges (1.31.19). Note that these versions
will only become available via biocLite() tomorrow but you can get them
directly from git.bioconductor.org.

With these versions, as.list, lapply, and mclapply work for on userSets:

  > library(GenomicRanges)
  > data("sample_input", package="LOLA")
  > as.list(userSets)
  $setA
  GRanges object with 3142 ranges and 0 metadata columns:
   seqnames   ranges strand
        
   [1] chr1   [ 437151,  438164]  *
   [2] chr1   [ 875730,  878363]  *
   [3] chr1   [ 933387,  937410]  *
   [4] chr1   [ 967966,  970238]  *
   [5] chr1   [1016863, 1017439]  *
   ...  ...  ...    ...
    [3138] chrY [ 9364545,  9364859]  *
    [3139] chrY [ 9385471,  9385777]  *
    [3140] chrY [14532115, 14533600]  *
    [3141] chrY [23696580, 23696878]  *
    [3142] chrY [26959489, 26959716]  *
    ---
    seqinfo: 69 sequences from an unspecified genome; no seqlengths

  $setB
  GRanges object with 5831 ranges and 0 metadata columns:
   seqnames   ranges strand
        
   [1] chr1 [ 28735,  29810]  *
   [2] chr1 [544738, 546649]  *
   [3] chr1 [713984, 714547]  *
   [4] chr1 [762416, 763445]  *
   [5] chr1 [805198, 805628]  *
   ...  ...  ...    ...
    [5827] chrY [20508190, 20508452]  *
    [5828] chrY [21154603, 21155040]  *
    [5829] chrY [21238448, 21240005]  *
    [5830] chrY [26979889, 26980116]  *
    [5831] chrY [28773315, 28773544]  *
    ---
    seqinfo: 69 sequences from an unspecified genome; no seqlengths

  > lapply(userSets, length)
  $setA
  [1] 3142

  $setB
  [1] 5831

  > mclapply(userSets, length)
  $setA
  [1] 3142

  $setB
  [1] 5831

Note that you should not need to call as.list() on a GRangesList object
before passing it to lapply() or mclapply().

Let me know if the problem persist after you update.

Best,
H.

> sessionInfo()
R Under development (unstable) (2017-12-11 r73889)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS

Matrix products: default
BLAS: /home/hpages/R/R-3.5.r73889/lib/libRblas.so
LAPACK: /home/hpages/R/R-3.5.r73889/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8    LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats graphics  grDevices utils datasets
[8] methods   base

other attached packages:
[1] GenomicRanges_1.31.19 GenomeInfoDb_1.15.5   IRanges_2.13.25
[4] S4Vectors_0.17.31 BiocGenerics_0.25.3

loaded via a namespace (and not attached):
[1] zlibbioc_1.25.0    compiler_3.5.0 tools_3.5.0
[4] XVector_0.19.8 GenomeInfoDbData_1.1.0 RCurl_1.95-4.10
[7] bitops_1.0-6


On 02/10/2018 02:48 PM, Nathan Sheffield wrote:
I'm having some issues getting my package LOLA to pass R CMD check 
using the updated dev versions of GenomicRanges et al.


It seems like any time I try to apply something across a 
"CompressedGRangesList" object, it's giving errors when I use 
mclapply from parallel. Here's a reproducible example:


data("sample_input", package="LOLA")
library(parallel)
mclapply(userSets, length)

(loads packages...)

Error in lapply_CompressedList(X, FUN, ...) :
   invalid output element of class "GRanges"


It works with regular lapply:

 > lapply(userSets, length)
$setA
[1] 3142

$setB
[1] 5831


This is running on the bioconductor docker devel_core2 container, and 
I've then gone and updated to the latest dev versions of these 
packages with `biocLite()`.


I earlier ran into issues using `as.list()` on these same 
CompressedGRangesList objects. It used to be that I had to call 
as.list when they were just GRangesList objects, but now that's 
failing, and so I've had to go take all calls to as.list out of my 
code. This has solved that issue (I guess the updates made the 
as.list calls unnecessary), but you can see it's still causing errors:


as.list(userSets)
Error in lapply_CompressedList(X, FUN, ...) :

[Bioc-devel] Issues with GenomicRanges updates

2018-02-10 Thread Nathan Sheffield
I'm having some issues getting my package LOLA to pass R CMD check using 
the updated dev versions of GenomicRanges et al.


It seems like any time I try to apply something across a 
"CompressedGRangesList" object, it's giving errors when I use mclapply 
from parallel. Here's a reproducible example:


data("sample_input", package="LOLA")
library(parallel)
mclapply(userSets, length)

(loads packages...)

Error in lapply_CompressedList(X, FUN, ...) :
  invalid output element of class "GRanges"


It works with regular lapply:

> lapply(userSets, length)
$setA
[1] 3142

$setB
[1] 5831


This is running on the bioconductor docker devel_core2 container, and 
I've then gone and updated to the latest dev versions of these packages 
with `biocLite()`.


I earlier ran into issues using `as.list()` on these same 
CompressedGRangesList objects. It used to be that I had to call as.list 
when they were just GRangesList objects, but now that's failing, and so 
I've had to go take all calls to as.list out of my code. This has solved 
that issue (I guess the updates made the as.list calls unnecessary), but 
you can see it's still causing errors:


as.list(userSets)
Error in lapply_CompressedList(X, FUN, ...) :
  invalid output element of class "GRanges"

-Nathan

```
> sessionInfo()
R Under development (unstable) (2018-02-04 r74204)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8    LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats graphics  grDevices utils datasets
[8] methods   base

other attached packages:
[1] GenomicRanges_1.31.18 GenomeInfoDb_1.15.5 IRanges_2.13.24
[4] S4Vectors_0.17.31 BiocGenerics_0.25.3

loaded via a namespace (and not attached):
[1] zlibbioc_1.25.0    compiler_3.5.0 XVector_0.19.8
[4] tools_3.5.0    GenomeInfoDbData_1.1.0 RCurl_1.95-4.10
[7] bitops_1.0-6
```

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] EXTERNAL: Re: DOI for packages

2018-01-05 Thread Nathan Sheffield
Hi,

Thanks for the package DOIs. I have a question about citation formats 
from these DOIs.

When using the bioconductor DOIs to programatically pull a citation from 
DOI providers in bibtex format, the author field seems to be formatted 
incorrectly. I don't really know how the information is given to the 
provider, or how that is formatted and parsed, but there seems to be a 
hiccup somewhere. For example, if you take the AnnotationHub DOI:

10.18129/B9.bioc.AnnotationHub 
<https://doi.org/doi:10.18129/B9.bioc.AnnotationHub>

And paste this into the DOI citation formatter at crosscite 
(https://citation.crosscite.org/), with bibtex formatting style, the 
result is:

@article{Martin Morgan [Cre], Marc Carlson [Ctb], Dan Tenenbaum [Ctb], 
Sonali Arora [Ctb]_2017, title={AnnotationHub}, 
DOI={10.18129/b9.bioc.annotationhub}, publisher={Bioconductor}, 
author={Martin Morgan [Cre], Marc Carlson [Ctb], Dan Tenenbaum [Ctb], 
Sonali Arora [Ctb]}, year={2017}}

When using the jabref DOI puller, I get the same bibtex:

@Misc{[Cre]2017,
   author    = {Martin Morgan [Cre], Marc Carlson [Ctb], Dan Tenenbaum 
[Ctb], Sonali Arora [Ctb]},
   title = {AnnotationHub},
   year  = {2017},
   doi   = {10.18129/b9.bioc.annotationhub},
   pages = {-},
   publisher = {Bioconductor},
   timestamp = {2018-01-05},
}

Jabref doesn't correctly parse this bibtex because the author field is 
not formatted correctly in bibtex format. See this page for an 
explanation: http://www.tex.ac.uk/FAQ-manyauthor.html

This also leads to the really strange default bibtex keys. This 
indicates that however the metadata is getting sent to the provider may 
be incorrect, because it's just treating that author field as a single 
string so it's not getting parsed correctly into alternative citation 
formats. It strikes me that the [Cre]/[Ctb] flags would probably need to 
be passed in a separate field, and the authors seem to be not passed in 
correctly as individuals but rather as a concatenated string, somehow.

This could either be a problem with the way bioconductor is passing 
metadata along, or perhaps it's a problem with crosscite or something? 
I'm not sure. Any thoughts?

Nathan Sheffield, PhD
Assistant Professor
Center for Public Health Genomics
University of Virginia
www.databio.org

On 10/19/2017 03:19 PM, Shepherd, Lori wrote:
> Many Thanks to Sean Davis for coming up with how to generate and providing 
> the infastructure!
>
>
> Lori Shepherd
>
> Bioconductor Core Team
>
> Roswell Park Cancer Institute
>
> Department of Biostatistics & Bioinformatics
>
> Elm & Carlton Streets
>
> Buffalo, New York 14263
>
> 
> From: Laurent Gatto 
> Sent: Thursday, October 19, 2017 3:10:52 PM
> To: bioc-devel@r-project.org
> Cc: Shepherd, Lori
> Subject: EXTERNAL: Re: [Bioc-devel] DOI for packages
>
>
> On 19 October 2017 13:22, Shepherd, Lori wrote:
>
>> Hello Bioconductor,
>>
>> We have added DOI's for packages on Bioconductor package landing
>> pages. The DOI will get generated automatically when a package is
>> accepted to Bioconductor. This is the recommended reference to use for
>> publication/citations/etc.  The DOI link should automatically redirect
>> to the current release version of a package (or devel if the package
>> is not yet in release).
> Thank you for this.
>
> Are there any plans to add the DOI to the DESCRIPTION file and
> automatically include it to the default citation() output?
>
> Laurent
>
>> Thank you,
>>
>>
>>
>> Lori Shepherd
>>
>> Bioconductor Core Team
>>
>> Roswell Park Cancer Institute
>>
>> Department of Biostatistics & Bioinformatics
>>
>> Elm & Carlton Streets
>>
>> Buffalo, New York 14263
>>
>>
>> This email message may contain legally privileged and/or confidential 
>> information.  If you are not the intended recipient(s), or the employee or 
>> agent responsible for the delivery of this message to the intended 
>> recipient(s), you are hereby notified that any disclosure, copying, 
>> distribution, or use of this email message is prohibited.  If you have 
>> received this message in error, please notify the sender immediately by 
>> e-mail and delete this email message from your computer. Thank you.
>> [[alternative HTML version deleted]]
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
> --
> Laurent Gatto | @lgatt0
> http://cpu.sysbiol.cam.ac.uk/
> http://lgatto.github.io/
>
>
> This email message may contain legally privileged and/or confidential 
> info

Re: [Bioc-devel] Git Transition Plan

2017-05-30 Thread Nathan Sheffield

Hi Nitesh and all,

Can you give any status update on the SVN-to-git transition? I haven't 
seen any news since this exchange in March -- is the announced public 
beta in progress or ongoing? Are new package submissions going straight 
into git at this point?


Or did I miss something?

Thanks,

Nathan


On 04/01/2017 09:14 AM, Martin Morgan wrote:

On 03/29/2017 03:55 PM, Henrik Bengtsson wrote:

Thanks.  I have a few thoughts and questions in order to plan ahead:

- Our plan is to make a 'clean' transition from SVN to git, 
approximately one month after the next Bioconductor release. 
Developers or users will not have access to the SVN system after the 
date of transition.


In order to preserve commit authorship, what's your plan for mapping
SVN username to Git 'user.email' and 'user.name'?  The 'user.email' is
what GitHub uses to associate commits and contributions to GitHub
accounts.


The svn administrators kept a comprehensive (but not complete) record 
of svn id, real user name, and contact email at the time the svn id 
was create. We use this to map between svn commits and git user name 
and email address.


The information is not entirely consistent, with some fields for some 
records 'missing' (e.g., my record doesn't contain my real name) and 
of course out of date (e.g., my email address).


Our intention is NOT to re-write history, but to map the information 
that we have to the git repositories. So my svn commits appear without 
a proper name, and with an outdated email address. Of course, new 
commits after the transition will contain whatever info git provides.




BTW, using obsolete email addresses may prevent people from being
associated with those email addresses on GitHub and other online
services that require authentication of authorship claims (which go
out via those email addresses).

When I SVN-to-Git exported my Bioconductor packages a few years ago, I
could handle manually because there were not too many contributors in
the SVN logs and I reached out to each of them and asked what email
addresses they would prefer to have in the Git commits.  That approach
is obviously not feasible to automate for all Bioconductor packages.
Maybe this can be handled optionally by each package maintainers by
adding a .gitauthors file to the package root (or possibly via a
global Bioc one that everyone can commit to), e.g.
```r
hb = Henrik Bengtsson 
j...@foo.com = John Doe 
```
and then Bioc can use this mapping when exporting to Git?

Finally, will people like me who already done the SVN-to-Git migration
be able to use that instead of the Bioc generated one? (I assume not,
but worth asking)


No, we'll make a git snapshots of svn at the time of transition, and 
these git repositories will be the cannonical version -- developers 
will need to sync with these as they see fit.


Martin



Thanks,

Henrik

On Wed, Mar 29, 2017 at 9:51 AM, Turaga, Nitesh
 wrote:

Dear Bioconductor Developers,

More news about the Git transition plan. We are coming close to our 
transition date and have made significant progress in getting our 
new server ready for the Bioconductor community.


1. Overall plan:

- Bioconductor hosts each package as a distinct repository at 
git.bioconductor.org. From 
Bioconductor's perspective, this is the canonical location. Nightly 
builds will be based on these repositories, release branches will be 
created in these repositories, etc. The naming convention for 
branches remains the same.


- Developers clone or otherwise sync their code base to these 
repositories. Each developer will be able to push to and pull from 
(e.g., during branching and version bumps at package release) their 
git.bioconductor.org repository. 
Version bumps and new branches(during Bioconductor release) will be 
handled by the core team.


- Developers are encouraged to host and develop their source code on 
Github or other git-based social platforms. This promotes community 
involvement, and empowers developers to adopt best practices related 
to issue tracking, continuous integration, bug fixes, pull requests 
from their community, etc.


- All bioconductor infrastructure code will also be available on 
Github, through our organization 
page(https://github.com/Bioconductor). Community members are 
encouraged to send us pull requests for all our public repositories.


2. Timing:

- Our plan is to make a 'clean' transition from SVN to git, 
approximately one month after the next Bioconductor release. 
Developers or users will not have access to the SVN system after the 
date of transition.


3. Repositories:

- The git repositories will be derived from a 'snapshot' of the 
latest SVN repository at the time of the transition. After the date 
of transition, further commits to SVN will not be reflected in the 
new git repositories.


- Each repository will capture the full SVN commit history of the 
package. Releases will be included as branches in

Re: [Bioc-devel] Question about R functions

2017-03-30 Thread Nathan Sheffield

HI Jing,

You should export FA FB and FC, but don't export FD. If using roxygen2 
for documentation you use "#' @export" on the ones to export, and just 
don't document FD and it won't be exported by default.


Hope that helps,

-Nathan


On 03/30/2017 04:02 PM, Jing Wang wrote:

Hi,

  


I have three functions (FA,FB,FC) in the package and all these functions need 
to call another function (FD). But I do not want other users to use the 
function FD and thus I do not want to create the document for FD in the R 
package.

  


Could you please give me some suggestion how to do that?

  


Thanks,

  

  

  



[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] IRanges findOverlaps and queryHits

2016-12-20 Thread Nathan Sheffield

Did the findOverlaps return object get a @queryHits slot removed recently?

I recently got this error running some of my code:

Error: no slot of name "queryHits" for this object of class 
"SortedByQueryHits"


My workflow is basically,

fo = findOverlaps(...)
fo@queryHits

Using queryHits(fo) works, and using fo@queryHits to access it via slot 
has always worked for me as well -- until this time. I couldn't find 
anything in a changelog describing any changes into IRanges slots. 
Thought someone here might be able to shed some light on that for me... 
does anyone know what happened to the ability to access with @queryHits? 
Some relevant sessionInfo():


> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.8 (Final)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C LC_TIME=en_US.UTF-8
LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 
LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C 
LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C


attached base packages:
[1] parallel  stats4stats graphics  grDevices utils 
datasets  methods   base


other attached packages:
 [1] RGenomeUtils_0.01 BSgenome.Hsapiens.UCSC.hg19.masked_1.3.99 
BSgenome.Hsapiens.UCSC.hg19_1.4.0

 [4] BSgenome_1.42.0 rtracklayer_1.34.1 Biostrings_2.42.0
 [7] XVector_0.14.0 LOLA_1.4.0 GenomicRanges_1.26.1
[10] GenomeInfoDb_1.10.1 IRanges_2.8.1 S4Vectors_0.12.0
[13] BiocGenerics_0.20.0 ggplot2_2.1.0 simpleCache_0.0.1
[16] extrafont_0.17 data.table_1.9.6 devtools_1.12.0
[19] project.init_0.0.1

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.7plyr_1.8.4 bitops_1.0-6   
tools_3.3.0
 [5] zlibbioc_1.20.0testthat_1.0.2 
digest_0.6.10  lattice_0.20-33
 [9] memoise_1.0.0  gtable_0.2.0 Matrix_1.2-6   
yaml_2.1.13
[13] Rttf2pt1_1.3.4 withr_1.0.2 stringr_1.1.0  
roxygen2_5.0.1
[17] grid_3.3.0 Biobase_2.34.0 
R6_2.2.0   BiocParallel_1.8.1
[21] XML_3.98-1.4   reshape2_1.4.2 
extrafontdb_1.0magrittr_1.5
[25] GenomicAlignments_1.10.0   Rsamtools_1.26.1 
scales_0.4.1   SummarizedExperiment_1.4.0
[29] colorspace_1.3-0   stringi_1.1.2 RCurl_1.95-4.8 
munsell_0.4.3

[33] chron_2.3-47   crayon_1.3.2
>


-Nathan

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] GitHub and svn

2016-10-15 Thread Nathan Sheffield

Hi Kevin, Mani --

Isn't the git-svn bridge deprecated?

http://bioconductor.org/developers/how-to/git-svn/
And this message:

- Forwarded Message -
From: "Dan Tenenbaum"
To: "bioc-devel"
Sent: Monday, December 28, 2015 11:08:29 AM
Subject: [Bioc-devel] Git-Svn bridge will be removed permanently on January 
29, 2016

Attention Bioconductor developers,

The deprecated Git-SVN bridge will be removed permanently on Friday, January 
29th, 2016.

But anyway, Mani, I at least agree with you that the current system is 
"unnecessarily complicated" (git-svn, cherry-picking, credentials, 
mirrors and so forth) and I hope it will become much better after, as 
Kasper put it, "the (long) process of changing this" gets finished, 
which I hope means a system that is git-centered, since that seems to be 
the way the world has moved.


I, for one at least, have several ideas for new packages and maintenance 
of old ones that I continue to put off on the hopes that one day soon 
the update system will be as easy for an official Bioconductor repo as 
it is to simply push a git commit to my own repository on GitHub. But 
alas, it is not yet the case -- but I think when this gets "fixed" it 
will be a glorious future indeed.


-Nathan

On 10/15/2016 06:41 PM, Kevin RUE wrote:

Dear Mani,

That's actually where I think the git-svn bridge becomes useful.
If you take the time once to synchronise the devel branch of your package
from the Bioconductor-mirror to one of the branch on your GitHub repository
(for me the master branch), you could then:
1) commit changes to your own repository first (does not affect the BioC
code)
2) follow the BioC git-svn instructions that you mentioned to selectively
push the changes to either or both the devel and release branche(s) of
Bioconductor.

I personally don't think that the current system is "unnecessarily
complicated". That handful of commands is probably as simple as a system
can be, to handle version control selectively applied to multiple branches
(devel, 3.3, 3.2, master, ...) of multiple repositories (BioC, GitHub, ...)
using multiple version control software (git, svn).

The learning curve for version control can be quite steep beyond the basic
commands, but one has to bear in mind that RStudio is not a version control
software in itself. The GUI only provides buttons for the most common
version controls commands (pull/push/update). For more advanced commands,
you will have to open a shell anyway (the little wheel icon).
I must admit that I was a bit scared/frustrated at first with the system,
but after a few attempts to get it right, this little process to control
the branches to synchronise can almost become enjoyable when proudly
releasing new code :)

All the best,
Kevin


On Sat, Oct 15, 2016 at 10:21 PM, S Manimaran 
wrote:


Thanks, Gabe and Kasper, for the info.



Following up on Gabe's reply: That means, I need to setup two projects in
R-Studio with one for the release pointing to the release repository in SVN
and another for the development version pointing to the development
repository in SVN, right? Now, suppose I make a bug fix and commit to the
release repository and I want the same fix in the development repository as
well, how exactly do I go about this: Do I just manually copy those files
with the changes to the other development version project and commit it
there as well? (Personally, I also like to keep the original GitHub
repository in sync with the latest in BioConductor development, which would
mean I need to maintain three projects in R-Studio, right?) Or is there any
other way about this?



Thanks,

Mani




From: Gabe Becker [mailto:becker.g...@gene.com]
Sent: Saturday, October 15, 2016 4:35 PM
To: Kasper Daniel Hansen
Cc: S Manimaran; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] GitHub and svn


Mani,

Related to what Kasper said, one thing you can do is commit directly to
the canonical repo for your package (which again is not on github once the
package is accepted) from rstudio. It supports svn.

~G

On Oct 15, 2016 11:38 AM, "Kasper Daniel Hansen" <
kasperdanielhan...@gmail.com> wrote:
Not at the moment.  We are in the (long) process of changing this, but
there is no ETA for it.

The complications we currently have, as soon as a package is accepted in
Bioconductor, is that the "true" repository then becomes Bioconductor SVN
and your Github repository is just a way for you to develop.  This is not
the case during package submission.

Best,
Kasper


On Sat, Oct 15, 2016 at 12:19 PM, S Manimaran mailto:manimaran_1...@hotmail.com>>
wrote:


Hi,

I never understood the github mirror setup and the instructions below

look

unnecessarily complicated to me. I see that the current package

submission

process with the automatic hook added to github is the most easiest of

all

with every commit to github automatically triggering a build at
Bioconductor. Now, my question is: Can't this same procedure be car

Re: [Bioc-devel] Conflicting Imports in Namespace

2016-07-26 Thread Nathan Sheffield
You may also find this useful if you only need to use one of the two.

To import every symbol from a package but for a few exceptions, pass the 
|except| argument to |import|. The directive

import(foo, except=c(bar, baz))

https://cran.r-project.org/doc/manuals/R-exts.html#Specifying-imports-and-exports




On 07/20/2016 02:48 AM, Martin Morgan wrote:
> On 07/20/2016 03:12 AM, Aaron Taudt wrote:
>> Hi,
>>
>> I have a package which imports the two functions ggdendro::segment and
>> DNAcopy::segment. This leads to a warning message when checking the 
>> package:
>>
>> Warning: replacing previous import ‘DNAcopy::segment’ by
>> ‘ggdendro::segment’ when loading ‘AneuFinder’
>>
>> How can I avoid this warning and properly import the two functions to 
>> pass
>> R CMD check? I use the two segment functions in different functions 
>> of my
>> package.
>
> Import: both packages in the DESCRIPTION. Don't mention (explicitly or 
> implicitly, e.g., by import()ing both packages) 'segment' twice in the 
> NAMESPACE. Resolve symbols as you do above in the code.
>
> Martin
>
>>
>> Aaron
>>
>> [[alternative HTML version deleted]]
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
>
> This email message may contain legally privileged and/or...{{dropped:2}}
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] New package: LOLA - Locus overlap analysis for enrichment of genomic ranges

2015-10-20 Thread Nathan Sheffield
We're pleased to announce the release of a new Bioconductor package, 
LOLA (Locus Overlap Analysis). LOLA is a package for genomic locus 
overlap enrichment. Roughly analogous to what GSEA does for gene sets, 
LOLA does for genomic regions, which can be defined however you like, 
including experiments like ChIP-seq, BS-seq, DNase-seq, etc. LOLA lets 
you test your genomic ranges of interest against a database of other 
genomic range sets to identify enrichment of overlap, tying external 
annotation to your regions of interest. A complete enrichment analysis 
against a database of thousands of region sets requires just 3 lines of 
R code and completes in minutes.


Along with LOLA, we provide the LOLA Core Database, which includes 
region sets from ENCODE, Roadmap Epigenomics, Cistrome, CODEX, UCSC, and 
other public databases. We intend to build, maintain, and curate this 
database over the coming years. You can also use LOLA for custom 
analysis by creating custom databases following instructions in the LOLA 
readme.


You can find more information, vignettes, and downloads at LOLA's 
website (http://databio.org/lola/), or follow our development at GitHub 
(https://github.com/sheffien/LOLA). We would be eager to hear any 
suggestions or feedback!


Nathan Sheffield and Christoph Bock

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Dependencies in Bioconductor dockers

2015-08-31 Thread Nathan Sheffield



On 08/31/2015 09:52 AM, Laurent Gatto wrote:

On 29 August 2015 01:19, Martin Morgan wrote:


On 08/28/2015 02:51 PM, Dan Tenenbaum wrote:


- Original Message -

From: "Laurent Gatto" 
To: "Dan Tenenbaum" 
Cc: "Kasper Daniel Hansen" , "bioC-devel" 
, "Laurent
Gatto" 
Sent: Friday, August 28, 2015 2:28:29 PM
Subject: Re: [Bioc-devel] Dependencies in Bioconductor dockers


On 28 August 2015 20:42, Dan Tenenbaum wrote:


- Original Message -

From: "Kasper Daniel Hansen" 
To: "Laurent Gatto" 
Cc: "bioC-devel" 
Sent: Wednesday, August 26, 2015 2:36:08 PM
Subject: Re: [Bioc-devel] Dependencies in Bioconductor dockers

This might be especially nice if we use the docker containers for
R
CMD
check.


In this case, you would be checking your own package, right, so the
docker image cannot know in advance what the Suggests dependencies
of
your package are.

[More below].



On Wed, Aug 26, 2015 at 10:56 PM, Laurent Gatto 
wrote:


Dear all,

As far as I can see, the Suggests dependencies of a package are
not
included in the docker containers. Would you consider adding
these?
It
would be nice to be able to run all examples and vignette code
of
the
packages available in a container.


Adding the Suggests dependencies of all packages installed on the
image is going to make the image much bigger. This request comes
soon
after other requests to reduce the size of the images. We should
probably have a wider discussion and decide exactly what type of
docker images we want to have.

Use cases that have been mentioned are:

- an image for building/checking with travis (sounds similar to
  Kasper's request above).  For this one in particular, small
  size is
  important as Travis has to build its environment from scratch
  every
  time, and loading large images takes too long.
- an image that has the Suggests dependencies of all installed
  packages installed.

We might want to pick a different way to decide what packages are
installed on a given image.  Currently we install all packages with
a
given biocView (Sequencing for example) and this leads to very
large
images (sequencing = ~7.5GB).

Thank you for these clarifications, Dan.

If there is interest in having full/complete containers in addition
to
requiring light ones, would it make sense to distribute both? Would
that
be much overhead?


I think it definitely makes sense to distribute the light containers. (and even 
then, I want to see how small a 'light' container is--one that contains R, 
LaTeX, and every system dependency that we know about)
I am a little hesitant to make the existing bloated containers even bigger by 
adding all the Suggests dependencies. That's why I said we might want to 
revisit the way we decide what packages are on a given container. Right now we 
use biocViews (Microarray, Sequencing, Proteomics, FlowCytometry) but that 
results in huge containers containing many packages that people arguably don't 
use that much but just happen to have the correct biocView. Of course it does 
have the benefit of being a somewhat democratic method.


I don't really know what I'm talking about, but does it make sense to think of
the docker images provided by Bioconductor as building blocks for more
specialized containers? i.e., that it should not be 'hard' for a developer to
make an image that is appropriate for their particular needs?

It seems like there's value to some level of nimbleness provided by small
container size. I also wonder about LaTeX -- it seems like HTML vignettes are
way better, and since docker images are forward-looking, maybe the images should
be provisioned with the notion that they'll support HTML?

Maybe there could be a docker-factory script that would take the name of a base
image and the path to a package repository, and create a derived image with the
additional necessary dependencies?

That sounds like a great idea. It would still be nice if Bioconductor
kept the topic specific containers (flow, microarrays, proteomics,
sequencing).

Laurent




I can pitch in a viewpoint here... I'm doing basically exactly this. 
I've created several of my own Dockerfiles, which essentially use the 
base bioconductor images, and then build on these various combinations 
of packages that I need; one for production, one for development, etc.


I even wrote a few "R setup" scripts that just take a list of packages, 
and then install these into a new container on top of the bioconductor 
base images. Seems like almost exactly what you're describing, actually.


I don't think it's really reasonable to expect bioconductor to create 
docker images like this, for every possible use case; but providing a 
base image is very useful, and then people (like me) can use this to 
build our own containers, with whatever packages we require. We could 
even write a tutorial on how to do this...


I don't think it's particularly useful to make huge, even democratic 
containers with all packages of a certain type,