On 3/23/21 1:31 PM, Murphy, Alan E wrote:
Hi Hervé,

My apologies but I don't think I follow your approach. I have put your code below into a utils.R script in ewceData:

.my_internal_global_variables <- new.env(parent=emptyenv())

    .get_eh <- function() get("eh", envir=.my_internal_global_variables)
    .set_eh <- function(value) assign("eh", value,
envir=.my_internal_global_variables)

    get_ExperimentHub <- function()
    {
      eh <- try(.get_eh(), silent=TRUE)
      if (inherits(eh, "try-error")) {
        eh <- ExperimentHub::ExperimentHub()
        .set_eh(eh)
      }
      eh
    }

Then in EWCE, I have made a script for each dataset containing the function you outlined. Note I had to use ewceData:::get_ExperimentHub to access the function, not sure if this is wrong by me up to this point:

## Exported.
tt_alzh <- function()
{
   eh <- ewceData:::get_ExperimentHub()
   eh[["EH5373"]]
}

Lastly to call a dataset I simply use tt_alzh(). However, now when I run a test script I believe the eh statement is being repeatedly called still as I get multiple warnings and the runtime doesn't improve any.

'eh' is a variable so I'm not sure how can it be "repeatedly called".

Also not sure why you want to move tt_alzh() from ewceData to EWCE. Nothing wrong in having it in the former which is actually the natural place for it.

I made these changes to ewceData (see full patch below, note that I renamed my tt_alzh -> load_tt_alzh to avoid conflict with your tt_alzh) and things seem to work as expected for me:

  > library(ewceData)

  > system.time(tt_alzh <- tt_alzh())
  snapshotDate(): 2021-03-23
  see ?ewceData and browseVignettes('ewceData') for documentation
  loading from cache
     user  system elapsed
    2.441   0.028   4.720

  > system.time(tt_alzh <- load_tt_alzh())
  see ?ewceData and browseVignettes('ewceData') for documentation
  loading from cache
     user  system elapsed
    1.215   0.012   1.727

No warning.

H.

hpages@spectre:~/sandbox/ewceData$ git diff --cached
diff --git a/NAMESPACE b/NAMESPACE
index 0144dc9..8ad9b64 100644
--- a/NAMESPACE
+++ b/NAMESPACE
@@ -6,3 +6,5 @@ import(ExperimentHubData)
 import(utils)
 importFrom(AnnotationHub,query)
 importFrom(utils,read.csv)
+
+export(load_tt_alzh)
diff --git a/R/utils.R b/R/utils.R
new file mode 100644
index 0000000..c295aef
--- /dev/null
+++ b/R/utils.R
@@ -0,0 +1,15 @@
+.my_internal_global_variables <- new.env(parent=emptyenv())
+
+.get_eh <- function() get("eh", envir=.my_internal_global_variables)
+.set_eh <- function(value) assign("eh", value, envir=.my_internal_global_variables)
+
+get_ExperimentHub <- function()
+{
+    eh <- try(.get_eh(), silent=TRUE)
+    if (inherits(eh, "try-error")) {
+        eh <- ExperimentHub::ExperimentHub()
+        .set_eh(eh)
+    }
+    eh
+}
+
diff --git a/R/zzz.R b/R/zzz.R
index cf0d87e..b1159b0 100644
--- a/R/zzz.R
+++ b/R/zzz.R
@@ -34,4 +34,12 @@
            assign(xx, func, envir=ns)
            namespaceExport(ns, xx)
          })
-}
\ No newline at end of file
+}
+
+## Exported.
+load_tt_alzh <- function()
+{
+    eh <- get_ExperimentHub()
+    eh[["EH5373"]]
+}



Sorry again if I have misinterpreted your approach.

Kind regards,
Alan.
------------------------------------------------------------------------
*From:* Hervé Pagès <hpages.on.git...@gmail.com>
*Sent:* 23 March 2021 19:08
*To:* Murphy, Alan E <a.mur...@imperial.ac.uk>; Martin Morgan <mtmorgan.b...@gmail.com>; Kern, Lori <lori.sheph...@roswellpark.org>; bioc-devel@r-project.org <bioc-devel@r-project.org>
*Subject:* Re: [Bioc-devel] Methods to speed up R CMD Check
Doesn't really matter where you put them but
.my_internal_global_variables, .get_eh(), .set_eh(), and toto() don't
need to be defined inside the .onLoad() hook, and it's actually
cleaner/easier to not define them there. You can define there anywhere
in your ewceData/R/* files.

Here is a slightly improved version of the code:

    .my_internal_global_variables <- new.env(parent=emptyenv())

    .get_eh <- function() get("eh", envir=.my_internal_global_variables)
    .set_eh <- function(value) assign("eh", value,
envir=.my_internal_global_variables)

    get_ExperimentHub <- function()
    {
      eh <- try(.get_eh(), silent=TRUE)
      if (inherits(eh, "try-error")) {
        eh <- ExperimentHub::ExperimentHub()
        .set_eh(eh)
      }
      eh
    }

In my packages I like to put this kind of low-level stuff in R/utils.R.
None of this is exported.

Then anywhere in your package when you need an ExperimentHub instance, do:

    ## Exported.
    tt_alzh <- function()
    {
      eh <- get_ExperimentHub()
      eh[["EH5373"]]
    }

H.


On 3/23/21 10:53 AM, Murphy, Alan E wrote:
HeyHervé,

Thanks for this it is very helpful and I'm very sorry but I have one more question, where to put option 3? I thought making an onload r script for it:

.onLoad <- function(libname, pkgname) {
    .my_internal_global_variables <- new.env(parent=emptyenv())
    .get_eh <- function() get("eh", envir=.my_internal_global_variables)
    .set_eh <- function(value) assign("eh", value,
                                      envir=.my_internal_global_variables)
    toto <- function()
    {
      eh <- try(.get_eh(), silent=TRUE)
      if (inherits(eh, "try-error")) {
        eh <- ExperimentHub()
        .set_eh(eh)
      }
      eh
    }
    toto()
}

This seems to work in that the script runs (I can tell based on the output with devtools::check()) but I still get an error that eh doesn't exist in my test functions.

Kind regards,
Alan.
------------------------------------------------------------------------
*From:* Hervé Pagès <hpages.on.git...@gmail.com>
*Sent:* 23 March 2021 17:31
*To:* Murphy, Alan E <a.mur...@imperial.ac.uk>; Martin Morgan <mtmorgan.b...@gmail.com>; Kern, Lori <lori.sheph...@roswellpark.org>; bioc-devel@r-project.org <bioc-devel@r-project.org>
*Subject:* Re: [Bioc-devel] Methods to speed up R CMD Check
3 ways to do this, one that doesn't work, and two that work ;-)


1. Simple way that doesn't work:

     ## Just a place holder. Will be initialized at run-time the first
     ## time it's needed.
     .some_internal_global_variable <- NULL

     toto <- function()
     {
       if (is.null(.some_global_variable))
           .some_internal_global_variable <<- 55L
     }

     However, if you put this in your package, you'll get the following
     error the first time toto() is called:

       cannot change value of locked binding for
'.some_internal_global_variable'

2. Simple way that works: initialize the global variable in the
      .onLoad() hook. This works because the .onLoad() hook is executed
      right before the package namespace gets locked.

     ## Just a place holder. Will be initialized at load-time.
     .some_internal_global_variable <- NULL

     .onLoad <- function(libname, pkgname)
     {
       .some_internal_global_variable <<- 55L
     }

     However, I don't really like using this approach when initialization
     of the global variable requires access to the internet. It means that
     in case of connectivity issue your users won't be able to load the
     package and troubleshooting can become really tricky when you can't
     even load a package. So in that case I prefer the solution below.

3. Define the internal global variable in an environment:

     .my_internal_global_variables <- new.env(parent=emptyenv())

     .get_eh <- function() get("eh", envir=.my_internal_global_variables)
     .set_eh <- function(value) assign("eh", value,
envir=.my_internal_global_variables)

     toto <- function()
     {
       eh <- try(.get_eh(), silent=TRUE)
       if (inherits(eh, "try-error")) {
         eh <- ExperimentHub()
         .set_eh(eh)
       }
       eh
     }

Hope this helps,

H.




On 3/23/21 10:05 AM, Murphy, Alan E wrote:
Hey Hervé,

I get the idea now thanks for clarifying. Where do I place the call to ExperimentHub in the package?:

eh <- ExperimentHub()  # the only call to ExperimentHub() in
                            # the entire R session

The package contains calls to the datasets in internal functions, examples, tests and the vignette so eh it would need to be available to all. Sorry I don't have much experience using experiment datasets.

Kind regards,
Alan.

------------------------------------------------------------------------
*From:* Hervé Pagès <hpages.on.git...@gmail.com>
*Sent:* 23 March 2021 16:46
*To:* Murphy, Alan E <a.mur...@imperial.ac.uk>; Martin Morgan <mtmorgan.b...@gmail.com>; Kern, Lori <lori.sheph...@roswellpark.org>; bioc-devel@r-project.org <bioc-devel@r-project.org>
*Subject:* Re: [Bioc-devel] Methods to speed up R CMD Check

*******************
This email originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list https://spam.ic.ac.uk/SpamConsole/Senders.aspx
<https://spam.ic.ac.uk/SpamConsole/Senders.aspx>
<https://spam.ic.ac.uk/SpamConsole/Senders.aspx
<https://spam.ic.ac.uk/SpamConsole/Senders.aspx>>
<https://spam.ic.ac.uk/SpamConsole/Senders.aspx
<https://spam.ic.ac.uk/SpamConsole/Senders.aspx
<https://spam.ic.ac.uk/SpamConsole/Senders.aspx>>> to disable email
stamping for this address.
*******************
On 3/23/21 4:11 AM, Murphy, Alan E wrote:
Hi,

Thank you very much Martin and Hervé for your suggestions. I have reverted my zzz.R on load 
function to that advised by ExperimentHub and had used the ID look up (system.time(tt_alzh 
<- eh[["EH5373"]])) on internal functions and unit tests. However, the check  
is still taking ~18 minutes so I need to do a bit more work.
Even with
my new on load function, calling datasets by name still takes substantially longer, see below for the example Hervé gave on my new code:

a<-function(){
    eh <- query(ExperimentHub(), "ewceData")

The above line is not needed. Creating an ExperimentHub instance can be
quite expensive and with the current approach 'R CMD check' will do it
dozens of times. My suggestion was to create an ExperimentHub instance
once for all the first time you need it, and reuse it in all your data
access functions:

     eh <- ExperimentHub()  # the only call to ExperimentHub() in
                            # the entire R session

Also there's no need to query(). Just use the EHb ID directly on the
ExperimentHub instance to load your data:

     eh[["EH5373"]]

This should reduce 'R CMD check' by a few more minutes.

H.

--
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

--
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

--
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

--
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to