Looking at the codetools package, I think "findGlobals" is basically exactly what we want here, right? As you say, there are necessarily limitations due to R being a dynamic language, but the goal is to catch common errors, not stop people from tricking the check.
I think I'll try to code something up soon. -Ryan On 11/3/13, 5:10 PM, Gabriel Becker wrote: > Henrik, > > See https://github.com/duncantl/CodeDepends (as used by used by > https://github.com/gmbecker/RCacheSuite). It will identify necessarily > defined symbols (input variables) for code that is not doing certain > tricks (eg get(), mixing data.frame columns and gobal variables in > formulas, etc ). > > Tierney's codetools package also does things along these lines but > there are some situations where it has trouble. I can give more detail > if desired. > > ~G > > > On Sun, Nov 3, 2013 at 3:04 PM, Ryan <r...@thompsonclan.org > <mailto:r...@thompsonclan.org>> wrote: > > Another potential easy step we can do is that if FUN function in > the user's workspace, we automatically export that function under > the same name in the children. This would make recursive functions > just work, but it might be a bit too magical. > > > On 11/3/13, 2:38 PM, Ryan wrote: > > Here's an easy thing we can add to BiocParallel in the short > term. The following code defines a wrapper function > "withBPExtraErrorText" that simply appends an additional > message to the end of any error that looks like it is about a > missing variable. We could wrap every evaluation in a similar > tryCatch to at least provide a more informative error message > when a subprocess has a missing variable. > > -Ryan > > withBPExtraErrorText <- function(expr) { > tryCatch({ > expr > }, simpleError = function(err) { > if (grepl("^object '(.*)' not found$", err$message, > perl=TRUE)) { > ## It is an error due to a variable not found. > err$message <- paste0(err$message, ". Maybe you > forgot to export this variable from the main R session using > \"bpexport\"?") > } > stop(err) > }) > } > > x <- 5 > > ## Succeeds > withBPExtraErrorText(x) > > ## Fails with more informative error message > withBPExtraErrorText(y) > > > > On Sun Nov 3 14:01:48 2013, Henrik Bengtsson wrote: > > On Sun, Nov 3, 2013 at 1:29 PM, Michael Lawrence > <lawrence.mich...@gene.com > <mailto:lawrence.mich...@gene.com>> wrote: > > An analog to clusterExport is a good idea. To make it > even easier, we could > have a dynamic environment based on object tables that > would catch missing > symbols and download them from the parent thread. But > maybe there's some > benefit to being explicit? > > > A first step to fully automate this would be to provide > some (opt > in/out) mechanism for code inspection and warn about > non-defined > objects (cf. 'R CMD check'). That is of course major > work, but will > certainly spare the community/users 1000's of hours in > troubleshooting > and the mailing lists from "why doesn't my parallel code > not work" > messages. Such protection may be better suited for the > 'parallel' > package though. Unfortunately, it's beyond my skills/time > to pull > such a thing together. > > /Henrik > > > Michael > > > On Sun, Nov 3, 2013 at 12:39 PM, Henrik Bengtsson > <h...@biostat.ucsf.edu <mailto:h...@biostat.ucsf.edu>> > wrote: > > > Hi, > > in BiocParallel, is there a suggested (or planned) > best standards for > making *locally* assigned variables (e.g. > functions) available to the > applied function when it runs in a separate R > process (which will be > the most common use case)? I understand that > avoid local variables > should be avoided and it's preferred to put as > mush as possible in > packages, but that's not always possible or very > convenient. > > EXAMPLE: > > library('BiocParallel') > library('BatchJobs') > > # Here I pick a recursive functions to make the > problem a bit harder, i.e. > # the function needs to call itself ("itself" = > see below) > fib <- function(n=0) { > if (n < 0) stop("Invalid 'n': ", n) > if (n == 0 || n == 1) return(1) > fib(n-2) + fib(n-1) > } > > # Executing in the current R session > cluster.functions <- makeClusterFunctionsInteractive() > bpParams <- > BatchJobsParam(cluster.functions=cluster.functions) > register(bpParams) > values <- bplapply(0:9, FUN=fib) > ## SubmitJobs |++++++++++++++++++++++++++++++++++| > 100% (00:00:00) > ## Waiting [S:0 R:0 D:10 E:0] > |+++++++++++++++++++| 100% (00:00:00) > > > # Executing in a separate R process, where fib() > is not defined > # (not specific to BiocParallel) > cluster.functions <- makeClusterFunctionsLocal() > bpParams <- > BatchJobsParam(cluster.functions=cluster.functions) > register(bpParams) > values <- bplapply(0:9, FUN=fib) > ## SubmitJobs |++++++++++++++++++++++++++++++++++| > 100% (00:00:00) > ## Waiting [S:0 R:0 D:10 E:0] > |+++++++++++++++++++| 100% (00:00:00) > Error in LastError$store(results = results, > is.error = !ok, throw.error = > TRUE) > : > Errors occurred during execution. First error > message: > Error in FUN(...): could not find function "fib" > [...] > > > # The following illustrates that the solution is > not always > straightforward. > # (not specific to BiocParallel; must have been > discussed previously) > values <- bplapply(0:9, FUN=function(n, fib) { > fib(n) > }, fib=fib) > Error in LastError$store(results = results, > is.error = !ok, > throw.error = TRUE) : > Errors occurred during execution. First error > message: > Error in fib(n): could not find function "fib" > [...] > > # Workaround; make fib() aware of itself > # (this is something the user need to do, and > would be very > # hard for BiocParallel et al. to automate. BTW, > should all > # recursive functions be implemented this way?). > fib <- function(n=0) { > if (n < 0) stop("Invalid 'n': ", n) > if (n == 0 || n == 1) return(1) > fib <- sys.function() # Make function aware of > itself > fib(n-2) + fib(n-1) > } > values <- bplapply(0:9, FUN=function(n, fib) { > fib(n) > }, fib=fib) > > > WISHLIST: > Considering the above recursive issue solved, a > slightly more explicit > and standardized solution is then: > > values <- bplapply(0:9, FUN=function(n, > BPGLOBALS=NULL) { > for (name in names(BPGLOBALS)) assign(name, > BPGLOBALS[[name]]) > fib(n) > }, BPGLOBALS=list(fib=fib)) > > Could the above be generalized into something as > neat as: > > bpExport("fib") > values <- bplapply(0:9, FUN=function(n) { > BiocParallel::bpImport("fib") > fib(n) > }) > > or ideally just (analogously to > parallel::clusterExport()): > > bpExport("fib") > values <- bplapply(0:9, FUN=fib) > > /Henrik > > _______________________________________________ > Bioc-devel@r-project.org > <mailto:Bioc-devel@r-project.org> mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > > > _______________________________________________ > Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> > mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > _______________________________________________ > Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing > list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > > > -- > Gabriel Becker > Graduate Student > Statistics Department > University of California, Davis [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel