Re: [Bioc-devel] What is good convention for package-local BiocParallel param?

2019-01-15 Thread Aaron Lun
> From the above example, if I had the BPPARAM argument, it’d also clearly add 
> a lot of code noise:
> 
> data %>%
> apply_methods(method_list1, BPPARAM = MulticoreParam(stop.on.error=FALSE)) %>%
> apply_methods(method_list2, BPPARAM = MulticoreParam(stop.on.error=FALSE)) %>%
> apply_methods(method_list3, BPPARAM = MulticoreParam(stop.on.error=FALSE))
> 
> The compromise is to have
> 
> my_bpparam = MulticoreParam(stop.on.error=FALSE)
> 
> data %>%
> apply_methods(method_list1, BPPARAM = my_bpparam) %>%
> apply_methods(method_list2, BPPARAM = my_bpparam) %>%
> apply_methods(method_list3, BPPARAM = my_bpparam)

This actually looks like the best option to me. Nice and explicit, amenable to 
static code analysis (manual or automated).

> But really at that point why not just have
> 
> set_cellbench_bpparam(MulticoreParam(stop.on.error=FALSE))
> 
> data %>%
> apply_methods(method_list1) %>%
> apply_methods(method_list2) %>%
> apply_methods(method_list3)
> 
> Which I guess is the point of this mail chain. Is there a reason why I 
> shouldn’t?

Can’t hurt to have fewer globals, I guess?

-A

>> =On 12 Jan 2019, at 5:43 pm, Aaron Lun 
>>  wrote:
>> 
>> My current set-up in a variety of packages is that every parallelizable 
>> function has a BPPARAM= argument. This makes it explicit about which steps 
>> are being parallelized. Requiring users to respecify BPPARAM= in each 
>> function isn’t as annoying as you’d think, because not many steps are 
>> actually heavy enough to warrant parallelization.
>> 
>> Ideally, I'd have BPPARAM=bpparam() by default, allowing me to both respond 
>> to the register()'d bpparam() as well as any custom argument that might be 
>> supplied by the user, e.g., if they don't want to change bpparam(). However, 
>> for various reasons (discussed in the other SerialParam() thread), the 
>> current default is BPPARAM=SerialParam().
>> 
>> To be honest, I've never thought it necessary to have a global 
>> package-specific parameter for parallelization as you've done (for scPipe, 
>> presumably). The current options - global across all packages with 
>> register(), or local to individual functions with BPPARAM= - seem to be 
>> satisfactory in the vast majority of cases. At least to me.
>> 
>> And at least for RNGs, if a function from another package is giving greatly 
>> different results upon parallelization (excepting some numerical error with 
>> changed order of summation), I'd say that's a bug of some sort. That should 
>> be fixed on their end, rather than requiring other packages and users to 
>> tiptoe around them.
>> 
>> -A
>> 
>>> On 10 Jan 2019, at 23:59, Shian Su  wrote:
>>> 
>>> Hello Developers,
>>> 
>>> I’m using BiocParallel for parallelism, and I understand that register() is 
>>> the recommended method for setting threads. But I am reluctant to ask 
>>> people to run code for my package which changes how other packages operate, 
>>> so I figured I’d use local bp params. Recent discussions of RNG has made me 
>>> worried there may be hidden state gotcha’s I’ve not considered. The current 
>>> implementation is
>>> 
>>> set_mypkg_threads <- function(n) {
>>> if (n == 1) {
>>> options(“mypkg.bpparam” = SerialParam())
>>> } else if (n > 1) {
>>> if (.Platform$OS.type == "windows") {
>>> options(“mypkg.bpparam" = SnowParam(nthreads))
>>> } else {
>>> options(“mypkg.bpparam" = MulticoreParam(nthreads))
>>> }
>>> }
>>> }
>>> 
>>> Then elsewhere in my package I make use of parallelism as follows
>>> 
>>> bplapply(
>>> BPPARAM = getOption(“mypkg.bpparam”, bpparam()),
>>> …
>>> )
>>> 
>>> Where getOption() either retrieves my set option or the default value given 
>>> by bpparam(). So the behaviour is that if users have not registered params 
>>> for my package specifically then it will take the BiocParallel default, but 
>>> otherwise it will use my package’s local bpparam.
>>> 
>>> Also I know that as currently implemented, I preclude cluster parallelism 
>>> on non-Windows machines. But it’s easy to fix. Just looking for feedback on 
>>> the approach.
>>> 
>>> Kind regards,
>>> Shian Su
>>> 
>>> ___
>>> 
>>> The information in this email is confidential and intended solely for the 
>>> addressee.
>>> You must not disclose, forward, print or use it without the permission of 
>>> the sender.
>>> 
>>> The Walter and Eliza Hall Institute acknowledges the Wurundjeri people of 
>>> the Kulin
>>> Nation as the traditional owners of the land where our campuses are located 
>>> and
>>> the continuing connection to country and community.
>>> ___
>>> 
>>> [[alternative HTML version deleted]]
>>> 
>>> ___
>>> Bioc-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> 
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> 
> 

Re: [Bioc-devel] What is good convention for package-local BiocParallel param?

2019-01-13 Thread Shian Su
This is for my work-in-progress package CellBench: 
https://github.com/Shians/CellBench, it’s a benchmarking framework for testing 
combinations of methods in a pipeline. Its intended use looks like

data %>%
apply_methods(method_list1) %>%
apply_methods(method_list2) %>%
apply_methods(method_list3)

etc...

Where multiple methods at each stop is applied in a combinatorial fashion 
resulting in results from computations from each combination of methods at each 
step.

At the moment I am working on error logging with stop.on.error = FALSE, and I 
really don’t want to impose this option on other packages using BiocParallel.

From the above example, if I had the BPPARAM argument, it’d also clearly add a 
lot of code noise:

data %>%
apply_methods(method_list1, BPPARAM = MulticoreParam(stop.on.error=FALSE)) %>%
apply_methods(method_list2, BPPARAM = MulticoreParam(stop.on.error=FALSE)) %>%
apply_methods(method_list3, BPPARAM = MulticoreParam(stop.on.error=FALSE))

The compromise is to have

my_bpparam = MulticoreParam(stop.on.error=FALSE)

data %>%
apply_methods(method_list1, BPPARAM = my_bpparam) %>%
apply_methods(method_list2, BPPARAM = my_bpparam) %>%
apply_methods(method_list3, BPPARAM = my_bpparam)

But really at that point why not just have

set_cellbench_bpparam(MulticoreParam(stop.on.error=FALSE))

data %>%
apply_methods(method_list1) %>%
apply_methods(method_list2) %>%
apply_methods(method_list3)

Which I guess is the point of this mail chain. Is there a reason why I 
shouldn’t?

Kind regards,
Shian

> On 12 Jan 2019, at 5:43 pm, Aaron Lun 
>  wrote:
>
> My current set-up in a variety of packages is that every parallelizable 
> function has a BPPARAM= argument. This makes it explicit about which steps 
> are being parallelized. Requiring users to respecify BPPARAM= in each 
> function isn’t as annoying as you’d think, because not many steps are 
> actually heavy enough to warrant parallelization.
>
> Ideally, I'd have BPPARAM=bpparam() by default, allowing me to both respond 
> to the register()'d bpparam() as well as any custom argument that might be 
> supplied by the user, e.g., if they don't want to change bpparam(). However, 
> for various reasons (discussed in the other SerialParam() thread), the 
> current default is BPPARAM=SerialParam().
>
> To be honest, I've never thought it necessary to have a global 
> package-specific parameter for parallelization as you've done (for scPipe, 
> presumably). The current options - global across all packages with 
> register(), or local to individual functions with BPPARAM= - seem to be 
> satisfactory in the vast majority of cases. At least to me.
>
> And at least for RNGs, if a function from another package is giving greatly 
> different results upon parallelization (excepting some numerical error with 
> changed order of summation), I'd say that's a bug of some sort. That should 
> be fixed on their end, rather than requiring other packages and users to 
> tiptoe around them.
>
> -A
>
>> On 10 Jan 2019, at 23:59, Shian Su  wrote:
>>
>> Hello Developers,
>>
>> I’m using BiocParallel for parallelism, and I understand that register() is 
>> the recommended method for setting threads. But I am reluctant to ask people 
>> to run code for my package which changes how other packages operate, so I 
>> figured I’d use local bp params. Recent discussions of RNG has made me 
>> worried there may be hidden state gotcha’s I’ve not considered. The current 
>> implementation is
>>
>> set_mypkg_threads <- function(n) {
>> if (n == 1) {
>> options(“mypkg.bpparam” = SerialParam())
>> } else if (n > 1) {
>> if (.Platform$OS.type == "windows") {
>> options(“mypkg.bpparam" = SnowParam(nthreads))
>> } else {
>> options(“mypkg.bpparam" = MulticoreParam(nthreads))
>> }
>> }
>> }
>>
>> Then elsewhere in my package I make use of parallelism as follows
>>
>> bplapply(
>> BPPARAM = getOption(“mypkg.bpparam”, bpparam()),
>> …
>> )
>>
>> Where getOption() either retrieves my set option or the default value given 
>> by bpparam(). So the behaviour is that if users have not registered params 
>> for my package specifically then it will take the BiocParallel default, but 
>> otherwise it will use my package’s local bpparam.
>>
>> Also I know that as currently implemented, I preclude cluster parallelism on 
>> non-Windows machines. But it’s easy to fix. Just looking for feedback on the 
>> approach.
>>
>> Kind regards,
>> Shian Su
>>
>> ___
>>
>> The information in this email is confidential and intended solely for the 
>> addressee.
>> You must not disclose, forward, print or use it without the permission of 
>> the sender.
>>
>> The Walter and Eliza Hall Institute acknowledges the Wurundjeri people of 
>> the Kulin
>> Nation as the traditional owners of the land where our campuses are located 
>> and
>> the continuing connection to country and community.
>> ___
>>
>> [[alternative HTML version 

Re: [Bioc-devel] What is good convention for package-local BiocParallel param?

2019-01-11 Thread Aaron Lun
My current set-up in a variety of packages is that every parallelizable 
function has a BPPARAM= argument. This makes it explicit about which steps are 
being parallelized. Requiring users to respecify BPPARAM= in each function 
isn’t as annoying as you’d think, because not many steps are actually heavy 
enough to warrant parallelization.

Ideally, I'd have BPPARAM=bpparam() by default, allowing me to both respond to 
the register()'d bpparam() as well as any custom argument that might be 
supplied by the user, e.g., if they don't want to change bpparam(). However, 
for various reasons (discussed in the other SerialParam() thread), the current 
default is BPPARAM=SerialParam().

To be honest, I've never thought it necessary to have a global package-specific 
parameter for parallelization as you've done (for scPipe, presumably). The 
current options - global across all packages with register(), or local to 
individual functions with BPPARAM= - seem to be satisfactory in the vast 
majority of cases. At least to me.

And at least for RNGs, if a function from another package is giving greatly 
different results upon parallelization (excepting some numerical error with 
changed order of summation), I'd say that's a bug of some sort. That should be 
fixed on their end, rather than requiring other packages and users to tiptoe 
around them.

-A

> On 10 Jan 2019, at 23:59, Shian Su  wrote:
> 
> Hello Developers,
> 
> I’m using BiocParallel for parallelism, and I understand that register() is 
> the recommended method for setting threads. But I am reluctant to ask people 
> to run code for my package which changes how other packages operate, so I 
> figured I’d use local bp params. Recent discussions of RNG has made me 
> worried there may be hidden state gotcha’s I’ve not considered. The current 
> implementation is
> 
> set_mypkg_threads <- function(n) {
> if (n == 1) {
> options(“mypkg.bpparam” = SerialParam())
> } else if (n > 1) {
> if (.Platform$OS.type == "windows") {
> options(“mypkg.bpparam" = SnowParam(nthreads))
> } else {
> options(“mypkg.bpparam" = MulticoreParam(nthreads))
> }
> }
> }
> 
> Then elsewhere in my package I make use of parallelism as follows
> 
> bplapply(
> BPPARAM = getOption(“mypkg.bpparam”, bpparam()),
> …
> )
> 
> Where getOption() either retrieves my set option or the default value given 
> by bpparam(). So the behaviour is that if users have not registered params 
> for my package specifically then it will take the BiocParallel default, but 
> otherwise it will use my package’s local bpparam.
> 
> Also I know that as currently implemented, I preclude cluster parallelism on 
> non-Windows machines. But it’s easy to fix. Just looking for feedback on the 
> approach.
> 
> Kind regards,
> Shian Su
> 
> ___
> 
> The information in this email is confidential and intended solely for the 
> addressee.
> You must not disclose, forward, print or use it without the permission of the 
> sender.
> 
> The Walter and Eliza Hall Institute acknowledges the Wurundjeri people of the 
> Kulin
> Nation as the traditional owners of the land where our campuses are located 
> and
> the continuing connection to country and community.
> ___
> 
>   [[alternative HTML version deleted]]
> 
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel