Re: [R] Parallel assignments and goto

Thomas Mailund Tue, 27 Feb 2018 08:47:28 -0800

I did try assign. That was the slowest version from what my profiling could 
tell, as far as I recall, which really surprised me. I had expected it to be 
the fastest. The second slowest was using the [[ operator on environments. Or 
it might be the reverse for those two. They were both slower than the other 
versions I posted here.


Cheers

On 27 Feb 2018, 17.16 +0100, Bert Gunter <bgunter.4...@gmail.com>, wrote:
> No clue, but see ?assign perhaps if you have not done so already.
>
> -- Bert
>
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and 
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
> > On Tue, Feb 27, 2018 at 6:51 AM, Thomas Mailund <thomas.mail...@gmail.com> 
> > wrote:
> > > Interestingly, the <<- operator is also a lot faster than using a 
> > > namespace explicitly, and only slightly slower than using <- with local 
> > > variables, see below. But, surely, both must at some point insert values 
> > > in a given environment — either the local one, for <-, or an enclosing 
> > > one, for <<- — so I guess I am asking if there is a more low-level 
> > > assignment operation I can get my hands on without diving into C?
> > >
> > >
> > > factorial <- function(n, acc = 1) {
> > >     if (n == 1) acc
> > >     else factorial(n - 1, n * acc)
> > > }
> > >
> > > factorial_tr_manual <- function (n, acc = 1)
> > > {
> > >     repeat {
> > >         if (n <= 1)
> > >             return(acc)
> > >         else {
> > >             .tailr_n <- n - 1
> > >             .tailr_acc <- acc * n
> > >             n <- .tailr_n
> > >             acc <- .tailr_acc
> > >             next
> > >         }
> > >     }
> > > }
> > >
> > > factorial_tr_automatic_1 <- function(n, acc = 1) {
> > >     .tailr_n <- n
> > >     .tailr_acc <- acc
> > >     callCC(function(escape) {
> > >         repeat {
> > >             n <- .tailr_n
> > >             acc <- .tailr_acc
> > >             if (n <= 1) {
> > >                 escape(acc)
> > >             } else {
> > >                 .tailr_n <<- n - 1
> > >                 .tailr_acc <<- n * acc
> > >             }
> > >         }
> > >     })
> > > }
> > >
> > > factorial_tr_automatic_2 <- function(n, acc = 1) {
> > >     .tailr_env <- rlang::get_env()
> > >     callCC(function(escape) {
> > >         repeat {
> > >             if (n <= 1) {
> > >                 escape(acc)
> > >             } else {
> > >                 .tailr_env$.tailr_n <- n - 1
> > >                 .tailr_env$.tailr_acc <- n * acc
> > >                 .tailr_env$n <- .tailr_env$.tailr_n
> > >                 .tailr_env$acc <- .tailr_env$.tailr_acc
> > >             }
> > >         }
> > >     })
> > > }
> > >
> > > microbenchmark::microbenchmark(factorial(1000),
> > >                                factorial_tr_manual(1000),
> > >                                factorial_tr_automatic_1(1000),
> > >                                factorial_tr_automatic_2(1000))
> > > Unit: microseconds
> > >                            expr     min      lq      mean   median        
> > > uq      max neval
> > >                 factorial(1000) 884.137 942.060 1076.3949 977.6235 
> > > 1042.5035 2889.779   100
> > >       factorial_tr_manual(1000) 110.215 116.919  130.2337 118.7350  
> > > 122.7495  255.062   100
> > >  factorial_tr_automatic_1(1000) 179.897 183.437  212.8879 187.8250  
> > > 195.7670  979.352   100
> > >  factorial_tr_automatic_2(1000) 508.353 534.328  601.9643 560.7830  
> > > 587.8350 1424.260   100
> > >
> > > Cheers
> > >
> > > On 26 Feb 2018, 21.12 +0100, Thomas Mailund <thomas.mail...@gmail.com>, 
> > > wrote:
> > > > Following up on this attempt of implementing the tail-recursion 
> > > > optimisation — now that I’ve finally had the chance to look at it again 
> > > > — I find that non-local return implemented with callCC doesn’t actually 
> > > > incur much overhead once I do it more sensibly. I haven’t found a good 
> > > > way to handle parallel assignments that isn’t vastly slower than simply 
> > > > introducing extra variables, so I am going with that solution. However, 
> > > > I have now run into another problem involving those local variables — 
> > > > and assigning to local variables in general.
> > > >
> > > > Consider again the factorial function and three different ways of 
> > > > implementing it using the tail recursion optimisation:
> > > >
> > > > factorial <- function(n, acc = 1) {
> > > >     if (n == 1) acc
> > > >     else factorial(n - 1, n * acc)
> > > > }
> > > >
> > > > factorial_tr_manual <- function (n, acc = 1)
> > > > {
> > > >     repeat {
> > > >         if (n <= 1)
> > > >             return(acc)
> > > >         else {
> > > >             .tailr_n <- n - 1
> > > >             .tailr_acc <- acc * n
> > > >             n <- .tailr_n
> > > >             acc <- .tailr_acc
> > > >             next
> > > >         }
> > > >     }
> > > > }
> > > >
> > > > factorial_tr_automatic_1 <- function(n, acc = 1) {
> > > >     callCC(function(escape) {
> > > >         repeat {
> > > >             if (n <= 1) {
> > > >                 escape(acc)
> > > >             } else {
> > > >                 .tailr_n <- n - 1
> > > >                 .tailr_acc <- n * acc
> > > >                 n <- .tailr_n
> > > >                 acc <- .tailr_acc
> > > >             }
> > > >         }
> > > >     })
> > > > }
> > > >
> > > > factorial_tr_automatic_2 <- function(n, acc = 1) {
> > > >     .tailr_env <- rlang::get_env()
> > > >     callCC(function(escape) {
> > > >         repeat {
> > > >             if (n <= 1) {
> > > >                 escape(acc)
> > > >             } else {
> > > >                 .tailr_env$.tailr_n <- n - 1
> > > >                 .tailr_env$.tailr_acc <- n * acc
> > > >                 .tailr_env$n <- .tailr_env$.tailr_n
> > > >                 .tailr_env$acc <- .tailr_env$.tailr_acc
> > > >             }
> > > >         }
> > > >     })
> > > > }
> > > >
> > > > The factorial_tr_manual function is how I would implement the function 
> > > > manually while factorial_tr_automatic_1 is what my package used to come 
> > > > up with. It handles non-local returns, because this is something I need 
> > > > in general. Finally, factorial_tr_automatic_2 accesses the local 
> > > > variables explicitly through the environment, which is what my package 
> > > > currently produces.
> > > >
> > > > The difference between supporting non-local returns and not is tiny, 
> > > > but explicitly accessing variables through their environment costs me 
> > > > about a factor of five — something that surprised me.
> > > >
> > > > > microbenchmark::microbenchmark(factorial(1000),
> > > > +                                factorial_tr_manual(1000),
> > > > +                                factorial_tr_automatic_1(1000),
> > > > +                                factorial_tr_automatic_2(1000))
> > > > Unit: microseconds
> > > >                            expr     min       lq     mean   median
> > > >                 factorial(1000) 756.357 810.4135 963.1040 856.3315
> > > >       factorial_tr_manual(1000) 104.838 119.7595 198.7347 129.0870
> > > >  factorial_tr_automatic_1(1000) 112.354 125.5145 211.6148 135.5255
> > > >  factorial_tr_automatic_2(1000) 461.015 544.7035 688.5988 565.3240
> > > >        uq      max neval
> > > >  945.3110 4149.099   100
> > > >  136.8200 4190.331   100
> > > >  152.9625 5944.312   100
> > > >  600.5235 7798.622   100
> > > >
> > > > The simple solution, of course, is to not do that, but then I can’t 
> > > > handle expressions inside calls to “with”. And I would really like to, 
> > > > because then I can combine tail recursion with pattern matching.
> > > >
> > > > I can define linked lists and a length function on them like this:
> > > >
> > > > library(pmatch)
> > > > llist := NIL | CONS(car, cdr : llist)
> > > >
> > > > llength <- function(llist, acc = 0) {
> > > >     cases(llist,
> > > >           NIL -> acc,
> > > >           CONS(car, cdr) -> llength(cdr, acc + 1))
> > > > }
> > > >
> > > > The tail-recursion I get out of transforming this function looks like 
> > > > this:
> > > >
> > > > llength_tr <- function (llist, acc = 0) {
> > > >     .tailr_env <- rlang::get_env()
> > > >     callCC(function(escape) {
> > > >         repeat {
> > > >             if (!rlang::is_null(..match_env <- test_pattern(llist,
> > > >                                                             NIL)))
> > > >                 with(..match_env, escape(acc))
> > > >
> > > >             else if (!rlang::is_null(..match_env <-
> > > >                                      test_pattern(llist, CONS(car, 
> > > > cdr))))
> > > >                 with(..match_env, {
> > > >                     .tailr_env$.tailr_llist <- cdr
> > > >                     .tailr_env$.tailr_acc <- acc + 1
> > > >                     .tailr_env$llist <- .tailr_env$.tailr_llist
> > > >                     .tailr_env$acc <- .tailr_env$.tailr_acc
> > > >                 })
> > > >         }
> > > >     })
> > > > }
> > > >
> > > > Maybe not the prettiest code, but you are not supposed to actually see 
> > > > it, of course.
> > > >
> > > > There is not much gain in speed
> > > >
> > > > Unit: milliseconds
> > > >                    expr      min       lq     mean   median       uq
> > > >     llength(test_llist) 70.74605 76.08734 87.78418 85.81193 94.66378
> > > >  llength_tr(test_llist) 45.16946 51.56856 59.09306 57.00101 63.07044
> > > >       max neval
> > > >  182.4894   100
> > > >  166.6990   100
> > > >
> > > > but you don’t run out of stack space
> > > >
> > > > > llength(make_llist(1000))
> > > > Error: evaluation nested too deeply: infinite recursion / 
> > > > options(expressions=)?
> > > > Error during wrapup: C stack usage  7990648 is too close to the limit
> > > > > llength_tr(make_llist(1000))
> > > > [1] 1000
> > > >
> > > > I should be able to make the function go faster if I had a faster way 
> > > > of handling the variable assignments, but inside “with”, I’m not sure 
> > > > how to do that…
> > > >
> > > > Any suggestions?
> > > >
> > > > Cheers
> > > >
> > > > On 11 Feb 2018, 16.48 +0100, Thomas Mailund <thomas.mail...@gmail.com>, 
> > > > wrote:
> > > > > Hi guys,
> > > > >
> > > > > I am working on some code for automatically translating recursive 
> > > > > functions into looping functions to implemented tail-recursion 
> > > > > optimisations. See https://github.com/mailund/tailr
> > > > >
> > > > > As a toy-example, consider the factorial function
> > > > >
> > > > > factorial <- function(n, acc = 1) {
> > > > > if (n <= 1) acc
> > > > > else factorial(n - 1, acc * n)
> > > > > }
> > > > >
> > > > > I can automatically translate this into the loop-version
> > > > >
> > > > > factorial_tr_1 <- function (n, acc = 1)
> > > > > {
> > > > > repeat {
> > > > > if (n <= 1)
> > > > > return(acc)
> > > > > else {
> > > > > .tailr_n <- n - 1
> > > > > .tailr_acc <- acc * acc
> > > > > n <- .tailr_n
> > > > > acc <- .tailr_acc
> > > > > next
> > > > > }
> > > > > }
> > > > > }
> > > > >
> > > > > which will run faster and not have problems with recursion depths. 
> > > > > However, I’m not entirely happy with this version for two reasons: I 
> > > > > am not happy with introducing the temporary variables and this 
> > > > > rewrite will not work if I try to over-scope an evaluation context.
> > > > >
> > > > > I have two related questions, one related to parallel assignments — 
> > > > > i.e. expressions to variables so the expression uses the old variable 
> > > > > values and not the new values until the assignments are all done — 
> > > > > and one related to restarting a loop from nested loops or from nested 
> > > > > expressions in `with` expressions or similar.
> > > > >
> > > > > I can implement parallel assignment using something like 
> > > > > rlang::env_bind:
> > > > >
> > > > > factorial_tr_2 <- function (n, acc = 1)
> > > > > {
> > > > > .tailr_env <- rlang::get_env()
> > > > > repeat {
> > > > > if (n <= 1)
> > > > > return(acc)
> > > > > else {
> > > > > rlang::env_bind(.tailr_env, n = n - 1, acc = acc * n)
> > > > > next
> > > > > }
> > > > > }
> > > > > }
> > > > >
> > > > > This reduces the number of additional variables I need to one, but is 
> > > > > a couple of orders of magnitude slower than the first version.
> > > > >
> > > > > > microbenchmark::microbenchmark(factorial(100),
> > > > > + factorial_tr_1(100),
> > > > > + factorial_tr_2(100))
> > > > > Unit: microseconds
> > > > > expr min lq mean median uq max neval
> > > > > factorial(100) 53.978 60.543 77.76203 71.0635 85.947 180.251 100
> > > > > factorial_tr_1(100) 9.022 9.903 11.52563 11.0430 11.984 28.464 100
> > > > > factorial_tr_2(100) 5870.565 6109.905 6534.13607 6320.4830 6756.463 
> > > > > 8177.635 100
> > > > >
> > > > >
> > > > > Is there another way to do parallel assignments that doesn’t cost 
> > > > > this much in running time?
> > > > >
> > > > > My other problem is the use of `next`. I would like to combine 
> > > > > tail-recursion optimisation with pattern matching as in 
> > > > > https://github.com/mailund/pmatch where I can, for example, define a 
> > > > > linked list like this:
> > > > >
> > > > > devtools::install_github("mailund/pmatch”)
> > > > > library(pmatch)
> > > > > llist := NIL | CONS(car, cdr : llist)
> > > > >
> > > > > and define a function for computing the length of a list like this:
> > > > >
> > > > > list_length <- function(lst, acc = 0) {
> > > > > force(acc)
> > > > > cases(lst,
> > > > > NIL -> acc,
> > > > > CONS(car, cdr) -> list_length(cdr, acc + 1))
> > > > > }
> > > > >
> > > > > The `cases` function creates an environment that binds variables in a 
> > > > > pattern-description that over-scopes the expression to the right of 
> > > > > `->`, so the recursive call in this example have access to the 
> > > > > variables `cdr` and `car`.
> > > > >
> > > > > I can transform a `cases` call to one that creates the environment 
> > > > > containing the bound variables and then evaluate this using `eval` or 
> > > > > `with`, but in either case, a call to `next` will not work in such a 
> > > > > context. The expression will be evaluated inside `bind` or `with`, 
> > > > > and not in the `list_lenght` function.
> > > > >
> > > > > A version that *will* work, is something like this
> > > > >
> > > > > factorial_tr_3 <- function (n, acc = 1)
> > > > > {
> > > > > .tailr_env <- rlang::get_env()
> > > > > .tailr_frame <- rlang::current_frame()
> > > > > repeat {
> > > > > if (n <= 1)
> > > > > rlang::return_from(.tailr_frame, acc)
> > > > > else {
> > > > > rlang::env_bind(.tailr_env, n = n - 1, acc = acc * n)
> > > > > rlang::return_to(.tailr_frame)
> > > > > }
> > > > > }
> > > > > }
> > > > >
> > > > > Here, again, for the factorial function since this is easier to 
> > > > > follow than the list-length function.
> > > > >
> > > > > This solution will also work if you return values from inside loops, 
> > > > > where `next` wouldn’t work either.
> > > > >
> > > > > Using `rlang::return_from` and `rlang::return_to` implements the 
> > > > > right semantics, but costs me another order of magnitude in running 
> > > > > time.
> > > > >
> > > > > microbenchmark::microbenchmark(factorial(100),
> > > > > factorial_tr_1(100),
> > > > > factorial_tr_2(100),
> > > > > factorial_tr_3(100))
> > > > > Unit: microseconds
> > > > > expr min lq mean median uq max neval
> > > > > factorial(100) 52.479 60.2640 93.43069 67.5130 83.925 2062.481 100
> > > > > factorial_tr_1(100) 8.875 9.6525 49.19595 10.6945 11.217 3818.823 100
> > > > > factorial_tr_2(100) 5296.350 5525.0745 5973.77664 5737.8730 6260.128 
> > > > > 8471.301 100
> > > > > factorial_tr_3(100) 77554.457 80757.0905 87307.28737 84004.0725 
> > > > > 89859.169 171039.228 100
> > > > >
> > > > > I can live with the “introducing extra variables” solution to 
> > > > > parallel assignment, and I could hack my way out of using `with` or 
> > > > > `bind` in rewriting `cases`, but restarting a `repeat` loop would 
> > > > > really make for a nicer solution. I know that `goto` is considered 
> > > > > harmful, but really, in this case, it is what I want.
> > > > >
> > > > > A `callCC` version also solves the problem
> > > > >
> > > > > factorial_tr_4 <- function(n, acc = 1) {
> > > > > function_body <- function(continuation) {
> > > > > if (n <= 1) {
> > > > > continuation(acc)
> > > > > } else {
> > > > > continuation(list("continue", n = n - 1, acc = acc * n))
> > > > > }
> > > > > }
> > > > > repeat {
> > > > > result <- callCC(function_body)
> > > > > if (is.list(result) && result[[1]] == "continue") {
> > > > > n <- result$n
> > > > > acc <- result$acc
> > > > > next
> > > > > } else {
> > > > > return(result)
> > > > > }
> > > > > }
> > > > > }
> > > > >
> > > > > But this requires that I know how to distinguish between a valid 
> > > > > return value and a tag for “next” and is still a lot slower than the 
> > > > > `next` solution
> > > > >
> > > > > microbenchmark::microbenchmark(factorial(100),
> > > > > factorial_tr_1(100),
> > > > > factorial_tr_2(100),
> > > > > factorial_tr_3(100),
> > > > > factorial_tr_4(100))
> > > > > Unit: microseconds
> > > > > expr min lq mean median uq max neval
> > > > > factorial(100) 54.109 61.8095 81.33167 81.8785 89.748 243.554 100
> > > > > factorial_tr_1(100) 9.025 9.9035 11.38607 11.1990 12.008 22.375 100
> > > > > factorial_tr_2(100) 5272.524 5798.3965 6302.40467 6077.7180 6492.959 
> > > > > 9967.237 100
> > > > > factorial_tr_3(100) 66186.080 72336.2810 76480.75172 73632.9665 
> > > > > 75405.054 203785.673 100
> > > > > factorial_tr_4(100) 270.978 302.7890 337.48763 313.9930 334.096 
> > > > > 1425.702 100
> > > > >
> > > > > I don’t necessarily need the tail-recursion optimisation to be faster 
> > > > > than the recursive version; just getting out of the problem of too 
> > > > > deep recursions is a benefit, but I would rather not pay with an 
> > > > > order of magnitude for it. I could, of course, try to handle cases 
> > > > > that works with `next` in one way, and other cases using `callCC`, 
> > > > > but I feel it should be possible with a version that handles all 
> > > > > cases the same way.
> > > > >
> > > > > Is there any way to achieve this?
> > > > >
> > > > > Cheers
> > > > > Thomas
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > >
> > >         [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide 
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Parallel assignments and goto

Reply via email to