Re: [Rd] Multiple Assignment built into the R Interpreter?

Duncan Murdoch Sun, 12 Mar 2023 04:19:05 -0700

On 12/03/2023 6:07 a.m., Sebastian Martin Krantz wrote:

Thinking more about this, and seeing Kevins examples athttps://github.com/kevinushey/dotty<https://github.com/kevinushey/dotty>, I think this is the most R-likeway of doing it,with an additional benefit as it would allow to introduce the usefuldata.table semantics DT[, .(a = b, c, d)] to more general R. So I wouldpropose tointroduce a new primitive function . <- function(...) .Primitive(".") inR with an assignment method and the following features:

I think that proposal is very unlikely to be accepted. If it was aprimitive function, it could only be maintained by R Core. They arejustifiably very reluctant to take on extra work for themselves.

Kevin's package demonstrates that this can be done entirely in acontributed package, which means there's no need for R Core to beinvolved. I don't know if he has plans to turn his prototype into aCRAN package. If he doesn't, then it will be up to some otherinterested maintainer to step up and take on the task, or it will justfade away.

I haven't checked whether your proposals below represent changes fromthe current version of dotty, but if they do, the way to proceed is tofork that project, implement your changes, and offer to contribute themback to the main branch.


Duncan Murdoch


  * Positional assignment e.g. .[nr, nc] <- dim(x), and named assignment
    e.g. .[new = carb] <- mtcars or .[new = log(carb)] <- mtcars. All
    the functionality proposed by Kevin at
    https://github.com/kevinushey/dotty
    <https://github.com/kevinushey/dotty> is useful, unambiguous and
    feasible.
  * Silent dropping of RHS values e.g. .[mpg_new, cyl_new] <- mtcars.
  * Mixing of positional and named assignment e.g .[mpg_new, carb_new =
    carb, cyl_new] <- mtcars. The inputs not assigned by name are simply
    the elements of RHS in the order they occur, regardless of whether
    they have been used previously e.g. .[mpg_new, cyl_new = cyl,
    log_cyl = log(cyl), cyl_new2] <- mtcars is feasible. RHS here could
    be any named vector type.
  * Conventional use of the function as lazy version of of list(), as in
    data.table: .(A = B, C, D) is the same as list(A = B, C = C, D = D).
    This would also be useful, allowing more parsimonious code, and
    avoid the need to assign names to all return values in a function
    return, e.g. if I already have matrices A, C, Q and R as internal
    objects in my function, I can simply end by return(.(A, C, Q, R))
    instead of return(list(A = A, C = C, Q = Q, R = R)) if I wanted the
    list to be named with the object names.

The implementation of this in R and C should be pretty straightforward.It would just require a modification to R CMD Check to recognize .[<- asassignment.


Best regards,

Sebastian
-
2.)

On Sun, 12 Mar 2023 at 09:42, Sebastian Martin Krantz<sebastian.kra...@graduateinstitute.ch<mailto:sebastian.kra...@graduateinstitute.ch>> wrote:


    Thanks Gabriel and Kevin for your inputs,

    regarding your points Gabriel, I think Python and Julia do allow
    multiple sub-assignment, but in-line with my earlier suggestion in
    response to Duncan to make multiple assignment an environment-level
    operation (like collapse::%=% currently works),  this would not be
    possible in R.

    Regarding the [a] <- coolest_function() syntax, yeah it would mean
    do multiple assignment and set a equal to the first element dropping
    all other elements. Multiple assignment should be positional loke in
    other languages, enabling flexible renaming of objects on the fly.
    So it should be irrelevant whether the function returns a named or
    unnamed list or vector.

    Thanks also Kevin for this contribution. I think it’s a remarkable
    effort, and I wouldn’t mind such semantics e.g. making it a function
    call to ‘.[‘ or any other one-letter function, as long as it’s coded
    in C and recognized by the interpreter as an assignment operation.

    Best regards,

    Sebastian





    On Sun 12. Mar 2023 at 01:00, Kevin Ushey <kevinus...@gmail.com
    <mailto:kevinus...@gmail.com>> wrote:

        FWIW, it's possible to get fairly close to your proposed semantics
        using the existing metaprogramming facilities in R. I put together a
        prototype package here to demonstrate:

        https://github.com/kevinushey/dotty
        <https://github.com/kevinushey/dotty>

        The package exports an object called `.`, with a special
        `[<-.dot` S3
        method which enables destructuring assignments. This means you can
        write code like:

             .[nr, nc] <- dim(mtcars)

        and that will define 'nr' and 'nc' as you expect.

        As for R CMD check warnings, you can suppress those through the
        use of
        globalVariables(), and that can also be automated within the
        package.
        The 'dotty' package includes a function 'dotify()' which automates
        looking for such usages in your package, and calling
        globalVariables()
        so that R CMD check doesn't warn. In theory, a similar technique
        would
        be applicable to other packages defining similar operators (zeallot,
        collapse).

        Obviously, globalVariables() is a very heavy hammer to swing for
        this
        issue, but you might consider the benefits worth the tradeoffs.

        Best,
        Kevin

        On Sat, Mar 11, 2023 at 2:53 PM Duncan Murdoch
        <murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com>> wrote:
         >
         > On 11/03/2023 4:42 p.m., Sebastian Martin Krantz wrote:
         > > Thanks Duncan and Ivan for the careful thoughts. I'm not
        sure I can
         > > follow all aspects you raised, but to give my limited take
        on a few:
         > >

> >> your proposal violates a very basic property of thelanguage, i.e. that all statements are expressions and have a

        value.  > What's the value of 1 + (A, C = init_matrices()).
         > >
         > > I'm not sure I see the point here. I evaluated 1 + (d =
        dim(mtcars); nr
         > > = d[1]; nc = d[2]; rm(d)), which simply gives a syntax error,
         >
         >
         >    d = dim(mtcars); nr = d[1]; nc = d[2]; rm(d)
         >
         > is not a statement, it is a sequence of 4 statements.
         >
         > Duncan Murdoch
         >
         >   as the
         > > above expression should. `%=%` assigns to
         > > environments, so 1 + (c("A", "C") %=% init_matrices()) returns
         > > numeric(0), with A and C having their values assigned.
         > >
         > >> suppose f() returns list(A = 1, B = 2) and I do  > B, A <-
        f() > Should assignment be by position or by name?
         > >
         > > In other languages this is by position. The feature is not
        meant to
         > > replace list2env(), and being able to rename objects in the
        assignment
         > > is a vital feature of codes
         > > using multi input and output functions e.g. in Matlab or Julia.
         > >
         > >> Honestly, given that this is simply syntactic sugar, I
        don't think I would support it.
         > >
         > > You can call it that, but it would be used by almost every
        R user almost
         > > every day. Simple things like nr, nc = dim(x); values,
        vectors =
         > > eigen(x) etc. where the creation of intermediate objects
         > > is cumbersome and redundant.
         > >
         > >> I see you've already mentioned it ("JavaScript-like"). I
        think it would  fulfil Sebastian's requirements too, as long as
        it is considered "true assignment" by the rest of the language.
         > >
         > > I don't have strong opinions about how the issue is phrased or
         > > implemented. Something like [t, n] = dim(x) might even be
        more clear.
         > > It's important though that assignment remains by position,
         > > so even if some output gets thrown away that should also be
        positional.
         > >
         > >>  A <- 0  > [A, B = A + 10] <- list(1, A = 2)
         > >
         > > I also fail to see the use of allowing this. something like
        this is an
         > > error.
         > >
         > >> A = 2
         > >> (B = A + 1) <- 1
         > > Error in (B = A + 1) <- 1 : could not find function "(<-"
         > >
         > > Regarding the practical implementation, I think
        `collapse::%=%` is a
         > > good starting point. It could be introduced in R as a
        separate function,
         > > or `=` could be modified to accommodate its capability. It
        should be
         > > clear that
         > > with more than one LHS variables the assignment is an
        environment level
         > > operation and the results can only be used in computations
        once assigned
         > > to the environment, e.g. as in 1 + (c("A", "C") %=%
        init_matrices()),
         > > A and C are not available for the addition in this
        statement. The
         > > interpretor then needs to be modified to read something
        like nr, nc =
         > > dim(x) or [nr, nc] = dim(x). as an environment-level
        multiple assignment
         > > operation with no
         > > immediate value. Appears very feasible to my limited
        understanding, but
         > > I guess there are other things to consider still.
        Definitely appreciate
         > > the responses so far though.
         > >
         > > Best regards,
         > >
         > > Sebastian
         > >
         > >
         > >
         > >
         > >
         > > On Sat, 11 Mar 2023 at 20:38, Duncan Murdoch
        <murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com>
         > > <mailto:murdoch.dun...@gmail.com
        <mailto:murdoch.dun...@gmail.com>>> wrote:
         > >
         > >     On 11/03/2023 11:57 a.m., Ivan Krylov wrote:
         > >      > On Sat, 11 Mar 2023 11:11:06 -0500
         > >      > Duncan Murdoch <murdoch.dun...@gmail.com
        <mailto:murdoch.dun...@gmail.com>
         > >     <mailto:murdoch.dun...@gmail.com
        <mailto:murdoch.dun...@gmail.com>>> wrote:
         > >      >
         > >      >> That's clear, but your proposal violates a very
        basic property
         > >     of the
         > >      >> language, i.e. that all statements are expressions
        and have a value.
         > >      >
         > >      > How about reframing this feature request from
        multiple assignment
         > >      > (which does go contrary to "everything has only one
        value, even
         > >     if it's
         > >      > sometimes invisible(NULL)") to "structured binding"
        / "destructuring
         > >      > assignment" [*], which takes this single single
        value returned by the
         > >      > expression and subsets it subject to certain rules?
        It may be
         > >     easier to
         > >      > make a decision on the semantics for destructuring
        assignment (e.g.
         > >      > languages which have this feature typically allow
        throwing unneeded
         > >      > parts of the return value away), and it doesn't seem
        to break as much
         > >      > of the rest of the language if implemented.
         > >      >
         > >      > I see you've already mentioned it
        ("JavaScript-like"). I think it
         > >     would
         > >      > fulfil Sebastian's requirements too, as long as it
        is considered
         > >     "true
         > >      > assignment" by the rest of the language.
         > >      >
         > >      > The hard part is to propose the actual grammar of
        the new feature (in
         > >      > terms of src/main/gram.y, preferably without introducing
         > >     conflicts) and
         > >      > its semantics (including the corner cases, some of
        which you have
         > >      > already mentioned). I'm not sure I'm up to the task.
         > >      >
         > >
         > >     If I were doing it, here's what I'd propose:
         > >
         > >         '[' formlist ']' LEFT_ASSIGN expr
         > >         '[' formlist ']' EQ_ASSIGN expr
         > >         expr RIGHT_ASSIGN  '[' formlist ']'
         > >
         > >     where `formlist` has the syntax of the formals list for
        a function
         > >     definition.  This would have the following semantics:
         > >
         > >          {
         > >            *tmp* <- expr
         > >
         > >            # For arguments with no "default" expression,
         > >
         > >            argname1 <- *tmp*[[1]]
         > >            argname2 <- *tmp*[[2]]
         > >            ...
         > >
         > >            # For arguments with a default listed
         > >
         > >            argname3 <- with(*tmp*, default3)
         > >          }
         > >
         > >
         > >     The value of the whole thing would therefore be
        (invisibly) the
         > >     value of
         > >     the last item in the assignment.
         > >
         > >     Two examples:
         > >
         > >         [A, B, C] <- expr   # assign the first three
        elements of expr to A,
         > >     B, and C
         > >
         > >         [A, B, C = a + b] <- expr  # assign the first two
        elements of expr
         > >                                    # to A and B,
         > >                                    # assign with(expr, a +
        b) to C.
         > >
         > >     Unfortunately, I don't think this could be done entirely by
         > >     transforming
         > >     the expression (which is the way |> was done), and that
        makes it a lot
         > >     harder to write and to reason about.  E.g. what does
        this do?
         > >
         > >         A <- 0
         > >         [A, B = A + 10] <- list(1, A = 2)
         > >
         > >     According to the recipe above, I think it sets A to 1
        and B to 12, but
         > >     maybe a user would expect B to be 10 or 11.  And
        according to that
         > >     recipe this is an error:
         > >
         > >         [A, B = A + 10] <- c(1, A = 2)
         > >
         > >     which probably isn't what a user would expect, given
        that this is fine:
         > >
         > >         [A, B] <- c(1, 2)
         > >
         > >     Duncan Murdoch
         > >
         >
         > ______________________________________________
         > R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list
         > https://stat.ethz.ch/mailman/listinfo/r-devel
        <https://stat.ethz.ch/mailman/listinfo/r-devel>


______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Multiple Assignment built into the R Interpreter?

Reply via email to