Re: [Rd] Multiple Assignment built into the R Interpreter?

Duncan Murdoch Tue, 14 Mar 2023 15:58:12 -0700

On 13/03/2023 6:01 a.m., Duncan Murdoch wrote:

Yes, this is really a problem with the checks, not with the language.


A simpler approach than your alternativeAssignment function would be
simply to allow globalVariables() to be limited to a single function as
the note in its help page says.

I just took a look, and this would be quite easy to do. It wouldrequire changes to codetools and to utils, but probably just a few dozenlines.


Duncan Murdoch


This might be tedious to write by hand, but could be automated using
methods like "dotify" in dotty.

Duncan Murdoch


On 12/03/2023 10:36 p.m., Pavel Krivitsky wrote:

Dear All,

As a maintainer of large, complex packages, I can think of many places
in which deconstructing assignment would simplify the code, as well as
facilitate readability by breaking up larger functions into helpers, so
I would be very glad to see this incorporated somehow.

I think the crux of the matter is that while there is a number of ways
to implement deconstructing assignment within R, there is no mechanism
to tell R CMD check about it without also suppressing checks for every
other instance of that variable name. This is particularly problematic
because those variable names are likely to be used elsewhere in the
package.

Workarounds that have been suggested all defeat the conciseness and
clarity of the deconstructing assignment and introduce potential for
subtle bugs.

The check warnings are something that can only be addressed in
'codetools', with a finer API than what utils::globalVariables()
provides. Perhaps this would have a lower hurdle than modifying R
language itself?

  From skimming through the relevant 'codetools' code, one idea for such
an API would be a function, along the lines of

utils::alternativeAssignment(op, assigned)

that sets up a callback assigned = function(op, e) that given the
operator (as string) and the expression it's embedded in, returns a
list of three elements:
   * a character vector containing a list of variables assigned to that
     might not otherwise be detected
   * a character vector containing a list of variables referenced that
     might not otherwise be detected
   * expression e with potentially "offending" elements removed, which
     will then be processed by the rest of the checking code

Then, say, 'zeallot' could implement zeallot::zeallot_assign_detect(),
and a package developer using it could put

utils::alternativeAssignment("%<-%", zeallot::zeallot_assign_detect)

in their .onLoad() function. Similarly, users of 'dotty' could set up
callbacks for all standard assignment operators to inform the code
about the nonstandard assignment.

Best Regards,Pavel

On Sun, 2023-03-12 at 14:05 +0200, Sebastian Martin Krantz wrote:

Kevins package is very nice as a proof of concept, no doubt about
that, but
it is not at the level of performance or convenience that a native R
implementation would offer. I would probably not use it to translate
matlab
routines into R packages placed on CRAN, because it’s an additional
dependency, I have a performance burden in every iteration, and
utils::globalVariables() is everything but elegant. From that
perspective
it would be more convenient for me right now to stick with
collapse::%=%,
which is already written in C, and also call
utils::globalVariables().

But again my hope in starting this was that R Core might see that the
addition of multiple assignment would be a significant enhancement to
the
language, of the same order as the base pipe |> in my opinion.

I think the discussion so far has at least brought forth a way to
implement
this in a way that does not violate fundamental principles of the
language.
Which could form a basis for thinking about an actual addition to the
language.

Best regards,

Sebastian


On Sun 12. Mar 2023 at 13:18, Duncan Murdoch
<murdoch.dun...@gmail.com>
wrote:

On 12/03/2023 6:07 a.m., Sebastian Martin Krantz wrote:

Thinking more about this, and seeing Kevins examples at
https://github.com/kevinushey/dotty
<https://github.com/kevinushey/dotty>, I think this is the most
R-like
way of doing it,
with an additional benefit as it would allow to introduce the
useful
data.table semantics DT[, .(a = b, c, d)] to more general R. So I
would
propose to
introduce a new primitive function . <- function(...)
.Primitive(".") in
R with an assignment method and the following features:


I think that proposal is very unlikely to be accepted.  If it was a
primitive function, it could only be maintained by R Core.  They
are
justifiably very reluctant to take on extra work for themselves.

Kevin's package demonstrates that this can be done entirely in a
contributed package, which means there's no need for R Core to be
involved.  I don't know if he has plans to turn his prototype into
a
CRAN package.  If he doesn't, then it will be up to some other
interested maintainer to step up and take on the task, or it will
just
fade away.

I haven't checked whether your proposals below represent changes
from
the current version of dotty, but if they do, the way to proceed is
to
fork that project, implement your changes, and offer to contribute
them
back to the main branch.

Duncan Murdoch


    * Positional assignment e.g. .[nr, nc] <- dim(x), and named
assignment
      e.g. .[new = carb] <- mtcars or .[new = log(carb)] <- mtcars.
All
      the functionality proposed by Kevin at
      https://github.com/kevinushey/dotty
      <https://github.com/kevinushey/dotty> is useful, unambiguous
and
      feasible.
    * Silent dropping of RHS values e.g. .[mpg_new, cyl_new] <-
mtcars.
    * Mixing of positional and named assignment e.g .[mpg_new,
carb_new =
      carb, cyl_new] <- mtcars. The inputs not assigned by name are
simply
      the elements of RHS in the order they occur, regardless of
whether
      they have been used previously e.g. .[mpg_new, cyl_new = cyl,
      log_cyl = log(cyl), cyl_new2] <- mtcars is feasible. RHS here
could
      be any named vector type.
    * Conventional use of the function as lazy version of of
list(), as in
      data.table: .(A = B, C, D) is the same as list(A = B, C = C,
D = D).
      This would also be useful, allowing more parsimonious code,
and
      avoid the need to assign names to all return values in a
function
      return, e.g. if I already have matrices A, C, Q and R as
internal
      objects in my function, I can simply end by return(.(A, C, Q,
R))
      instead of return(list(A = A, C = C, Q = Q, R = R)) if I
wanted the
      list to be named with the object names.

The implementation of this in R and C should be pretty
straightforward.
It would just require a modification to R CMD Check to recognize
.[<- as
assignment.

Best regards,

Sebastian
-
2.)

On Sun, 12 Mar 2023 at 09:42, Sebastian Martin Krantz
<sebastian.kra...@graduateinstitute.ch
<mailto:sebastian.kra...@graduateinstitute.ch>> wrote:

      Thanks Gabriel and Kevin for your inputs,

      regarding your points Gabriel, I think Python and Julia do
allow
      multiple sub-assignment, but in-line with my earlier
suggestion in
      response to Duncan to make multiple assignment an
environment-level
      operation (like collapse::%=% currently works),  this would
not be
      possible in R.

      Regarding the [a] <- coolest_function() syntax, yeah it would
mean
      do multiple assignment and set a equal to the first element
dropping
      all other elements. Multiple assignment should be positional
loke in
      other languages, enabling flexible renaming of objects on the
fly.
      So it should be irrelevant whether the function returns a
named or
      unnamed list or vector.

      Thanks also Kevin for this contribution. I think it’s a
remarkable
      effort, and I wouldn’t mind such semantics e.g. making it a
function
      call to ‘.[‘ or any other one-letter function, as long as
it’s coded
      in C and recognized by the interpreter as an assignment
operation.

      Best regards,

      Sebastian





      On Sun 12. Mar 2023 at 01:00, Kevin Ushey
<kevinus...@gmail.com
      <mailto:kevinus...@gmail.com>> wrote:

          FWIW, it's possible to get fairly close to your proposed

semantics

          using the existing metaprogramming facilities in R. I put

together a

          prototype package here to demonstrate:

          https://github.com/kevinushey/dotty
          <https://github.com/kevinushey/dotty>

          The package exports an object called `.`, with a special
          `[<-.dot` S3
          method which enables destructuring assignments. This
means you

can

          write code like:

               .[nr, nc] <- dim(mtcars)

          and that will define 'nr' and 'nc' as you expect.

          As for R CMD check warnings, you can suppress those
through the
          use of
          globalVariables(), and that can also be automated within
the
          package.
          The 'dotty' package includes a function 'dotify()' which

automates

          looking for such usages in your package, and calling
          globalVariables()
          so that R CMD check doesn't warn. In theory, a similar
technique
          would
          be applicable to other packages defining similar
operators

(zeallot,

          collapse).

          Obviously, globalVariables() is a very heavy hammer to
swing for
          this
          issue, but you might consider the benefits worth the
tradeoffs.

          Best,
          Kevin

          On Sat, Mar 11, 2023 at 2:53 PM Duncan Murdoch

<murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com>>

wrote:

           >
           > On 11/03/2023 4:42 p.m., Sebastian Martin Krantz
wrote:
           > > Thanks Duncan and Ivan for the careful thoughts. I'm
not
          sure I can
           > > follow all aspects you raised, but to give my
limited take
          on a few:
           > >
           > >> your proposal violates a very basic property of the
          language, i.e. that all statements are expressions and
have a
          value.  > What's the value of 1 + (A, C =
init_matrices()).
           > >
           > > I'm not sure I see the point here. I evaluated 1 +
(d =
          dim(mtcars); nr
           > > = d[1]; nc = d[2]; rm(d)), which simply gives a
syntax

error,

           >
           >
           >    d = dim(mtcars); nr = d[1]; nc = d[2]; rm(d)
           >
           > is not a statement, it is a sequence of 4 statements.
           >
           > Duncan Murdoch
           >
           >   as the
           > > above expression should. `%=%` assigns to
           > > environments, so 1 + (c("A", "C") %=%
init_matrices())

returns

           > > numeric(0), with A and C having their values
assigned.
           > >
           > >> suppose f() returns list(A = 1, B = 2) and I do  >
B, A <-
          f() > Should assignment be by position or by name?
           > >
           > > In other languages this is by position. The feature
is not
          meant to
           > > replace list2env(), and being able to rename objects
in the
          assignment
           > > is a vital feature of codes
           > > using multi input and output functions e.g. in
Matlab or

Julia.

           > >
           > >> Honestly, given that this is simply syntactic
sugar, I
          don't think I would support it.
           > >
           > > You can call it that, but it would be used by almost
every
          R user almost
           > > every day. Simple things like nr, nc = dim(x);
values,
          vectors =
           > > eigen(x) etc. where the creation of intermediate
objects
           > > is cumbersome and redundant.
           > >
           > >> I see you've already mentioned it ("JavaScript-
like"). I
          think it would  fulfil Sebastian's requirements too, as
long as
          it is considered "true assignment" by the rest of the
language.
           > >
           > > I don't have strong opinions about how the issue is
phrased

or

           > > implemented. Something like [t, n] = dim(x) might
even be
          more clear.
           > > It's important though that assignment remains by
position,
           > > so even if some output gets thrown away that should
also be
          positional.
           > >
           > >>  A <- 0  > [A, B = A + 10] <- list(1, A = 2)
           > >
           > > I also fail to see the use of allowing this.
something like
          this is an
           > > error.
           > >
           > >> A = 2
           > >> (B = A + 1) <- 1
           > > Error in (B = A + 1) <- 1 : could not find function
"(<-"
           > >
           > > Regarding the practical implementation, I think
          `collapse::%=%` is a
           > > good starting point. It could be introduced in R as
a
          separate function,
           > > or `=` could be modified to accommodate its
capability. It
          should be
           > > clear that
           > > with more than one LHS variables the assignment is
an
          environment level
           > > operation and the results can only be used in
computations
          once assigned
           > > to the environment, e.g. as in 1 + (c("A", "C") %=%
          init_matrices()),
           > > A and C are not available for the addition in this
          statement. The
           > > interpretor then needs to be modified to read
something
          like nr, nc =
           > > dim(x) or [nr, nc] = dim(x). as an environment-level
          multiple assignment
           > > operation with no
           > > immediate value. Appears very feasible to my limited
          understanding, but
           > > I guess there are other things to consider still.
          Definitely appreciate
           > > the responses so far though.
           > >
           > > Best regards,
           > >
           > > Sebastian
           > >
           > >
           > >
           > >
           > >
           > > On Sat, 11 Mar 2023 at 20:38, Duncan Murdoch

<murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com>

           > > <mailto:murdoch.dun...@gmail.com
          <mailto:murdoch.dun...@gmail.com>>> wrote:
           > >
           > >     On 11/03/2023 11:57 a.m., Ivan Krylov wrote:
           > >      > On Sat, 11 Mar 2023 11:11:06 -0500
           > >      > Duncan Murdoch <murdoch.dun...@gmail.com
          <mailto:murdoch.dun...@gmail.com>
           > >     <mailto:murdoch.dun...@gmail.com
          <mailto:murdoch.dun...@gmail.com>>> wrote:
           > >      >
           > >      >> That's clear, but your proposal violates a
very
          basic property
           > >     of the
           > >      >> language, i.e. that all statements are
expressions
          and have a value.
           > >      >
           > >      > How about reframing this feature request from
          multiple assignment
           > >      > (which does go contrary to "everything has
only one
          value, even
           > >     if it's
           > >      > sometimes invisible(NULL)") to "structured
binding"
          / "destructuring
           > >      > assignment" [*], which takes this single
single
          value returned by the
           > >      > expression and subsets it subject to certain
rules?
          It may be
           > >     easier to
           > >      > make a decision on the semantics for
destructuring
          assignment (e.g.
           > >      > languages which have this feature typically
allow
          throwing unneeded
           > >      > parts of the return value away), and it
doesn't seem
          to break as much
           > >      > of the rest of the language if implemented.
           > >      >
           > >      > I see you've already mentioned it
          ("JavaScript-like"). I think it
           > >     would
           > >      > fulfil Sebastian's requirements too, as long
as it
          is considered
           > >     "true
           > >      > assignment" by the rest of the language.
           > >      >
           > >      > The hard part is to propose the actual
grammar of
          the new feature (in
           > >      > terms of src/main/gram.y, preferably without

introducing

           > >     conflicts) and
           > >      > its semantics (including the corner cases,
some of
          which you have
           > >      > already mentioned). I'm not sure I'm up to
the task.
           > >      >
           > >
           > >     If I were doing it, here's what I'd propose:
           > >
           > >         '[' formlist ']' LEFT_ASSIGN expr
           > >         '[' formlist ']' EQ_ASSIGN expr
           > >         expr RIGHT_ASSIGN  '[' formlist ']'
           > >
           > >     where `formlist` has the syntax of the formals
list for
          a function
           > >     definition.  This would have the following
semantics:
           > >
           > >          {
           > >            *tmp* <- expr
           > >
           > >            # For arguments with no "default"
expression,
           > >
           > >            argname1 <- *tmp*[[1]]
           > >            argname2 <- *tmp*[[2]]
           > >            ...
           > >
           > >            # For arguments with a default listed
           > >
           > >            argname3 <- with(*tmp*, default3)
           > >          }
           > >
           > >
           > >     The value of the whole thing would therefore be
          (invisibly) the
           > >     value of
           > >     the last item in the assignment.
           > >
           > >     Two examples:
           > >
           > >         [A, B, C] <- expr   # assign the first three
          elements of expr to A,
           > >     B, and C
           > >
           > >         [A, B, C = a + b] <- expr  # assign the
first two
          elements of expr
           > >                                    # to A and B,
           > >                                    # assign
with(expr, a +
          b) to C.
           > >
           > >     Unfortunately, I don't think this could be done

entirely by

           > >     transforming
           > >     the expression (which is the way |> was done),
and that
          makes it a lot
           > >     harder to write and to reason about.  E.g. what
does
          this do?
           > >
           > >         A <- 0
           > >         [A, B = A + 10] <- list(1, A = 2)
           > >
           > >     According to the recipe above, I think it sets A
to 1
          and B to 12, but
           > >     maybe a user would expect B to be 10 or 11.  And
          according to that
           > >     recipe this is an error:
           > >
           > >         [A, B = A + 10] <- c(1, A = 2)
           > >
           > >     which probably isn't what a user would expect,
given
          that this is fine:
           > >
           > >         [A, B] <- c(1, 2)
           > >
           > >     Duncan Murdoch
           > >
           >
           > ______________________________________________
           > R-devel@r-project.org <mailto:R-devel@r-project.org>
mailing

list

           > https://stat.ethz.ch/mailman/listinfo/r-devel
          <https://stat.ethz.ch/mailman/listinfo/r-devel>


          [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Multiple Assignment built into the R Interpreter?

Reply via email to