The behavior has been there much longer than that in R and it's been a
known issue with complex assignment for a long time (not the only
one). You're in a better position than I to know how Splus handles this.

The complex assignment expression

    df[<index>, ]$treated <- TRUE

is basically evaluated as

    tmp <-df[<index>, ]
    tmp$treated <- TRUE
    df[<index>,] <- tmp

So the <index> argument is evaluated twice. This is always a little
inefficient, but probably not what you want if there are side effects
in the index argument. So the main take-away is:

    Don't use index arguments with side effects in complex assignments.

It is in principle possible, when standard evaluation is in use, to
capture the value of <index> from the first evaluation and re-use for
the second. But, for better or worse, assignment methods can and do
use non-standard evaluation for the index arguments, and it would be
very hard for authors of such methods to avoid this. So changing to
avoid multiple index evaluation would always have to come with an
asterisk.

There are other issues with complex assignment as implemented
currently that have higher priority but are also quite tricky to
address. Possibly this one can be addressed at the same time.

Best,

luke

On Fri, 19 Jun 2020, William Dunlap via R-help wrote:

It is a bug that has been present in R since at least R-2.14.0 (the oldest
that I have installed on my laptop).

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Fri, Jun 19, 2020 at 10:37 AM Rui Barradas <ruipbarra...@sapo.pt> wrote:

Hello,


Thanks, I hadn't thought of that.

But, why? Is it evaluated once before assignment and a second time when
the assignment occurs?

To trace both sample and `[<-` gives 2 calls to sample.


trace(sample)
trace(`[<-`)
df[sample(nrow(df), 3),]$treated <- TRUE
trace: sample(nrow(df), 3)
trace: `[<-`(`*tmp*`, sample(nrow(df), 3), , value = list(unit = c(7L,
6L, 8L), treated = c(TRUE, TRUE, TRUE)))
trace: sample(nrow(df), 3)


Regards,

Rui Barradas


Às 17:20 de 19/06/2020, William Dunlap escreveu:
The first subscript argument is getting evaluated twice.
trace(sample)
set.seed(2020); df[i<-sample(10,3), ]$Treated <- TRUE
trace: sample(10, 3)
trace: sample(10, 3)
i
[1]  1 10  4
set.seed(2020); sample(10,3)
trace: sample(10, 3)
[1] 7 6 8
sample(10,3)
trace: sample(10, 3)
[1]  1 10  4

Bill Dunlap
TIBCO Software
wdunlap tibco.com <http://tibco.com>


On Fri, Jun 19, 2020 at 8:46 AM Rui Barradas <ruipbarra...@sapo.pt
<mailto:ruipbarra...@sapo.pt>> wrote:

    Hello,

    I don't have an answer on the reason why this happens but it seems
    like
    a bug. Where?

    In which of  `[<-.data.frame` or `[<-.default`?

    A solution is to subset and assign the vector:


    set.seed(2020)
    df2 <- data.frame(unit = 1:10)
    df2$treated <- FALSE

    df2$treated[sample(nrow(df2), 3)] <- TRUE
    df2
    #  unit treated
    #1     1   FALSE
    #2     2   FALSE
    #3     3   FALSE
    #4     4   FALSE
    #5     5   FALSE
    #6     6    TRUE
    #7     7    TRUE
    #8     8    TRUE
    #9     9   FALSE
    #10   10   FALSE


    Or


    set.seed(2020)
    df3 <- data.frame(unit = 1:10)
    df3$treated <- FALSE

    df3[sample(nrow(df3), 3), "treated"] <- TRUE
    df3
    # result as expected


    Hope this helps,

    Rui  Barradas



    Às 13:49 de 19/06/2020, Sébastien Lahaie escreveu:
   > I ran into some strange behavior in R when trying to assign a
    treatment to
   > rows in a data frame. I'm wondering whether any R experts can
    explain
   > what's going on.
   >
   > First, let's assign a treatment to 3 out of 10 rows as follows.
   >
   >> df <- data.frame(unit = 1:10)
   >> df$treated <- FALSE
   >> s <- sample(nrow(df), 3)
   >> df[s,]$treated <- TRUE
   >> df
   >     unit treated
   >
   > 1     1   FALSE
   >
   > 2     2    TRUE
   >
   > 3     3   FALSE
   >
   > 4     4   FALSE
   >
   > 5     5    TRUE
   >
   > 6     6   FALSE
   >
   > 7     7    TRUE
   >
   > 8     8   FALSE
   >
   > 9     9   FALSE
   >
   > 10   10   FALSE
   >
   > This is as expected. Now we'll just skip the intermediate step
    of saving
   > the sampled indices, and apply the treatment directly as follows.
   >
   >> df <- data.frame(unit = 1:10)
   >> df$treated <- FALSE
   >> df[sample(nrow(df), 3),]$treated <- TRUE
   >> df
   >     unit treated
   >
   > 1     6    TRUE
   >
   > 2     2   FALSE
   >
   > 3     3   FALSE
   >
   > 4     9    TRUE
   >
   > 5     5   FALSE
   >
   > 6     6   FALSE
   >
   > 7     7   FALSE
   >
   > 8     5    TRUE
   >
   > 9     9   FALSE
   >
   > 10   10   FALSE
   >
   > Now the data frame still has 10 rows with 3 assigned to the
    treatment. But
   > the units are garbled. Units 1 and 4 have disappeared, for
    instance, and
   > there are duplicates for 6 and 9, one assigned to treatment and
    the other
   > to control. Why would this happen?
   >
   > Thanks,
   > Sebastien
   >
   >       [[alternative HTML version deleted]]
   >
   > ______________________________________________
   > R-help@r-project.org <mailto:R-help@r-project.org> mailing list
    -- To UNSUBSCRIBE and more, see
   > https://stat.ethz.ch/mailman/listinfo/r-help
   > PLEASE do read the posting guide
    http://www.R-project.org/posting-guide.html
   > and provide commented, minimal, self-contained, reproducible code.

    --
    Este e-mail foi verificado em termos de vírus pelo software
    antivírus Avast.
    https://www.avast.com/antivirus

    ______________________________________________
    R-help@r-project.org <mailto:R-help@r-project.org> mailing list --
    To UNSUBSCRIBE and more, see
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide
    http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.


--
Este e-mail foi verificado em termos de vírus pelo software antivírus
Avast.
https://www.avast.com/antivirus



        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
   Actuarial Science
241 Schaeffer Hall                  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to