On 6/19/2020 5:49 AM, Sébastien Lahaie wrote:
I ran into some strange behavior in R when trying to assign a treatment to
rows in a data frame. I'm wondering whether any R experts can explain
what's going on.

First, let's assign a treatment to 3 out of 10 rows as follows.

df <- data.frame(unit = 1:10)
df$treated <- FALSE
s <- sample(nrow(df), 3)
df[s,]$treated <- TRUE
df
    unit treated
1     1   FALSE
2     2    TRUE
3     3   FALSE
4     4   FALSE
5     5    TRUE
6     6   FALSE
7     7    TRUE
8     8   FALSE
9     9   FALSE
10   10   FALSE

This is as expected. Now we'll just skip the intermediate step of saving
the sampled indices, and apply the treatment directly as follows.

df <- data.frame(unit = 1:10)
df$treated <- FALSE
df[sample(nrow(df), 3),]$treated <- TRUE
df
    unit treated
1     6    TRUE
2     2   FALSE
3     3   FALSE
4     9    TRUE
5     5   FALSE
6     6   FALSE
7     7   FALSE
8     5    TRUE
9     9   FALSE
10   10   FALSE

Now the data frame still has 10 rows with 3 assigned to the treatment. But
the units are garbled. Units 1 and 4 have disappeared, for instance, and
there are duplicates for 6 and 9, one assigned to treatment and the other
to control. Why would this happen?

Thanks,
Sebastien

Sébastien,

You have received good explanations of what is going on with your code.  I think you can get what you want by making a simple modification of your treatment assignment statement. At least it works for me.

df[sample(nrow(df),3), 'treated'] <- TRUE

Hope this is helpful,

Dan

--
Daniel Nordlund
Port Townsend, WA  USA

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to