On 6/19/2020 5:49 AM, Sébastien Lahaie wrote:
I ran into some strange behavior in R when trying to assign a treatment to
rows in a data frame. I'm wondering whether any R experts can explain
what's going on.
First, let's assign a treatment to 3 out of 10 rows as follows.
df <- data.frame(unit = 1:10)
df$treated <- FALSE
s <- sample(nrow(df), 3)
df[s,]$treated <- TRUE
df
unit treated
1 1 FALSE
2 2 TRUE
3 3 FALSE
4 4 FALSE
5 5 TRUE
6 6 FALSE
7 7 TRUE
8 8 FALSE
9 9 FALSE
10 10 FALSE
This is as expected. Now we'll just skip the intermediate step of saving
the sampled indices, and apply the treatment directly as follows.
df <- data.frame(unit = 1:10)
df$treated <- FALSE
df[sample(nrow(df), 3),]$treated <- TRUE
df
unit treated
1 6 TRUE
2 2 FALSE
3 3 FALSE
4 9 TRUE
5 5 FALSE
6 6 FALSE
7 7 FALSE
8 5 TRUE
9 9 FALSE
10 10 FALSE
Now the data frame still has 10 rows with 3 assigned to the treatment. But
the units are garbled. Units 1 and 4 have disappeared, for instance, and
there are duplicates for 6 and 9, one assigned to treatment and the other
to control. Why would this happen?
Thanks,
Sebastien
Sébastien,
You have received good explanations of what is going on with your code.
I think you can get what you want by making a simple modification of
your treatment assignment statement. At least it works for me.
df[sample(nrow(df),3), 'treated'] <- TRUE
Hope this is helpful,
Dan
--
Daniel Nordlund
Port Townsend, WA USA
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.