[R] '[' vs subset() behaviors when filtering a dataframe with NA values

Fabio D'Agostino Sat, 19 Feb 2022 19:40:21 -0800

Hi All,
I just have two questions since I did not understand the behavior of
'['  vs the  subset() function when filtering a dataframe that has NA
values


I was filtering a dataframe named 'weight' according to values of the
column named 'weight_rec' ...
str(weight)
'data.frame': 17307 obs. of  6 variables:
 $ ICUSTAY_ID: num  229904 229904 229904 247844 247844 ...
 $ INTIME    : chr  "2127-08-11 20:43:43 UTC" "2127-08-11 20:43:43
UTC" "2127-08-11 20:43:43 UTC" "2179-09-29 18:46:50 UTC" ...
 $ ITEMID    : num  224639 224639 226512 762 762 ...
 $ VALUENUM  : num  61 59.2 59.8 86 86 86 85.5 93 128 128 ...
 $ CHARTTIME : chr  "2127-08-14 08:00:00 UTC" "2127-08-13 08:00:00
UTC" "2127-08-11 21:01:00 UTC" "2179-10-02 19:39:00 UTC" ...
 $ weight_rec: num  51.3 27.3 -20.7 53.2 -18.8 ...

... using the following script:
weight[weight$weight_rec <= 24 & weight$weight_rec >= 0, ]   #I get an
output of 1055 rows

while using:
subset(weight, weight$weight_rec <= 24 & weight$weight_rec >= 0)   #I
get an output of 1040 rows

analyzing the values in the column 'weight_rec' I found that
sum(is.na(weight$weight_rec))
[1] 15   #15 values are NA

My two questions are:
1) Why are NA values considered when using '[' ? I only filtered for a
condition of numeric values (i.e., >=0 & <=24)... and subset() did
what I expected.
2) Why are all the values of the columns of that 15 rows equal to NA
and not only the values of the column named 'weight_rec'?

Thanks in advance for clarifying this!
Fabio

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] '[' vs subset() behaviors when filtering a dataframe with NA values

Reply via email to