Hi All, I just have two questions since I did not understand the behavior of '[' vs the subset() function when filtering a dataframe that has NA values
I was filtering a dataframe named 'weight' according to values of the column named 'weight_rec' ... str(weight) 'data.frame': 17307 obs. of 6 variables: $ ICUSTAY_ID: num 229904 229904 229904 247844 247844 ... $ INTIME : chr "2127-08-11 20:43:43 UTC" "2127-08-11 20:43:43 UTC" "2127-08-11 20:43:43 UTC" "2179-09-29 18:46:50 UTC" ... $ ITEMID : num 224639 224639 226512 762 762 ... $ VALUENUM : num 61 59.2 59.8 86 86 86 85.5 93 128 128 ... $ CHARTTIME : chr "2127-08-14 08:00:00 UTC" "2127-08-13 08:00:00 UTC" "2127-08-11 21:01:00 UTC" "2179-10-02 19:39:00 UTC" ... $ weight_rec: num 51.3 27.3 -20.7 53.2 -18.8 ... ... using the following script: weight[weight$weight_rec <= 24 & weight$weight_rec >= 0, ] #I get an output of 1055 rows while using: subset(weight, weight$weight_rec <= 24 & weight$weight_rec >= 0) #I get an output of 1040 rows analyzing the values in the column 'weight_rec' I found that sum(is.na(weight$weight_rec)) [1] 15 #15 values are NA My two questions are: 1) Why are NA values considered when using '[' ? I only filtered for a condition of numeric values (i.e., >=0 & <=24)... and subset() did what I expected. 2) Why are all the values of the columns of that 15 rows equal to NA and not only the values of the column named 'weight_rec'? Thanks in advance for clarifying this! Fabio ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.