[R] conditional selection of dataframe rows
Dear helpeRs, I have a dataframe (14947 x 27) containing measurements collected every 5 seconds at several different sampling locations. If one measurement at a given location is less than zero on a given day, I would like to delete all measurements from that location on that day. Here is a toy example: toy - data.frame(CH = rep(3:5,3), DAY = c(rep(4,5), rep(5,4)), SLOPE = c(seq(0.2,0.6, .1),seq(0.2, -0.1, -0.1))) In this example, row 9 has a negative measurement for Chamber 5, so I would like to delete row 6, which is the same Chamber on the same day, but not row 3, which is the same chamber on a different day. In the full dataframe, there are, of course, many more days. Is there a handy R way to do this? Thank you for the assistance. Toby __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] conditional selection of dataframe rows
On Aug 12, 2010, at 3:11 PM, Toby Gass wrote: Dear helpeRs, I have a dataframe (14947 x 27) containing measurements collected every 5 seconds at several different sampling locations. If one measurement at a given location is less than zero on a given day, I would like to delete all measurements from that location on that day. Here is a toy example: toy - data.frame(CH = rep(3:5,3), DAY = c(rep(4,5), rep(5,4)), SLOPE = c(seq(0.2,0.6, .1),seq(0.2, -0.1, -0.1))) In this example, row 9 has a negative measurement for Chamber 5, so I would like to delete row 6, which is the same Chamber on the same day, but not row 3, which is the same chamber on a different day. In the full dataframe, there are, of course, many more days. Is there a handy R way to do this? toy[ - which(toy$SLOPE 0 ) , ] Thank you for the assistance. Toby __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] conditional selection of dataframe rows
Try this: subset(toy, !rowSums(mapply(is.element, toy[c('CH', 'DAY')], subset(toy, SLOPE 0, CH:DAY))) 1 | SLOPE 0) On Thu, Aug 12, 2010 at 4:11 PM, Toby Gass tobyg...@warnercnr.colostate.edu wrote: Dear helpeRs, I have a dataframe (14947 x 27) containing measurements collected every 5 seconds at several different sampling locations. If one measurement at a given location is less than zero on a given day, I would like to delete all measurements from that location on that day. Here is a toy example: toy - data.frame(CH = rep(3:5,3), DAY = c(rep(4,5), rep(5,4)), SLOPE = c(seq(0.2,0.6, .1),seq(0.2, -0.1, -0.1))) In this example, row 9 has a negative measurement for Chamber 5, so I would like to delete row 6, which is the same Chamber on the same day, but not row 3, which is the same chamber on a different day. In the full dataframe, there are, of course, many more days. Is there a handy R way to do this? Thank you for the assistance. Toby __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] conditional selection of dataframe rows
On Aug 12, 2010, at 2:11 PM, Toby Gass wrote: Dear helpeRs, I have a dataframe (14947 x 27) containing measurements collected every 5 seconds at several different sampling locations. If one measurement at a given location is less than zero on a given day, I would like to delete all measurements from that location on that day. Here is a toy example: toy - data.frame(CH = rep(3:5,3), DAY = c(rep(4,5), rep(5,4)), SLOPE = c(seq(0.2,0.6, .1),seq(0.2, -0.1, -0.1))) In this example, row 9 has a negative measurement for Chamber 5, so I would like to delete row 6, which is the same Chamber on the same day, but not row 3, which is the same chamber on a different day. In the full dataframe, there are, of course, many more days. Is there a handy R way to do this? Thank you for the assistance. Toby Not fully tested, but here is one possibility: toy CH DAY SLOPE 1 3 4 0.2 2 4 4 0.3 3 5 4 0.4 4 3 4 0.5 5 4 4 0.6 6 5 5 0.2 7 3 5 0.1 8 4 5 0.0 9 5 5 -0.1 subset(toy, ave(SLOPE, CH, DAY, FUN = function(x) any(x 0)) == 0) CH DAY SLOPE 1 3 4 0.2 2 4 4 0.3 3 5 4 0.4 4 3 4 0.5 5 4 4 0.6 7 3 5 0.1 8 4 5 0.0 See ?ave and ?subset HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] conditional selection of dataframe rows
On Aug 12, 2010, at 2:24 PM, Marc Schwartz wrote: On Aug 12, 2010, at 2:11 PM, Toby Gass wrote: Dear helpeRs, I have a dataframe (14947 x 27) containing measurements collected every 5 seconds at several different sampling locations. If one measurement at a given location is less than zero on a given day, I would like to delete all measurements from that location on that day. Here is a toy example: toy - data.frame(CH = rep(3:5,3), DAY = c(rep(4,5), rep(5,4)), SLOPE = c(seq(0.2,0.6, .1),seq(0.2, -0.1, -0.1))) In this example, row 9 has a negative measurement for Chamber 5, so I would like to delete row 6, which is the same Chamber on the same day, but not row 3, which is the same chamber on a different day. In the full dataframe, there are, of course, many more days. Is there a handy R way to do this? Thank you for the assistance. Toby Not fully tested, but here is one possibility: toy CH DAY SLOPE 1 3 4 0.2 2 4 4 0.3 3 5 4 0.4 4 3 4 0.5 5 4 4 0.6 6 5 5 0.2 7 3 5 0.1 8 4 5 0.0 9 5 5 -0.1 subset(toy, ave(SLOPE, CH, DAY, FUN = function(x) any(x 0)) == 0) CH DAY SLOPE 1 3 4 0.2 2 4 4 0.3 3 5 4 0.4 4 3 4 0.5 5 4 4 0.6 7 3 5 0.1 8 4 5 0.0 This can actually be slightly shortened to: subset(toy, !ave(SLOPE, CH, DAY, FUN = function(x) any(x 0))) CH DAY SLOPE 1 3 4 0.2 2 4 4 0.3 3 5 4 0.4 4 3 4 0.5 5 4 4 0.6 7 3 5 0.1 8 4 5 0.0 HTH, Marc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] conditional selection of dataframe rows
Thank you all for the quick responses. So far as I've checked, Marc's solution works perfectly and is quite speedy. I'm still trying to figure out what it is doing. :) Henrique's solution seems to need some columns somewhere. David's solution does not find all the other measurements, possibly with positive values, taken on the same day. Thank you again for your efforts. Toby On 12 Aug 2010 at 14:32, Marc Schwartz wrote: On Aug 12, 2010, at 2:24 PM, Marc Schwartz wrote: On Aug 12, 2010, at 2:11 PM, Toby Gass wrote: Dear helpeRs, I have a dataframe (14947 x 27) containing measurements collected every 5 seconds at several different sampling locations. If one measurement at a given location is less than zero on a given day, I would like to delete all measurements from that location on that day. Here is a toy example: toy - data.frame(CH = rep(3:5,3), DAY = c(rep(4,5), rep(5,4)), SLOPE = c(seq(0.2,0.6, .1),seq(0.2, -0.1, -0.1))) In this example, row 9 has a negative measurement for Chamber 5, so I would like to delete row 6, which is the same Chamber on the same day, but not row 3, which is the same chamber on a different day. In the full dataframe, there are, of course, many more days. Is there a handy R way to do this? Thank you for the assistance. Toby Not fully tested, but here is one possibility: toy CH DAY SLOPE 1 3 4 0.2 2 4 4 0.3 3 5 4 0.4 4 3 4 0.5 5 4 4 0.6 6 5 5 0.2 7 3 5 0.1 8 4 5 0.0 9 5 5 -0.1 subset(toy, ave(SLOPE, CH, DAY, FUN = function(x) any(x 0)) == 0) CH DAY SLOPE 1 3 4 0.2 2 4 4 0.3 3 5 4 0.4 4 3 4 0.5 5 4 4 0.6 7 3 5 0.1 8 4 5 0.0 This can actually be slightly shortened to: subset(toy, !ave(SLOPE, CH, DAY, FUN = function(x) any(x 0))) CH DAY SLOPE 1 3 4 0.2 2 4 4 0.3 3 5 4 0.4 4 3 4 0.5 5 4 4 0.6 7 3 5 0.1 8 4 5 0.0 HTH, Marc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] conditional selection of dataframe rows
On Aug 12, 2010, at 4:06 PM, Toby Gass wrote: Thank you all for the quick responses. So far as I've checked, Marc's solution works perfectly and is quite speedy. I'm still trying to figure out what it is doing. :) Henrique's solution seems to need some columns somewhere. David's solution does not find all the other measurements, possibly with positive values, taken on the same day. I assumed you only wanted to look at what appeared to be a data column, SLOPE. If you want to look at all columns for negatives then try: toy[ which( apply(toy, 1, function(x) all(x = 0)) ), ] # or toy[ apply(toy, 1, function(x) all(x = 0)) , ] This is how they differ w,r,t, their handling of NA's. toy[3,2] - NA toy[ apply(toy, 1, function(x) all(x = 0)) , ] CH DAY SLOPE 1 3 4 0.2 2 4 4 0.3 NA NA NANA 4 3 4 0.5 5 4 4 0.6 6 5 5 0.2 7 3 5 0.1 8 4 5 0.0 toy[ which(apply(toy, 1, function(x) all(x = 0)) ), ] CH DAY SLOPE 1 3 4 0.2 2 4 4 0.3 4 3 4 0.5 5 4 4 0.6 6 5 5 0.2 7 3 5 0.1 8 4 5 0.0 Thank you again for your efforts. Toby On 12 Aug 2010 at 14:32, Marc Schwartz wrote: On Aug 12, 2010, at 2:24 PM, Marc Schwartz wrote: On Aug 12, 2010, at 2:11 PM, Toby Gass wrote: Dear helpeRs, I have a dataframe (14947 x 27) containing measurements collected every 5 seconds at several different sampling locations. If one measurement at a given location is less than zero on a given day, I would like to delete all measurements from that location on that day. Here is a toy example: toy - data.frame(CH = rep(3:5,3), DAY = c(rep(4,5), rep(5,4)), SLOPE = c(seq(0.2,0.6, .1),seq(0.2, -0.1, -0.1))) In this example, row 9 has a negative measurement for Chamber 5, so I would like to delete row 6, which is the same Chamber on the same day, but not row 3, which is the same chamber on a different day. In the full dataframe, there are, of course, many more days. Is there a handy R way to do this? Thank you for the assistance. Toby Not fully tested, but here is one possibility: toy CH DAY SLOPE 1 3 4 0.2 2 4 4 0.3 3 5 4 0.4 4 3 4 0.5 5 4 4 0.6 6 5 5 0.2 7 3 5 0.1 8 4 5 0.0 9 5 5 -0.1 subset(toy, ave(SLOPE, CH, DAY, FUN = function(x) any(x 0)) == 0) CH DAY SLOPE 1 3 4 0.2 2 4 4 0.3 3 5 4 0.4 4 3 4 0.5 5 4 4 0.6 7 3 5 0.1 8 4 5 0.0 This can actually be slightly shortened to: subset(toy, !ave(SLOPE, CH, DAY, FUN = function(x) any(x 0))) CH DAY SLOPE 1 3 4 0.2 2 4 4 0.3 3 5 4 0.4 4 3 4 0.5 5 4 4 0.6 7 3 5 0.1 8 4 5 0.0 HTH, Marc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] conditional selection of dataframe rows
On Aug 12, 2010, at 3:06 PM, Toby Gass wrote: Thank you all for the quick responses. So far as I've checked, Marc's solution works perfectly and is quite speedy. I'm still trying to figure out what it is doing. :) Henrique's solution seems to need some columns somewhere. David's solution does not find all the other measurements, possibly with positive values, taken on the same day. Thank you again for your efforts. Toby snip Toby, Working from the inside out: The ave() function splits (sub-groups) the data frame by one or more factors, internally using split() and then passing the desired column from each sub-group to the function defined by using lapply(). By default, that is mean(). The great thing about using ave(), is that it will replicate the scalar sub-group based result of the function, once for each row in the sub-group. In addition, the result vector will be sorted in the order of the rows in the original data frame, rather than in the order of the sub-group rows. So in this case, if any of the rows in the sub-group has a SLOPE with negative value, all rows in the sub-group get a TRUE. You can get an initial feel for the internal data organizing process by using: split(toy, list(toy$CH, toy$DAY)) $`3.4` CH DAY SLOPE 1 3 4 0.2 4 3 4 0.5 $`4.4` CH DAY SLOPE 2 4 4 0.3 5 4 4 0.6 $`5.4` CH DAY SLOPE 3 5 4 0.4 $`3.5` CH DAY SLOPE 7 3 5 0.1 $`4.5` CH DAY SLOPE 8 4 5 0 $`5.5` CH DAY SLOPE 6 5 5 0.2 9 5 5 -0.1 So the first step is: with(toy, ave(SLOPE, CH, DAY, FUN = function(x) any(x 0))) [1] 0 0 0 0 0 1 0 0 1 Note that I use with() to define that SLOPE, CH and DAY are all to be evaluated (found) within the 'toy' data frame. That is easier than using: ave(toy$SLOPE, toy$CH, toy$DAY, FUN = function(x) any(x 0)) [1] 0 0 0 0 0 1 0 0 1 This returns a vector of 0's and 1's (FALSE and TRUE coerced to a numeric). Note that the returned vector does not correspond to the sequence of rows in the result of split() above, but to the sequence of rows in the original 'toy' data frame. That is, rows 6 and 9 are 1 (TRUE): cbind(toy, flag = with(toy, ave(SLOPE, CH, DAY, FUN = function(x) any(x 0 CH DAY SLOPE flag 1 3 4 0.20 2 4 4 0.30 3 5 4 0.40 4 3 4 0.50 5 4 4 0.60 6 5 5 0.21 7 3 5 0.10 8 4 5 0.00 9 5 5 -0.11 The next step is to remove those rows. You could do that by using regular indexing, but by using subset(), I can replicate the behavior of having used with() above, since the arguments in subset() are evaluated within the data frame defined. Thus, I can eliminate the use of with() and have a shorter solution. Then, by negating the result of ave() so that 0 (FALSE) becomes TRUE, retain only those rows where the ave() result was 0: subset(toy, !ave(SLOPE, CH, DAY, FUN = function(x) any(x 0))) CH DAY SLOPE 1 3 4 0.2 2 4 4 0.3 3 5 4 0.4 4 3 4 0.5 5 4 4 0.6 7 3 5 0.1 8 4 5 0.0 I hope that clarifies the process. Marc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] conditional selection of dataframe rows
Hi, I do want to look only at slope. If there is one negative slope measurement for a given day and a given chamber, I would like to remove all other slope measurements for that day and that chamber, even if they are positive. On one day, I will have 20 slope measurements for each chamber. If one is negative, I would like to delete the other 19 for that chamber on that day, even if they are positive. I have measurements for every day of the year, for 4 years and multiple chambers. I know I could make some awful nested loop with a vector of day and chamber numbers for each occurrence of a negative slope and then run that against the whole data set but I hope not to have to do that. Here is the rationale, if that helps. These are unattended outdoor chambers that measure soil carbon efflux. When the numbers go negative during part of the day but otherwise look normal, it usually means a plant has sprouted in the chamber and is using the carbon dioxide. That means the measurements are all lower than they should be and I need to discard all measurements collected on that day, whether positive or negative. It might have been a little clearer if I'd make the toy dataframe a bit larger. Thanks again for the assistance. Toby On 12 Aug 2010 at 16:39, David Winsemius wrote: On Aug 12, 2010, at 4:06 PM, Toby Gass wrote: Thank you all for the quick responses. So far as I've checked, Marc's solution works perfectly and is quite speedy. I'm still trying to figure out what it is doing. :) Henrique's solution seems to need some columns somewhere. David's solution does not find all the other measurements, possibly with positive values, taken on the same day. I assumed you only wanted to look at what appeared to be a data column, SLOPE. If you want to look at all columns for negatives then try: toy[ which( apply(toy, 1, function(x) all(x = 0)) ), ] # or toy[ apply(toy, 1, function(x) all(x = 0)) , ] This is how they differ w,r,t, their handling of NA's. toy[3,2] - NA toy[ apply(toy, 1, function(x) all(x = 0)) , ] CH DAY SLOPE 1 3 4 0.2 2 4 4 0.3 NA NA NANA 4 3 4 0.5 5 4 4 0.6 6 5 5 0.2 7 3 5 0.1 8 4 5 0.0 toy[ which(apply(toy, 1, function(x) all(x = 0)) ), ] CH DAY SLOPE 1 3 4 0.2 2 4 4 0.3 4 3 4 0.5 5 4 4 0.6 6 5 5 0.2 7 3 5 0.1 8 4 5 0.0 Thank you again for your efforts. Toby On 12 Aug 2010 at 14:32, Marc Schwartz wrote: On Aug 12, 2010, at 2:24 PM, Marc Schwartz wrote: On Aug 12, 2010, at 2:11 PM, Toby Gass wrote: Dear helpeRs, I have a dataframe (14947 x 27) containing measurements collected every 5 seconds at several different sampling locations. If one measurement at a given location is less than zero on a given day, I would like to delete all measurements from that location on that day. Here is a toy example: toy - data.frame(CH = rep(3:5,3), DAY = c(rep(4,5), rep(5,4)), SLOPE = c(seq(0.2,0.6, .1),seq(0.2, -0.1, -0.1))) In this example, row 9 has a negative measurement for Chamber 5, so I would like to delete row 6, which is the same Chamber on the same day, but not row 3, which is the same chamber on a different day. In the full dataframe, there are, of course, many more days. Is there a handy R way to do this? Thank you for the assistance. Toby Not fully tested, but here is one possibility: toy CH DAY SLOPE 1 3 4 0.2 2 4 4 0.3 3 5 4 0.4 4 3 4 0.5 5 4 4 0.6 6 5 5 0.2 7 3 5 0.1 8 4 5 0.0 9 5 5 -0.1 subset(toy, ave(SLOPE, CH, DAY, FUN = function(x) any(x 0)) == 0) CH DAY SLOPE 1 3 4 0.2 2 4 4 0.3 3 5 4 0.4 4 3 4 0.5 5 4 4 0.6 7 3 5 0.1 8 4 5 0.0 This can actually be slightly shortened to: subset(toy, !ave(SLOPE, CH, DAY, FUN = function(x) any(x 0))) CH DAY SLOPE 1 3 4 0.2 2 4 4 0.3 3 5 4 0.4 4 3 4 0.5 5 4 4 0.6 7 3 5 0.1 8 4 5 0.0 HTH, Marc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] conditional selection of dataframe rows
On Aug 12, 2010, at 5:20 PM, Toby Gass wrote: Hi, I do want to look only at slope. If there is one negative slope measurement for a given day and a given chamber, I would like to remove all other slope measurements for that day and that chamber, even if they are positive. On one day, I will have 20 slope measurements for each chamber. If one is negative, I would like to delete the other 19 for that chamber on that day, even if they are positive. I have measurements for every day of the year, for 4 years and multiple chambers. I know I could make some awful nested loop with a vector of day and chamber numbers for each occurrence of a negative slope and then run that against the whole data set but I hope not to have to do that. Here is the rationale, if that helps. These are unattended outdoor chambers that measure soil carbon efflux. When the numbers go negative during part of the day but otherwise look normal, it usually means a plant has sprouted in the chamber and is using the carbon dioxide. That means the measurements are all lower than they should be and I need to discard all measurements collected on that day, whether positive or negative. It might have been a little clearer if I'd make the toy dataframe a bit larger. I think the fault was all mine. Failure to read for meaning. Here's an alternate strategy, although I think Schwartz's might be cleaner: toy$ch.day.cat - with(toy, paste(CH, DAY, sep=.)) negs.idxs - tapply(toy$SLOPE , toy$ch.day.cat, function (x) any(x 0) ) negs.idxs 3.4 3.5 4.4 4.5 5.4 5.5 FALSE FALSE FALSE FALSE FALSE TRUE toy[-which(negs.idxs), ] CH DAY SLOPE ch.day.cat 1 3 4 0.23.4 2 4 4 0.34.4 3 5 4 0.45.4 4 3 4 0.53.4 5 4 4 0.64.4 7 3 5 0.13.5 8 4 5 0.04.5 9 5 5 -0.15.5 -- David Thanks again for the assistance. Toby On 12 Aug 2010 at 16:39, David Winsemius wrote: On Aug 12, 2010, at 4:06 PM, Toby Gass wrote: Thank you all for the quick responses. So far as I've checked, Marc's solution works perfectly and is quite speedy. I'm still trying to figure out what it is doing. :) Henrique's solution seems to need some columns somewhere. David's solution does not find all the other measurements, possibly with positive values, taken on the same day. I assumed you only wanted to look at what appeared to be a data column, SLOPE. If you want to look at all columns for negatives then try: toy[ which( apply(toy, 1, function(x) all(x = 0)) ), ] # or toy[ apply(toy, 1, function(x) all(x = 0)) , ] This is how they differ w,r,t, their handling of NA's. toy[3,2] - NA toy[ apply(toy, 1, function(x) all(x = 0)) , ] CH DAY SLOPE 1 3 4 0.2 2 4 4 0.3 NA NA NANA 4 3 4 0.5 5 4 4 0.6 6 5 5 0.2 7 3 5 0.1 8 4 5 0.0 toy[ which(apply(toy, 1, function(x) all(x = 0)) ), ] CH DAY SLOPE 1 3 4 0.2 2 4 4 0.3 4 3 4 0.5 5 4 4 0.6 6 5 5 0.2 7 3 5 0.1 8 4 5 0.0 Thank you again for your efforts. Toby On 12 Aug 2010 at 14:32, Marc Schwartz wrote: On Aug 12, 2010, at 2:24 PM, Marc Schwartz wrote: On Aug 12, 2010, at 2:11 PM, Toby Gass wrote: Dear helpeRs, I have a dataframe (14947 x 27) containing measurements collected every 5 seconds at several different sampling locations. If one measurement at a given location is less than zero on a given day, I would like to delete all measurements from that location on that day. Here is a toy example: toy - data.frame(CH = rep(3:5,3), DAY = c(rep(4,5), rep(5,4)), SLOPE = c(seq(0.2,0.6, .1),seq(0.2, -0.1, -0.1))) In this example, row 9 has a negative measurement for Chamber 5, so I would like to delete row 6, which is the same Chamber on the same day, but not row 3, which is the same chamber on a different day. In the full dataframe, there are, of course, many more days. Is there a handy R way to do this? Thank you for the assistance. Toby Not fully tested, but here is one possibility: toy CH DAY SLOPE 1 3 4 0.2 2 4 4 0.3 3 5 4 0.4 4 3 4 0.5 5 4 4 0.6 6 5 5 0.2 7 3 5 0.1 8 4 5 0.0 9 5 5 -0.1 subset(toy, ave(SLOPE, CH, DAY, FUN = function(x) any(x 0)) == 0) CH DAY SLOPE 1 3 4 0.2 2 4 4 0.3 3 5 4 0.4 4 3 4 0.5 5 4 4 0.6 7 3 5 0.1 8 4 5 0.0 This can actually be slightly shortened to: subset(toy, !ave(SLOPE, CH, DAY, FUN = function(x) any(x 0))) CH DAY SLOPE 1 3 4 0.2 2 4 4 0.3 3 5 4 0.4 4 3 4 0.5 5 4 4 0.6 7 3 5 0.1 8 4 5 0.0 HTH, Marc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT David Winsemius, MD West
Re: [R] conditional selection of dataframe rows
On Aug 12, 2010, at 6:15 PM, David Winsemius wrote: On Aug 12, 2010, at 5:20 PM, Toby Gass wrote: Hi, I do want to look only at slope. If there is one negative slope measurement for a given day and a given chamber, I would like to remove all other slope measurements for that day and that chamber, even if they are positive. On one day, I will have 20 slope measurements for each chamber. If one is negative, I would like to delete the other 19 for that chamber on that day, even if they are positive. I have measurements for every day of the year, for 4 years and multiple chambers. I know I could make some awful nested loop with a vector of day and chamber numbers for each occurrence of a negative slope and then run that against the whole data set but I hope not to have to do that. Here is the rationale, if that helps. These are unattended outdoor chambers that measure soil carbon efflux. When the numbers go negative during part of the day but otherwise look normal, it usually means a plant has sprouted in the chamber and is using the carbon dioxide. That means the measurements are all lower than they should be and I need to discard all measurements collected on that day, whether positive or negative. It might have been a little clearer if I'd make the toy dataframe a bit larger. I think the fault was all mine. Failure to read for meaning. Here's an alternate strategy, although I think Schwartz's might be cleaner: toy$ch.day.cat - with(toy, paste(CH, DAY, sep=.)) negs.idxs - tapply(toy$SLOPE , toy$ch.day.cat, function (x) any(x 0) ) negs.idxs 3.4 3.5 4.4 4.5 5.4 5.5 FALSE FALSE FALSE FALSE FALSE TRUE toy[-which(negs.idxs), ] CH DAY SLOPE ch.day.cat 1 3 4 0.23.4 2 4 4 0.34.4 3 5 4 0.45.4 4 3 4 0.53.4 5 4 4 0.64.4 7 3 5 0.13.5 8 4 5 0.04.5 9 5 5 -0.15.5 I think I should give up today. I saw that the above code eliminates #6 and only after posting saw that #9 was left in: require(rms) # for %nin% .. or use the %w/o% operator defined on match help page: toy[toy$ch.day.cat %nin% names(negs.idxs[negs.idxs]), ] CH DAY SLOPE ch.day.cat 1 3 4 0.23.4 2 4 4 0.34.4 3 5 4 0.45.4 4 3 4 0.53.4 5 4 4 0.64.4 7 3 5 0.13.5 8 4 5 0.04.5 Now I am really sure that the ave( , , any) strategy is superior. -- David Thanks again for the assistance. Toby On 12 Aug 2010 at 16:39, David Winsemius wrote: On Aug 12, 2010, at 4:06 PM, Toby Gass wrote: Thank you all for the quick responses. So far as I've checked, Marc's solution works perfectly and is quite speedy. I'm still trying to figure out what it is doing. :) Henrique's solution seems to need some columns somewhere. David's solution does not find all the other measurements, possibly with positive values, taken on the same day. I assumed you only wanted to look at what appeared to be a data column, SLOPE. If you want to look at all columns for negatives then try: toy[ which( apply(toy, 1, function(x) all(x = 0)) ), ] # or toy[ apply(toy, 1, function(x) all(x = 0)) , ] This is how they differ w,r,t, their handling of NA's. toy[3,2] - NA toy[ apply(toy, 1, function(x) all(x = 0)) , ] CH DAY SLOPE 1 3 4 0.2 2 4 4 0.3 NA NA NANA 4 3 4 0.5 5 4 4 0.6 6 5 5 0.2 7 3 5 0.1 8 4 5 0.0 toy[ which(apply(toy, 1, function(x) all(x = 0)) ), ] CH DAY SLOPE 1 3 4 0.2 2 4 4 0.3 4 3 4 0.5 5 4 4 0.6 6 5 5 0.2 7 3 5 0.1 8 4 5 0.0 Thank you again for your efforts. Toby On 12 Aug 2010 at 14:32, Marc Schwartz wrote: On Aug 12, 2010, at 2:24 PM, Marc Schwartz wrote: On Aug 12, 2010, at 2:11 PM, Toby Gass wrote: Dear helpeRs, I have a dataframe (14947 x 27) containing measurements collected every 5 seconds at several different sampling locations. If one measurement at a given location is less than zero on a given day, I would like to delete all measurements from that location on that day. Here is a toy example: toy - data.frame(CH = rep(3:5,3), DAY = c(rep(4,5), rep(5,4)), SLOPE = c(seq(0.2,0.6, .1),seq(0.2, -0.1, -0.1))) In this example, row 9 has a negative measurement for Chamber 5, so I would like to delete row 6, which is the same Chamber on the same day, but not row 3, which is the same chamber on a different day. In the full dataframe, there are, of course, many more days. Is there a handy R way to do this? Thank you for the assistance. Toby Not fully tested, but here is one possibility: toy CH DAY SLOPE 1 3 4 0.2 2 4 4 0.3 3 5 4 0.4 4 3 4 0.5 5 4 4 0.6 6 5 5 0.2 7 3 5 0.1 8 4 5 0.0 9 5 5 -0.1 subset(toy, ave(SLOPE, CH, DAY, FUN = function(x) any(x 0)) == 0) CH DAY SLOPE 1 3 4 0.2 2 4 4 0.3 3 5 4 0.4 4 3 4 0.5 5 4 4 0.6 7 3 5