[R] conditional selection of dataframe rows

2010-08-12 Thread Toby Gass
Dear helpeRs,

I have a dataframe (14947 x 27) containing measurements collected 
every 5 seconds at several different sampling locations.  If one 
measurement at a given location is less than zero on a given day, I 
would like to delete all measurements from that location on that day.

Here is a toy example:

toy - data.frame(CH = rep(3:5,3), DAY = c(rep(4,5), rep(5,4)), 
SLOPE = c(seq(0.2,0.6, .1),seq(0.2, -0.1, -0.1)))

In this example, row 9 has a negative measurement for Chamber 5, so I 
would like to delete row 6, which is the same Chamber on the same 
day, but not row 3, which is the same chamber on a different day.  In 
the full dataframe, there are, of course, many more days.

Is there a handy R way to do this?

Thank you for the assistance.

Toby

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] conditional selection of dataframe rows

2010-08-12 Thread David Winsemius


On Aug 12, 2010, at 3:11 PM, Toby Gass wrote:


Dear helpeRs,

I have a dataframe (14947 x 27) containing measurements collected
every 5 seconds at several different sampling locations.  If one
measurement at a given location is less than zero on a given day, I
would like to delete all measurements from that location on that day.

Here is a toy example:

toy - data.frame(CH = rep(3:5,3), DAY = c(rep(4,5), rep(5,4)),
SLOPE = c(seq(0.2,0.6, .1),seq(0.2, -0.1, -0.1)))

In this example, row 9 has a negative measurement for Chamber 5, so I
would like to delete row 6, which is the same Chamber on the same
day, but not row 3, which is the same chamber on a different day.  In
the full dataframe, there are, of course, many more days.

Is there a handy R way to do this?


toy[ - which(toy$SLOPE 0 ) , ]



Thank you for the assistance.

Toby

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] conditional selection of dataframe rows

2010-08-12 Thread Henrique Dallazuanna
Try this:

subset(toy, !rowSums(mapply(is.element, toy[c('CH', 'DAY')], subset(toy,
SLOPE  0, CH:DAY)))  1 | SLOPE  0)


On Thu, Aug 12, 2010 at 4:11 PM, Toby Gass tobyg...@warnercnr.colostate.edu
 wrote:

 Dear helpeRs,

 I have a dataframe (14947 x 27) containing measurements collected
 every 5 seconds at several different sampling locations.  If one
 measurement at a given location is less than zero on a given day, I
 would like to delete all measurements from that location on that day.

 Here is a toy example:

 toy - data.frame(CH = rep(3:5,3), DAY = c(rep(4,5), rep(5,4)),
 SLOPE = c(seq(0.2,0.6, .1),seq(0.2, -0.1, -0.1)))

 In this example, row 9 has a negative measurement for Chamber 5, so I
 would like to delete row 6, which is the same Chamber on the same
 day, but not row 3, which is the same chamber on a different day.  In
 the full dataframe, there are, of course, many more days.

 Is there a handy R way to do this?

 Thank you for the assistance.

 Toby

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] conditional selection of dataframe rows

2010-08-12 Thread Marc Schwartz
On Aug 12, 2010, at 2:11 PM, Toby Gass wrote:

 Dear helpeRs,
 
 I have a dataframe (14947 x 27) containing measurements collected 
 every 5 seconds at several different sampling locations.  If one 
 measurement at a given location is less than zero on a given day, I 
 would like to delete all measurements from that location on that day.
 
 Here is a toy example:
 
 toy - data.frame(CH = rep(3:5,3), DAY = c(rep(4,5), rep(5,4)), 
 SLOPE = c(seq(0.2,0.6, .1),seq(0.2, -0.1, -0.1)))
 
 In this example, row 9 has a negative measurement for Chamber 5, so I 
 would like to delete row 6, which is the same Chamber on the same 
 day, but not row 3, which is the same chamber on a different day.  In 
 the full dataframe, there are, of course, many more days.
 
 Is there a handy R way to do this?
 
 Thank you for the assistance.
 
 Toby



Not fully tested, but here is one possibility:

 toy
  CH DAY SLOPE
1  3   4   0.2
2  4   4   0.3
3  5   4   0.4
4  3   4   0.5
5  4   4   0.6
6  5   5   0.2
7  3   5   0.1
8  4   5   0.0
9  5   5  -0.1


 subset(toy, ave(SLOPE, CH, DAY, FUN = function(x) any(x  0)) == 0)
  CH DAY SLOPE
1  3   4   0.2
2  4   4   0.3
3  5   4   0.4
4  3   4   0.5
5  4   4   0.6
7  3   5   0.1
8  4   5   0.0


See ?ave and ?subset


HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] conditional selection of dataframe rows

2010-08-12 Thread Marc Schwartz
On Aug 12, 2010, at 2:24 PM, Marc Schwartz wrote:

 On Aug 12, 2010, at 2:11 PM, Toby Gass wrote:
 
 Dear helpeRs,
 
 I have a dataframe (14947 x 27) containing measurements collected 
 every 5 seconds at several different sampling locations.  If one 
 measurement at a given location is less than zero on a given day, I 
 would like to delete all measurements from that location on that day.
 
 Here is a toy example:
 
 toy - data.frame(CH = rep(3:5,3), DAY = c(rep(4,5), rep(5,4)), 
 SLOPE = c(seq(0.2,0.6, .1),seq(0.2, -0.1, -0.1)))
 
 In this example, row 9 has a negative measurement for Chamber 5, so I 
 would like to delete row 6, which is the same Chamber on the same 
 day, but not row 3, which is the same chamber on a different day.  In 
 the full dataframe, there are, of course, many more days.
 
 Is there a handy R way to do this?
 
 Thank you for the assistance.
 
 Toby
 
 
 
 Not fully tested, but here is one possibility:
 
 toy
  CH DAY SLOPE
 1  3   4   0.2
 2  4   4   0.3
 3  5   4   0.4
 4  3   4   0.5
 5  4   4   0.6
 6  5   5   0.2
 7  3   5   0.1
 8  4   5   0.0
 9  5   5  -0.1
 
 
 subset(toy, ave(SLOPE, CH, DAY, FUN = function(x) any(x  0)) == 0)
  CH DAY SLOPE
 1  3   4   0.2
 2  4   4   0.3
 3  5   4   0.4
 4  3   4   0.5
 5  4   4   0.6
 7  3   5   0.1
 8  4   5   0.0


This can actually be slightly shortened to:

 subset(toy, !ave(SLOPE, CH, DAY, FUN = function(x) any(x  0)))
  CH DAY SLOPE
1  3   4   0.2
2  4   4   0.3
3  5   4   0.4
4  3   4   0.5
5  4   4   0.6
7  3   5   0.1
8  4   5   0.0


HTH,

Marc

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] conditional selection of dataframe rows

2010-08-12 Thread Toby Gass
Thank you all for the quick responses.  So far as I've checked, 
Marc's solution works perfectly and is quite speedy.  I'm still 
trying to figure out what it is doing. :)

Henrique's solution seems to need some columns somewhere.  David's 
solution does not find all the other measurements, possibly with 
positive values, taken on the same day.

Thank you again for your efforts.

Toby

On 12 Aug 2010 at 14:32, Marc Schwartz wrote:

 On Aug 12, 2010, at 2:24 PM, Marc Schwartz wrote:
 
  On Aug 12, 2010, at 2:11 PM, Toby Gass wrote:
  
  Dear helpeRs,
  
  I have a dataframe (14947 x 27) containing measurements collected 
  every 5 seconds at several different sampling locations.  If one 
  measurement at a given location is less than zero on a given day, I 
  would like to delete all measurements from that location on that day.
  
  Here is a toy example:
  
  toy - data.frame(CH = rep(3:5,3), DAY = c(rep(4,5), rep(5,4)), 
  SLOPE = c(seq(0.2,0.6, .1),seq(0.2, -0.1, -0.1)))
  
  In this example, row 9 has a negative measurement for Chamber 5, so I 
  would like to delete row 6, which is the same Chamber on the same 
  day, but not row 3, which is the same chamber on a different day.  In 
  the full dataframe, there are, of course, many more days.
  
  Is there a handy R way to do this?
  
  Thank you for the assistance.
  
  Toby
  
  
  
  Not fully tested, but here is one possibility:
  
  toy
   CH DAY SLOPE
  1  3   4   0.2
  2  4   4   0.3
  3  5   4   0.4
  4  3   4   0.5
  5  4   4   0.6
  6  5   5   0.2
  7  3   5   0.1
  8  4   5   0.0
  9  5   5  -0.1
  
  
  subset(toy, ave(SLOPE, CH, DAY, FUN = function(x) any(x  0)) == 0)
   CH DAY SLOPE
  1  3   4   0.2
  2  4   4   0.3
  3  5   4   0.4
  4  3   4   0.5
  5  4   4   0.6
  7  3   5   0.1
  8  4   5   0.0
 
 
 This can actually be slightly shortened to:
 
  subset(toy, !ave(SLOPE, CH, DAY, FUN = function(x) any(x  0)))
   CH DAY SLOPE
 1  3   4   0.2
 2  4   4   0.3
 3  5   4   0.4
 4  3   4   0.5
 5  4   4   0.6
 7  3   5   0.1
 8  4   5   0.0
 
 
 HTH,
 
 Marc


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] conditional selection of dataframe rows

2010-08-12 Thread David Winsemius


On Aug 12, 2010, at 4:06 PM, Toby Gass wrote:


Thank you all for the quick responses.  So far as I've checked,
Marc's solution works perfectly and is quite speedy.  I'm still
trying to figure out what it is doing. :)

Henrique's solution seems to need some columns somewhere.  David's
solution does not find all the other measurements, possibly with
positive values, taken on the same day.


I assumed you only wanted to look at what appeared to be a data  
column, SLOPE. If you want to look at all columns for negatives then  
try:


toy[ which( apply(toy, 1, function(x) all(x = 0)) ), ]  # or
toy[ apply(toy, 1, function(x) all(x = 0)) , ]

This is how they differ w,r,t, their handling of NA's.

 toy[3,2] - NA
 toy[ apply(toy, 1, function(x) all(x = 0)) , ]
   CH DAY SLOPE
1   3   4   0.2
2   4   4   0.3
NA NA  NANA
4   3   4   0.5
5   4   4   0.6
6   5   5   0.2
7   3   5   0.1
8   4   5   0.0
 toy[ which(apply(toy, 1, function(x) all(x = 0)) ), ]
  CH DAY SLOPE
1  3   4   0.2
2  4   4   0.3
4  3   4   0.5
5  4   4   0.6
6  5   5   0.2
7  3   5   0.1
8  4   5   0.0




Thank you again for your efforts.

Toby

On 12 Aug 2010 at 14:32, Marc Schwartz wrote:


On Aug 12, 2010, at 2:24 PM, Marc Schwartz wrote:


On Aug 12, 2010, at 2:11 PM, Toby Gass wrote:


Dear helpeRs,

I have a dataframe (14947 x 27) containing measurements collected
every 5 seconds at several different sampling locations.  If one
measurement at a given location is less than zero on a given day, I
would like to delete all measurements from that location on that  
day.


Here is a toy example:

toy - data.frame(CH = rep(3:5,3), DAY = c(rep(4,5), rep(5,4)),
SLOPE = c(seq(0.2,0.6, .1),seq(0.2, -0.1, -0.1)))

In this example, row 9 has a negative measurement for Chamber 5,  
so I

would like to delete row 6, which is the same Chamber on the same
day, but not row 3, which is the same chamber on a different  
day.  In

the full dataframe, there are, of course, many more days.

Is there a handy R way to do this?

Thank you for the assistance.

Toby




Not fully tested, but here is one possibility:


toy

CH DAY SLOPE
1  3   4   0.2
2  4   4   0.3
3  5   4   0.4
4  3   4   0.5
5  4   4   0.6
6  5   5   0.2
7  3   5   0.1
8  4   5   0.0
9  5   5  -0.1



subset(toy, ave(SLOPE, CH, DAY, FUN = function(x) any(x  0)) == 0)

CH DAY SLOPE
1  3   4   0.2
2  4   4   0.3
3  5   4   0.4
4  3   4   0.5
5  4   4   0.6
7  3   5   0.1
8  4   5   0.0



This can actually be slightly shortened to:


subset(toy, !ave(SLOPE, CH, DAY, FUN = function(x) any(x  0)))

 CH DAY SLOPE
1  3   4   0.2
2  4   4   0.3
3  5   4   0.4
4  3   4   0.5
5  4   4   0.6
7  3   5   0.1
8  4   5   0.0


HTH,

Marc



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] conditional selection of dataframe rows

2010-08-12 Thread Marc Schwartz

On Aug 12, 2010, at 3:06 PM, Toby Gass wrote:

 Thank you all for the quick responses.  So far as I've checked, 
 Marc's solution works perfectly and is quite speedy.  I'm still 
 trying to figure out what it is doing. :)
 
 Henrique's solution seems to need some columns somewhere.  David's 
 solution does not find all the other measurements, possibly with 
 positive values, taken on the same day.
 
 Thank you again for your efforts.
 
 Toby

snip

Toby,

Working from the inside out:

The ave() function splits (sub-groups) the data frame by one or more factors, 
internally using split() and then passing the desired column from each 
sub-group to the function defined by using lapply(). By default, that is 
mean(). 

The great thing about using ave(), is that it will replicate the scalar 
sub-group based result of the function, once for each row in the sub-group. In 
addition, the result vector will be sorted in the order of the rows in the 
original data frame, rather than in the order of the sub-group rows. So in this 
case, if any of the rows in the sub-group has a SLOPE with negative value, all 
rows in the sub-group get a TRUE.


You can get an initial feel for the internal data organizing process by using:

 split(toy, list(toy$CH, toy$DAY))
$`3.4`
  CH DAY SLOPE
1  3   4   0.2
4  3   4   0.5

$`4.4`
  CH DAY SLOPE
2  4   4   0.3
5  4   4   0.6

$`5.4`
  CH DAY SLOPE
3  5   4   0.4

$`3.5`
  CH DAY SLOPE
7  3   5   0.1

$`4.5`
  CH DAY SLOPE
8  4   5 0

$`5.5`
  CH DAY SLOPE
6  5   5   0.2
9  5   5  -0.1



So the first step is:

 with(toy, ave(SLOPE, CH, DAY, FUN = function(x) any(x  0)))
[1] 0 0 0 0 0 1 0 0 1


Note that I use with() to define that SLOPE, CH and DAY are all to be evaluated 
(found) within the 'toy' data frame. That is easier than using:

 ave(toy$SLOPE, toy$CH, toy$DAY, FUN = function(x) any(x  0))
[1] 0 0 0 0 0 1 0 0 1


This returns a vector of 0's and 1's (FALSE and TRUE coerced to a numeric). 
Note that the returned vector does not correspond to the sequence of rows in 
the result of split() above, but to the sequence of rows in the original 'toy' 
data frame. That is, rows 6 and 9 are 1 (TRUE):

 cbind(toy, flag = with(toy, ave(SLOPE, CH, DAY, 
  FUN = function(x) any(x  0
  CH DAY SLOPE flag
1  3   4   0.20
2  4   4   0.30
3  5   4   0.40
4  3   4   0.50
5  4   4   0.60
6  5   5   0.21
7  3   5   0.10
8  4   5   0.00
9  5   5  -0.11


The next step is to remove those rows. You could do that by using regular 
indexing, but by using subset(), I can replicate the behavior of having used 
with() above, since the arguments in subset() are evaluated within the data 
frame defined. Thus, I can eliminate the use of with() and have a shorter 
solution. Then, by negating the result of ave() so that 0 (FALSE) becomes TRUE, 
retain only those rows where the ave() result was 0:

 subset(toy, !ave(SLOPE, CH, DAY, FUN = function(x) any(x  0)))
  CH DAY SLOPE
1  3   4   0.2
2  4   4   0.3
3  5   4   0.4
4  3   4   0.5
5  4   4   0.6
7  3   5   0.1
8  4   5   0.0


I hope that clarifies the process.

Marc

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] conditional selection of dataframe rows

2010-08-12 Thread Toby Gass
Hi,

I do want to look only at slope.
If there is one negative slope measurement  for a given day and a 
given chamber, I would like to remove all other slope measurements 
for that day and that chamber, even if they are positive.  

On one day, I will have 20 slope measurements for each chamber.  If 
one is negative, I would like to delete the other 19 for that chamber 
on that day, even if they are positive.  I have measurements for 
every day of the year, for 4 years and multiple chambers.  

I know I could make some awful nested loop with a vector of day and 
chamber numbers for each occurrence of a negative slope and then run 
that against the whole data set but I hope not to have to do that.

Here is the rationale, if that helps.  These are unattended outdoor 
chambers that measure soil carbon efflux.  When the numbers go 
negative during part of the day but otherwise look normal, it usually 
means a plant has sprouted in the chamber and is using the carbon 
dioxide.  That means the measurements are all lower than they should 
be and I need to discard all measurements collected on that day, 
whether positive or negative.

It might have been a little clearer if I'd make the toy dataframe a 
bit larger.  

Thanks again for the assistance.

Toby



On 12 Aug 2010 at 16:39, David Winsemius wrote:

 
 On Aug 12, 2010, at 4:06 PM, Toby Gass wrote:
 
  Thank you all for the quick responses.  So far as I've checked,
  Marc's solution works perfectly and is quite speedy.  I'm still
  trying to figure out what it is doing. :)
 
  Henrique's solution seems to need some columns somewhere.  David's
  solution does not find all the other measurements, possibly with
  positive values, taken on the same day.
 
 I assumed you only wanted to look at what appeared to be a data  
 column, SLOPE. If you want to look at all columns for negatives then  
 try:
 
 toy[ which( apply(toy, 1, function(x) all(x = 0)) ), ]  # or
 toy[ apply(toy, 1, function(x) all(x = 0)) , ]
 
 This is how they differ w,r,t, their handling of NA's.
 
   toy[3,2] - NA
   toy[ apply(toy, 1, function(x) all(x = 0)) , ]
 CH DAY SLOPE
 1   3   4   0.2
 2   4   4   0.3
 NA NA  NANA
 4   3   4   0.5
 5   4   4   0.6
 6   5   5   0.2
 7   3   5   0.1
 8   4   5   0.0
   toy[ which(apply(toy, 1, function(x) all(x = 0)) ), ]
CH DAY SLOPE
 1  3   4   0.2
 2  4   4   0.3
 4  3   4   0.5
 5  4   4   0.6
 6  5   5   0.2
 7  3   5   0.1
 8  4   5   0.0
 
 
 
  Thank you again for your efforts.
 
  Toby
 
  On 12 Aug 2010 at 14:32, Marc Schwartz wrote:
 
  On Aug 12, 2010, at 2:24 PM, Marc Schwartz wrote:
 
  On Aug 12, 2010, at 2:11 PM, Toby Gass wrote:
 
  Dear helpeRs,
 
  I have a dataframe (14947 x 27) containing measurements collected
  every 5 seconds at several different sampling locations.  If one
  measurement at a given location is less than zero on a given day, I
  would like to delete all measurements from that location on that  
  day.
 
  Here is a toy example:
 
  toy - data.frame(CH = rep(3:5,3), DAY = c(rep(4,5), rep(5,4)),
  SLOPE = c(seq(0.2,0.6, .1),seq(0.2, -0.1, -0.1)))
 
  In this example, row 9 has a negative measurement for Chamber 5,  
  so I
  would like to delete row 6, which is the same Chamber on the same
  day, but not row 3, which is the same chamber on a different  
  day.  In
  the full dataframe, there are, of course, many more days.
 
  Is there a handy R way to do this?
 
  Thank you for the assistance.
 
  Toby
 
 
 
  Not fully tested, but here is one possibility:
 
  toy
  CH DAY SLOPE
  1  3   4   0.2
  2  4   4   0.3
  3  5   4   0.4
  4  3   4   0.5
  5  4   4   0.6
  6  5   5   0.2
  7  3   5   0.1
  8  4   5   0.0
  9  5   5  -0.1
 
 
  subset(toy, ave(SLOPE, CH, DAY, FUN = function(x) any(x  0)) == 0)
  CH DAY SLOPE
  1  3   4   0.2
  2  4   4   0.3
  3  5   4   0.4
  4  3   4   0.5
  5  4   4   0.6
  7  3   5   0.1
  8  4   5   0.0
 
 
  This can actually be slightly shortened to:
 
  subset(toy, !ave(SLOPE, CH, DAY, FUN = function(x) any(x  0)))
   CH DAY SLOPE
  1  3   4   0.2
  2  4   4   0.3
  3  5   4   0.4
  4  3   4   0.5
  5  4   4   0.6
  7  3   5   0.1
  8  4   5   0.0
 
 
  HTH,
 
  Marc
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 David Winsemius, MD
 West Hartford, CT


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] conditional selection of dataframe rows

2010-08-12 Thread David Winsemius


On Aug 12, 2010, at 5:20 PM, Toby Gass wrote:


Hi,

I do want to look only at slope.
If there is one negative slope measurement  for a given day and a
given chamber, I would like to remove all other slope measurements
for that day and that chamber, even if they are positive.

On one day, I will have 20 slope measurements for each chamber.  If
one is negative, I would like to delete the other 19 for that chamber
on that day, even if they are positive.  I have measurements for
every day of the year, for 4 years and multiple chambers.

I know I could make some awful nested loop with a vector of day and
chamber numbers for each occurrence of a negative slope and then run
that against the whole data set but I hope not to have to do that.

Here is the rationale, if that helps.  These are unattended outdoor
chambers that measure soil carbon efflux.  When the numbers go
negative during part of the day but otherwise look normal, it usually
means a plant has sprouted in the chamber and is using the carbon
dioxide.  That means the measurements are all lower than they should
be and I need to discard all measurements collected on that day,
whether positive or negative.

It might have been a little clearer if I'd make the toy dataframe a
bit larger.


I think the fault was all mine. Failure to read for meaning. Here's an  
alternate strategy, although I think Schwartz's might be cleaner:


 toy$ch.day.cat - with(toy, paste(CH, DAY, sep=.))
 negs.idxs - tapply(toy$SLOPE , toy$ch.day.cat, function (x) any(x  
0) )

 negs.idxs
  3.4   3.5   4.4   4.5   5.4   5.5
FALSE FALSE FALSE FALSE FALSE  TRUE
 toy[-which(negs.idxs), ]
  CH DAY SLOPE ch.day.cat
1  3   4   0.23.4
2  4   4   0.34.4
3  5   4   0.45.4
4  3   4   0.53.4
5  4   4   0.64.4
7  3   5   0.13.5
8  4   5   0.04.5
9  5   5  -0.15.5

--
David


Thanks again for the assistance.

Toby



On 12 Aug 2010 at 16:39, David Winsemius wrote:



On Aug 12, 2010, at 4:06 PM, Toby Gass wrote:


Thank you all for the quick responses.  So far as I've checked,
Marc's solution works perfectly and is quite speedy.  I'm still
trying to figure out what it is doing. :)

Henrique's solution seems to need some columns somewhere.  David's
solution does not find all the other measurements, possibly with
positive values, taken on the same day.


I assumed you only wanted to look at what appeared to be a data
column, SLOPE. If you want to look at all columns for negatives then
try:

toy[ which( apply(toy, 1, function(x) all(x = 0)) ), ]  # or
toy[ apply(toy, 1, function(x) all(x = 0)) , ]

This is how they differ w,r,t, their handling of NA's.


toy[3,2] - NA
toy[ apply(toy, 1, function(x) all(x = 0)) , ]

   CH DAY SLOPE
1   3   4   0.2
2   4   4   0.3
NA NA  NANA
4   3   4   0.5
5   4   4   0.6
6   5   5   0.2
7   3   5   0.1
8   4   5   0.0

toy[ which(apply(toy, 1, function(x) all(x = 0)) ), ]

  CH DAY SLOPE
1  3   4   0.2
2  4   4   0.3
4  3   4   0.5
5  4   4   0.6
6  5   5   0.2
7  3   5   0.1
8  4   5   0.0




Thank you again for your efforts.

Toby

On 12 Aug 2010 at 14:32, Marc Schwartz wrote:


On Aug 12, 2010, at 2:24 PM, Marc Schwartz wrote:


On Aug 12, 2010, at 2:11 PM, Toby Gass wrote:


Dear helpeRs,

I have a dataframe (14947 x 27) containing measurements collected
every 5 seconds at several different sampling locations.  If one
measurement at a given location is less than zero on a given  
day, I

would like to delete all measurements from that location on that
day.

Here is a toy example:

toy - data.frame(CH = rep(3:5,3), DAY = c(rep(4,5), rep(5,4)),
SLOPE = c(seq(0.2,0.6, .1),seq(0.2, -0.1, -0.1)))

In this example, row 9 has a negative measurement for Chamber 5,
so I
would like to delete row 6, which is the same Chamber on the same
day, but not row 3, which is the same chamber on a different
day.  In
the full dataframe, there are, of course, many more days.

Is there a handy R way to do this?

Thank you for the assistance.

Toby




Not fully tested, but here is one possibility:


toy

CH DAY SLOPE
1  3   4   0.2
2  4   4   0.3
3  5   4   0.4
4  3   4   0.5
5  4   4   0.6
6  5   5   0.2
7  3   5   0.1
8  4   5   0.0
9  5   5  -0.1


subset(toy, ave(SLOPE, CH, DAY, FUN = function(x) any(x  0))  
== 0)

CH DAY SLOPE
1  3   4   0.2
2  4   4   0.3
3  5   4   0.4
4  3   4   0.5
5  4   4   0.6
7  3   5   0.1
8  4   5   0.0



This can actually be slightly shortened to:


subset(toy, !ave(SLOPE, CH, DAY, FUN = function(x) any(x  0)))

CH DAY SLOPE
1  3   4   0.2
2  4   4   0.3
3  5   4   0.4
4  3   4   0.5
5  4   4   0.6
7  3   5   0.1
8  4   5   0.0


HTH,

Marc



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT






David Winsemius, MD
West 

Re: [R] conditional selection of dataframe rows

2010-08-12 Thread David Winsemius


On Aug 12, 2010, at 6:15 PM, David Winsemius wrote:



On Aug 12, 2010, at 5:20 PM, Toby Gass wrote:


Hi,

I do want to look only at slope.
If there is one negative slope measurement  for a given day and a
given chamber, I would like to remove all other slope measurements
for that day and that chamber, even if they are positive.

On one day, I will have 20 slope measurements for each chamber.  If
one is negative, I would like to delete the other 19 for that chamber
on that day, even if they are positive.  I have measurements for
every day of the year, for 4 years and multiple chambers.

I know I could make some awful nested loop with a vector of day and
chamber numbers for each occurrence of a negative slope and then run
that against the whole data set but I hope not to have to do that.

Here is the rationale, if that helps.  These are unattended outdoor
chambers that measure soil carbon efflux.  When the numbers go
negative during part of the day but otherwise look normal, it usually
means a plant has sprouted in the chamber and is using the carbon
dioxide.  That means the measurements are all lower than they should
be and I need to discard all measurements collected on that day,
whether positive or negative.

It might have been a little clearer if I'd make the toy dataframe a
bit larger.


I think the fault was all mine. Failure to read for meaning. Here's  
an alternate strategy, although I think Schwartz's might be cleaner:


 toy$ch.day.cat - with(toy, paste(CH, DAY, sep=.))
 negs.idxs - tapply(toy$SLOPE , toy$ch.day.cat, function (x) any(x  
0) )

 negs.idxs
 3.4   3.5   4.4   4.5   5.4   5.5
FALSE FALSE FALSE FALSE FALSE  TRUE
 toy[-which(negs.idxs), ]
 CH DAY SLOPE ch.day.cat
1  3   4   0.23.4
2  4   4   0.34.4
3  5   4   0.45.4
4  3   4   0.53.4
5  4   4   0.64.4
7  3   5   0.13.5
8  4   5   0.04.5
9  5   5  -0.15.5



I think I should give up today. I saw that the above code eliminates  
#6 and only after posting saw that #9 was left in:


require(rms)   # for %nin%  .. or use the %w/o% operator defined on  
match help page:


 toy[toy$ch.day.cat %nin% names(negs.idxs[negs.idxs]), ]
  CH DAY SLOPE ch.day.cat
1  3   4   0.23.4
2  4   4   0.34.4
3  5   4   0.45.4
4  3   4   0.53.4
5  4   4   0.64.4
7  3   5   0.13.5
8  4   5   0.04.5

Now I am really sure that the ave(  , , any)  strategy is superior.



--
David


Thanks again for the assistance.

Toby



On 12 Aug 2010 at 16:39, David Winsemius wrote:



On Aug 12, 2010, at 4:06 PM, Toby Gass wrote:


Thank you all for the quick responses.  So far as I've checked,
Marc's solution works perfectly and is quite speedy.  I'm still
trying to figure out what it is doing. :)

Henrique's solution seems to need some columns somewhere.  David's
solution does not find all the other measurements, possibly with
positive values, taken on the same day.


I assumed you only wanted to look at what appeared to be a data
column, SLOPE. If you want to look at all columns for negatives then
try:

toy[ which( apply(toy, 1, function(x) all(x = 0)) ), ]  # or
toy[ apply(toy, 1, function(x) all(x = 0)) , ]

This is how they differ w,r,t, their handling of NA's.


toy[3,2] - NA
toy[ apply(toy, 1, function(x) all(x = 0)) , ]

  CH DAY SLOPE
1   3   4   0.2
2   4   4   0.3
NA NA  NANA
4   3   4   0.5
5   4   4   0.6
6   5   5   0.2
7   3   5   0.1
8   4   5   0.0

toy[ which(apply(toy, 1, function(x) all(x = 0)) ), ]

 CH DAY SLOPE
1  3   4   0.2
2  4   4   0.3
4  3   4   0.5
5  4   4   0.6
6  5   5   0.2
7  3   5   0.1
8  4   5   0.0




Thank you again for your efforts.

Toby

On 12 Aug 2010 at 14:32, Marc Schwartz wrote:


On Aug 12, 2010, at 2:24 PM, Marc Schwartz wrote:


On Aug 12, 2010, at 2:11 PM, Toby Gass wrote:


Dear helpeRs,

I have a dataframe (14947 x 27) containing measurements  
collected

every 5 seconds at several different sampling locations.  If one
measurement at a given location is less than zero on a given  
day, I

would like to delete all measurements from that location on that
day.

Here is a toy example:

toy - data.frame(CH = rep(3:5,3), DAY = c(rep(4,5), rep(5,4)),
SLOPE = c(seq(0.2,0.6, .1),seq(0.2, -0.1, -0.1)))

In this example, row 9 has a negative measurement for Chamber 5,
so I
would like to delete row 6, which is the same Chamber on the  
same

day, but not row 3, which is the same chamber on a different
day.  In
the full dataframe, there are, of course, many more days.

Is there a handy R way to do this?

Thank you for the assistance.

Toby




Not fully tested, but here is one possibility:


toy

CH DAY SLOPE
1  3   4   0.2
2  4   4   0.3
3  5   4   0.4
4  3   4   0.5
5  4   4   0.6
6  5   5   0.2
7  3   5   0.1
8  4   5   0.0
9  5   5  -0.1


subset(toy, ave(SLOPE, CH, DAY, FUN = function(x) any(x  0))  
== 0)

CH DAY SLOPE
1  3   4   0.2
2  4   4   0.3
3  5   4   0.4
4  3   4   0.5
5  4   4   0.6
7  3   5