[R] Counting consecutive events in R
Hi, I have the following dataframe structure(list(Type = c(QRS, QRS, QRS, QRS, QRS, QRS, QRS, QRS, QRS, QRS, QRS, QRS, RR, RR, RR, PP, PP, PP, PP, PP, PP, PP, PP, PP, QTc, QTc, QTc, QTc, QTc, QTc, QTc, QTc, QTc, QTc, QTc, QTc, QTc, QTc, QTc), Time_Point_Start = c(2015-04-01 14:57:15.0.0312, 2015-04-01 14:57:15.0.7839, 2015-04-01 14:57:16.0.5343, 2015-04-01 14:57:17.0.2573, 2015-04-01 14:57:18.0.0234, 2015-04-01 14:57:18.0.7722, 2015-04-01 14:57:19.0.5265, 2015-04-01 14:57:24.0.0195, 2015-04-01 14:57:24.0.7839, 2015-04-01 14:57:25.0.5343, 2015-04-01 14:57:26.0.2768, 2015-04-01 14:57:27.0.0273, 2015-04-01 14:58:03.0.0702, 2015-04-01 14:58:03.0.8190, 2015-04-01 14:58:04.0.5694, 2015-04-01 14:57:58.0.4134, 2015-04-01 14:57:59.0.1637, 2015-04-01 14:57:59.0.9126, 2015-04-01 14:58:00.0.6630, 2015-04-01 14:58:01.0.4134, 2015-04-01 14:58:02.0.1637, 2015-04-01 14:58:02.0.9126, 2015-04-01 14:58:03.0.6630, 2015-04-01 14:58:04.0.4134, 2015-04-01 14:57:07.0.4212, 2015-04-01 14:57:08.0.1715, 2015-04-01 14:57:08.0.9204, 2015-04-01 14:57:09.0.6864, 2015-04-01 14:57:10.0.4368, 2015-04-01 14:57:11.0.1871, 2015-04-01 14:57:11.0.9360, 2015-04-01 14:57:12.0.6591, 2015-04-01 14:57:13.0.4251, 2015-04-01 14:57:14.0.1754, 2015-04-01 14:57:14.0.9243, 2015-04-01 14:57:15.0.6903, 2015-04-01 14:57:16.0.4407, 2015-04-01 14:57:17.0.1676, 2015-04-01 14:57:17.0.9321), Time_Point_End = c(2015-04-01 14:57:15.0.0858, 2015-04-01 14:57:15.0.8346, 2015-04-01 14:57:16.0.6006, 2015-04-01 14:57:17.0.0351, 2015-04-01 14:57:18.0.1403, 2015-04-01 14:57:18.0.8385, 2015-04-01 14:57:19.0.5889, 2015-04-01 14:57:24.0.0858, 2015-04-01 14:57:24.0.8346, 2015-04-01 14:57:25.0.5772, 2015-04-01 14:57:26.0.3939, 2015-04-01 14:57:27.0.0936, 2015-04-01 14:58:03.0.8190, 2015-04-01 14:58:04.0.5694, 2015-04-01 14:58:05.0.3197, 2015-04-01 14:57:59.0.1637, 2015-04-01 14:57:59.0.9126, 2015-04-01 14:58:00.0.6630, 2015-04-01 14:58:01.0.4134, 2015-04-01 14:58:02.0.1637, 2015-04-01 14:58:02.0.9126, 2015-04-01 14:58:03.0.6630, 2015-04-01 14:58:04.0.4134, 2015-04-01 14:58:05.0.1793, 2015-04-01 14:57:07.0.8775, 2015-04-01 14:57:08.0.6435, 2015-04-01 14:57:09.0.3705, 2015-04-01 14:57:10.0.1209, 2015-04-01 14:57:10.0.8697, 2015-04-01 14:57:11.0.6201, 2015-04-01 14:57:12.0.3861, 2015-04-01 14:57:13.0.1364, 2015-04-01 14:57:13.0.8853, 2015-04-01 14:57:14.0.6513, 2015-04-01 14:57:15.0.4017, 2015-04-01 14:57:16.0.1248, 2015-04-01 14:57:16.0.9165, 2015-04-01 14:57:17.0.6162, 2015-04-01 14:57:18.0.3900), Value = c(0.0546, 0.0507, 0.0663, 0.0936, 0.117, 0.0663, 0.0624, 0.0663, 0.0507, 0.0429, 0.117, 0.0663, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7644, 0.033103481, 0.034056449, 0.032367699, 0.031000613, 0.031405867, 0.031241866, 0.032367699, 0.034337907, 0.033125921, 0.034337907, 0.034337907, 0.031241866, 0.034337907, 0.032367699, 0.032930616), Score = c(0L, 0L, 0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), Type_Desc = c(NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, 1L, 1L, 1L, 1L, 1L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Pat_id = c(4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L)), .Names = c(Type, Time_Point_Start, Time_Point_End, Value, Score, Type_Desc, Pat_id), class = data.frame, row.names = c(NA, -39L)) For each unique value in column 'Type' , I want to check for consecutive 5 rows (if any) of 'Score' 0. Now, if there are five consecutive rows with Score 0 and 'Type_Desc' = 0, then we print Type_low , else if 'Type_Desc' = 1, we print Type_high. The search should end once 5 consecutive rows have been found. So, for this data frame we will have two statements as follows, 1.PP_high (reason - consecutive 5 rows of score 0 and 'Type_Desc' = 1 ) 2.QTc_low (reason - consecutive 5 rows of score 0 and 'Type_Desc' = 0 ) How can this problem tackled in R? Thanks, Abhinaba [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting consecutive events in R
Assuming I understand the problem correctly, you want to check for runs of at least length five where both Score and Test_desc assume particular values. You don't care where they are or what other data are associated, you just want to know if at least one such run exists in your data frame. Here's a function that does that: checkruns - function(testdata) { test1 - ifelse(testdata$Score 0 testdata$Type_Desc == 1 !is.na(testdata$Type_Desc), 1, 0) test0 - ifelse(testdata$Score 0 testdata$Type_Desc == 0 !is.na(testdata$Type_Desc), 1, 0) test1.rle - rle(test1) test0.rle - rle(test0) if(any(test1.rle$lengths = 5 test1.rle$values == 1)) cat(Type_high\n) if(any(test0.rle$lengths = 5 test0.rle$values == 1)) cat(Type_low\n) invisible() } Sarah On Thu, May 14, 2015 at 8:16 AM, Abhinaba Roy abhinabaro...@gmail.com wrote: Hi, I have the following dataframe structure(list(Type = c(QRS, QRS, QRS, QRS, QRS, QRS, QRS, QRS, QRS, QRS, QRS, QRS, RR, RR, RR, PP, PP, PP, PP, PP, PP, PP, PP, PP, QTc, QTc, QTc, QTc, QTc, QTc, QTc, QTc, QTc, QTc, QTc, QTc, QTc, QTc, QTc), Time_Point_Start = c(2015-04-01 14:57:15.0.0312, 2015-04-01 14:57:15.0.7839, 2015-04-01 14:57:16.0.5343, 2015-04-01 14:57:17.0.2573, 2015-04-01 14:57:18.0.0234, 2015-04-01 14:57:18.0.7722, 2015-04-01 14:57:19.0.5265, 2015-04-01 14:57:24.0.0195, 2015-04-01 14:57:24.0.7839, 2015-04-01 14:57:25.0.5343, 2015-04-01 14:57:26.0.2768, 2015-04-01 14:57:27.0.0273, 2015-04-01 14:58:03.0.0702, 2015-04-01 14:58:03.0.8190, 2015-04-01 14:58:04.0.5694, 2015-04-01 14:57:58.0.4134, 2015-04-01 14:57:59.0.1637, 2015-04-01 14:57:59.0.9126, 2015-04-01 14:58:00.0.6630, 2015-04-01 14:58:01.0.4134, 2015-04-01 14:58:02.0.1637, 2015-04-01 14:58:02.0.9126, 2015-04-01 14:58:03.0.6630, 2015-04-01 14:58:04.0.4134, 2015-04-01 14:57:07.0.4212, 2015-04-01 14:57:08.0.1715, 2015-04-01 14:57:08.0.9204, 2015-04-01 14:57:09.0.6864, 2015-04-01 14:57:10.0.4368, 2015-04-01 14:57:11.0.1871, 2015-04-01 14:57:11.0.9360, 2015-04-01 14:57:12.0.6591, 2015-04-01 14:57:13.0.4251, 2015-04-01 14:57:14.0.1754, 2015-04-01 14:57:14.0.9243, 2015-04-01 14:57:15.0.6903, 2015-04-01 14:57:16.0.4407, 2015-04-01 14:57:17.0.1676, 2015-04-01 14:57:17.0.9321), Time_Point_End = c(2015-04-01 14:57:15.0.0858, 2015-04-01 14:57:15.0.8346, 2015-04-01 14:57:16.0.6006, 2015-04-01 14:57:17.0.0351, 2015-04-01 14:57:18.0.1403, 2015-04-01 14:57:18.0.8385, 2015-04-01 14:57:19.0.5889, 2015-04-01 14:57:24.0.0858, 2015-04-01 14:57:24.0.8346, 2015-04-01 14:57:25.0.5772, 2015-04-01 14:57:26.0.3939, 2015-04-01 14:57:27.0.0936, 2015-04-01 14:58:03.0.8190, 2015-04-01 14:58:04.0.5694, 2015-04-01 14:58:05.0.3197, 2015-04-01 14:57:59.0.1637, 2015-04-01 14:57:59.0.9126, 2015-04-01 14:58:00.0.6630, 2015-04-01 14:58:01.0.4134, 2015-04-01 14:58:02.0.1637, 2015-04-01 14:58:02.0.9126, 2015-04-01 14:58:03.0.6630, 2015-04-01 14:58:04.0.4134, 2015-04-01 14:58:05.0.1793, 2015-04-01 14:57:07.0.8775, 2015-04-01 14:57:08.0.6435, 2015-04-01 14:57:09.0.3705, 2015-04-01 14:57:10.0.1209, 2015-04-01 14:57:10.0.8697, 2015-04-01 14:57:11.0.6201, 2015-04-01 14:57:12.0.3861, 2015-04-01 14:57:13.0.1364, 2015-04-01 14:57:13.0.8853, 2015-04-01 14:57:14.0.6513, 2015-04-01 14:57:15.0.4017, 2015-04-01 14:57:16.0.1248, 2015-04-01 14:57:16.0.9165, 2015-04-01 14:57:17.0.6162, 2015-04-01 14:57:18.0.3900), Value = c(0.0546, 0.0507, 0.0663, 0.0936, 0.117, 0.0663, 0.0624, 0.0663, 0.0507, 0.0429, 0.117, 0.0663, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7644, 0.033103481, 0.034056449, 0.032367699, 0.031000613, 0.031405867, 0.031241866, 0.032367699, 0.034337907, 0.033125921, 0.034337907, 0.034337907, 0.031241866, 0.034337907, 0.032367699, 0.032930616), Score = c(0L, 0L, 0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), Type_Desc = c(NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, 1L, 1L, 1L, 1L, 1L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Pat_id = c(4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L)), .Names = c(Type, Time_Point_Start, Time_Point_End, Value, Score, Type_Desc, Pat_id), class = data.frame, row.names = c(NA, -39L)) For each unique value in column 'Type' , I want to check for consecutive 5 rows (if any) of 'Score' 0. Now, if there are five consecutive rows with Score 0 and 'Type_Desc' = 0, then we print Type_low , else if 'Type_Desc' = 1, we print Type_high. The search should end once 5 consecutive rows have been found. So, for this data frame we will have two statements as
Re: [R] Counting consecutive events in R
I normally use rle() for these problems, see ?rle. for instance, k - rbinom(999, 1, .5) series - function(run) { r - rle(run)ser - which(r$lengths 5 r$values) } series(k) returns the indices of consecutive runs that have length 5 or longer. Abhinaba Roy abhinabaro...@gmail.com [Thu, May 14, 2015 at 02:16:31PM CEST]: Hi, I have the following dataframe structure(list(Type = c(QRS, QRS, QRS, QRS, QRS, QRS, QRS, QRS, QRS, QRS, QRS, QRS, RR, RR, RR, PP, PP, PP, PP, PP, PP, PP, PP, PP, QTc, QTc, QTc, QTc, QTc, QTc, QTc, QTc, QTc, QTc, QTc, QTc, QTc, QTc, QTc), Time_Point_Start = c(2015-04-01 14:57:15.0.0312, 2015-04-01 14:57:15.0.7839, 2015-04-01 14:57:16.0.5343, 2015-04-01 14:57:17.0.2573, 2015-04-01 14:57:18.0.0234, 2015-04-01 14:57:18.0.7722, 2015-04-01 14:57:19.0.5265, 2015-04-01 14:57:24.0.0195, 2015-04-01 14:57:24.0.7839, 2015-04-01 14:57:25.0.5343, 2015-04-01 14:57:26.0.2768, 2015-04-01 14:57:27.0.0273, 2015-04-01 14:58:03.0.0702, 2015-04-01 14:58:03.0.8190, 2015-04-01 14:58:04.0.5694, 2015-04-01 14:57:58.0.4134, 2015-04-01 14:57:59.0.1637, 2015-04-01 14:57:59.0.9126, 2015-04-01 14:58:00.0.6630, 2015-04-01 14:58:01.0.4134, 2015-04-01 14:58:02.0.1637, 2015-04-01 14:58:02.0.9126, 2015-04-01 14:58:03.0.6630, 2015-04-01 14:58:04.0.4134, 2015-04-01 14:57:07.0.4212, 2015-04-01 14:57:08.0.1715, 2015-04-01 14:57:08.0.9204, 2015-04-01 14:57:09.0.6864, 2015-04-01 14:57:10.0.4368, 2015-04-01 14:57:11.0.1871, 2015-04-01 14:57:11.0.9360, 2015-04-01 14:57:12.0.6591, 2015-04-01 14:57:13.0.4251, 2015-04-01 14:57:14.0.1754, 2015-04-01 14:57:14.0.9243, 2015-04-01 14:57:15.0.6903, 2015-04-01 14:57:16.0.4407, 2015-04-01 14:57:17.0.1676, 2015-04-01 14:57:17.0.9321), Time_Point_End = c(2015-04-01 14:57:15.0.0858, 2015-04-01 14:57:15.0.8346, 2015-04-01 14:57:16.0.6006, 2015-04-01 14:57:17.0.0351, 2015-04-01 14:57:18.0.1403, 2015-04-01 14:57:18.0.8385, 2015-04-01 14:57:19.0.5889, 2015-04-01 14:57:24.0.0858, 2015-04-01 14:57:24.0.8346, 2015-04-01 14:57:25.0.5772, 2015-04-01 14:57:26.0.3939, 2015-04-01 14:57:27.0.0936, 2015-04-01 14:58:03.0.8190, 2015-04-01 14:58:04.0.5694, 2015-04-01 14:58:05.0.3197, 2015-04-01 14:57:59.0.1637, 2015-04-01 14:57:59.0.9126, 2015-04-01 14:58:00.0.6630, 2015-04-01 14:58:01.0.4134, 2015-04-01 14:58:02.0.1637, 2015-04-01 14:58:02.0.9126, 2015-04-01 14:58:03.0.6630, 2015-04-01 14:58:04.0.4134, 2015-04-01 14:58:05.0.1793, 2015-04-01 14:57:07.0.8775, 2015-04-01 14:57:08.0.6435, 2015-04-01 14:57:09.0.3705, 2015-04-01 14:57:10.0.1209, 2015-04-01 14:57:10.0.8697, 2015-04-01 14:57:11.0.6201, 2015-04-01 14:57:12.0.3861, 2015-04-01 14:57:13.0.1364, 2015-04-01 14:57:13.0.8853, 2015-04-01 14:57:14.0.6513, 2015-04-01 14:57:15.0.4017, 2015-04-01 14:57:16.0.1248, 2015-04-01 14:57:16.0.9165, 2015-04-01 14:57:17.0.6162, 2015-04-01 14:57:18.0.3900), Value = c(0.0546, 0.0507, 0.0663, 0.0936, 0.117, 0.0663, 0.0624, 0.0663, 0.0507, 0.0429, 0.117, 0.0663, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7644, 0.033103481, 0.034056449, 0.032367699, 0.031000613, 0.031405867, 0.031241866, 0.032367699, 0.034337907, 0.033125921, 0.034337907, 0.034337907, 0.031241866, 0.034337907, 0.032367699, 0.032930616), Score = c(0L, 0L, 0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), Type_Desc = c(NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, 1L, 1L, 1L, 1L, 1L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Pat_id = c(4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L)), .Names = c(Type, Time_Point_Start, Time_Point_End, Value, Score, Type_Desc, Pat_id), class = data.frame, row.names = c(NA, -39L)) For each unique value in column 'Type' , I want to check for consecutive 5 rows (if any) of 'Score' 0. Now, if there are five consecutive rows with Score 0 and 'Type_Desc' = 0, then we print Type_low , else if 'Type_Desc' = 1, we print Type_high. The search should end once 5 consecutive rows have been found. So, for this data frame we will have two statements as follows, 1.PP_high (reason - consecutive 5 rows of score 0 and 'Type_Desc' = 1 ) 2.QTc_low (reason -