Re: [R] Counting number of consecutive occurrences per rows

2013-05-06 Thread PIKAL Petr
Hi

I slightly modified Jim's code

first part is function to split data frame test according to act, juln and day 
and compute repetitions in each chunk.

fff<- function(x) {
fac <-  factor((x[, "act"]==0)*1+(x[,"act"] == 200)*2, levels=c(1,0,2))
int<-interaction(x[,"juln"], x[,"day"], fac)
res <- cumsum(c(1, abs(diff(as.numeric(int)
res
}

test$fac<-fff(test)

Second part evaluates length of each chunk

test$res <- ave(test$fac, test$fac, FUN=length)

Last part computes max (min, sum) of res in each distinct chunk.

fff2<- function(x) {
fac <-  factor((x[, "act"]==0)*1+(x[,"act"] == 200)*2, levels=c(1,0,2), 
labels=c("0", "1-199", "200"))
fac
}

aggregate(test$res, list(test$juln, test$day), max)
aggregate(test$res, list(test$juln, test$day, fff2(test)), max)

Is it what you want?

Petr

From: zuzana zajkova [mailto:zuzu...@gmail.com]
Sent: Friday, May 03, 2013 7:10 PM
To: PIKAL Petr; jholt...@gmail.com
Cc: r-help@r-project.org
Subject: Re: [R] Counting number of consecutive occurrences per rows

Hi,
I'm sorry that it takes me so much time to respond, finally yesterday I got 
time to try your suggestions. Thank you for them!
I tried both, they give the same results, but in both there are some things I 
still need to solve. I would appreciate your help.
I include a little bigger dataframe (test2, in the end of this email), with 
more differencies in variables, to be able to better explain what I would like 
to calculate in addition.

Jim's code:
I needed to make some changes in assigning the key. Yours worked ok for that 
small "test" data, but when I tried it on my dataframe which has around 
25000rows, it didn't work properly.

test2$key[test2$act == 0] <- 1
test2$key[test2$act > 0 & test2$act < 200] <- 2
test2$key[test2$act == 200] <- 3
# this works ok
test2$resChange <- cumsum(c(1, abs(diff(test2$key
test2$res <- ave(test2$resChange, test2$resChange, FUN = length)
# I added new column by jul date
test2$resJ <- ave(test2$resChange, test2$resChange, test2$juln, FUN = length)
# this works fine as well, for dividing between day 0 and day 1
test2$resJD <- ave(test2$resChange, test2$resChange, test2$juln, test2$day, FUN 
= length)
# resume
test2Resume <- test2[ , list(maxres = max(res)
   , minres = min(res)
   , sumres = length(unique(resChange)))
   , keyby = c('day', 'key')]
# change 'key'
 test2Resume_day$key <- c('0', '1-199', '200')[test2Resume_day$key]
 test2Resume_day
   day   key maxres minres sumres
1:   0 0  2  2  3
2:   0 1-199  3  1  9
3:   0   200  6  1  7
4:   1 0  1  1  1
5:   1 1-199 10  1  7
6:   1   200  6  1  6
# resume by juln
 test2Resume_jul <- test2[ , list(maxres = max(res)
   , minres = min(res)
  , sumres = length(unique(resChange)))
  , keyby = c('juln', 'key')]  # by juln
 # change 'key'
 test2Resume_jul$key <- c('0', '1-199', '200')[test2Resume_jul$key]
 test2Resume_jul
juln   key maxres minres sumres
1: 15173 0  2  2  1
2: 15173 1-199  3  1  7
3: 15173   200  6  1  6
4: 15174 0  2  1  3
5: 15174 1-199 10  1  8
6: 15174   200  6  1  6
It is ok, but what I would like to get is resume for juln and for  variable day 
(0 and 1) aswell.
Like this:
juln   day  key   maxres   minressumres
15173   00
15173   01-199
15173   0200
15173   10
15173   11-199
15173   1200
15174  0 0
15174  0 1-199
15174  0 200
15174  1 0
15174  1 1-199
15174  1 200
...
The other thing is that the "sumres" I would like to calculate like a sum of 
values of occurencies for each "key".
For example, if in the test2 dataframe res values for key 200 (juln 15173) are 
1, 1, 2,2,1,2 the sumres should be 9 (1+1+2+2+1+2), not 6 (which I suppose come 
form sum of number of unique occurencies).

Petr's code:
This works fine also, the thing is that doing the aggregation I would need the 
intervals to be like this
[0, 1)
[1, 199]
(199, 200]
what I don't know if is possible... I checked the hepl for cut, but I found 
that it can be closed just right or left...

Thank you very much for your time and sharing your knowledge!
Zuzana

## here is the bigger test2 dataframe
> dput(test2)
structure(list(daten = structure(c(15173, 15173, 15173, 15173,
15173, 15173, 15173, 1

Re: [R] Counting number of consecutive occurrences per rows

2013-05-03 Thread zuzana zajkova
ot;win", "win", "win", "win", "win", "win",
"win", "win", "win", "win", "win", "win", "win", "win", "win",
"win", "win", "win", "win", "win", "win", "win", "win", "win",
"win", "win", "win", "win", "win", "win", "win", "win", "win",
"win", "win", "win", "win", "win", "win", "win", "win", "win",
"win"), night = structure(c(1310962792, 1310963392, 1310963992,
1310964592, 1310965192, 1310965792, 1310966392, 1310966992, 1310967592,
1310968192, 1310968792, 1310969392, 1310969992, 1310970592, 1310971192,
1310971792, 1310972392, 1310972992, 1310973592, 1310974192, 1310974792,
1310975392, 1311107991, 1311108591, 1311109191, 1311109791, 130391,
130991, 131591, 132191, 132791, 133391, 133991,
134591, 135191, 135791, 136391, 136991, 137591,
138191, 138791, 139391, 139991, 1311034191, 1311034791,
1311035391, 1311035991, 1311036591, 1311037191, 1311037791, 1311038391,
1311038991, 1311039591, 1311040191, 1311040791, 1311041391, 1311041991,
1311042591, 1311043191, 1311043791), class = c("POSIXct", "POSIXt"
), tzone = "GMT"), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0), act = c(196, 200, 199, 200, 197, 198, 197,
200, 200, 197, 200, 200, 198, 200, 1, 1, 0, 0, 1, 2, 200, 200,
200, 200, 200, 200, 199, 61, 0, 194, 198, 198, 196, 193, 194,
193, 197, 198, 199, 200, 197, 199, 199, 200, 198, 200, 200, 198,
200, 34, 1, 1, 0, 0, 199, 200, 199, 7, 0, 0)), .Names = c("daten",
"juln", "fen", "night", "day", "act"), row.names = 9990:10049, class =
"data.frame")





On 29 April 2013 14:35, PIKAL Petr  wrote:

> Hi
>
> rrr<-rle(as.numeric(cut(test$act, c(0,1,199,200), include.lowest=T)))
> test$res <- rep(rrr$lengths, rrr$lengths)
>
> If you put it in function
>
> fff<- function(x, limits=c(0,1,199,200)) {
> rrr<-rle(as.numeric(cut(x, limits, include.lowest=T)))
> res <- rep(rrr$lengths, rrr$lengths)
> res
> }
>
> you can use split/lapply approach
>
> test$res2<-unlist(lapply(split(test$act, factor(test$day, levels=c(1,0))),
> fff))
>
> Beware of correct ordering of days in output. Without correct leveling of
> factor 0 precedes 1.
>
> And for the last part probably aggregate can be the way.
>
> > aggregate(test$res, list(test$jul, cut(test$act, c(0,1,199,200),
> include.lowest=T)), max)
>   Group.1   Group.2 x
> 1   14655 [0,1] 4
> 2   14655   (1,199] 3
> 3   14655 (199,200] 3
> > aggregate(test$res, list(test$jul, cut(test$act, c(0,1,199,200),
> include.lowest=T)), min)
>   Group.1   Group.2 x
> 1   14655 [0,1] 4
> 2   14655   (1,199] 1
> 3   14655 (199,200] 2
>
> Regards
> Petr
>
> > -Original Message-
> > From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> > project.org] On Behalf Of zuzana zajkova
> > Sent: Monday, April 29, 2013 12:45 PM
> > To: r-help@r-project.org
> > Subject: [R] Counting number of consecutive occurrences per rows
> >
> > Hi,
> >
> > I would appreciate if somebody could help me with following
> > calculation.
> > I have a dataframe, by 10 minutes time, for mostly one year data. This
> > is small example:
> >
> > > dput(test)
> > structure(list(jul = structure(c(14655, 14655, 14655, 14655, 14655,
> > 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655,
> > 14655), origin = structure(0, class = "Date")),
> > time = structure(c(1266258354, 1266258954, 1266259554, 1266260154,
> > 1266260754, 1266261354, 1266261954, 1266262554, 1266263154,
> > 1266263754, 1266264354, 1266264954, 1266265554, 1266266154,
> > 1266266754, 1266267354), class = c("POSIXct", "POSIXt"), tzone =
> > "GMT"),
> > act = c(130, 23, 45, 200, 200, 200, 199, 150, 0, 0, 0, 0,
> > 34, 200, 200, 145), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> > 0, 0, 0, 0, 0, 0)), .Names = c("jul", "time", "act", "day"
> > ), class = "data.frame", row.names = c(510L, 512L, 514L, 516L, 518L,
> > 520L, 522L, 524L, 526L, 528L, 530L, 532L, 534L, 536L, 538L,
> > 540L))
> >
> > L

Re: [R] Counting number of consecutive occurrences per rows

2013-04-29 Thread PIKAL Petr
Hi

rrr<-rle(as.numeric(cut(test$act, c(0,1,199,200), include.lowest=T)))
test$res <- rep(rrr$lengths, rrr$lengths)

If you put it in function

fff<- function(x, limits=c(0,1,199,200)) {
rrr<-rle(as.numeric(cut(x, limits, include.lowest=T)))
res <- rep(rrr$lengths, rrr$lengths)
res
}

you can use split/lapply approach

test$res2<-unlist(lapply(split(test$act, factor(test$day, levels=c(1,0))), fff))

Beware of correct ordering of days in output. Without correct leveling of 
factor 0 precedes 1.

And for the last part probably aggregate can be the way.

> aggregate(test$res, list(test$jul, cut(test$act, c(0,1,199,200), 
> include.lowest=T)), max)
  Group.1   Group.2 x
1   14655 [0,1] 4
2   14655   (1,199] 3
3   14655 (199,200] 3
> aggregate(test$res, list(test$jul, cut(test$act, c(0,1,199,200), 
> include.lowest=T)), min)
  Group.1   Group.2 x
1   14655 [0,1] 4
2   14655   (1,199] 1
3   14655 (199,200] 2

Regards
Petr

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of zuzana zajkova
> Sent: Monday, April 29, 2013 12:45 PM
> To: r-help@r-project.org
> Subject: [R] Counting number of consecutive occurrences per rows
> 
> Hi,
> 
> I would appreciate if somebody could help me with following
> calculation.
> I have a dataframe, by 10 minutes time, for mostly one year data. This
> is small example:
> 
> > dput(test)
> structure(list(jul = structure(c(14655, 14655, 14655, 14655, 14655,
> 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655,
> 14655), origin = structure(0, class = "Date")),
> time = structure(c(1266258354, 1266258954, 1266259554, 1266260154,
> 1266260754, 1266261354, 1266261954, 1266262554, 1266263154,
> 1266263754, 1266264354, 1266264954, 1266265554, 1266266154,
> 1266266754, 1266267354), class = c("POSIXct", "POSIXt"), tzone =
> "GMT"),
> act = c(130, 23, 45, 200, 200, 200, 199, 150, 0, 0, 0, 0,
> 34, 200, 200, 145), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 0, 0, 0, 0, 0, 0)), .Names = c("jul", "time", "act", "day"
> ), class = "data.frame", row.names = c(510L, 512L, 514L, 516L, 518L,
> 520L, 522L, 524L, 526L, 528L, 530L, 532L, 534L, 536L, 538L,
> 540L))
> 
> Looks like this:
> 
> > test
>  jultime act day
> 510 14655 2010-02-15 18:25:54 130   1
> 512 14655 2010-02-15 18:35:54  23   1
> 514 14655 2010-02-15 18:45:54  45   1
> 516 14655 2010-02-15 18:55:54 200   1
> 518 14655 2010-02-15 19:05:54 200   1
> 520 14655 2010-02-15 19:15:54 200   1
> 522 14655 2010-02-15 19:25:54 199   1
> 524 14655 2010-02-15 19:35:54 150   1
> 526 14655 2010-02-15 19:45:54   0   1
> 528 14655 2010-02-15 19:55:54   0   1
> 530 14655 2010-02-15 20:05:54   0   0
> 532 14655 2010-02-15 20:15:54   0   0
> 534 14655 2010-02-15 20:25:54  34   0
> 536 14655 2010-02-15 20:35:54 200   0
> 538 14655 2010-02-15 20:45:54 200   0
> 540 14655 2010-02-15 20:55:54 145   0
> 
> 
> What I would like to calculate is the number of consecutive occurrences
> of values 200,  0 and together values from 1 til 199 (in fact the
> values that differ from 200 and 0) in column "act".
> 
> I would like to get something like this (result$res)
> 
> > result
>   jultime act day res res2
> 510 14655 2010-02-15 18:25:54 130   1   33
> 512 14655 2010-02-15 18:35:54  23   1   33
> 514 14655 2010-02-15 18:45:54  45   1   33
> 516 14655 2010-02-15 18:55:54 200   1   33
> 518 14655 2010-02-15 19:05:54 200   1   33
> 520 14655 2010-02-15 19:15:54 200   1   33
> 522 14655 2010-02-15 19:25:54 199   1   22
> 524 14655 2010-02-15 19:35:54 150   1   22
> 526 14655 2010-02-15 19:45:54   0   1   42
> 528 14655 2010-02-15 19:55:54   0   1   42
> 530 14655 2010-02-15 20:05:54   0   0   42
> 532 14655 2010-02-15 20:15:54   0   0   42
> 534 14655 2010-02-15 20:25:54  34   0   11
> 536 14655 2010-02-15 20:35:54 200   0   22
> 538 14655 2010-02-15 20:45:54 200   0   22
> 540 14655 2010-02-15 20:55:54 145   0   11
> 
> And if possible, distinguish among day==1 and day==0 (see the "act"
> values of 0 for example), results as in result$res2.
> 
> After it I would like to make a resume table per days (jul):
> where maxres is max(result$res) for the "act" value where minres is
> min(result$res) for the "act" value where sumres is sum(result$res) for
> the "act" value (for example, if the 200 value ocurrs in different
> times per day(jul) consecutively 3, 5, 1, 6 and 7 times the sumr

Re: [R] Counting number of consecutive occurrences per rows

2013-04-29 Thread jim holtman
try this:

> test <- structure(list(jul = structure(c(14655, 14655, 14655, 14655,
+ 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655,
+ 14655, 14655, 14655), origin = structure(0, class = "Date")),
+ time = structure(c(1266258354, 1266258954, 1266259554, 1266260154,
+ 1266260754, 1266261354, 1266261954, 1266262554, 1266263154,
+ 1266263754, 1266264354, 1266264954, 1266265554, 1266266154,
+ 1266266754, 1266267354), class = c("POSIXct", "POSIXt"), tzone =
+ "GMT"),
+ act = c(130, 23, 45, 200, 200, 200, 199, 150, 0, 0, 0, 0,
+ 34, 200, 200, 145), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 0, 0, 0, 0, 0, 0)), .Names = c("jul", "time", "act", "day"
+ ), class = "data.frame", row.names = c(510L, 512L, 514L, 516L,
+ 518L, 520L, 522L, 524L, 526L, 528L, 530L, 532L, 534L, 536L, 538L,
+ 540L))
>
> # add key to separate data
> test$key <- ifelse(test$act == 0
+ , 1L  # 0
+ , ifelse(test$act == 200
+ , 3L  # 200
+ , 2L  # 1-199
+ )
+ )
> # mark changes in sequence
> test$resChange <- cumsum(c(1L, abs(diff(test$key
> test$res <- ave(test$resChange, test$resChange, FUN = length)
>
> test$res2 <- ave(test$resChange, test$resChange, test$day, FUN = length)
>
> test
  jultime act day key resChange res res2
510 14655 2010-02-15 18:25:54 130   1   2 1   33
512 14655 2010-02-15 18:35:54  23   1   2 1   33
514 14655 2010-02-15 18:45:54  45   1   2 1   33
516 14655 2010-02-15 18:55:54 200   1   3 2   33
518 14655 2010-02-15 19:05:54 200   1   3 2   33
520 14655 2010-02-15 19:15:54 200   1   3 2   33
522 14655 2010-02-15 19:25:54 199   1   2 3   22
524 14655 2010-02-15 19:35:54 150   1   2 3   22
526 14655 2010-02-15 19:45:54   0   1   1 4   42
528 14655 2010-02-15 19:55:54   0   1   1 4   42
530 14655 2010-02-15 20:05:54   0   0   1 4   42
532 14655 2010-02-15 20:15:54   0   0   1 4   42
534 14655 2010-02-15 20:25:54  34   0   2 5   11
536 14655 2010-02-15 20:35:54 200   0   3 6   22
538 14655 2010-02-15 20:45:54 200   0   3 6   22
540 14655 2010-02-15 20:55:54 145   0   2 7   11
>



On Mon, Apr 29, 2013 at 6:44 AM, zuzana zajkova  wrote:

> Hi,
>
> I would appreciate if somebody could help me with following calculation.
> I have a dataframe, by 10 minutes time, for mostly one year data. This is
> small example:
>
> > dput(test)
> structure(list(jul = structure(c(14655, 14655, 14655, 14655,
> 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655,
> 14655, 14655, 14655), origin = structure(0, class = "Date")),
> time = structure(c(1266258354, 1266258954, 1266259554, 1266260154,
> 1266260754, 1266261354, 1266261954, 1266262554, 1266263154,
> 1266263754, 1266264354, 1266264954, 1266265554, 1266266154,
> 1266266754, 1266267354), class = c("POSIXct", "POSIXt"), tzone =
> "GMT"),
> act = c(130, 23, 45, 200, 200, 200, 199, 150, 0, 0, 0, 0,
> 34, 200, 200, 145), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 0, 0, 0, 0, 0, 0)), .Names = c("jul", "time", "act", "day"
> ), class = "data.frame", row.names = c(510L, 512L, 514L, 516L,
> 518L, 520L, 522L, 524L, 526L, 528L, 530L, 532L, 534L, 536L, 538L,
> 540L))
>
> Looks like this:
>
> > test
>  jultime act day
> 510 14655 2010-02-15 18:25:54 130   1
> 512 14655 2010-02-15 18:35:54  23   1
> 514 14655 2010-02-15 18:45:54  45   1
> 516 14655 2010-02-15 18:55:54 200   1
> 518 14655 2010-02-15 19:05:54 200   1
> 520 14655 2010-02-15 19:15:54 200   1
> 522 14655 2010-02-15 19:25:54 199   1
> 524 14655 2010-02-15 19:35:54 150   1
> 526 14655 2010-02-15 19:45:54   0   1
> 528 14655 2010-02-15 19:55:54   0   1
> 530 14655 2010-02-15 20:05:54   0   0
> 532 14655 2010-02-15 20:15:54   0   0
> 534 14655 2010-02-15 20:25:54  34   0
> 536 14655 2010-02-15 20:35:54 200   0
> 538 14655 2010-02-15 20:45:54 200   0
> 540 14655 2010-02-15 20:55:54 145   0
>
>
> What I would like to calculate is the number of consecutive occurrences of
> values 200,  0 and together values from 1 til 199 (in fact the values that
> differ from 200 and 0) in column "act".
>
> I would like to get something like this (result$res)
>
> > result
>   jultime act day res res2
> 510 14655 2010-02-15 18:25:54 130   1   33
> 512 14655 2010-02-15 18:35:54  23   1   33
> 514 14655 2010-02-15 18:45:54  45   1   33
> 516 14655 2010-02-15 18:55:54 200   1   33
> 518 14655 2010-02-15 19:05:54 200   1   33
> 520 14655 2010-02-15 19:15:54 200   1   33
> 522 14655 2010-02-15 19:25:54 199   1   22
> 524 14655 2010-02-15 19:35:54 150   1   22
> 526 14655 2010-02-15 19:45:54   0   1   42
> 528 14655 2010-02-15 19:55:54   0   1   42
> 530 14655 2010-

Re: [R] Counting number of consecutive occurrences per rows

2013-04-29 Thread jim holtman
Forgot the last part of the question:

> test <- structure(list(jul = structure(c(14655, 14655, 14655, 14655,
+ 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655,
+ 14655, 14655, 14655), origin = structure(0, class = "Date")),
+ time = structure(c(1266258354, 1266258954, 1266259554, 1266260154,
+ 1266260754, 1266261354, 1266261954, 1266262554, 1266263154,
+ 1266263754, 1266264354, 1266264954, 1266265554, 1266266154,
+ 1266266754, 1266267354), class = c("POSIXct", "POSIXt"), tzone =
+ "GMT"),
+ act = c(130, 23, 45, 200, 200, 200, 199, 150, 0, 0, 0, 0,
+ 34, 200, 200, 145), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 0, 0, 0, 0, 0, 0)), .Names = c("jul", "time", "act", "day"
+ ), class = "data.frame", row.names = c(510L, 512L, 514L, 516L,
+ 518L, 520L, 522L, 524L, 526L, 528L, 530L, 532L, 534L, 536L, 538L,
+ 540L))
>
> # add key to separate data
> test$key <- ifelse(test$act == 0
+ , 1L  # 0
+ , ifelse(test$act == 200
+ , 3L  # 200
+ , 2L  # 1-199
+ )
+ )
> # mark changes in sequence
> test$resChange <- cumsum(c(1L, abs(diff(test$key
> test$res <- ave(test$resChange, test$resChange, FUN = length)
>
> test$res2 <- ave(test$resChange, test$resChange, test$day, FUN = length)
>
> require(data.table)  # use this for aggregation
> test <- data.table(test)
> testResume <- test[
+ , list(maxres = max(res)
+ , minres = min(res)
+ , sumres = length(unique(resChange))
+ )
+ , keyby = c('day', 'key')
+ ]
> # change 'key'
> testResume$key <- c('0', '1-199', '200')[testResume$key]
> testResume
   day   key maxres minres sumres
1:   0 0  4  4  1
2:   0 1-199  1  1  2
3:   0   200  2  2  1
4:   1 0  4  4  1
5:   1 1-199  3  2  2
6:   1   200  3  3  1
>



On Mon, Apr 29, 2013 at 6:44 AM, zuzana zajkova  wrote:

> Hi,
>
> I would appreciate if somebody could help me with following calculation.
> I have a dataframe, by 10 minutes time, for mostly one year data. This is
> small example:
>
> > dput(test)
> structure(list(jul = structure(c(14655, 14655, 14655, 14655,
> 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655,
> 14655, 14655, 14655), origin = structure(0, class = "Date")),
> time = structure(c(1266258354, 1266258954, 1266259554, 1266260154,
> 1266260754, 1266261354, 1266261954, 1266262554, 1266263154,
> 1266263754, 1266264354, 1266264954, 1266265554, 1266266154,
> 1266266754, 1266267354), class = c("POSIXct", "POSIXt"), tzone =
> "GMT"),
> act = c(130, 23, 45, 200, 200, 200, 199, 150, 0, 0, 0, 0,
> 34, 200, 200, 145), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 0, 0, 0, 0, 0, 0)), .Names = c("jul", "time", "act", "day"
> ), class = "data.frame", row.names = c(510L, 512L, 514L, 516L,
> 518L, 520L, 522L, 524L, 526L, 528L, 530L, 532L, 534L, 536L, 538L,
> 540L))
>
> Looks like this:
>
> > test
>  jultime act day
> 510 14655 2010-02-15 18:25:54 130   1
> 512 14655 2010-02-15 18:35:54  23   1
> 514 14655 2010-02-15 18:45:54  45   1
> 516 14655 2010-02-15 18:55:54 200   1
> 518 14655 2010-02-15 19:05:54 200   1
> 520 14655 2010-02-15 19:15:54 200   1
> 522 14655 2010-02-15 19:25:54 199   1
> 524 14655 2010-02-15 19:35:54 150   1
> 526 14655 2010-02-15 19:45:54   0   1
> 528 14655 2010-02-15 19:55:54   0   1
> 530 14655 2010-02-15 20:05:54   0   0
> 532 14655 2010-02-15 20:15:54   0   0
> 534 14655 2010-02-15 20:25:54  34   0
> 536 14655 2010-02-15 20:35:54 200   0
> 538 14655 2010-02-15 20:45:54 200   0
> 540 14655 2010-02-15 20:55:54 145   0
>
>
> What I would like to calculate is the number of consecutive occurrences of
> values 200,  0 and together values from 1 til 199 (in fact the values that
> differ from 200 and 0) in column "act".
>
> I would like to get something like this (result$res)
>
> > result
>   jultime act day res res2
> 510 14655 2010-02-15 18:25:54 130   1   33
> 512 14655 2010-02-15 18:35:54  23   1   33
> 514 14655 2010-02-15 18:45:54  45   1   33
> 516 14655 2010-02-15 18:55:54 200   1   33
> 518 14655 2010-02-15 19:05:54 200   1   33
> 520 14655 2010-02-15 19:15:54 200   1   33
> 522 14655 2010-02-15 19:25:54 199   1   22
> 524 14655 2010-02-15 19:35:54 150   1   22
> 526 14655 2010-02-15 19:45:54   0   1   42
> 528 14655 2010-02-15 19:55:54   0   1   42
> 530 14655 2010-02-15 20:05:54   0   0   42
> 532 14655 2010-02-15 20:15:54   0   0   42
> 534 14655 2010-02-15 20:25:54  34   0   11
> 536 14655 2010-02-15 20:35:54 200   0   22
> 538 14655 2010-02-15 20:45:54 200   0   22
> 540 14655 2010-02-15 20:55:54 145   0   11
>
> And if possible, distinguish among day==1 and day==0 (see the "act" values
> of 0 for example), results as in result$res2.
>
> After it I would like 

[R] Counting number of consecutive occurrences per rows

2013-04-29 Thread zuzana zajkova
Hi,

I would appreciate if somebody could help me with following calculation.
I have a dataframe, by 10 minutes time, for mostly one year data. This is
small example:

> dput(test)
structure(list(jul = structure(c(14655, 14655, 14655, 14655,
14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655,
14655, 14655, 14655), origin = structure(0, class = "Date")),
time = structure(c(1266258354, 1266258954, 1266259554, 1266260154,
1266260754, 1266261354, 1266261954, 1266262554, 1266263154,
1266263754, 1266264354, 1266264954, 1266265554, 1266266154,
1266266754, 1266267354), class = c("POSIXct", "POSIXt"), tzone =
"GMT"),
act = c(130, 23, 45, 200, 200, 200, 199, 150, 0, 0, 0, 0,
34, 200, 200, 145), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
0, 0, 0, 0, 0, 0)), .Names = c("jul", "time", "act", "day"
), class = "data.frame", row.names = c(510L, 512L, 514L, 516L,
518L, 520L, 522L, 524L, 526L, 528L, 530L, 532L, 534L, 536L, 538L,
540L))

Looks like this:

> test
 jultime act day
510 14655 2010-02-15 18:25:54 130   1
512 14655 2010-02-15 18:35:54  23   1
514 14655 2010-02-15 18:45:54  45   1
516 14655 2010-02-15 18:55:54 200   1
518 14655 2010-02-15 19:05:54 200   1
520 14655 2010-02-15 19:15:54 200   1
522 14655 2010-02-15 19:25:54 199   1
524 14655 2010-02-15 19:35:54 150   1
526 14655 2010-02-15 19:45:54   0   1
528 14655 2010-02-15 19:55:54   0   1
530 14655 2010-02-15 20:05:54   0   0
532 14655 2010-02-15 20:15:54   0   0
534 14655 2010-02-15 20:25:54  34   0
536 14655 2010-02-15 20:35:54 200   0
538 14655 2010-02-15 20:45:54 200   0
540 14655 2010-02-15 20:55:54 145   0


What I would like to calculate is the number of consecutive occurrences of
values 200,  0 and together values from 1 til 199 (in fact the values that
differ from 200 and 0) in column "act".

I would like to get something like this (result$res)

> result
  jultime act day res res2
510 14655 2010-02-15 18:25:54 130   1   33
512 14655 2010-02-15 18:35:54  23   1   33
514 14655 2010-02-15 18:45:54  45   1   33
516 14655 2010-02-15 18:55:54 200   1   33
518 14655 2010-02-15 19:05:54 200   1   33
520 14655 2010-02-15 19:15:54 200   1   33
522 14655 2010-02-15 19:25:54 199   1   22
524 14655 2010-02-15 19:35:54 150   1   22
526 14655 2010-02-15 19:45:54   0   1   42
528 14655 2010-02-15 19:55:54   0   1   42
530 14655 2010-02-15 20:05:54   0   0   42
532 14655 2010-02-15 20:15:54   0   0   42
534 14655 2010-02-15 20:25:54  34   0   11
536 14655 2010-02-15 20:35:54 200   0   22
538 14655 2010-02-15 20:45:54 200   0   22
540 14655 2010-02-15 20:55:54 145   0   11

And if possible, distinguish among day==1 and day==0 (see the "act" values
of 0 for example), results as in result$res2.

After it I would like to make a resume table per days (jul):
where maxres is max(result$res) for the "act" value
where minres is min(result$res) for the "act" value
where sumres is sum(result$res) for the "act" value (for example, if the
200 value ocurrs in different times per day(jul) consecutively 3, 5, 1, 6
and 7 times the sumres would be 3+5+1+6+7= 22)

something like this (this are made up numbers):

julact maxres  minres sumres
146550  4   1   25
14655 200 32  48
146551-199   3171
146560   8238
14656 200 15360
146561-199   114 46
...
(theoretically the sum of sumres per day(jul) should be 144)


> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)


I hope my explanation is sufficient. I appreciate any hint.
Thank you,

Zuzana

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.