Re: [R] Peak Over Threshold values
-Original Message- From: Tonja Krueger [mailto:tonja.krue...@web.de] Sent: Monday, May 31, 2010 2:03 AM To: William Dunlap Cc: r-help@r-project.org Subject: RE: [R] Peak Over Threshold values Thanks a lot for your help. That's the time period I was looking for. I've got one more question: for further analyses I need the respective maximum values within these time periods (between the green and red lines). Preferably in combination with the date the maximum event happened. Looping over the intervals found by f() (or f0()) is simple to do and probably quick enough for you. E.g., to get the position of the maximum of each interval use something like positionOfMaxInEachInterval - function (x, intervals) { with(intervals, sapply(seq_along(start), function(i) start[i] + which.max(x[start[i]:stop[i]]) - 1)) } The maxima themselve would be gotten by intervals - f(x, startThreshold, stopThreshold, plot=TRUE) imax - postionOfMaxInEachInterval(x, intervals) # abline(v=imax) if you had plot=TRUE above maxs - x[imax] By the way, the looping form of f() can be made much faster than the f0 I showed earlier, in the case where there are many intervals (e.g., f(sin(1:1e6),.1,0)), by preallocating the output. You may prefer to edit that version to get the positions of the interval maxima as you compute the intervals. f0a - function (x = walevel, startThreshhold = 5.45, stopThreshhold = 5.3, plot = TRUE) { stopifnot(startThreshhold stopThreshhold) inRun - FALSE start - integer(length(x)) stop - integer(length(x)) nStart - 0 nStop - 0 for (i in seq_along(x)) { if (inRun) { if (x[i] stopThreshhold) { nStop - nStop + 1 stop[nStop] - i - 1L inRun - FALSE } } else { if (x[i] startThreshhold) { nStart - nStart + 1 start[nStart] - i inRun - TRUE } } } if (inRun) { nStop - nStop + 1 stop[nStop] - length(x) } if (nStop nStart) { stop - stop[-1] nStop - nStop - 1 } length(stop) - nStop length(start) - nStart if (plot) { plot(x, cex = 0.5) abline(h = c(startThreshhold, stopThreshhold)) abline(v = start, col = green) abline(v = stop, col = red) } data.frame(start = start, stop = stop) } Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com Thank you in advance, Tonja -Ursprüngliche Nachricht- Von: William Dunlap wdun...@tibco.com Gesendet: 27.05.2010 22:13:21 An: Hutchinson,David [PYR] david.hutchin...@ec.gc.ca,Tonja Krueger tonja.krue...@web.de Betreff: RE: [R] Peak Over Threshold values -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of William Dunlap Sent: Thursday, May 27, 2010 12:24 PM To: Hutchinson,David [PYR]; Tonja Krueger Cc: r-help@r-project.org Subject: Re: [R] Peak Over Threshold values Here is another version, loopless, but still a bit clumsy (can the call to which be removed?): Note that avoiding loops is mostly for the aesthetic effect. On my aging laptop the following loopy and vector-growing version takes 3 seconds on a million-long vector while the non-loopy one takes 0.22 seconds (timings done with plot=FALSE). That is a nice ratio but not much of a difference. f0 - function (x = walevel, startThreshhold = 5.45, stopThreshhold = 5.3, plot = TRUE) { stopifnot(startThreshhold stopThreshhold) inRun - FALSE start - integer() stop - integer() for (i in seq_along(x)) { if (inRun) { if (x[i] stopThreshhold) { stop[length(stop) + 1] - i - 1 inRun - FALSE } } else { if (x[i] startThreshhold) { start[length(start) + 1] - i inRun - TRUE } } } if (inRun) stop[length(stop) + 1] - length(x) if (length(stop) length(start)) stop - stop[-1] if (plot) { plot(x, cex = 0.5) abline(h = c(startThreshhold, stopThreshhold)) abline(v = start, col = green) abline(v = stop, col = red) } data.frame(start = start, stop = stop) } f - function (x = walevel, startThreshhold = 5.45, stopThreshhold = 5.3, plot = TRUE) { stopifnot(startThreshhold stopThreshhold) isFirstInRun - function(x) c(TRUE, x[-1] != x[-length(x)]) isLastInRun - function(x) c(x[-1] != x[-length(x)], TRUE) isOverStart - x = startThreshhold isOverStop - x = stopThreshhold possibleStartPt - which(isFirstInRun(isOverStart) isOverStart) possibleStopPt - which(isLastInRun(isOverStop
Re: [R] Peak Over Threshold values
Thanks a lot for your help. That’s the time period I was looking for. I’ve got one more question: for further analyses I need the respective maximum values within these time periods (between the green and red lines). Preferably in combination with the date the maximum event happened. Thank you in advance, Tonja -Ursprüngliche Nachricht- Von: William Dunlap wdun...@tibco.com Gesendet: 27.05.2010 22:13:21 An: Hutchinson,David [PYR] david.hutchin...@ec.gc.ca,Tonja Krueger tonja.krue...@web.de Betreff: RE: [R] Peak Over Threshold values -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of William Dunlap Sent: Thursday, May 27, 2010 12:24 PM To: Hutchinson,David [PYR]; Tonja Krueger Cc: r-help@r-project.org Subject: Re: [R] Peak Over Threshold values Here is another version, loopless, but still a bit clumsy (can the call to which be removed?): Note that avoiding loops is mostly for the aesthetic effect. On my aging laptop the following loopy and vector-growing version takes 3 seconds on a million-long vector while the non-loopy one takes 0.22 seconds (timings done with plot=FALSE). That is a nice ratio but not much of a difference. f0 - function (x = walevel, startThreshhold = 5.45, stopThreshhold = 5.3, plot = TRUE) { stopifnot(startThreshhold stopThreshhold) inRun - FALSE start - integer() stop - integer() for (i in seq_along(x)) { if (inRun) { if (x[i] stopThreshhold) { stop[length(stop) + 1] - i - 1 inRun - FALSE } } else { if (x[i] startThreshhold) { start[length(start) + 1] - i inRun - TRUE } } } if (inRun) stop[length(stop) + 1] - length(x) if (length(stop) length(start)) stop - stop[-1] if (plot) { plot(x, cex = 0.5) abline(h = c(startThreshhold, stopThreshhold)) abline(v = start, col = green) abline(v = stop, col = red) } data.frame(start = start, stop = stop) } f - function (x = walevel, startThreshhold = 5.45, stopThreshhold = 5.3, plot = TRUE) { stopifnot(startThreshhold stopThreshhold) isFirstInRun - function(x) c(TRUE, x[-1] != x[-length(x)]) isLastInRun - function(x) c(x[-1] != x[-length(x)], TRUE) isOverStart - x = startThreshhold isOverStop - x = stopThreshhold possibleStartPt - which(isFirstInRun(isOverStart) isOverStart) possibleStopPt - which(isLastInRun(isOverStop) isOverStop) pts - c(possibleStartPt, possibleStopPt) names(pts) - rep(c(start, stop), c(length(possibleStartPt), length(possibleStopPt))) pts - pts[order(pts)] tmp - isFirstInRun(names(pts)) start - pts[tmp names(pts) == start] stop - pts[tmp names(pts) == stop] if (length(stop) length(start)) { # i.e., when first downcrossing happens before # first upcrossing stop - stop[-1] } if (plot) { plot(x, cex = 0.5) abline(h = c(startThreshhold, stopThreshhold)) abline(v = start, col = green) abline(v = stop, col = red) } data.frame(start = start, stop = stop) } # define the data in the original message and call f(). The isFirstInRun and isLastInRun functions do the first part of the calculation that rle does and avoid the cumsum(diff()) roundtrip that cumsum(rle()$lengths) involves and their names make it clearer what I'm trying to do. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Hutchinson,David [PYR] Sent: Thursday, May 27, 2010 10:41 AM To: Tonja Krueger Cc: r-help@r-project.org Subject: Re: [R] Peak Over Threshold values Perhaps not elegant, but does the job threshold - 5.45 meetThreshold - walevel = threshold d - rle(meetThreshold) startPos - c(1, 1 + cumsum(d$lengths[-length(d$lengths)])) abv - which(d$values == TRUE) p.o.t - data.frame() for (i in seq( along = abv ) ) { a - startPos[abv[i]] b - a + (d$lengths[abv[i]] - 1) e - which.max(walevel[a:b]) p.o.t - rbind( p.o.t, data.frame( pos = a + e - 1, val = walevel[a:b][e] ) ) } plot ( day, walevel, type = 'l' ) points( p.o.t$pos, p.o.t$val, col = 'red' ) abline(h = threshold, lty = 2, col = 'red') -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Tonja Krueger Sent: Thursday, May 27, 2010 1:47 AM To: Vito Muggeo (UniPa); Clint Bowman Cc: r-help@r-project.org Subject: [R] Peak Over Threshold values I'm sorry, but that's not exactly
[R] Peak Over Threshold values
I’m sorry, but that‘s not exactly what I was looking for. I obviously didn’t explain properly: Within my dataframe (df) I would like to find POT values that are not linked. In my definition two maximum values are linked if walevel does not fall below a certain value (the lower threshold value). walevel - c(5.75, 5.75, 5.75, 5.76, 5.80, 5.82, 5.85, 5.86, 5.91, 5.91, 5.88, 5.92, 5.99, 6.02, 6.00, 6.08, 6.11, 6.10, 6.10, 6.11, 6.14, 6.12, 6.15, 6.17, 6.15, 6.08, 6.01, 5.95, 5.89, 5.83, 5.77, 5.72, 5.70, 5.65, 5.59, 5.51, 5.43, 5.33, 5.30, 5.25, 5.22, 5.21, 5.19, 5.19, 5.19, 5.13, 5.15, 5.07, 5.04, 5.02, 4.99, 5.05, 5.07, 5.12, 5.17, 5.22, 5.32, 5.29, 5.33, 5.33, 5.36, 5.37, 5.42, 5.43, 5.48, 5.55, 5.57, 5.66, 5.68, 5.72, 5.77, 5.79, 5.81, 5.80, 5.80, 5.77, 5.72, 5.70, 5.66, 5.62, 5.62, 5.61, 5.59, 5.57, 5.51, 5.47, 5.41, 5.39, 5.40, 5.40, 5.42, 5.45, 5.47, 5.50, 5.50, 5.49, 5.43, 5.39, 5.33, 5.26) day - c(1:100) df - data.frame(day,walevel) For example in my dataframe a threshold value = 5.45 and an lower threshold value = 5.3 should lead to two maximum values because between the 2^nd and 3^rd peak walevel does not fall below the lower threshold value. With “clusters (…)” from “evd package”, I could find out POT values but even though I set a lower threshold value for my example dataframe I would get three maximum values instead of two. library(evd) clusters(df$walevel,u =5.45, ulow= 5.3, r = 1, cmax=T) clusters(df$walevel,u =5.45, ulow= 5.3, r = 1, plot=T) Changing r to 30 (for example) connects the 2^nd and 3^rd maximum events and gives out the right ‘end’ for the first extreme event but not for the second event. (Also I’d rather not use a time interval to prove that the events are linked) clusters(df$walevel,u =5.45, ulow= 5.3, r = 30, cmax=T) clusters(df$walevel,u =5.45, ulow= 5.3, r = 30, plot=T) What went wrong??? Tonja -Ursprüngliche Nachricht- How about? hi.rle-rle(walevel5.79) lo.rle-rle(walevel5.36) plot(walevel) abline(h=5.8,col=2,lty=3) abline(h=5.35,col=3,lty=3) hi.lo.rle-sort(c(cumsum(hi.rle$lengths),cumsum(lo.rle$lengths))) abline(v=hi.lo.rle) You can use the $values from the rle to sort things out. Probably want to ignore the end point at length(walevel). -- Clint Bowman INTERNET: cl...@ecy.wa.gov Air Quality Modeler INTERNET: cl...@math.utah.edu Department of Ecology VOICE: (360) 407-6815 PO Box 47600 FAX: (360) 407-7534 Olympia, WA 98504-7600 On Wed, 26 May 2010, Vito Muggeo (UniPa) wrote: dear Tonja, By plotting your data plot(df) it seems to me that you are looking for a piecewise linear relationships. If this is the case, have a look to the package segmented. You have to specify or not the number and the starting values for the breakpoints library(segmented) olm-lm(walevel~day) #specify number and starting values for the breakpoints.. oseg-segmented(olm, seg.Z=~day, psi=c(20,50,80)) plot(oseg,add=TRUE,col=2) oseg$psi #or you can avoid to specify starting values for psi oseg-segmented(olm, seg.Z=~day, psi=NA,control=seg.control(stop.if.error=FALSE,K=30)) plot(oseg,add=TRUE,col=2) oseg$psi best, vito Tonja Krueger ha scritto: Dear List I hope you can help me: I’ve got a dataframe (df) within which I am looking for Peak Over Threshold values as well as the length of the events. An event starts when walevel equals 5.8 and it should end when walevel equals the lower threshold value (5.35). I tried “clusters (…)” from “evd package”, and varied r (see example) but it did not work for all events (again see example). walevel - c(5.75, 5.75, 5.75, 5.76, 5.80, 5.82, 5.85, 5.86, 5.91, 5.91, 5.88, 5.92, 5.99, 6.02, 6.00, 6.08, 6.11, 6.10, 6.10, 6.11, 6.14, 6.12, 6.15, 6.17, 6.15, 6.08, 6.01, 5.95, 5.89, 5.83, 5.77, 5.72, 5.70, 5.65, 5.59, 5.51, 5.43, 5.33, 5.30, 5.25, 5.22, 5.21, 5.19, 5.19, 5.19, 5.13, 5.15, 5.07, 5.04, 5.02, 4.99, 5.05, 5.07, 5.12, 5.17, 5.22, 5.32, 5.29, 5.33, 5.33, 5.36, 5.37, 5.42, 5.43, 5.48, 5.55, 5.57, 5.66, 5.68, 5.72, 5.77, 5.79, 5.81, 5.80, 5.80, 5.77, 5.72, 5.70, 5.66, 5.62, 5.62, 5.61, 5.59, 5.57, 5.51, 5.47, 5.41, 5.39, 5.40, 5.40, 5.42, 5.45, 5.47, 5.50, 5.50, 5.49, 5.43, 5.39, 5.33, 5.26) day - c(1:100) df - data.frame(day,walevel) library(evd) clusters(df$walevel, u = 5.80, r = 1, ulow = 5.35, cmax = T, plot = T) clusters(df$walevel, u = 5.80, r = 50, ulow = 5.35, cmax = T, plot = T) What have I done wrong? Tonja __ R-help@r-project.org mailing list [1]https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
Re: [R] Peak Over Threshold values
Perhaps not elegant, but does the job threshold - 5.45 meetThreshold - walevel = threshold d - rle(meetThreshold) startPos - c(1, 1 + cumsum(d$lengths[-length(d$lengths)])) abv - which(d$values == TRUE) p.o.t - data.frame() for (i in seq( along = abv ) ) { a - startPos[abv[i]] b - a + (d$lengths[abv[i]] - 1) e - which.max(walevel[a:b]) p.o.t - rbind( p.o.t, data.frame( pos = a + e - 1, val = walevel[a:b][e] ) ) } plot ( day, walevel, type = 'l' ) points( p.o.t$pos, p.o.t$val, col = 'red' ) abline(h = threshold, lty = 2, col = 'red') -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Tonja Krueger Sent: Thursday, May 27, 2010 1:47 AM To: Vito Muggeo (UniPa); Clint Bowman Cc: r-help@r-project.org Subject: [R] Peak Over Threshold values I'm sorry, but that's not exactly what I was looking for. I obviously didn't explain properly: Within my dataframe (df) I would like to find POT values that are not linked. In my definition two maximum values are linked if walevel does not fall below a certain value (the lower threshold value). walevel - c(5.75, 5.75, 5.75, 5.76, 5.80, 5.82, 5.85, 5.86, 5.91, 5.91, 5.88, 5.92, 5.99, 6.02, 6.00, 6.08, 6.11, 6.10, 6.10, 6.11, 6.14, 6.12, 6.15, 6.17, 6.15, 6.08, 6.01, 5.95, 5.89, 5.83, 5.77, 5.72, 5.70, 5.65, 5.59, 5.51, 5.43, 5.33, 5.30, 5.25, 5.22, 5.21, 5.19, 5.19, 5.19, 5.13, 5.15, 5.07, 5.04, 5.02, 4.99, 5.05, 5.07, 5.12, 5.17, 5.22, 5.32, 5.29, 5.33, 5.33, 5.36, 5.37, 5.42, 5.43, 5.48, 5.55, 5.57, 5.66, 5.68, 5.72, 5.77, 5.79, 5.81, 5.80, 5.80, 5.77, 5.72, 5.70, 5.66, 5.62, 5.62, 5.61, 5.59, 5.57, 5.51, 5.47, 5.41, 5.39, 5.40, 5.40, 5.42, 5.45, 5.47, 5.50, 5.50, 5.49, 5.43, 5.39, 5.33, 5.26) day - c(1:100) df - data.frame(day,walevel) For example in my dataframe a threshold value = 5.45 and an lower threshold value = 5.3 should lead to two maximum values because between the 2^nd and 3^rd peak walevel does not fall below the lower threshold value. With clusters (...) from evd package, I could find out POT values but even though I set a lower threshold value for my example dataframe I would get three maximum values instead of two. library(evd) clusters(df$walevel,u =5.45, ulow= 5.3, r = 1, cmax=T) clusters(df$walevel,u =5.45, ulow= 5.3, r = 1, plot=T) Changing r to 30 (for example) connects the 2^nd and 3^rd maximum events and gives out the right 'end' for the first extreme event but not for the second event. (Also I'd rather not use a time interval to prove that the events are linked) clusters(df$walevel,u =5.45, ulow= 5.3, r = 30, cmax=T) clusters(df$walevel,u =5.45, ulow= 5.3, r = 30, plot=T) What went wrong??? Tonja -Ursprüngliche Nachricht- How about? hi.rle-rle(walevel5.79) lo.rle-rle(walevel5.36) plot(walevel) abline(h=5.8,col=2,lty=3) abline(h=5.35,col=3,lty=3) hi.lo.rle-sort(c(cumsum(hi.rle$lengths),cumsum(lo.rle$lengths))) abline(v=hi.lo.rle) You can use the $values from the rle to sort things out. Probably want to ignore the end point at length(walevel). -- Clint Bowman INTERNET: cl...@ecy.wa.gov Air Quality Modeler INTERNET: cl...@math.utah.edu Department of Ecology VOICE: (360) 407-6815 PO Box 47600 FAX: (360) 407-7534 Olympia, WA 98504-7600 On Wed, 26 May 2010, Vito Muggeo (UniPa) wrote: dear Tonja, By plotting your data plot(df) it seems to me that you are looking for a piecewise linear relationships. If this is the case, have a look to the package segmented. You have to specify or not the number and the starting values for the breakpoints library(segmented) olm-lm(walevel~day) #specify number and starting values for the breakpoints.. oseg-segmented(olm, seg.Z=~day, psi=c(20,50,80)) plot(oseg,add=TRUE,col=2) oseg$psi #or you can avoid to specify starting values for psi oseg-segmented(olm, seg.Z=~day, psi=NA,control=seg.control(stop.if.error=FALSE,K=30)) plot(oseg,add=TRUE,col=2) oseg$psi best, vito Tonja Krueger ha scritto: Dear List I hope you can help me: I've got a dataframe (df) within which I am looking for Peak Over Threshold values as well as the length of the events. An event starts when walevel equals 5.8 and it should end when walevel equals the lower threshold value (5.35). I tried clusters (...) from evd package, and varied r (see example) but it did not work for all events (again see example). walevel - c(5.75, 5.75, 5.75, 5.76, 5.80, 5.82, 5.85, 5.86, 5.91, 5.91, 5.88, 5.92, 5.99, 6.02, 6.00, 6.08, 6.11, 6.10, 6.10, 6.11, 6.14, 6.12, 6.15, 6.17, 6.15, 6.08, 6.01, 5.95, 5.89, 5.83, 5.77, 5.72, 5.70, 5.65, 5.59, 5.51, 5.43, 5.33, 5.30, 5.25
Re: [R] Peak Over Threshold values
Here is another version, loopless, but still a bit clumsy (can the call to which be removed?): f - function (x = walevel, startThreshhold = 5.45, stopThreshhold = 5.3, plot = TRUE) { stopifnot(startThreshhold stopThreshhold) isFirstInRun - function(x) c(TRUE, x[-1] != x[-length(x)]) isLastInRun - function(x) c(x[-1] != x[-length(x)], TRUE) isOverStart - x = startThreshhold isOverStop - x = stopThreshhold possibleStartPt - which(isFirstInRun(isOverStart) isOverStart) possibleStopPt - which(isLastInRun(isOverStop) isOverStop) pts - c(possibleStartPt, possibleStopPt) names(pts) - rep(c(start, stop), c(length(possibleStartPt), length(possibleStopPt))) pts - pts[order(pts)] tmp - isFirstInRun(names(pts)) start - pts[tmp names(pts) == start] stop - pts[tmp names(pts) == stop] if (length(stop) length(start)) { # i.e., when first downcrossing happens before # first upcrossing stop - stop[-1] } if (plot) { plot(x, cex = 0.5) abline(h = c(startThreshhold, stopThreshhold)) abline(v = start, col = green) abline(v = stop, col = red) } data.frame(start = start, stop = stop) } # define the data in the original message and call f(). The isFirstInRun and isLastInRun functions do the first part of the calculation that rle does and avoid the cumsum(diff()) roundtrip that cumsum(rle()$lengths) involves and their names make it clearer what I'm trying to do. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Hutchinson,David [PYR] Sent: Thursday, May 27, 2010 10:41 AM To: Tonja Krueger Cc: r-help@r-project.org Subject: Re: [R] Peak Over Threshold values Perhaps not elegant, but does the job threshold - 5.45 meetThreshold - walevel = threshold d - rle(meetThreshold) startPos - c(1, 1 + cumsum(d$lengths[-length(d$lengths)])) abv - which(d$values == TRUE) p.o.t - data.frame() for (i in seq( along = abv ) ) { a - startPos[abv[i]] b - a + (d$lengths[abv[i]] - 1) e - which.max(walevel[a:b]) p.o.t - rbind( p.o.t, data.frame( pos = a + e - 1, val = walevel[a:b][e] ) ) } plot ( day, walevel, type = 'l' ) points( p.o.t$pos, p.o.t$val, col = 'red' ) abline(h = threshold, lty = 2, col = 'red') -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Tonja Krueger Sent: Thursday, May 27, 2010 1:47 AM To: Vito Muggeo (UniPa); Clint Bowman Cc: r-help@r-project.org Subject: [R] Peak Over Threshold values I'm sorry, but that's not exactly what I was looking for. I obviously didn't explain properly: Within my dataframe (df) I would like to find POT values that are not linked. In my definition two maximum values are linked if walevel does not fall below a certain value (the lower threshold value). walevel - c(5.75, 5.75, 5.75, 5.76, 5.80, 5.82, 5.85, 5.86, 5.91, 5.91, 5.88, 5.92, 5.99, 6.02, 6.00, 6.08, 6.11, 6.10, 6.10, 6.11, 6.14, 6.12, 6.15, 6.17, 6.15, 6.08, 6.01, 5.95, 5.89, 5.83, 5.77, 5.72, 5.70, 5.65, 5.59, 5.51, 5.43, 5.33, 5.30, 5.25, 5.22, 5.21, 5.19, 5.19, 5.19, 5.13, 5.15, 5.07, 5.04, 5.02, 4.99, 5.05, 5.07, 5.12, 5.17, 5.22, 5.32, 5.29, 5.33, 5.33, 5.36, 5.37, 5.42, 5.43, 5.48, 5.55, 5.57, 5.66, 5.68, 5.72, 5.77, 5.79, 5.81, 5.80, 5.80, 5.77, 5.72, 5.70, 5.66, 5.62, 5.62, 5.61, 5.59, 5.57, 5.51, 5.47, 5.41, 5.39, 5.40, 5.40, 5.42, 5.45, 5.47, 5.50, 5.50, 5.49, 5.43, 5.39, 5.33, 5.26) day - c(1:100) df - data.frame(day,walevel) For example in my dataframe a threshold value = 5.45 and an lower threshold value = 5.3 should lead to two maximum values because between the 2^nd and 3^rd peak walevel does not fall below the lower threshold value. With clusters (...) from evd package, I could find out POT values but even though I set a lower threshold value for my example dataframe I would get three maximum values instead of two. library(evd) clusters(df$walevel,u =5.45, ulow= 5.3, r = 1, cmax=T) clusters(df$walevel,u =5.45, ulow= 5.3, r = 1, plot=T) Changing r to 30 (for example) connects the 2^nd and 3^rd maximum events and gives out the right 'end' for the first extreme event but not for the second event. (Also I'd rather not use a time interval to prove that the events are linked) clusters(df$walevel,u =5.45, ulow= 5.3, r = 30, cmax=T) clusters(df$walevel,u =5.45, ulow= 5.3, r = 30, plot=T) What went wrong??? Tonja -Ursprüngliche Nachricht- How about? hi.rle-rle(walevel5.79) lo.rle-rle(walevel5.36) plot(walevel) abline(h
Re: [R] Peak Over Threshold values
-Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of William Dunlap Sent: Thursday, May 27, 2010 12:24 PM To: Hutchinson,David [PYR]; Tonja Krueger Cc: r-help@r-project.org Subject: Re: [R] Peak Over Threshold values Here is another version, loopless, but still a bit clumsy (can the call to which be removed?): Note that avoiding loops is mostly for the aesthetic effect. On my aging laptop the following loopy and vector-growing version takes 3 seconds on a million-long vector while the non-loopy one takes 0.22 seconds (timings done with plot=FALSE). That is a nice ratio but not much of a difference. f0 - function (x = walevel, startThreshhold = 5.45, stopThreshhold = 5.3, plot = TRUE) { stopifnot(startThreshhold stopThreshhold) inRun - FALSE start - integer() stop - integer() for (i in seq_along(x)) { if (inRun) { if (x[i] stopThreshhold) { stop[length(stop) + 1] - i - 1 inRun - FALSE } } else { if (x[i] startThreshhold) { start[length(start) + 1] - i inRun - TRUE } } } if (inRun) stop[length(stop) + 1] - length(x) if (length(stop) length(start)) stop - stop[-1] if (plot) { plot(x, cex = 0.5) abline(h = c(startThreshhold, stopThreshhold)) abline(v = start, col = green) abline(v = stop, col = red) } data.frame(start = start, stop = stop) } f - function (x = walevel, startThreshhold = 5.45, stopThreshhold = 5.3, plot = TRUE) { stopifnot(startThreshhold stopThreshhold) isFirstInRun - function(x) c(TRUE, x[-1] != x[-length(x)]) isLastInRun - function(x) c(x[-1] != x[-length(x)], TRUE) isOverStart - x = startThreshhold isOverStop - x = stopThreshhold possibleStartPt - which(isFirstInRun(isOverStart) isOverStart) possibleStopPt - which(isLastInRun(isOverStop) isOverStop) pts - c(possibleStartPt, possibleStopPt) names(pts) - rep(c(start, stop), c(length(possibleStartPt), length(possibleStopPt))) pts - pts[order(pts)] tmp - isFirstInRun(names(pts)) start - pts[tmp names(pts) == start] stop - pts[tmp names(pts) == stop] if (length(stop) length(start)) { # i.e., when first downcrossing happens before # first upcrossing stop - stop[-1] } if (plot) { plot(x, cex = 0.5) abline(h = c(startThreshhold, stopThreshhold)) abline(v = start, col = green) abline(v = stop, col = red) } data.frame(start = start, stop = stop) } # define the data in the original message and call f(). The isFirstInRun and isLastInRun functions do the first part of the calculation that rle does and avoid the cumsum(diff()) roundtrip that cumsum(rle()$lengths) involves and their names make it clearer what I'm trying to do. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Hutchinson,David [PYR] Sent: Thursday, May 27, 2010 10:41 AM To: Tonja Krueger Cc: r-help@r-project.org Subject: Re: [R] Peak Over Threshold values Perhaps not elegant, but does the job threshold - 5.45 meetThreshold - walevel = threshold d - rle(meetThreshold) startPos - c(1, 1 + cumsum(d$lengths[-length(d$lengths)])) abv - which(d$values == TRUE) p.o.t - data.frame() for (i in seq( along = abv ) ) { a - startPos[abv[i]] b - a + (d$lengths[abv[i]] - 1) e - which.max(walevel[a:b]) p.o.t - rbind( p.o.t, data.frame( pos = a + e - 1, val = walevel[a:b][e] ) ) } plot ( day, walevel, type = 'l' ) points( p.o.t$pos, p.o.t$val, col = 'red' ) abline(h = threshold, lty = 2, col = 'red') -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Tonja Krueger Sent: Thursday, May 27, 2010 1:47 AM To: Vito Muggeo (UniPa); Clint Bowman Cc: r-help@r-project.org Subject: [R] Peak Over Threshold values I'm sorry, but that's not exactly what I was looking for. I obviously didn't explain properly: Within my dataframe (df) I would like to find POT values that are not linked. In my definition two maximum values are linked if walevel does not fall below a certain value (the lower threshold value). walevel - c(5.75, 5.75, 5.75, 5.76, 5.80, 5.82, 5.85, 5.86, 5.91, 5.91, 5.88, 5.92, 5.99, 6.02, 6.00, 6.08, 6.11, 6.10, 6.10, 6.11, 6.14, 6.12, 6.15, 6.17, 6.15, 6.08, 6.01, 5.95, 5.89, 5.83, 5.77, 5.72, 5.70, 5.65, 5.59, 5.51, 5.43
[R] Peak Over Threshold values
Dear List I hope you can help me: I’ve got a dataframe (df) within which I am looking for Peak Over Threshold values as well as the length of the events. An event starts when walevel equals 5.8 and it should end when walevel equals the lower threshold value (5.35). I tried “clusters (…)” from “evd package”, and varied r (see example) but it did not work for all events (again see example). walevel - c(5.75, 5.75, 5.75, 5.76, 5.80, 5.82, 5.85, 5.86, 5.91, 5.91, 5.88, 5.92, 5.99, 6.02, 6.00, 6.08, 6.11, 6.10, 6.10, 6.11, 6.14, 6.12, 6.15, 6.17, 6.15, 6.08, 6.01, 5.95, 5.89, 5.83, 5.77, 5.72, 5.70, 5.65, 5.59, 5.51, 5.43, 5.33, 5.30, 5.25, 5.22, 5.21, 5.19, 5.19, 5.19, 5.13, 5.15, 5.07, 5.04, 5.02, 4.99, 5.05, 5.07, 5.12, 5.17, 5.22, 5.32, 5.29, 5.33, 5.33, 5.36, 5.37, 5.42, 5.43, 5.48, 5.55, 5.57, 5.66, 5.68, 5.72, 5.77, 5.79, 5.81, 5.80, 5.80, 5.77, 5.72, 5.70, 5.66, 5.62, 5.62, 5.61, 5.59, 5.57, 5.51, 5.47, 5.41, 5.39, 5.40, 5.40, 5.42, 5.45, 5.47, 5.50, 5.50, 5.49, 5.43, 5.39, 5.33, 5.26) day - c(1:100) df - data.frame(day,walevel) library(evd) clusters(df$walevel, u = 5.80, r = 1, ulow = 5.35, cmax = T, plot = T) clusters(df$walevel, u = 5.80, r = 50, ulow = 5.35, cmax = T, plot = T) What have I done wrong? Tonja __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Peak Over Threshold values
dear Tonja, By plotting your data plot(df) it seems to me that you are looking for a piecewise linear relationships. If this is the case, have a look to the package segmented. You have to specify or not the number and the starting values for the breakpoints library(segmented) olm-lm(walevel~day) #specify number and starting values for the breakpoints.. oseg-segmented(olm, seg.Z=~day, psi=c(20,50,80)) plot(oseg,add=TRUE,col=2) oseg$psi #or you can avoid to specify starting values for psi oseg-segmented(olm, seg.Z=~day, psi=NA,control=seg.control(stop.if.error=FALSE,K=30)) plot(oseg,add=TRUE,col=2) oseg$psi best, vito Tonja Krueger ha scritto: Dear List I hope you can help me: I’ve got a dataframe (df) within which I am looking for Peak Over Threshold values as well as the length of the events. An event starts when walevel equals 5.8 and it should end when walevel equals the lower threshold value (5.35). I tried “clusters (…)” from “evd package”, and varied r (see example) but it did not work for all events (again see example). walevel - c(5.75, 5.75, 5.75, 5.76, 5.80, 5.82, 5.85, 5.86, 5.91, 5.91, 5.88, 5.92, 5.99, 6.02, 6.00, 6.08, 6.11, 6.10, 6.10, 6.11, 6.14, 6.12, 6.15, 6.17, 6.15, 6.08, 6.01, 5.95, 5.89, 5.83, 5.77, 5.72, 5.70, 5.65, 5.59, 5.51, 5.43, 5.33, 5.30, 5.25, 5.22, 5.21, 5.19, 5.19, 5.19, 5.13, 5.15, 5.07, 5.04, 5.02, 4.99, 5.05, 5.07, 5.12, 5.17, 5.22, 5.32, 5.29, 5.33, 5.33, 5.36, 5.37, 5.42, 5.43, 5.48, 5.55, 5.57, 5.66, 5.68, 5.72, 5.77, 5.79, 5.81, 5.80, 5.80, 5.77, 5.72, 5.70, 5.66, 5.62, 5.62, 5.61, 5.59, 5.57, 5.51, 5.47, 5.41, 5.39, 5.40, 5.40, 5.42, 5.45, 5.47, 5.50, 5.50, 5.49, 5.43, 5.39, 5.33, 5.26) day - c(1:100) df - data.frame(day,walevel) library(evd) clusters(df$walevel, u = 5.80, r = 1, ulow = 5.35, cmax = T, plot = T) clusters(df$walevel, u = 5.80, r = 50, ulow = 5.35, cmax = T, plot = T) What have I done wrong? Tonja __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Vito M.R. Muggeo Dip.to Sc Statist e Matem `Vianelli' Università di Palermo viale delle Scienze, edificio 13 90128 Palermo - ITALY tel: 091 23895240 fax: 091 485726/485612 http://dssm.unipa.it/vmuggeo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Peak Over Threshold values
How about? hi.rle-rle(walevel5.79) lo.rle-rle(walevel5.36) plot(walevel) abline(h=5.8,col=2,lty=3) abline(h=5.35,col=3,lty=3) hi.lo.rle-sort(c(cumsum(hi.rle$lengths),cumsum(lo.rle$lengths))) abline(v=hi.lo.rle) You can use the $values from the rle to sort things out. Probably want to ignore the end point at length(walevel). -- Clint BowmanINTERNET: cl...@ecy.wa.gov Air Quality Modeler INTERNET: cl...@math.utah.edu Department of Ecology VOICE: (360) 407-6815 PO Box 47600FAX:(360) 407-7534 Olympia, WA 98504-7600 On Wed, 26 May 2010, Vito Muggeo (UniPa) wrote: dear Tonja, By plotting your data plot(df) it seems to me that you are looking for a piecewise linear relationships. If this is the case, have a look to the package segmented. You have to specify or not the number and the starting values for the breakpoints library(segmented) olm-lm(walevel~day) #specify number and starting values for the breakpoints.. oseg-segmented(olm, seg.Z=~day, psi=c(20,50,80)) plot(oseg,add=TRUE,col=2) oseg$psi #or you can avoid to specify starting values for psi oseg-segmented(olm, seg.Z=~day, psi=NA,control=seg.control(stop.if.error=FALSE,K=30)) plot(oseg,add=TRUE,col=2) oseg$psi best, vito Tonja Krueger ha scritto: Dear List I hope you can help me: I’ve got a dataframe (df) within which I am looking for Peak Over Threshold values as well as the length of the events. An event starts when walevel equals 5.8 and it should end when walevel equals the lower threshold value (5.35). I tried “clusters (…)” from “evd package”, and varied r (see example) but it did not work for all events (again see example). walevel - c(5.75, 5.75, 5.75, 5.76, 5.80, 5.82, 5.85, 5.86, 5.91, 5.91, 5.88, 5.92, 5.99, 6.02, 6.00, 6.08, 6.11, 6.10, 6.10, 6.11, 6.14, 6.12, 6.15, 6.17, 6.15, 6.08, 6.01, 5.95, 5.89, 5.83, 5.77, 5.72, 5.70, 5.65, 5.59, 5.51, 5.43, 5.33, 5.30, 5.25, 5.22, 5.21, 5.19, 5.19, 5.19, 5.13, 5.15, 5.07, 5.04, 5.02, 4.99, 5.05, 5.07, 5.12, 5.17, 5.22, 5.32, 5.29, 5.33, 5.33, 5.36, 5.37, 5.42, 5.43, 5.48, 5.55, 5.57, 5.66, 5.68, 5.72, 5.77, 5.79, 5.81, 5.80, 5.80, 5.77, 5.72, 5.70, 5.66, 5.62, 5.62, 5.61, 5.59, 5.57, 5.51, 5.47, 5.41, 5.39, 5.40, 5.40, 5.42, 5.45, 5.47, 5.50, 5.50, 5.49, 5.43, 5.39, 5.33, 5.26) day - c(1:100) df - data.frame(day,walevel) library(evd) clusters(df$walevel, u = 5.80, r = 1, ulow = 5.35, cmax = T, plot = T) clusters(df$walevel, u = 5.80, r = 50, ulow = 5.35, cmax = T, plot = T) What have I done wrong? Tonja __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Peak Over Threshold values/ length of the events
Dear List I hope you can help me: I’ve got a dataframe (df) within which I am looking for Peak Over Threshold values as well as the length of the events. An event starts when walevel equals 5.8 and it should end when walevel equals the lower threshold value (5.35). I tried “clusters (…)” from “evd package”, and varied r (see example) but it did not work for all events (again see example). walevel - c(5.75, 5.75, 5.75, 5.76, 5.80, 5.82, 5.85, 5.86, 5.91, 5.91, 5.88, 5.92, 5.99, 6.02, 6.00, 6.08, 6.11, 6.10, 6.10, 6.11, 6.14, 6.12, 6.15, 6.17, 6.15, 6.08, 6.01, 5.95, 5.89, 5.83, 5.77, 5.72, 5.70, 5.65, 5.59, 5.51, 5.43, 5.33, 5.30, 5.25, 5.22, 5.21, 5.19, 5.19, 5.19, 5.13, 5.15, 5.07, 5.04, 5.02, 4.99, 5.05, 5.07, 5.12, 5.17, 5.22, 5.32, 5.29, 5.33, 5.33, 5.36, 5.37, 5.42, 5.43, 5.48, 5.55, 5.57, 5.66, 5.68, 5.72, 5.77, 5.79, 5.81, 5.80, 5.80, 5.77, 5.72, 5.70, 5.66, 5.62, 5.62, 5.61, 5.59, 5.57, 5.51, 5.47, 5.41, 5.39, 5.40, 5.40, 5.42, 5.45, 5.47, 5.50, 5.50, 5.49, 5.43, 5.39, 5.33, 5.26) day - c(1:100) df - data.frame(day,walevel) library(evd) clusters(df$walevel, u = 5.80, r = 1, ulow = 5.35, cmax = T, plot = T) clusters(df$walevel, u = 5.80, r = 50, ulow = 5.35, cmax = T, plot = T) What have I done wrong? Tonja __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.