[R] ks.test() with 2 samples vs. 1 sample an distr. function
Dear all, I have a question concerning the ks.test() function. I tryed to calculate the example given on the German wikipedia page. xi <- c(9.41,9.92,11.55,11.6,11.73,12,12.06,13.3) I get the right results when I calculate: ks.test(xi,pnorm,11,1) Now the question: shouldn't I obtain the same or a very similar result if I commpare the sample and a calculated sample from the distribution? p<- c(0.125, 0.250, 0.375, 0.500, 0.625, 0.750, 0.875, 0.) x <- qnorm(p,11,1) ks.test(xi,x) Why don't I? Thanks for helping me! Tonja __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] use gcheckbox in gWidgets to switch on/off gframe
Dear All, I’m trying to create a GUI using gWidgets. I would like to have a checkbox to “switch on/off” different frames within the GUI. Ideally if one frame is switched on, all other frames would be switched off. Unfortunatly I only came as far as this: library(gWidgets) Population <- c("A","B","C","D","E","F") w = gwindow("") g1 = ggroup(horizontal = F, cont=w) g2 = ggroup(horizontal = T, cont=g1) glabel("Population:", cont=g2) Station = gcombobox(Population, editable=F, cont=g2, handler=NULL) gseparator(horizontal=T, container=g1, expand=F) gcheckbox("checked", container=g1, handler=function(h,...) { enabled ( frame1 ) <- cat(svalue(h$obj)) }) frame1 <- gframe ( "A:" , cont = g1 , horizontal=FALSE ) lyt1 <- glayout ( cont = frame1) widget_list <- list ( ) lyt1 [1,1] <- "A1:" lyt1 [1,2,expand = TRUE] <- (widget_list$A1 <- gedit(" ", cont=lyt1, handler=NULL)) lyt1 [2,1] <- "A2:" lyt1 [2,2,expand = TRUE] <- (widget_list$A2 <- gedit(" ", cont=lyt1, handler=NULL)) gcheckbox("checked", container=g1, handler=function(h,...) { enabled ( frame2 ) <- cat(svalue(h$obj)) }) frame2 <- gframe ( "B:" , cont = g1 , horizontal=FALSE ) lyt2 <- glayout ( cont = frame2) widget_list <- list ( ) lyt2 [1,1] <- "B1:" lyt2 [1,2, expand = TRUE] <- (widget_list$B1 <- gedit(" ", cont=lyt2, handler=NULL)) When I type in: enabled ( frame2 ) <- F; enabled ( frame2 ) <- T it does what I would like it to do. But when I check the checkbox it will only work once. Thank you for any suggestions! Tonja __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] do calculation within expression(), text in plot
Dear Helpers, I would like to add some text to my plot, containing a variable and the calculated value for the variable. As I would like to produce many plots, I hope this can be done automatically. Unfortunately I can't get R to do the calculation for cor(nou,dak) when I use expression().. text(0,2.2,expression(paste(r[Pearson],=, cor(nou,dak) , sep= )),pos=4, cex=1.5) Is there a command that stops expression() so the result is calculated and printed in the plot? Thank you in advance, Tonja __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fit Gumbel Distribution using Method of Moments
Hi all! I'm sure this is a stupid question but I can't find an answer. How can I fit the Gumbel distribution to my data using The Method of Moments in R? Thank you for helping me, Tonja __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] own function: computing time
That's perfect, thanks a lot! Tonja Gesendet: Mittwoch, 10. Oktober 2012 um 21:37 Uhr Von: William Dunlap wdun...@tibco.com An: tonja.krue...@web.de tonja.krue...@web.de, r-help@r-project.org r-help@r-project.org Betreff: RE: [R] own function: computing time Your original method would be the following function f - function (x, y) { xy - cbind(x, y) outside - function(z) { !any(x z[1] y z[2]) } j - apply(xy, 1, outside) which(j) } and the following one quickly computes the same thing as the above as long as there are no repeated points (if there are repeated points it chooses one of them). f1 - function (x, y) { o - order(x, decreasing = TRUE) yo - y[o] j - logical(length(y)) j[o] - yo == cummax(yo) which(j) } Think of the problem as finding the ladder points (Feller's term) of a sequence of points, the places where the sequence reaches a new high point. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of William Dunlap Sent: Wednesday, October 10, 2012 9:52 AM To: tonja.krue...@web.de; r-help@r-project.org Subject: Re: [R] own function: computing time No, the desired points are not a subset of the convex hull. E.g., x=c(0,1:5), y=c(0,1/(1:5)). Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: William Dunlap Sent: Wednesday, October 10, 2012 9:46 AM To: 'tonja.krue...@web.de'; r-help@r-project.org Subject: RE: [R] own function: computing time Are the points you are looking for (those data points with no other data points above or to the right of them) a subset of the convex hull of the data points? If so, chull(x,y) can quickly give you the points on the convex hull (typically a fairly small number) and you can look through them for the ones you want. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of tonja.krue...@web.de Sent: Wednesday, October 10, 2012 3:16 AM To: r-help@r-project.org Subject: [R] own function: computing time Hi all, I wrote a function that actually does what I want it to do, but it tends to be very slow for large amount of data. On my computer it takes 5.37 seconds for 16000 data points and 21.95 seconds for 32000 data points. As my real data consists of 1800 data points it would take ages to use the function as it is now. Could someone help me to speed up the calculation? Thank you, Tonja system.time({ x - runif(32000) y - runif(32000) xy - cbind(x,y) outer - function(z){ !any(x z[1] y z[2])} j - apply(xy,1, outer) plot(x,y) points(x[j],y[j],col=green) }) __ R-help@r-project.org mailing list [1]https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide [2]http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list [3]https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide [4]http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. References 1. https://stat.ethz.ch/mailman/listinfo/r-help 2. http://www.R-project.org/posting-guide.html 3. https://stat.ethz.ch/mailman/listinfo/r-help 4. http://www.R-project.org/posting-guide.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] own function: computing time
Hi all, I wrote a function that actually does what I want it to do, but it tends to be very slow for large amount of data. On my computer it takes 5.37 seconds for 16000 data points and 21.95 seconds for 32000 data points. As my real data consists of 1800 data points it would take ages to use the function as it is now. Could someone help me to speed up the calculation? Thank you, Tonja system.time({ x - runif(32000) y - runif(32000) xy - cbind(x,y) outer - function(z){ !any(x z[1] y z[2])} j - apply(xy,1, outer) plot(x,y) points(x[j],y[j],col=green) }) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Remove similar rows from matrix
Hi everybody, I have a matrix (mat) from which I want to remove all rows that differ from other rows in that matrix only by having one ore two NA’s instead of a numbers. I would like to remove rows with more NA’s preferably, so in the end the matrix would look like mat2. Has someone done something similar before? Thanks for helping, Tonja Here my example: ex - c(14, 56, 114, 132, 187, 279, 324, 328, 328, 338, 338, 338, 346, 346, 395, 398, 428, 428, 428, 452, 452, 452, NA, 466, 467, 525, 894, 923, 968, 980, 1030, 1117, 1156, NA, 1159, 1166, 1166, 1166, 1171, 1171, 1209, 1211, 1235, 1235, 1235, 1275, 1275, 1275, NA, 1291, 1292, 1378, 829, 851, 880, 893, 929, 1003, 1042, 1045, 1045, 1051, 1051, 1051, 1057, 1057, 1097, 1099, 1119, 1119, 1119, 1147, 1147, 1147, 1147, 1167, 1168, 1235, 494, 510, 533, 538, 567, 623, 657, 660, 660, 666, 666, 666, 671, 671, 699, 702, NA, 722, 722, NA, NA, 744, 744, 759, 760, 816, 276, 293, 312, 318, 338, NA, NA, 418, 418, 424, 424, NA, 429, 429, NA, NA, 468, 468, 468, 490, 490, 490, 490, 508, 509, 568, 674, 696, 726, 734, 774, 851, 893, 896, 896, 903, 903, 903, 908, 908, 944, 947, 966, 966, 966, NA, 998, 998, 998, 1014, 1015, 1091, 421, 446, 472, 490, 510, 582, 624, 627, 627, 633, 633, NA, 640, 640, 669, 671, 685, 685, 685, 716, 716, 716, 716, 736, 737, 798, NA, NA, NA, NA, NA, NA, 74, NA, NA, 82, NA, 82, 86, NA, 104, NA, 114, NA, 114, 119, 119, 119, 119, NA, NA, NA) mat - matrix(example, ncol=8) ex2 - c(14, 56, 114, 132, 187, 279, 324, 328, 338, 346, 395, 398, 428, 452, 466, 467, 525, 894, 923, 968, 980, 1030, 1117, 156, 1159, 1166, 1171, 1209, 1211, 1235, 1275, 1291, 1292, 1378, 829, 851, 880, 893, 929, 1003, 1042, 1045, 1051, 1057, 1097, 1099, 1119, 1147, 1167, 1168, 1235, 494, 510, 533, 538, 567, 623, 657, 660, 666, 671, 699, 702, 722, 744, 759, 760, 816, 276, 293, 312, 318, 338, NA, NA, 418, 424, 429, NA, NA, 468, 490, 508, 509, 568, 674, 696, 726, 734, 774, 851, 893, 896, 903, 908, 944, 947, 966, 998, 1014, 1015, 1091, 421, 446, 472, 490, 510, 582, 624, 627, 633, 640, 669, 671, 685, 716, 736, 737, 798, NA, NA, NA, NA, NA, NA, 74, NA, 82, 86, 104, NA, 114, 119, NA, NA, NA) mat2 - matrix(example2, ncol=8) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] scatterplot3d(); customise axes
Hi all! I’m using scatterplot3d() to show the distribution of data for different locations. As I wound like to show distances between the locations and also label the locations, I was wondering whether there is a function similar to axis() for a 2D plot that works with scatterplot3d()? 2D: a - runif(50) a2 - qnorm(a) b - runif(50) b2 - qnorm(b) c - runif(50) c2 - qnorm(c) data - rbind(cbind(rep(1,50),a,a2),cbind(rep(7,50),b,b2),cbind(rep(10,50),c,c2)) plot(data[,1],data[,2],xaxt=n,xlab=Location,ylab=var 1) axis(1, at= c(1,7,10),labels = c(Loc 1,Loc 2, Loc 3)) 3D: library(scatterplot3d) scatterplot3d(data[,1], data[,2], data[,3],box=T, col.axis=black, angle= 45,grid=T, xlab=Location,zlab=var 2,ylab=var 1) Thank you for your suggestions, Tonja Ihr WEB.DE Postfach immer dabei: die kostenlose WEB.DE Mail App für iPhone und Android. [1]https://produkte.web.de/freemail_mobile_startseite/ References 1. https://produkte.web.de/freemail_mobile_startseite/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] local minima/ maxima
__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] fitting sine wave
Dear R-helpers I have 7 data points that I want to fit a continuous curve to, that should look similar to a sine wave My data points would mark the local minima and maxima respectively. This is what I’ve got so far. And I would keep doing so, but sadly nls() then says that it has reached the maximum number of Iterations… m-c(-0.2061826,0.5888406,0.2026079,1.000,0.2342754,0.6865078,-0.1265754) x - c(1,2,3,4,5,6,7) p - nls(m~k1*x+k2*cos(x)+k3*sin(x)+k4*cos(2*x)+k5*sin(2*x)+k6*cos(3*x),start = list(k1=0,k2=0,k3=0.1,k4=0.1,k5=0,k6=0)) par - c(pk1=summary(p)$parameters[1,1],pk2=summary(p)$parameters[2,1],pk3=summary(p)$parameters[3,1],pk4=summary(p)$parameters[4,1],pk5=summary(p)$parameters[5,1],pk6=summary(p)$parameters[6,1]) xx - seq(1,7,length.out=500) mm - par[1]*xx+par[2]*cos(xx)+par[3]*sin(xx)+par[4]*cos(2*xx)+par[5]*sin(2*xx)+par[6]*cos(3*xx) plot(x,m) points(xx,mm,type=l) I was also thinking of using fft(), but when I use the inverse function I only get my 7 original points back, but no smooth sine function. Thank you for your suggestions. Tonja ___ Schon gehört? WEB.DE hat einen genialen Phishing-Filter in die __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Accelerating the calculation of the moving average
Dear List, I have a data frame with approximately 50 rows that looks like this: Date time value … 19.07.1956 12:00:00 4.84 19.07.1956 13:00:00 4.85 19.07.1956 14:00:00 4.89 19.07.1956 15:00:00 4.94 19.07.1956 16:00:00 4.99 19.07.1956 17:00:00 5.01 19.07.1956 18:00:00 5.04 19.07.1956 19:00:00 5.04 19.07.1956 20:00:00 5.04 19.07.1956 21:00:00 5.02 19.07.1956 22:00:00 5.01 19.07.1956 23:00:00 5.00 20.07.1956 00:00:00 4.99 20.07.1956 01:00:00 4.99 20.07.1956 02:00:00 5.00 20.07.1956 03:00:00 5.03 20.07.1956 04:00:00 5.07 20.07.1956 05:00:00 5.10 20.07.1956 06:00:00 5.14 20.07.1956 07:00:00 5.14 20.07.1956 08:00:00 5.11 20.07.1956 09:00:00 5.08 20.07.1956 10:00:00 5.03 20.07.1956 11:00:00 4.98 20.07.1956 12:00:00 4.94 20.07.1956 13:00:00 4.93 … I want to calculate the moving average of the right column. I tried: dat$index-1:length(dat$Zeit) qs- 43800 erg-c() for (y in min(dat$index):max(dat$index)){ m- mean(dat[(dat$index=y)(dat$index=y+qs+1),3]) erg-c(erg,m) } It does works, but it takes ages. Is there a faster way to compute the moving average? Thank you, Tonja Krueger ___ Handy Internet-Flat ¿ gratis ¿ mit WEB.DE FreePhone __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] fit two-parameter lognormal distribution, l-moments
Dear R helpers, I would like to fit a two-parameter lognormal distribution to my data using l-moments. Is there a package that provides this feature? I used the “lmom”-package to fit the three-parameter lognormal distribution to my data as shown beneath. I would like something similar for the two-parameter lognormal distribution. library(lmom) data-c(6.8044, 6.4422, 6.0900, 6.3978, 6.2156, 5.8734, 6.3112, 6.1590, 6.2368, 6.1746, 6.0124, 6.2202, 5.7680, 5.8958, 6.5836, 5.9614, 5.9892, 6.2870) y.data-seq(5.6,6.8,0.01) lmom -samlmu(data) lfit.lognorm- pelln3(lmom) lcdfln3- cdfln3(y.data,lfit.lognorm) Thank you, Tonja ___ NEU: FreePhone - kostenlos mobil telefonieren und surfen! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] intercept point coordinates
Hi List, Can someone help me to calculate the coordinates of the red and green points? In this example I found their approximate location by trying, but as I have to analyse many similar curves, I’d rather calculate the exact location. data- c(0.008248005, 0.061242387, 0.099095516, 0.189943027, 0.227796157, 0.258078661, 0.280790538, 0.303502416, 0.386779301, 0.454914934, 0.545762445, 0.591186201, 0.682033712, 0.757739971, 0.825875604, 0.848587482, 0.803163726, 0.833446230, 0.878869985, 0.871299359, 0.878869985, 0.947005619, 1.0, 0.992429374, 0.954576245, 0.894011237, 0.765310597, 0.621468704, 0.492768064, 0.333784920, 0.258078661, 0.174801775, 0.099095516, 0.008248005) plot(data, type=l) abline(h=0.9) points(21.35,.9, pch=20, col=red) points(26,.9, pch=20, col=green) Thank you, Tonja ___ Neu: WEB.DE De-Mail - Einfach wie E-Mail, sicher wie ein Brief! Jetzt De-Mail-Adresse reservieren: https://produkte.web.de/go/demail02 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Contour line coordinates
Hi all, I used contour() to add contour lines to a plot. Now I’m wondering if there is a way to get an output of the calculated x- and y- coordinates of the contour lines? Tonja __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] solve integrate(,..) varying limits of integration
Dear List, Is there a way to solve integrate(func.1,x, Inf) $value =0.05 by varying the lower limit of integration (x in the example above)? So far I got: r- 0.730163 s--2 func.1- function(t) {1/(2*pi*sqrt(1-r^2))*exp(-1/(2*(1-r^2))*(s^2-2*r*s*t+t^2))} I can change the lower limit manually, like: integrate(func.1, -2.5, Inf) $value [1] 0.05053265 integrate(func.1, -2.4, Inf) $value [1] 0.04942731 integrate(func.1, -2.45, Inf) $value [1] 0.05000923 but this is very time-consuming. So I was wondering if there is a better, preferably an automated way to solve the equation? Thanks, Tonja ___ Neu: WEB.DE De-Mail - Einfach wie E-Mail, sicher wie ein Brief! Jetzt De-Mail-Adresse reservieren: https://produkte.web.de/go/demail02 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lmomRFA-package: regsimq()
Hi List! I’m using regsimq() from the “lmomRFA”-package to calculate error bounds for diverse distributions. For example: regsimq(gumfit$qfunc, nrec = lmom.data$n, f = lcdfgum, boundprob = c(0.025, 0.975)) Several times I got this error massage: Fehler in quantile.default(ou, probs = boundprob, type = 6) : missing values and NaN's not allowed if 'na.rm' is FALSE So my question is, can I change 'na.rm' = FALSE into 'na.rm' = TRUE? And how can I go so? Thank you for your help, Tonja ___ nur 19,99 ¿/mtl.!* http://web.de/DSL-Doppel-Flatrate/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] QQ-plot – Axes
I would like to change the position of the major tick marks in my qq-plot? Right now the ticks are set at 5.5, 6.0, 6.5 and 7.0. I would like them to be at 5.4, 5.6, 5.8, 6.0, 6.2, 6.4, 6.8 and 7.0. So I would have to remove some of the present ticks. So far I can only add ticks to the plot with: axis(1,at=c(5.4, 5.6, 5.8, 6.0, 6.2, 6.4, 6.8, 7.0)) Is there a solution to my problem? Tonja Neu: WEB.DE De-Mail - Einfach wie E-Mail, sicher wie ein Brief! Jetzt De-Mail-Adresse reservieren: https://produkte.web.de/go/demail02 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Confidence intervals (Weibull, LogNormal, Gumbel)
Is there a way to calculate confidence intervals for other distributions than the GEV distribution (like Weibull, LogNormal, Gumbel). I used the maximum likelihood method to estimate the parameters. (For the GEV distribution I used the extReme package) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lmomRFA package: error bounds/confidence intervals
Dear List I’m using the ”lmomRFA” package to fit different distributions to my data sample. To calculate the error bounds I used: regsimq(…) and sitequantbounds(…) So my questions are: Are error bounds and confidence intervals the same thing? And: Does regsimq(… boundprob = c(0.05, 0.95)) calculate the 90 or the 95% confidence interval? If error bounds and confidence intervals are not equal: Is there a way to calcu late confidence intervals for my fitted distributions (gev, weibull, gumbel…)? Thank you in advance, Tonja Neu: WEB.DE De-Mail - Einfach wie E-Mail, sicher wie ein Brief! Jetzt De-Mail-Adresse reservieren: https://produkte.web.de/go/demail02 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Peak Over Threshold values
Thanks a lot for your help. That’s the time period I was looking for. I’ve got one more question: for further analyses I need the respective maximum values within these time periods (between the green and red lines). Preferably in combination with the date the maximum event happened. Thank you in advance, Tonja -Ursprüngliche Nachricht- Von: William Dunlap wdun...@tibco.com Gesendet: 27.05.2010 22:13:21 An: Hutchinson,David [PYR] david.hutchin...@ec.gc.ca,Tonja Krueger tonja.krue...@web.de Betreff: RE: [R] Peak Over Threshold values -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of William Dunlap Sent: Thursday, May 27, 2010 12:24 PM To: Hutchinson,David [PYR]; Tonja Krueger Cc: r-help@r-project.org Subject: Re: [R] Peak Over Threshold values Here is another version, loopless, but still a bit clumsy (can the call to which be removed?): Note that avoiding loops is mostly for the aesthetic effect. On my aging laptop the following loopy and vector-growing version takes 3 seconds on a million-long vector while the non-loopy one takes 0.22 seconds (timings done with plot=FALSE). That is a nice ratio but not much of a difference. f0 - function (x = walevel, startThreshhold = 5.45, stopThreshhold = 5.3, plot = TRUE) { stopifnot(startThreshhold stopThreshhold) inRun - FALSE start - integer() stop - integer() for (i in seq_along(x)) { if (inRun) { if (x[i] stopThreshhold) { stop[length(stop) + 1] - i - 1 inRun - FALSE } } else { if (x[i] startThreshhold) { start[length(start) + 1] - i inRun - TRUE } } } if (inRun) stop[length(stop) + 1] - length(x) if (length(stop) length(start)) stop - stop[-1] if (plot) { plot(x, cex = 0.5) abline(h = c(startThreshhold, stopThreshhold)) abline(v = start, col = green) abline(v = stop, col = red) } data.frame(start = start, stop = stop) } f - function (x = walevel, startThreshhold = 5.45, stopThreshhold = 5.3, plot = TRUE) { stopifnot(startThreshhold stopThreshhold) isFirstInRun - function(x) c(TRUE, x[-1] != x[-length(x)]) isLastInRun - function(x) c(x[-1] != x[-length(x)], TRUE) isOverStart - x = startThreshhold isOverStop - x = stopThreshhold possibleStartPt - which(isFirstInRun(isOverStart) isOverStart) possibleStopPt - which(isLastInRun(isOverStop) isOverStop) pts - c(possibleStartPt, possibleStopPt) names(pts) - rep(c(start, stop), c(length(possibleStartPt), length(possibleStopPt))) pts - pts[order(pts)] tmp - isFirstInRun(names(pts)) start - pts[tmp names(pts) == start] stop - pts[tmp names(pts) == stop] if (length(stop) length(start)) { # i.e., when first downcrossing happens before # first upcrossing stop - stop[-1] } if (plot) { plot(x, cex = 0.5) abline(h = c(startThreshhold, stopThreshhold)) abline(v = start, col = green) abline(v = stop, col = red) } data.frame(start = start, stop = stop) } # define the data in the original message and call f(). The isFirstInRun and isLastInRun functions do the first part of the calculation that rle does and avoid the cumsum(diff()) roundtrip that cumsum(rle()$lengths) involves and their names make it clearer what I'm trying to do. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Hutchinson,David [PYR] Sent: Thursday, May 27, 2010 10:41 AM To: Tonja Krueger Cc: r-help@r-project.org Subject: Re: [R] Peak Over Threshold values Perhaps not elegant, but does the job threshold - 5.45 meetThreshold - walevel = threshold d - rle(meetThreshold) startPos - c(1, 1 + cumsum(d$lengths[-length(d$lengths)])) abv - which(d$values == TRUE) p.o.t - data.frame() for (i in seq( along = abv ) ) { a - startPos[abv[i]] b - a + (d$lengths[abv[i]] - 1) e - which.max(walevel[a:b]) p.o.t - rbind( p.o.t, data.frame( pos = a + e - 1, val = walevel[a:b][e] ) ) } plot ( day, walevel, type = 'l' ) points( p.o.t$pos, p.o.t$val, col = 'red' ) abline(h = threshold, lty = 2, col = 'red') -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Tonja Krueger Sent: Thursday, May 27, 2010 1:47 AM To: Vito Muggeo (UniPa); Clint Bowman Cc: r-help@r-project.org Subject: [R] Peak Over Threshold values I'm sorry, but that's not exactly
[R] Peak Over Threshold values
I’m sorry, but that‘s not exactly what I was looking for. I obviously didn’t explain properly: Within my dataframe (df) I would like to find POT values that are not linked. In my definition two maximum values are linked if walevel does not fall below a certain value (the lower threshold value). walevel - c(5.75, 5.75, 5.75, 5.76, 5.80, 5.82, 5.85, 5.86, 5.91, 5.91, 5.88, 5.92, 5.99, 6.02, 6.00, 6.08, 6.11, 6.10, 6.10, 6.11, 6.14, 6.12, 6.15, 6.17, 6.15, 6.08, 6.01, 5.95, 5.89, 5.83, 5.77, 5.72, 5.70, 5.65, 5.59, 5.51, 5.43, 5.33, 5.30, 5.25, 5.22, 5.21, 5.19, 5.19, 5.19, 5.13, 5.15, 5.07, 5.04, 5.02, 4.99, 5.05, 5.07, 5.12, 5.17, 5.22, 5.32, 5.29, 5.33, 5.33, 5.36, 5.37, 5.42, 5.43, 5.48, 5.55, 5.57, 5.66, 5.68, 5.72, 5.77, 5.79, 5.81, 5.80, 5.80, 5.77, 5.72, 5.70, 5.66, 5.62, 5.62, 5.61, 5.59, 5.57, 5.51, 5.47, 5.41, 5.39, 5.40, 5.40, 5.42, 5.45, 5.47, 5.50, 5.50, 5.49, 5.43, 5.39, 5.33, 5.26) day - c(1:100) df - data.frame(day,walevel) For example in my dataframe a threshold value = 5.45 and an lower threshold value = 5.3 should lead to two maximum values because between the 2^nd and 3^rd peak walevel does not fall below the lower threshold value. With “clusters (…)” from “evd package”, I could find out POT values but even though I set a lower threshold value for my example dataframe I would get three maximum values instead of two. library(evd) clusters(df$walevel,u =5.45, ulow= 5.3, r = 1, cmax=T) clusters(df$walevel,u =5.45, ulow= 5.3, r = 1, plot=T) Changing r to 30 (for example) connects the 2^nd and 3^rd maximum events and gives out the right ‘end’ for the first extreme event but not for the second event. (Also I’d rather not use a time interval to prove that the events are linked) clusters(df$walevel,u =5.45, ulow= 5.3, r = 30, cmax=T) clusters(df$walevel,u =5.45, ulow= 5.3, r = 30, plot=T) What went wrong??? Tonja -Ursprüngliche Nachricht- How about? hi.rle-rle(walevel5.79) lo.rle-rle(walevel5.36) plot(walevel) abline(h=5.8,col=2,lty=3) abline(h=5.35,col=3,lty=3) hi.lo.rle-sort(c(cumsum(hi.rle$lengths),cumsum(lo.rle$lengths))) abline(v=hi.lo.rle) You can use the $values from the rle to sort things out. Probably want to ignore the end point at length(walevel). -- Clint Bowman INTERNET: cl...@ecy.wa.gov Air Quality Modeler INTERNET: cl...@math.utah.edu Department of Ecology VOICE: (360) 407-6815 PO Box 47600 FAX: (360) 407-7534 Olympia, WA 98504-7600 On Wed, 26 May 2010, Vito Muggeo (UniPa) wrote: dear Tonja, By plotting your data plot(df) it seems to me that you are looking for a piecewise linear relationships. If this is the case, have a look to the package segmented. You have to specify or not the number and the starting values for the breakpoints library(segmented) olm-lm(walevel~day) #specify number and starting values for the breakpoints.. oseg-segmented(olm, seg.Z=~day, psi=c(20,50,80)) plot(oseg,add=TRUE,col=2) oseg$psi #or you can avoid to specify starting values for psi oseg-segmented(olm, seg.Z=~day, psi=NA,control=seg.control(stop.if.error=FALSE,K=30)) plot(oseg,add=TRUE,col=2) oseg$psi best, vito Tonja Krueger ha scritto: Dear List I hope you can help me: I’ve got a dataframe (df) within which I am looking for Peak Over Threshold values as well as the length of the events. An event starts when walevel equals 5.8 and it should end when walevel equals the lower threshold value (5.35). I tried “clusters (…)” from “evd package”, and varied r (see example) but it did not work for all events (again see example). walevel - c(5.75, 5.75, 5.75, 5.76, 5.80, 5.82, 5.85, 5.86, 5.91, 5.91, 5.88, 5.92, 5.99, 6.02, 6.00, 6.08, 6.11, 6.10, 6.10, 6.11, 6.14, 6.12, 6.15, 6.17, 6.15, 6.08, 6.01, 5.95, 5.89, 5.83, 5.77, 5.72, 5.70, 5.65, 5.59, 5.51, 5.43, 5.33, 5.30, 5.25, 5.22, 5.21, 5.19, 5.19, 5.19, 5.13, 5.15, 5.07, 5.04, 5.02, 4.99, 5.05, 5.07, 5.12, 5.17, 5.22, 5.32, 5.29, 5.33, 5.33, 5.36, 5.37, 5.42, 5.43, 5.48, 5.55, 5.57, 5.66, 5.68, 5.72, 5.77, 5.79, 5.81, 5.80, 5.80, 5.77, 5.72, 5.70, 5.66, 5.62, 5.62, 5.61, 5.59, 5.57, 5.51, 5.47, 5.41, 5.39, 5.40, 5.40, 5.42, 5.45, 5.47, 5.50, 5.50, 5.49, 5.43, 5.39, 5.33, 5.26) day - c(1:100) df - data.frame(day,walevel) library(evd) clusters(df$walevel, u = 5.80, r = 1, ulow = 5.35, cmax = T, plot = T) clusters(df$walevel, u = 5.80, r = 50, ulow = 5.35, cmax = T, plot = T) What have I done wrong? Tonja __ R-help@r-project.org mailing list [1]https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide [2]http://www.R-project.org
[R] Peak Over Threshold values
Dear List I hope you can help me: I’ve got a dataframe (df) within which I am looking for Peak Over Threshold values as well as the length of the events. An event starts when walevel equals 5.8 and it should end when walevel equals the lower threshold value (5.35). I tried “clusters (…)” from “evd package”, and varied r (see example) but it did not work for all events (again see example). walevel - c(5.75, 5.75, 5.75, 5.76, 5.80, 5.82, 5.85, 5.86, 5.91, 5.91, 5.88, 5.92, 5.99, 6.02, 6.00, 6.08, 6.11, 6.10, 6.10, 6.11, 6.14, 6.12, 6.15, 6.17, 6.15, 6.08, 6.01, 5.95, 5.89, 5.83, 5.77, 5.72, 5.70, 5.65, 5.59, 5.51, 5.43, 5.33, 5.30, 5.25, 5.22, 5.21, 5.19, 5.19, 5.19, 5.13, 5.15, 5.07, 5.04, 5.02, 4.99, 5.05, 5.07, 5.12, 5.17, 5.22, 5.32, 5.29, 5.33, 5.33, 5.36, 5.37, 5.42, 5.43, 5.48, 5.55, 5.57, 5.66, 5.68, 5.72, 5.77, 5.79, 5.81, 5.80, 5.80, 5.77, 5.72, 5.70, 5.66, 5.62, 5.62, 5.61, 5.59, 5.57, 5.51, 5.47, 5.41, 5.39, 5.40, 5.40, 5.42, 5.45, 5.47, 5.50, 5.50, 5.49, 5.43, 5.39, 5.33, 5.26) day - c(1:100) df - data.frame(day,walevel) library(evd) clusters(df$walevel, u = 5.80, r = 1, ulow = 5.35, cmax = T, plot = T) clusters(df$walevel, u = 5.80, r = 50, ulow = 5.35, cmax = T, plot = T) What have I done wrong? Tonja __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Peak Over Threshold values/ length of the events
Dear List I hope you can help me: I’ve got a dataframe (df) within which I am looking for Peak Over Threshold values as well as the length of the events. An event starts when walevel equals 5.8 and it should end when walevel equals the lower threshold value (5.35). I tried “clusters (…)” from “evd package”, and varied r (see example) but it did not work for all events (again see example). walevel - c(5.75, 5.75, 5.75, 5.76, 5.80, 5.82, 5.85, 5.86, 5.91, 5.91, 5.88, 5.92, 5.99, 6.02, 6.00, 6.08, 6.11, 6.10, 6.10, 6.11, 6.14, 6.12, 6.15, 6.17, 6.15, 6.08, 6.01, 5.95, 5.89, 5.83, 5.77, 5.72, 5.70, 5.65, 5.59, 5.51, 5.43, 5.33, 5.30, 5.25, 5.22, 5.21, 5.19, 5.19, 5.19, 5.13, 5.15, 5.07, 5.04, 5.02, 4.99, 5.05, 5.07, 5.12, 5.17, 5.22, 5.32, 5.29, 5.33, 5.33, 5.36, 5.37, 5.42, 5.43, 5.48, 5.55, 5.57, 5.66, 5.68, 5.72, 5.77, 5.79, 5.81, 5.80, 5.80, 5.77, 5.72, 5.70, 5.66, 5.62, 5.62, 5.61, 5.59, 5.57, 5.51, 5.47, 5.41, 5.39, 5.40, 5.40, 5.42, 5.45, 5.47, 5.50, 5.50, 5.49, 5.43, 5.39, 5.33, 5.26) day - c(1:100) df - data.frame(day,walevel) library(evd) clusters(df$walevel, u = 5.80, r = 1, ulow = 5.35, cmax = T, plot = T) clusters(df$walevel, u = 5.80, r = 50, ulow = 5.35, cmax = T, plot = T) What have I done wrong? Tonja __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.