Re: [R] Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves
Rui, Your response nearly answered a similar question of mine except that I also have ecdfs of different lengths. Do you know how I can adjust x - seq(min(loga, logb), max(loga, logb), length.out=length(loga)) to account for this? It must be in length.out() but I'm unsure how to proceed. Any advice is much appreciated. -L Rui Barradas wrote Hello, Try the following. (i've changed the color of the first ecdf.) loga - log10(a+1) # do this logb - log10(b+1) # only once f.a - ecdf(loga) f.b - ecdf(logb) # (2) max distance D x - seq(min(loga, logb), max(loga, logb), length.out=length(loga)) x0 - x[which( abs(f.a(x) - f.b(x)) == max(abs(f.a(x) - f.b(x))) )] y0 - f.a(x0) y1 - f.b(x0) plot(f.a, verticals=TRUE, do.points=FALSE, col=blue) plot(f.b, verticals=TRUE, do.points=FALSE, col=green, add=TRUE) ## alternatine, use standard R plot of ecdf #plot(f.a, col=blue) #lines(f.b, col=green) points(c(x0, x0), c(y0, y1), pch=16, col=red) segments(x0, y0, x0, y1, col=red, lty=dotted) ## alternative, down to x axis #segments(x0, 0, x0, y1, col=red, lty=dotted) Hope this helps, Rui Barradas maxbre wrote Hi all, given this example #start a-c(0,70,50,100,70,650,1300,6900,1780,4930,1120,700,190,940, 760,100,300,36270,5610,249680,1760,4040,164890,17230,75140,1870,22380,5890,2430) length(a) b-c(0,0,10,30,50,440,1000,140,70,90,60,60,20,90,180,30,90, 3220,490,20790,290,740,5350,940,3910,0,640,850,260) length(b) out-ks.test(log10(a+1),log10(b+1)) # max distance D out$statistic f.a-ecdf(log10(a+1)) f.b-ecdf(log10(b+1)) plot(f.a, verticals=TRUE, do.points=FALSE, col=red) plot(f.b, verticals=TRUE, do.points=FALSE, col=green, add=TRUE) #inverse of ecdf a x.a-get(x, environment(f.a)) y.a-get(y, environment(f.a)) # inverse of ecdf b x.b-get(x, environment(f.b)) y.b-get(y, environment(f.b)) #end I want to plot the max distance between the two ecdf curves as in the above given chart Is that possible and how? Thanks for your help PS: this is an amended version of a previous thread (but no reply followed) that I’ve deleted from Nabble repository because I realised it was not enough clear (now I hope it’s a little better, sorry for that) -- View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4645140.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves
Hello, Try length.out = max(length(loga), length(logb)) Note also that all of the previous code and the line above assumes that we are interested in the max distance, whereas the KS statistic computes the supremum of the distance. If it's a two sample test then their values are almost surely the same but not if it's a one sample test. Hope this helps, Rui Barradas Em 05-10-2012 12:15, user1234 escreveu: Rui, Your response nearly answered a similar question of mine except that I also have ecdfs of different lengths. Do you know how I can adjust x - seq(min(loga, logb), max(loga, logb), length.out=length(loga)) to account for this? It must be in length.out() but I'm unsure how to proceed. Any advice is much appreciated. -L Rui Barradas wrote Hello, Try the following. (i've changed the color of the first ecdf.) loga - log10(a+1) # do this logb - log10(b+1) # only once f.a - ecdf(loga) f.b - ecdf(logb) # (2) max distance D x - seq(min(loga, logb), max(loga, logb), length.out=length(loga)) x0 - x[which( abs(f.a(x) - f.b(x)) == max(abs(f.a(x) - f.b(x))) )] y0 - f.a(x0) y1 - f.b(x0) plot(f.a, verticals=TRUE, do.points=FALSE, col=blue) plot(f.b, verticals=TRUE, do.points=FALSE, col=green, add=TRUE) ## alternatine, use standard R plot of ecdf #plot(f.a, col=blue) #lines(f.b, col=green) points(c(x0, x0), c(y0, y1), pch=16, col=red) segments(x0, y0, x0, y1, col=red, lty=dotted) ## alternative, down to x axis #segments(x0, 0, x0, y1, col=red, lty=dotted) Hope this helps, Rui Barradas maxbre wrote Hi all, given this example #start a-c(0,70,50,100,70,650,1300,6900,1780,4930,1120,700,190,940, 760,100,300,36270,5610,249680,1760,4040,164890,17230,75140,1870,22380,5890,2430) length(a) b-c(0,0,10,30,50,440,1000,140,70,90,60,60,20,90,180,30,90, 3220,490,20790,290,740,5350,940,3910,0,640,850,260) length(b) out-ks.test(log10(a+1),log10(b+1)) # max distance D out$statistic f.a-ecdf(log10(a+1)) f.b-ecdf(log10(b+1)) plot(f.a, verticals=TRUE, do.points=FALSE, col=red) plot(f.b, verticals=TRUE, do.points=FALSE, col=green, add=TRUE) #inverse of ecdf a x.a-get(x, environment(f.a)) y.a-get(y, environment(f.a)) # inverse of ecdf b x.b-get(x, environment(f.b)) y.b-get(y, environment(f.b)) #end I want to plot the max distance between the two ecdf curves as in the above given chart Is that possible and how? Thanks for your help PS: this is an amended version of a previous thread (but no reply followed) that I’ve deleted from Nabble repository because I realised it was not enough clear (now I hope it’s a little better, sorry for that) -- View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4645140.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves
Another alternative is to put the data in a linear model structure (1 column for the response, another column for an indicator variable indicating group) and estimate all possible quantile regressions with rq() in quantreg package using a model with y ~ intercept + indicator (0,1) variable for group. The estimated quantiles for the intercept will be the quantiles of the ecdf for one group and the estimated quantiles for the indicator grouping variable will be the differences in quantiles (ecdf) between the two groups. There is useful built in graphing capability in quantreg with the plot.rqs() function. Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: brian_c...@usgs.gov tel: 970 226-9326 From: user1234 mehenderso...@gmail.com To: r-help@r-project.org Date: 10/05/2012 06:46 AM Subject: Re: [R] Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves Sent by: r-help-boun...@r-project.org Rui, Your response nearly answered a similar question of mine except that I also have ecdfs of different lengths. Do you know how I can adjust x - seq(min(loga, logb), max(loga, logb), length.out=length(loga)) to account for this? It must be in length.out() but I'm unsure how to proceed. Any advice is much appreciated. -L Rui Barradas wrote Hello, Try the following. (i've changed the color of the first ecdf.) loga - log10(a+1) # do this logb - log10(b+1) # only once f.a - ecdf(loga) f.b - ecdf(logb) # (2) max distance D x - seq(min(loga, logb), max(loga, logb), length.out=length(loga)) x0 - x[which( abs(f.a(x) - f.b(x)) == max(abs(f.a(x) - f.b(x))) )] y0 - f.a(x0) y1 - f.b(x0) plot(f.a, verticals=TRUE, do.points=FALSE, col=blue) plot(f.b, verticals=TRUE, do.points=FALSE, col=green, add=TRUE) ## alternatine, use standard R plot of ecdf #plot(f.a, col=blue) #lines(f.b, col=green) points(c(x0, x0), c(y0, y1), pch=16, col=red) segments(x0, y0, x0, y1, col=red, lty=dotted) ## alternative, down to x axis #segments(x0, 0, x0, y1, col=red, lty=dotted) Hope this helps, Rui Barradas maxbre wrote Hi all, given this example #start a-c(0,70,50,100,70,650,1300,6900,1780,4930,1120,700,190,940, 760,100,300,36270,5610,249680,1760,4040,164890,17230,75140,1870,22380,5890,2430) length(a) b-c(0,0,10,30,50,440,1000,140,70,90,60,60,20,90,180,30,90, 3220,490,20790,290,740,5350,940,3910,0,640,850,260) length(b) out-ks.test(log10(a+1),log10(b+1)) # max distance D out$statistic f.a-ecdf(log10(a+1)) f.b-ecdf(log10(b+1)) plot(f.a, verticals=TRUE, do.points=FALSE, col=red) plot(f.b, verticals=TRUE, do.points=FALSE, col=green, add=TRUE) #inverse of ecdf a x.a-get(x, environment(f.a)) y.a-get(y, environment(f.a)) # inverse of ecdf b x.b-get(x, environment(f.b)) y.b-get(y, environment(f.b)) #end I want to plot the max distance between the two ecdf curves as in the above given chart Is that possible and how? Thanks for your help PS: this is an amended version of a previous thread (but no reply followed) that I?ve deleted from Nabble repository because I realised it was not enough clear (now I hope it?s a little better, sorry for that) -- View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4645140.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves
thanks rui that's what I was looking for I have another related question: - why of the difference between the max distance D calculated with ks.test() and the max distance D “manually” calculated as in (2)? I guess it has something to do with the fact that KS is obtained with a maximisation that depends on the range of x values not necessarly coincident in the two different approaches ...any thought about this? maxbre -- View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4631564.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves
Hello, That's a very difficult question. See Marsaglia, Tsang, Wang (2003) http://www.jstatsoft.org/v08/i18/ Simard, L'Ecuyer (2011) http://www.jstatsoft.org/v39/i11 R's ks functions are a port of Marsaglia et al. to the .C interface. Rui Barradas maxbre wrote thanks rui that's what I was looking for I have another related question: - why of the difference between the max distance D calculated with ks.test() and the max distance D “manually” calculated as in (2)? I guess it has something to do with the fact that KS is obtained with a maximisation that depends on the range of x values not necessarly coincident in the two different approaches ...any thought about this? maxbre -- View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4631571.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves
thanks for the help: I'll have a look at the papers max Il 28/05/2012 12:31, Rui Barradas [via R] ha scritto: Hello, That's a very difficult question. See Marsaglia, Tsang, Wang (2003) http://www.jstatsoft.org/v08/i18/ Simard, L'Ecuyer (2011) http://www.jstatsoft.org/v39/i11 R's ks functions are a port of Marsaglia et al. to the .C interface. Rui Barradas maxbre wrote thanks rui that's what I was looking for I have another related question: - why of the difference between the max distance D calculated with ks.test() and the max distance D âmanuallyâ calculated as in (2)? I guess it has something to do with the fact that KS is obtained with a maximisation that depends on the range of x values not necessarly coincident in the two different approaches ...any thought about this? maxbre If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4631571.html To unsubscribe from Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves, click here http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4631437code=bWJyZXNzYW5AYXJwYS52ZW5ldG8uaXR8NDYzMTQzN3wyMjQwMjkzMTc=. NAML http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4631573.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves
Just a final correction. I was wrong, stats::ks.test doesn't use only Marsaglia et al. It's even clearly written in the help page. Read the documentation before stating! Rui Barradas Em 28-05-2012 11:51, maxbre escreveu: thanks for the help: I'll have a look at the papers max Il 28/05/2012 12:31, Rui Barradas [via R] ha scritto: Hello, That's a very difficult question. See Marsaglia, Tsang, Wang (2003) http://www.jstatsoft.org/v08/i18/ Simard, L'Ecuyer (2011) http://www.jstatsoft.org/v39/i11 R's ks functions are a port of Marsaglia et al. to the .C interface. Rui Barradas maxbre wrote thanks rui that's what I was looking for I have another related question: - why of the difference between the max distance D calculated with ks.test() and the max distance D âEURoemanuallyâEUR? calculated as in (2)? I guess it has something to do with the fact that KS is obtained with a maximisation that depends on the range of x values not necessarly coincident in the two different approaches ...any thought about this? maxbre If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4631571.html To unsubscribe from Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves, click here http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4631437code=bWJyZXNzYW5AYXJwYS52ZW5ldG8uaXR8NDYzMTQzN3wyMjQwMjkzMTc=. NAML http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4631573.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves
Hello, Try the following. (i've changed the color of the first ecdf.) loga - log10(a+1) # do this logb - log10(b+1) # only once f.a - ecdf(loga) f.b - ecdf(logb) # (2) max distance D x - seq(min(loga, logb), max(loga, logb), length.out=length(loga)) x0 - x[which( abs(f.a(x) - f.b(x)) == max(abs(f.a(x) - f.b(x))) )] y0 - f.a(x0) y1 - f.b(x0) plot(f.a, verticals=TRUE, do.points=FALSE, col=blue) plot(f.b, verticals=TRUE, do.points=FALSE, col=green, add=TRUE) ## alternatine, use standard R plot of ecdf #plot(f.a, col=blue) #lines(f.b, col=green) points(c(x0, x0), c(y0, y1), pch=16, col=red) segments(x0, y0, x0, y1, col=red, lty=dotted) ## alternative, down to x axis #segments(x0, 0, x0, y1, col=red, lty=dotted) Hope this helps, Rui Barradas maxbre wrote Hi all, given this example #start a-c(0,70,50,100,70,650,1300,6900,1780,4930,1120,700,190,940, 760,100,300,36270,5610,249680,1760,4040,164890,17230,75140,1870,22380,5890,2430) length(a) b-c(0,0,10,30,50,440,1000,140,70,90,60,60,20,90,180,30,90, 3220,490,20790,290,740,5350,940,3910,0,640,850,260) length(b) out-ks.test(log10(a+1),log10(b+1)) # max distance D out$statistic f.a-ecdf(log10(a+1)) f.b-ecdf(log10(b+1)) plot(f.a, verticals=TRUE, do.points=FALSE, col=red) plot(f.b, verticals=TRUE, do.points=FALSE, col=green, add=TRUE) #inverse of ecdf a x.a-get(x, environment(f.a)) y.a-get(y, environment(f.a)) # inverse of ecdf b x.b-get(x, environment(f.b)) y.b-get(y, environment(f.b)) #end I want to plot the max distance between the two ecdf curves as in the above given chart Is that possible and how? Thanks for your help PS: this is an amended version of a previous thread (but no reply followed) that I’ve deleted from Nabble repository because I realised it was not enough clear (now I hope it’s a little better, sorry for that) -- View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4631438.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves
Hi all, given this example #start a-c(0,70,50,100,70,650,1300,6900,1780,4930,1120,700,190,940, 760,100,300,36270,5610,249680,1760,4040,164890,17230,75140,1870,22380,5890,2430) length(a) b-c(0,0,10,30,50,440,1000,140,70,90,60,60,20,90,180,30,90, 3220,490,20790,290,740,5350,940,3910,0,640,850,260) length(b) out-ks.test(log10(a+1),log10(b+1)) # max distance D out$statistic f.a-ecdf(log10(a+1)) f.b-ecdf(log10(b+1)) plot(f.a, verticals=TRUE, do.points=FALSE, col=red) plot(f.b, verticals=TRUE, do.points=FALSE, col=green, add=TRUE) #inverse of ecdf a x.a-get(x, environment(f.a)) y.a-get(y, environment(f.a)) # inverse of ecdf b x.b-get(x, environment(f.b)) y.b-get(y, environment(f.b)) #end I want to plot the max distance between the two ecdf curves as in the above given chart Is that possible and how? Thanks for your help PS: this is an amended version of a previous thread (but no reply followed) that I’ve deleted from Nabble repository because I realised it was not enough clear (now I hope it’s a little better, sorry for that) -- View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.