Re: [R] Kolmogorov-Smirnov Test
On Aug 2, 2013, at 03:24 , Roslina Zakaria wrote: Dear r-users, I am using KS test to test the goodness of fit for my data and the got the following output. However, I don't understand about the warning messages. What does it mean by horizontals is not a graphical parameter It's horizontal, but I don't think this is coming from ks.test, which isn't supposed to do anything with graphics (unless you modified it). Would you by any chance have a graphics device open, for which you have been setting parameters? Also, I think there is a buglet in which R warnings are sometimes delayed, so it may came from a previous command. I don't think it would happen twice, though. -pd Thank you so much for any help given and it is very much appreciated. ks.test(compare[,1], compare[,2]) Two-sample Kolmogorov-Smirnov test data: compare[, 1] and compare[, 2] D = 0.0755, p-value = 2.238e-05 alternative hypothesis: two-sided Warning messages: 1: horizontals is not a graphical parameter 2: horizontals is not a graphical parameter 3: horizontals is not a graphical parameter 4: horizontals is not a graphical parameter 5: horizontals is not a graphical parameter 6: horizontals is not a graphical parameter 7: In ks.test(compare[, 1], compare[, 2]) : cannot compute correct p-values with ties ks.test(compare[,1], compare[,2]) Two-sample Kolmogorov-Smirnov test data: compare[, 1] and compare[, 2] D = 0.0755, p-value = 2.238e-05 alternative hypothesis: two-sided Warning messages: 1: horizontals is not a graphical parameter 2: horizontals is not a graphical parameter 3: horizontals is not a graphical parameter 4: horizontals is not a graphical parameter 5: horizontals is not a graphical parameter 6: horizontals is not a graphical parameter 7: In ks.test(compare[, 1], compare[, 2]) : cannot compute correct p-values with ties [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov-Smirnov Test
Hi Peter, Thank you so much for your explaination. I draw histogram before that, so maybe that warning messages are meant for that. From: peter dalgaard pda...@gmail.com Cc: r-help@r-project.org r-help@r-project.org Sent: Friday, August 2, 2013 3:11 PM Subject: Re: [R] Kolmogorov-Smirnov Test On Aug 2, 2013, at 03:24 , Roslina Zakaria wrote: Dear r-users, I am using KS test to test the goodness of fit for my data and the got the following output. However, I don't understand about the warning messages. What does it mean by horizontals is not a graphical parameter It's horizontal, but I don't think this is coming from ks.test, which isn't supposed to do anything with graphics (unless you modified it). Would you by any chance have a graphics device open, for which you have been setting parameters? Also, I think there is a buglet in which R warnings are sometimes delayed, so it may came from a previous command. I don't think it would happen twice, though. -pd Thank you so much for any help given and it is very much appreciated. ks.test(compare[,1], compare[,2]) Two-sample Kolmogorov-Smirnov test data: compare[, 1] and compare[, 2] D = 0.0755, p-value = 2.238e-05 alternative hypothesis: two-sided Warning messages: 1: horizontals is not a graphical parameter 2: horizontals is not a graphical parameter 3: horizontals is not a graphical parameter 4: horizontals is not a graphical parameter 5: horizontals is not a graphical parameter 6: horizontals is not a graphical parameter 7: In ks.test(compare[, 1], compare[, 2]) : cannot compute correct p-values with ties ks.test(compare[,1], compare[,2]) Two-sample Kolmogorov-Smirnov test data: compare[, 1] and compare[, 2] D = 0.0755, p-value = 2.238e-05 alternative hypothesis: two-sided Warning messages: 1: horizontals is not a graphical parameter 2: horizontals is not a graphical parameter 3: horizontals is not a graphical parameter 4: horizontals is not a graphical parameter 5: horizontals is not a graphical parameter 6: horizontals is not a graphical parameter 7: In ks.test(compare[, 1], compare[, 2]) : cannot compute correct p-values with ties [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Kolmogorov-Smirnov Test
Dear r-users, I am using KS test to test the goodness of fit for my data and the got the following output. However, I don't understand about the warning messages. What does it mean by horizontals is not a graphical parameter Thank you so much for any help given and it is very much appreciated. ks.test(compare[,1], compare[,2]) Two-sample Kolmogorov-Smirnov test data: compare[, 1] and compare[, 2] D = 0.0755, p-value = 2.238e-05 alternative hypothesis: two-sided Warning messages: 1: horizontals is not a graphical parameter 2: horizontals is not a graphical parameter 3: horizontals is not a graphical parameter 4: horizontals is not a graphical parameter 5: horizontals is not a graphical parameter 6: horizontals is not a graphical parameter 7: In ks.test(compare[, 1], compare[, 2]) : cannot compute correct p-values with ties ks.test(compare[,1], compare[,2]) Two-sample Kolmogorov-Smirnov test data: compare[, 1] and compare[, 2] D = 0.0755, p-value = 2.238e-05 alternative hypothesis: two-sided Warning messages: 1: horizontals is not a graphical parameter 2: horizontals is not a graphical parameter 3: horizontals is not a graphical parameter 4: horizontals is not a graphical parameter 5: horizontals is not a graphical parameter 6: horizontals is not a graphical parameter 7: In ks.test(compare[, 1], compare[, 2]) : cannot compute correct p-values with ties [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves
Rui, Your response nearly answered a similar question of mine except that I also have ecdfs of different lengths. Do you know how I can adjust x - seq(min(loga, logb), max(loga, logb), length.out=length(loga)) to account for this? It must be in length.out() but I'm unsure how to proceed. Any advice is much appreciated. -L Rui Barradas wrote Hello, Try the following. (i've changed the color of the first ecdf.) loga - log10(a+1) # do this logb - log10(b+1) # only once f.a - ecdf(loga) f.b - ecdf(logb) # (2) max distance D x - seq(min(loga, logb), max(loga, logb), length.out=length(loga)) x0 - x[which( abs(f.a(x) - f.b(x)) == max(abs(f.a(x) - f.b(x))) )] y0 - f.a(x0) y1 - f.b(x0) plot(f.a, verticals=TRUE, do.points=FALSE, col=blue) plot(f.b, verticals=TRUE, do.points=FALSE, col=green, add=TRUE) ## alternatine, use standard R plot of ecdf #plot(f.a, col=blue) #lines(f.b, col=green) points(c(x0, x0), c(y0, y1), pch=16, col=red) segments(x0, y0, x0, y1, col=red, lty=dotted) ## alternative, down to x axis #segments(x0, 0, x0, y1, col=red, lty=dotted) Hope this helps, Rui Barradas maxbre wrote Hi all, given this example #start a-c(0,70,50,100,70,650,1300,6900,1780,4930,1120,700,190,940, 760,100,300,36270,5610,249680,1760,4040,164890,17230,75140,1870,22380,5890,2430) length(a) b-c(0,0,10,30,50,440,1000,140,70,90,60,60,20,90,180,30,90, 3220,490,20790,290,740,5350,940,3910,0,640,850,260) length(b) out-ks.test(log10(a+1),log10(b+1)) # max distance D out$statistic f.a-ecdf(log10(a+1)) f.b-ecdf(log10(b+1)) plot(f.a, verticals=TRUE, do.points=FALSE, col=red) plot(f.b, verticals=TRUE, do.points=FALSE, col=green, add=TRUE) #inverse of ecdf a x.a-get(x, environment(f.a)) y.a-get(y, environment(f.a)) # inverse of ecdf b x.b-get(x, environment(f.b)) y.b-get(y, environment(f.b)) #end I want to plot the max distance between the two ecdf curves as in the above given chart Is that possible and how? Thanks for your help PS: this is an amended version of a previous thread (but no reply followed) that I’ve deleted from Nabble repository because I realised it was not enough clear (now I hope it’s a little better, sorry for that) -- View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4645140.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves
Hello, Try length.out = max(length(loga), length(logb)) Note also that all of the previous code and the line above assumes that we are interested in the max distance, whereas the KS statistic computes the supremum of the distance. If it's a two sample test then their values are almost surely the same but not if it's a one sample test. Hope this helps, Rui Barradas Em 05-10-2012 12:15, user1234 escreveu: Rui, Your response nearly answered a similar question of mine except that I also have ecdfs of different lengths. Do you know how I can adjust x - seq(min(loga, logb), max(loga, logb), length.out=length(loga)) to account for this? It must be in length.out() but I'm unsure how to proceed. Any advice is much appreciated. -L Rui Barradas wrote Hello, Try the following. (i've changed the color of the first ecdf.) loga - log10(a+1) # do this logb - log10(b+1) # only once f.a - ecdf(loga) f.b - ecdf(logb) # (2) max distance D x - seq(min(loga, logb), max(loga, logb), length.out=length(loga)) x0 - x[which( abs(f.a(x) - f.b(x)) == max(abs(f.a(x) - f.b(x))) )] y0 - f.a(x0) y1 - f.b(x0) plot(f.a, verticals=TRUE, do.points=FALSE, col=blue) plot(f.b, verticals=TRUE, do.points=FALSE, col=green, add=TRUE) ## alternatine, use standard R plot of ecdf #plot(f.a, col=blue) #lines(f.b, col=green) points(c(x0, x0), c(y0, y1), pch=16, col=red) segments(x0, y0, x0, y1, col=red, lty=dotted) ## alternative, down to x axis #segments(x0, 0, x0, y1, col=red, lty=dotted) Hope this helps, Rui Barradas maxbre wrote Hi all, given this example #start a-c(0,70,50,100,70,650,1300,6900,1780,4930,1120,700,190,940, 760,100,300,36270,5610,249680,1760,4040,164890,17230,75140,1870,22380,5890,2430) length(a) b-c(0,0,10,30,50,440,1000,140,70,90,60,60,20,90,180,30,90, 3220,490,20790,290,740,5350,940,3910,0,640,850,260) length(b) out-ks.test(log10(a+1),log10(b+1)) # max distance D out$statistic f.a-ecdf(log10(a+1)) f.b-ecdf(log10(b+1)) plot(f.a, verticals=TRUE, do.points=FALSE, col=red) plot(f.b, verticals=TRUE, do.points=FALSE, col=green, add=TRUE) #inverse of ecdf a x.a-get(x, environment(f.a)) y.a-get(y, environment(f.a)) # inverse of ecdf b x.b-get(x, environment(f.b)) y.b-get(y, environment(f.b)) #end I want to plot the max distance between the two ecdf curves as in the above given chart Is that possible and how? Thanks for your help PS: this is an amended version of a previous thread (but no reply followed) that I’ve deleted from Nabble repository because I realised it was not enough clear (now I hope it’s a little better, sorry for that) -- View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4645140.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves
Another alternative is to put the data in a linear model structure (1 column for the response, another column for an indicator variable indicating group) and estimate all possible quantile regressions with rq() in quantreg package using a model with y ~ intercept + indicator (0,1) variable for group. The estimated quantiles for the intercept will be the quantiles of the ecdf for one group and the estimated quantiles for the indicator grouping variable will be the differences in quantiles (ecdf) between the two groups. There is useful built in graphing capability in quantreg with the plot.rqs() function. Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: brian_c...@usgs.gov tel: 970 226-9326 From: user1234 mehenderso...@gmail.com To: r-help@r-project.org Date: 10/05/2012 06:46 AM Subject: Re: [R] Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves Sent by: r-help-boun...@r-project.org Rui, Your response nearly answered a similar question of mine except that I also have ecdfs of different lengths. Do you know how I can adjust x - seq(min(loga, logb), max(loga, logb), length.out=length(loga)) to account for this? It must be in length.out() but I'm unsure how to proceed. Any advice is much appreciated. -L Rui Barradas wrote Hello, Try the following. (i've changed the color of the first ecdf.) loga - log10(a+1) # do this logb - log10(b+1) # only once f.a - ecdf(loga) f.b - ecdf(logb) # (2) max distance D x - seq(min(loga, logb), max(loga, logb), length.out=length(loga)) x0 - x[which( abs(f.a(x) - f.b(x)) == max(abs(f.a(x) - f.b(x))) )] y0 - f.a(x0) y1 - f.b(x0) plot(f.a, verticals=TRUE, do.points=FALSE, col=blue) plot(f.b, verticals=TRUE, do.points=FALSE, col=green, add=TRUE) ## alternatine, use standard R plot of ecdf #plot(f.a, col=blue) #lines(f.b, col=green) points(c(x0, x0), c(y0, y1), pch=16, col=red) segments(x0, y0, x0, y1, col=red, lty=dotted) ## alternative, down to x axis #segments(x0, 0, x0, y1, col=red, lty=dotted) Hope this helps, Rui Barradas maxbre wrote Hi all, given this example #start a-c(0,70,50,100,70,650,1300,6900,1780,4930,1120,700,190,940, 760,100,300,36270,5610,249680,1760,4040,164890,17230,75140,1870,22380,5890,2430) length(a) b-c(0,0,10,30,50,440,1000,140,70,90,60,60,20,90,180,30,90, 3220,490,20790,290,740,5350,940,3910,0,640,850,260) length(b) out-ks.test(log10(a+1),log10(b+1)) # max distance D out$statistic f.a-ecdf(log10(a+1)) f.b-ecdf(log10(b+1)) plot(f.a, verticals=TRUE, do.points=FALSE, col=red) plot(f.b, verticals=TRUE, do.points=FALSE, col=green, add=TRUE) #inverse of ecdf a x.a-get(x, environment(f.a)) y.a-get(y, environment(f.a)) # inverse of ecdf b x.b-get(x, environment(f.b)) y.b-get(y, environment(f.b)) #end I want to plot the max distance between the two ecdf curves as in the above given chart Is that possible and how? Thanks for your help PS: this is an amended version of a previous thread (but no reply followed) that I?ve deleted from Nabble repository because I realised it was not enough clear (now I hope it?s a little better, sorry for that) -- View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4645140.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves
thanks rui that's what I was looking for I have another related question: - why of the difference between the max distance D calculated with ks.test() and the max distance D “manually” calculated as in (2)? I guess it has something to do with the fact that KS is obtained with a maximisation that depends on the range of x values not necessarly coincident in the two different approaches ...any thought about this? maxbre -- View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4631564.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves
Hello, That's a very difficult question. See Marsaglia, Tsang, Wang (2003) http://www.jstatsoft.org/v08/i18/ Simard, L'Ecuyer (2011) http://www.jstatsoft.org/v39/i11 R's ks functions are a port of Marsaglia et al. to the .C interface. Rui Barradas maxbre wrote thanks rui that's what I was looking for I have another related question: - why of the difference between the max distance D calculated with ks.test() and the max distance D “manually” calculated as in (2)? I guess it has something to do with the fact that KS is obtained with a maximisation that depends on the range of x values not necessarly coincident in the two different approaches ...any thought about this? maxbre -- View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4631571.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves
thanks for the help: I'll have a look at the papers max Il 28/05/2012 12:31, Rui Barradas [via R] ha scritto: Hello, That's a very difficult question. See Marsaglia, Tsang, Wang (2003) http://www.jstatsoft.org/v08/i18/ Simard, L'Ecuyer (2011) http://www.jstatsoft.org/v39/i11 R's ks functions are a port of Marsaglia et al. to the .C interface. Rui Barradas maxbre wrote thanks rui that's what I was looking for I have another related question: - why of the difference between the max distance D calculated with ks.test() and the max distance D âmanuallyâ calculated as in (2)? I guess it has something to do with the fact that KS is obtained with a maximisation that depends on the range of x values not necessarly coincident in the two different approaches ...any thought about this? maxbre If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4631571.html To unsubscribe from Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves, click here http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4631437code=bWJyZXNzYW5AYXJwYS52ZW5ldG8uaXR8NDYzMTQzN3wyMjQwMjkzMTc=. NAML http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4631573.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves
Just a final correction. I was wrong, stats::ks.test doesn't use only Marsaglia et al. It's even clearly written in the help page. Read the documentation before stating! Rui Barradas Em 28-05-2012 11:51, maxbre escreveu: thanks for the help: I'll have a look at the papers max Il 28/05/2012 12:31, Rui Barradas [via R] ha scritto: Hello, That's a very difficult question. See Marsaglia, Tsang, Wang (2003) http://www.jstatsoft.org/v08/i18/ Simard, L'Ecuyer (2011) http://www.jstatsoft.org/v39/i11 R's ks functions are a port of Marsaglia et al. to the .C interface. Rui Barradas maxbre wrote thanks rui that's what I was looking for I have another related question: - why of the difference between the max distance D calculated with ks.test() and the max distance D âEURoemanuallyâEUR? calculated as in (2)? I guess it has something to do with the fact that KS is obtained with a maximisation that depends on the range of x values not necessarly coincident in the two different approaches ...any thought about this? maxbre If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4631571.html To unsubscribe from Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves, click here http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4631437code=bWJyZXNzYW5AYXJwYS52ZW5ldG8uaXR8NDYzMTQzN3wyMjQwMjkzMTc=. NAML http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4631573.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves
Hello, Try the following. (i've changed the color of the first ecdf.) loga - log10(a+1) # do this logb - log10(b+1) # only once f.a - ecdf(loga) f.b - ecdf(logb) # (2) max distance D x - seq(min(loga, logb), max(loga, logb), length.out=length(loga)) x0 - x[which( abs(f.a(x) - f.b(x)) == max(abs(f.a(x) - f.b(x))) )] y0 - f.a(x0) y1 - f.b(x0) plot(f.a, verticals=TRUE, do.points=FALSE, col=blue) plot(f.b, verticals=TRUE, do.points=FALSE, col=green, add=TRUE) ## alternatine, use standard R plot of ecdf #plot(f.a, col=blue) #lines(f.b, col=green) points(c(x0, x0), c(y0, y1), pch=16, col=red) segments(x0, y0, x0, y1, col=red, lty=dotted) ## alternative, down to x axis #segments(x0, 0, x0, y1, col=red, lty=dotted) Hope this helps, Rui Barradas maxbre wrote Hi all, given this example #start a-c(0,70,50,100,70,650,1300,6900,1780,4930,1120,700,190,940, 760,100,300,36270,5610,249680,1760,4040,164890,17230,75140,1870,22380,5890,2430) length(a) b-c(0,0,10,30,50,440,1000,140,70,90,60,60,20,90,180,30,90, 3220,490,20790,290,740,5350,940,3910,0,640,850,260) length(b) out-ks.test(log10(a+1),log10(b+1)) # max distance D out$statistic f.a-ecdf(log10(a+1)) f.b-ecdf(log10(b+1)) plot(f.a, verticals=TRUE, do.points=FALSE, col=red) plot(f.b, verticals=TRUE, do.points=FALSE, col=green, add=TRUE) #inverse of ecdf a x.a-get(x, environment(f.a)) y.a-get(y, environment(f.a)) # inverse of ecdf b x.b-get(x, environment(f.b)) y.b-get(y, environment(f.b)) #end I want to plot the max distance between the two ecdf curves as in the above given chart Is that possible and how? Thanks for your help PS: this is an amended version of a previous thread (but no reply followed) that I’ve deleted from Nabble repository because I realised it was not enough clear (now I hope it’s a little better, sorry for that) -- View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4631438.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves
Hi all, given this example #start a-c(0,70,50,100,70,650,1300,6900,1780,4930,1120,700,190,940, 760,100,300,36270,5610,249680,1760,4040,164890,17230,75140,1870,22380,5890,2430) length(a) b-c(0,0,10,30,50,440,1000,140,70,90,60,60,20,90,180,30,90, 3220,490,20790,290,740,5350,940,3910,0,640,850,260) length(b) out-ks.test(log10(a+1),log10(b+1)) # max distance D out$statistic f.a-ecdf(log10(a+1)) f.b-ecdf(log10(b+1)) plot(f.a, verticals=TRUE, do.points=FALSE, col=red) plot(f.b, verticals=TRUE, do.points=FALSE, col=green, add=TRUE) #inverse of ecdf a x.a-get(x, environment(f.a)) y.a-get(y, environment(f.a)) # inverse of ecdf b x.b-get(x, environment(f.b)) y.b-get(y, environment(f.b)) #end I want to plot the max distance between the two ecdf curves as in the above given chart Is that possible and how? Thanks for your help PS: this is an amended version of a previous thread (but no reply followed) that I’ve deleted from Nabble repository because I realised it was not enough clear (now I hope it’s a little better, sorry for that) -- View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov-smirnov test
I recently gave a presentation at the 50th Army Operational Research Symposium at Ft Lee describing an implementation of Conover's exact calculation method for the KS test applied to discrete distributions. My implementation was done in Matlab script as opposed to R. Multiple Monte-Carlo trials were most encouraging. Seeing a comparison of the methods of implementation would be interesting. -- View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-smirnov-test-tp3313842p4037287.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Kolmogorov-Smirnov-Test on binned data, I guess gumbel-distributed data
Hi R-Users, I read some texts related to KS-tests. Most of those authors stated, that KS-Tests are not suitable for binned data, but some of them refer to 'other' authors who are claiming that KS-Tests are okay for binned data. I searched for sources and can't find examples which approve that it is okay to use KS-Tests for binned data - do you have any links to articles or tutorials? Anyway, I look for a test which backens me up that my data is gumbel-distributed. I estimated the gumbel-parameters mue and beta and after having a look on resulting plots, in my opinion: that looks quite good! You can the plot, related data, and the rscript here: www.jochen-bauer.net/downloads/kstest/Rplots-1000.pdf http://www.jochen-bauer.net/downloads/kstest/rm2700-1000.txt http://www.jochen-bauer.net/downloads/kstest/rcalc.R The story about the data: I am wondering what test I should choose if KS-Test is not appropriate? I get real high p-Values for data-row-1-histogram-heights and fitted-gumbel-distribution-function-to-bin-midth-vals. Most of the time, KS-test results in distances of 0.01 and p-Values of 0.99 or 1. This sounds strange to me, too high. Otherwise my plots are looking good and as you can see, in my first experiment I sampled 1000 values. In a second experiment I created only 50 random-values for the gumbel-parameter-estimation. I try to reduce permutations, so I will be able to create results faster, but I have to find out, when data fails for gumbel-distribution. The results surprised me, I expected that my tests and plots get worse, but I got still high p-values for the KS-Test and still a nice looking plot. www.jochen-bauer.net/downloads/kstest/Rplots-0050.pdf http://www.jochen-bauer.net/downloads/kstest/rm2700-0050.txt Moreover besides the shuffled data of my randomisation-test there are real-data-values. I calculated the p-value that my real data point occurs under estimated gumbel distribution. Those p-values between 1000permutation-experiment and 50-permutation-experiment are correlating enormously ... around 0.98. Pearson and Spearman-correlation-coefficients told me this. I guess that backens up the fact, that my plots are not getting worse nor the KS-Tests do. I hope I was able to state my current situation and you are able to give me some hints, for some literature or other tests or backen me up in my guess that my data is gumbel-distributed. Thanks in advance. Jochen I hope I was able to tell -- View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-Test-on-binned-data-I-guess-gumbel-distributed-data-tp3983781p3983781.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov-Smirnov test
There are criteria to tell if differences are meaningless, but they come from the science and the researcher, not from statistics tests and algorithms. Consider the question: Is one second of difference important? to answer that needs a bunch of context. One second can be a large period of time in nuclear physics or the 100 yard dash, but a small amount of time in geology or a marathon. Consider the distribution function that is equal to 1 when 0 x 0.99 or 99.99 x 100 and 0 otherwise, is this distribution meaningfully different from the uniform between 0 and 1? In some cases yes, others probably not (and some distribution tests would have an easier or harder time finding this difference). As for the differences in output between the programs, when the sample sizes are the same the KS statistic is pretty straight forward, when they differ there needs to be some type of interpolation of one or both datasets to get the comparison points. The differences you are seeing are probably due to differences in how that interpolation is being done. If the differences are small and do not change the decision then I would not worry about them. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of rommel Sent: Saturday, September 24, 2011 2:30 AM To: r-help@r-project.org Subject: Re: [R] Kolmogorov-Smirnov test Dear Dr. Snow, nbsp; Thank you for your reply. nbsp; 1. Are you doing the 2 sample KS test? Comparing if 2 samples come from the same distribution? -Yes, I am doing 2-sample KS test nbsp; 2. With 3,000 points you will still likely have power to find meaningless differences, what exactly are you trying to accomplish by doing the comparison? - I am comparing the swimming parameters of fish larvae such as move duration and move length. - The comparison is between treatments. -Sample sizes for example in one comparison pair :nbsp; Control (2700 data pts) vs Medium (3012 pts) nbsp; Dmax = 0.07 p-level lt;0.001 - Are there criteria to know if the differences are meaningless or not? nbsp; 3. I am really only familiar with the KS test done in R (which did not make your list, yet you are asking on an R mailing list). Differences could be due to errors, different assumptions, different algorithms, sunspots, or any number of other things. Are the differences meaningful? R lets you see exactly what it is doing so you can check errors/assumptions/algorithms, I don't know about the ones you show. - sorry i forgot to list the R. I thought wessa.net was using R already. but I also made the software comparisons using R. The results were: nbsp;nbsp;nbsp; with equal data points: results are the same in both Dmax and p-value nbsp;nbsp;nbsp;nbsp;with unequal data points : conclusions from results were the same such that significant difference between samples holds through using different softwares. Only the Dmax and p-values differ a bit. (please see attached file for the comparisons). nbsp; 4. You will need to ask someone who knows the programs you reference to determine what input they are expecting. R expects the raw data. - Thanks! I expected this also. nbsp; Thank you. nbsp; -Rommel nbsp; nbsp; nbsp; nbsp; - Ursprüngliche Nachricht - Von: Greg Snow-2 [via R] lt;ml- node+s789695n3838250...@n4.nabble.comgt; Datum: Samstag, 24. September 2011, 12:52 am Betreff: Re: Kolmogorov-Smirnov test An: rommel lt;rman...@ifm-geomar.degt; Are you doing the 2 sample KS test? Comparing if 2 samples come from the same distribution? With 3,000 points you will still likely have power to find meaningless differences, what exactly are you trying to accomplish by doing the comparison? I am really only familiar with the KS test done in R (which did not make your list, yet you are asking on an R mailing list). nbsp;Differences could be due to errors, different assumptions, different algorithms, sunspots, or any number of other things. nbsp;Are the differences meaningful? nbsp;R lets you see exactly what it is doing so you can check errors/assumptions/algorithms, I don't know about the ones you show. You will need to ask someone who knows the programs you reference to determine what input they are expecting. nbsp;R expects the raw data. -Original Message- From: [hidden email] [mailto: [hidden email] ] On Behalf Of rommel Sent: Friday, September 23, 2011 7:51 AM To: [hidden email] Subject: Re: [R] Kolmogorov-Smirnov test Dear Dr. Snow, I would like to ask for help on my three questions regarding Kolmogorov Smirnov test. 1. 'With a sample size over 10,000 you will have power to detect differences that are not practically meaningful. ' nbsp; nbsp; -Is sample size of 3000 for each sample okay for using Kolmogorov Smirnov test? 2. I am checking whether my KS
Re: [R] Kolmogorov-Smirnov test
One additional point, you may want to look at the vis.test function in the TeachingDemos package for one option of comparing that focuses more on meaningful or at least visible differences. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Greg Snow Sent: Monday, September 26, 2011 11:45 AM To: rommel; r-help@r-project.org Subject: Re: [R] Kolmogorov-Smirnov test There are criteria to tell if differences are meaningless, but they come from the science and the researcher, not from statistics tests and algorithms. Consider the question: Is one second of difference important? to answer that needs a bunch of context. One second can be a large period of time in nuclear physics or the 100 yard dash, but a small amount of time in geology or a marathon. Consider the distribution function that is equal to 1 when 0 x 0.99 or 99.99 x 100 and 0 otherwise, is this distribution meaningfully different from the uniform between 0 and 1? In some cases yes, others probably not (and some distribution tests would have an easier or harder time finding this difference). As for the differences in output between the programs, when the sample sizes are the same the KS statistic is pretty straight forward, when they differ there needs to be some type of interpolation of one or both datasets to get the comparison points. The differences you are seeing are probably due to differences in how that interpolation is being done. If the differences are small and do not change the decision then I would not worry about them. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of rommel Sent: Saturday, September 24, 2011 2:30 AM To: r-help@r-project.org Subject: Re: [R] Kolmogorov-Smirnov test Dear Dr. Snow, nbsp; Thank you for your reply. nbsp; 1. Are you doing the 2 sample KS test? Comparing if 2 samples come from the same distribution? -Yes, I am doing 2-sample KS test nbsp; 2. With 3,000 points you will still likely have power to find meaningless differences, what exactly are you trying to accomplish by doing the comparison? - I am comparing the swimming parameters of fish larvae such as move duration and move length. - The comparison is between treatments. -Sample sizes for example in one comparison pair :nbsp; Control (2700 data pts) vs Medium (3012 pts) nbsp; Dmax = 0.07 p-level lt;0.001 - Are there criteria to know if the differences are meaningless or not? nbsp; 3. I am really only familiar with the KS test done in R (which did not make your list, yet you are asking on an R mailing list). Differences could be due to errors, different assumptions, different algorithms, sunspots, or any number of other things. Are the differences meaningful? R lets you see exactly what it is doing so you can check errors/assumptions/algorithms, I don't know about the ones you show. - sorry i forgot to list the R. I thought wessa.net was using R already. but I also made the software comparisons using R. The results were: nbsp;nbsp;nbsp; with equal data points: results are the same in both Dmax and p-value nbsp;nbsp;nbsp;nbsp;with unequal data points : conclusions from results were the same such that significant difference between samples holds through using different softwares. Only the Dmax and p-values differ a bit. (please see attached file for the comparisons). nbsp; 4. You will need to ask someone who knows the programs you reference to determine what input they are expecting. R expects the raw data. - Thanks! I expected this also. nbsp; Thank you. nbsp; -Rommel nbsp; nbsp; nbsp; nbsp; - Ursprüngliche Nachricht - Von: Greg Snow-2 [via R] lt;ml- node+s789695n3838250...@n4.nabble.comgt; Datum: Samstag, 24. September 2011, 12:52 am Betreff: Re: Kolmogorov-Smirnov test An: rommel lt;rman...@ifm-geomar.degt; Are you doing the 2 sample KS test? Comparing if 2 samples come from the same distribution? With 3,000 points you will still likely have power to find meaningless differences, what exactly are you trying to accomplish by doing the comparison? I am really only familiar with the KS test done in R (which did not make your list, yet you are asking on an R mailing list). nbsp;Differences could be due to errors, different assumptions, different algorithms, sunspots, or any number of other things. nbsp;Are the differences meaningful? nbsp;R lets you see exactly what it is doing so you can check errors/assumptions/algorithms, I don't know about the ones you show. You will need to ask someone who knows
Re: [R] Kolmogorov-Smirnov test
Dear Dr. Snow, nbsp; Thank you for your reply. nbsp; 1. Are you doing the 2 sample KS test? Comparing if 2 samples come from the same distribution? -Yes, I am doing 2-sample KS test nbsp; 2. With 3,000 points you will still likely have power to find meaningless differences, what exactly are you trying to accomplish by doing the comparison? - I am comparing the swimming parameters of fish larvae such as move duration and move length. - The comparison is between treatments. -Sample sizes for example in one comparison pair :nbsp; Control (2700 data pts) vs Medium (3012 pts) nbsp; Dmax = 0.07 p-level lt;0.001 - Are there criteria to know if the differences are meaningless or not? nbsp; 3. I am really only familiar with the KS test done in R (which did not make your list, yet you are asking on an R mailing list). Differences could be due to errors, different assumptions, different algorithms, sunspots, or any number of other things. Are the differences meaningful? R lets you see exactly what it is doing so you can check errors/assumptions/algorithms, I don't know about the ones you show. - sorry i forgot to list the R. I thought wessa.net was using R already. but I also made the software comparisons using R. The results were: nbsp;nbsp;nbsp; with equal data points: results are the same in both Dmax and p-value nbsp;nbsp;nbsp;nbsp;with unequal data points : conclusions from results were the same such that significant difference between samples holds through using different softwares. Only the Dmax and p-values differ a bit. (please see attached file for the comparisons). nbsp; 4. You will need to ask someone who knows the programs you reference to determine what input they are expecting. R expects the raw data. - Thanks! I expected this also. nbsp; Thank you. nbsp; -Rommel nbsp; nbsp; nbsp; nbsp; - Ursprüngliche Nachricht - Von: Greg Snow-2 [via R] lt;ml-node+s789695n3838250...@n4.nabble.comgt; Datum: Samstag, 24. September 2011, 12:52 am Betreff: Re: Kolmogorov-Smirnov test An: rommel lt;rman...@ifm-geomar.degt; Are you doing the 2 sample KS test? Comparing if 2 samples come from the same distribution? With 3,000 points you will still likely have power to find meaningless differences, what exactly are you trying to accomplish by doing the comparison? I am really only familiar with the KS test done in R (which did not make your list, yet you are asking on an R mailing list). nbsp;Differences could be due to errors, different assumptions, different algorithms, sunspots, or any number of other things. nbsp;Are the differences meaningful? nbsp;R lets you see exactly what it is doing so you can check errors/assumptions/algorithms, I don't know about the ones you show. You will need to ask someone who knows the programs you reference to determine what input they are expecting. nbsp;R expects the raw data. -Original Message- From: [hidden email] [mailto: [hidden email] ] On Behalf Of rommel Sent: Friday, September 23, 2011 7:51 AM To: [hidden email] Subject: Re: [R] Kolmogorov-Smirnov test Dear Dr. Snow, I would like to ask for help on my three questions regarding Kolmogorov Smirnov test. 1. 'With a sample size over 10,000 you will have power to detect differences that are not practically meaningful. ' nbsp; nbsp; -Is sample size of 3000 for each sample okay for using Kolmogorov Smirnov test? 2. I am checking whether my KS procedure is correct. I have compared results of KS tests using the following 3 softwares: 1. Statistica 2. http://www.wessa.net/rwasp_Reddy-Moores%20K-S%20Test.wasp 3. http://www.physics.csbsju.edu/stats/KS-test.html I have observed that the three softwares produced the same results only if the samples sizes are equal. However, when samples are not equal, I did not get similar results particularly from the wessa.net calculator. Is it allowed to do a KS test to compare samples with unequal sizes? 3. Is it allowed to use the raw data values in doing KS test? Or should I use the frequencies obtained from frequency distribution table of the raw data from each sample? I think that when I use the frequency, the KS test will construct new cumulative fractions from the frequencies, which I think is not right. Hope you can assist me. Thanks! -rommel nbsp; -- View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-tp3479506p3836910.html Sent from the R help mailing list archive at Nabble.com. __ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal
Re: [R] Kolmogorov-Smirnov test
Dear Dr. Snow, I would like to ask for help on my three questions regarding Kolmogorov Smirnov test. 1. 'With a sample size over 10,000 you will have power to detect differences that are not practically meaningful. ' -Is sample size of 3000 for each sample okay for using Kolmogorov Smirnov test? 2. I am checking whether my KS procedure is correct. I have compared results of KS tests using the following 3 softwares: 1. Statistica 2. http://www.wessa.net/rwasp_Reddy-Moores%20K-S%20Test.wasp 3. http://www.physics.csbsju.edu/stats/KS-test.html I have observed that the three softwares produced the same results only if the samples sizes are equal. However, when samples are not equal, I did not get similar results particularly from the wessa.net calculator. Is it allowed to do a KS test to compare samples with unequal sizes? 3. Is it allowed to use the raw data values in doing KS test? Or should I use the frequencies obtained from frequency distribution table of the raw data from each sample? I think that when I use the frequency, the KS test will construct new cumulative fractions from the frequencies, which I think is not right. Hope you can assist me. Thanks! -rommel -- View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-tp3479506p3836910.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov-Smirnov test
Are you doing the 2 sample KS test? Comparing if 2 samples come from the same distribution? With 3,000 points you will still likely have power to find meaningless differences, what exactly are you trying to accomplish by doing the comparison? I am really only familiar with the KS test done in R (which did not make your list, yet you are asking on an R mailing list). Differences could be due to errors, different assumptions, different algorithms, sunspots, or any number of other things. Are the differences meaningful? R lets you see exactly what it is doing so you can check errors/assumptions/algorithms, I don't know about the ones you show. You will need to ask someone who knows the programs you reference to determine what input they are expecting. R expects the raw data. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of rommel Sent: Friday, September 23, 2011 7:51 AM To: r-help@r-project.org Subject: Re: [R] Kolmogorov-Smirnov test Dear Dr. Snow, I would like to ask for help on my three questions regarding Kolmogorov Smirnov test. 1. 'With a sample size over 10,000 you will have power to detect differences that are not practically meaningful. ' -Is sample size of 3000 for each sample okay for using Kolmogorov Smirnov test? 2. I am checking whether my KS procedure is correct. I have compared results of KS tests using the following 3 softwares: 1. Statistica 2. http://www.wessa.net/rwasp_Reddy-Moores%20K-S%20Test.wasp 3. http://www.physics.csbsju.edu/stats/KS-test.html I have observed that the three softwares produced the same results only if the samples sizes are equal. However, when samples are not equal, I did not get similar results particularly from the wessa.net calculator. Is it allowed to do a KS test to compare samples with unequal sizes? 3. Is it allowed to use the raw data values in doing KS test? Or should I use the frequencies obtained from frequency distribution table of the raw data from each sample? I think that when I use the frequency, the KS test will construct new cumulative fractions from the frequencies, which I think is not right. Hope you can assist me. Thanks! -rommel -- View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-tp3479506p3836910.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov-Smirnov test
Hi, many thanks for helpful answer. Best Marcin M.-- View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-tp3479506p3488364.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov-Smirnov test
The general idea of the KS test (and others) can be applied to discrete data, but the implementation in R assumes continuous data (does not have the needed adjustments to deal with ties). The chi-square and other tests suffer from the same problems in your case. In all cases the null hypothesis is that the data comes from the stated distribution (poisson in your case), failing to reject the null hypothesis does not prove that the data comes from that distribution, only shows that we cannot disprove that it comes from that distribution. With large sample sizes, your data could come from a true distribution that for all practical purposes is equivalent to the poisson, but due to slight rounding or other errors has probabilities slightly different for some values (a difference that no one would reasonably care about), but these tests can show a significant difference. Usually it is better to just show that your data and the theoretical distribution are close enough to each other rather than depending on a formal test. The plots and diagnostics in the vcd package are a good choice here, you could also use the KS test statistic (ignoring the p-value and warnings) as another measure, but plot the empirical and theoretical distributions to see what the value means and how close they are. Another option is the vis.test function in TeachingDemos, it lets you plot data from the theoretical distribution and the actual data, then see if you can visually tell the difference. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of m.marcinmichal Sent: Thursday, April 28, 2011 3:54 PM To: r-help@r-project.org Subject: Re: [R] Kolmogorov-Smirnov test Hi, thanks for response. The Kolmogorov-Smirnov test is designed for distributions on continuous variable, not discrete like the poisson. That is why you are getting some of your warnings. I read in Fitting distributions whith R Vito Ricci page 19 that: ... Kolmogorov-Smirnov test is used to decide if a sample comes from a population with a specific distribution. I can be applied both for discrete (count) data and continuous binned (even if some Authors do not agree on this point) and both for continuous variables but in page 16 i read that ... while the Kolmogorov-Smirnov and Anderson-Darling tests are restricted to continuous distribution and i was little confused, but try this test to my discrete data. Generally in first step, I try fit my data to discret or continuous distribution (task: find distribution for emirical data). Question, Can I approximate my discret data by the continuous distribution? I know that sometmies we can poisson distribution approxime by the normal distribution. But what happen if I use another distribution like log normall or gama? I done another three tests - chi square test. But this tests return three another results. Suppose that we have the same data i.e vectorSentence. Test: 1. One param - fitdistr(vectorSentence, poisson) chisq.test(table(vectorSentence), p = dpois(1:9, lambda=param[[1]][1]), rescale.p = TRUE) X-squared = 272.8958, df = 8, p-value 2.2e-16 2. Two library(vcd) gf - goodfit(vectorSentence, type=poisson, method=MinChisq) summary(gf) X^2 df P( X^2) Pearson 404.3607 8 2.186332e-82 3. Three fdistc - fitdist(vectorSentence, pois) g-gofstat(fdistc, print.test = TRUE) Chi-squared statistic: 535.344 Degree of freedom of the Chi-squared distribution: 8 Chi-squared p-value: 1.824112e-110 Question which results is correct? I know that I can reject null hipotesis: data don't come from poisson distribution. But which result is correct? For another side I trying to accomplish another problem: 1. Suppose that we have a reference data (dr) from some process (pr) which save in vectorSentence. 2. Suppose that we have a two another sample data d1, d2 from another two process p1, p2 3. We know that all data is discrete. Task: One: check if data d1, d2 is equal to reference data (dr) - this is not a problem. I use a cdf, histogram, another mensure etc. chi square test. But can I use Kolmogorov-Smirnov to test cumulative distribution function hipotesis i.e F(d1) = F(d) for my data? Two: find dr distributions discret or if possible continuous Best Marcin M. -- View this message in context: http://r.789695.n4.nabble.com/Kolmogorov- Smirnov-test-tp3479506p3482349.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code
Re: [R] Kolmogorov-Smirnov test
A couple of things to consider: The Kolmogorov-Smirnov test is designed for distributions on continuous variable, not discrete like the poisson. That is why you are getting some of your warnings. With a sample size over 10,000 you will have power to detect differences that are not practically meaningful. You might as well use SnowsPenultimateNormalityTest (at least read the help page). What are you trying to accomplish? We may be able to give you a better approach. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of m.marcinmichal Sent: Wednesday, April 27, 2011 3:23 PM To: r-help@r-project.org Subject: [R] Kolmogorov-Smirnov test Hi, I have a problem with Kolmogorov-Smirnov test fit. I try fit distribution to my data. Actualy I create two test: - # First Kolmogorov-Smirnov Tests fit - # Second Kolmogorov-Smirnov Tests fit see below. This two test return difrent result and i don't know which is properly. Which result is properly? The first test return lower D = 0.0234 and lower p-value = 0.00304. The lower 'D' indicate that distribution function (empirical and teoretical) coincide but low p-value indicate that i can reject hypotezis H0. For another side this p-value is most higer than p-value from second test (2.2e-16). Which result, test is most propertly? matr = rbind(c(1,2)) layout(matr) # length vectorSentence = 11999 vectorSentence - c() vectorLength - length(vectorSentence) # assume that we have a table(vectorSentence) # 123456789 # 512 1878 2400 2572 1875 1206 721 520 315 # Poisson parameter param - fitdistr(vectorSentence, poisson) # Expected density density.exp - dpois(1:9, lambda=param[[1]][1]) # Expected frequ. frequ.exp - dpois(1:9, lambda=param[[1]][1])*vectorLength # Construct numeric vector of data values (y = vFrequ for Kolmogorov- Smirnov Tests) vFrequ - c() for(i in 1:length(frequ.exp)) { vFrequ - append(vFrequ, rep(i, times=frequ.exp[i])) } # Check transformation plot(density.exp, ylim=c(0,0.20)) == plot(table(vFrequ)/vectorLength, ylim=c(0,0.20)) plot(table(vectorSentence)/vectorLength) plot(density.exp, ylim=c(0,0.20)) par(new=TRUE) plot(table(vFrequ)/vectorLength, ylim=c(0,0.20)) # First Kolmogorov-Smirnov Tests fit ks.test(vectorSentence, vFrequ) # Second Kolmogorov-Smirnov Tests fit ks.test(vectorSentence, dpois, lambda=param[[1]][1]) # First Kolmogorov-Smirnov Tests fit return data Two-sample Kolmogorov-Smirnov test data: vectorSentence and vFrequ D = 0.0234, p-value = 0.00304 alternative hypothesis: two-sided Warning message: In ks.test(vectorSentence, vFrequ) : cannot compute correct p-values with ties # Second Kolmogorov-Smirnov Tests fit return data One-sample Kolmogorov-Smirnov test data: vectorSentence D = 0.9832, p-value 2.2e-16 alternative hypothesis: two-sided Warning message: In ks.test(vectorSentence, dpois, lambda = param[[1]][1]) : cannot compute correct p-values with ties Best Marcin M. -- View this message in context: http://r.789695.n4.nabble.com/Kolmogorov- Smirnov-test-tp3479506p3479506.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov-Smirnov test
Hi, thanks for response. The Kolmogorov-Smirnov test is designed for distributions on continuous variable, not discrete like the poisson. That is why you are getting some of your warnings. I read in Fitting distributions whith R Vito Ricci page 19 that: ... Kolmogorov-Smirnov test is used to decide if a sample comes from a population with a specific distribution. I can be applied both for discrete (count) data and continuous binned (even if some Authors do not agree on this point) and both for continuous variables but in page 16 i read that ... while the Kolmogorov-Smirnov and Anderson-Darling tests are restricted to continuous distribution and i was little confused, but try this test to my discrete data. Generally in first step, I try fit my data to discret or continuous distribution (task: find distribution for emirical data). Question, Can I approximate my discret data by the continuous distribution? I know that sometmies we can poisson distribution approxime by the normal distribution. But what happen if I use another distribution like log normall or gama? I done another three tests - chi square test. But this tests return three another results. Suppose that we have the same data i.e vectorSentence. Test: 1. One param - fitdistr(vectorSentence, poisson) chisq.test(table(vectorSentence), p = dpois(1:9, lambda=param[[1]][1]), rescale.p = TRUE) X-squared = 272.8958, df = 8, p-value 2.2e-16 2. Two library(vcd) gf - goodfit(vectorSentence, type=poisson, method=MinChisq) summary(gf) X^2 df P( X^2) Pearson 404.3607 8 2.186332e-82 3. Three fdistc - fitdist(vectorSentence, pois) g-gofstat(fdistc, print.test = TRUE) Chi-squared statistic: 535.344 Degree of freedom of the Chi-squared distribution: 8 Chi-squared p-value: 1.824112e-110 Question which results is correct? I know that I can reject null hipotesis: data don't come from poisson distribution. But which result is correct? For another side I trying to accomplish another problem: 1. Suppose that we have a reference data (dr) from some process (pr) which save in vectorSentence. 2. Suppose that we have a two another sample data d1, d2 from another two process p1, p2 3. We know that all data is discrete. Task: One: check if data d1, d2 is equal to reference data (dr) - this is not a problem. I use a cdf, histogram, another mensure etc. chi square test. But can I use Kolmogorov-Smirnov to test cumulative distribution function hipotesis i.e F(d1) = F(d) for my data? Two: find dr distributions discret or if possible continuous Best Marcin M. -- View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-tp3479506p3482349.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov-Smirnov test
This test SnowsPenultimateNormalityTest() is great :) Best -- View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-tp3479506p3482401.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Kolmogorov-Smirnov test
Hi, I have a problem with Kolmogorov-Smirnov test fit. I try fit distribution to my data. Actualy I create two test: - # First Kolmogorov-Smirnov Tests fit - # Second Kolmogorov-Smirnov Tests fit see below. This two test return difrent result and i don't know which is properly. Which result is properly? The first test return lower D = 0.0234 and lower p-value = 0.00304. The lower 'D' indicate that distribution function (empirical and teoretical) coincide but low p-value indicate that i can reject hypotezis H0. For another side this p-value is most higer than p-value from second test (2.2e-16). Which result, test is most propertly? matr = rbind(c(1,2)) layout(matr) # length vectorSentence = 11999 vectorSentence - c() vectorLength - length(vectorSentence) # assume that we have a table(vectorSentence) # 123456789 # 512 1878 2400 2572 1875 1206 721 520 315 # Poisson parameter param - fitdistr(vectorSentence, poisson) # Expected density density.exp - dpois(1:9, lambda=param[[1]][1]) # Expected frequ. frequ.exp - dpois(1:9, lambda=param[[1]][1])*vectorLength # Construct numeric vector of data values (y = vFrequ for Kolmogorov-Smirnov Tests) vFrequ - c() for(i in 1:length(frequ.exp)) { vFrequ - append(vFrequ, rep(i, times=frequ.exp[i])) } # Check transformation plot(density.exp, ylim=c(0,0.20)) == plot(table(vFrequ)/vectorLength, ylim=c(0,0.20)) plot(table(vectorSentence)/vectorLength) plot(density.exp, ylim=c(0,0.20)) par(new=TRUE) plot(table(vFrequ)/vectorLength, ylim=c(0,0.20)) # First Kolmogorov-Smirnov Tests fit ks.test(vectorSentence, vFrequ) # Second Kolmogorov-Smirnov Tests fit ks.test(vectorSentence, dpois, lambda=param[[1]][1]) # First Kolmogorov-Smirnov Tests fit return data Two-sample Kolmogorov-Smirnov test data: vectorSentence and vFrequ D = 0.0234, p-value = 0.00304 alternative hypothesis: two-sided Warning message: In ks.test(vectorSentence, vFrequ) : cannot compute correct p-values with ties # Second Kolmogorov-Smirnov Tests fit return data One-sample Kolmogorov-Smirnov test data: vectorSentence D = 0.9832, p-value 2.2e-16 alternative hypothesis: two-sided Warning message: In ks.test(vectorSentence, dpois, lambda = param[[1]][1]) : cannot compute correct p-values with ties Best Marcin M. -- View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-tp3479506p3479506.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov-smirnov test
It's designed for continuous distributions. See the first sentence here: http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test K-S is conservative on discrete distributions On Sat, Feb 19, 2011 at 1:52 PM, tsippel tsip...@gmail.com wrote: Is the kolmogorov-smirnov test valid on both continuous and discrete data? I don't think so, and the example below helped me understand why. A suggestion on testing the discrete data would be appreciated. Thanks, a - rnorm(1000, 10, 1);a # normal distribution a b - rnorm(1000, 12, 1.5);b # normal distribution b c - rnorm(1000, 8, 1);c # normal distribution c d - rnorm(1000, 12, 2.5);d # normal distribution d par(mfrow=c(2,2), las=1) ahist-hist(a, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms of a bhist-hist(b, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms of b chist-hist(c, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms of c dhist-hist(d, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms of d ks.test(c(a,b), c(c,d), alternative=two.sided) # kolmogorov-smirnov on continuous data ks.test(c(ahist$density, bhist$density), c(chist$density, dhist$density), alternative=two.sided) # kolmogorov-smirnov on discrete data [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov-smirnov test
Taylor Arnold and I have developed a package ks.test (available on R-Forge in beta version) that modifies stats::ks.test to handle discrete null distributions for one-sample tests. We also have a draft of a paper we could provide (email us). The package uses methodology of Conover (1972) and Gleser (1985) to provide exact p-values. It also corrects an algorithmic problem with stats::ks.test in the calculation of the test statistic. This is not a bug, per se, because it was never intended to be used this way. We will submit this new function for inclusion in package stats once we're done testing. So, for example: # With the default ks.test (ouch): stats::ks.test(c(0,1), ecdf(c(0,1))) One-sample Kolmogorov-Smirnov test data: c(0, 1) D = 0.5, p-value = 0.5 alternative hypothesis: two-sided # With our new function (what you would want in this toy example): ks.test::ks.test(c(0,1), ecdf(c(0,1))) One-sample Kolmogorov-Smirnov test data: c(0, 1) D = 0, p-value = 1 alternative hypothesis: two-sided Original Message: Date: Mon, 28 Feb 2011 21:31:26 +1100 From: Glen Barnett glnbr...@gmail.com To: tsippel tsip...@gmail.com Cc: r-help@r-project.org Subject: Re: [R] Kolmogorov-smirnov test Message-ID: aanlktikcjigrgjuotkozqfxfqatin6arzjvt_appi...@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1 It's designed for continuous distributions. See the first sentence here: http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test K-S is conservative on discrete distributions On Sat, Feb 19, 2011 at 1:52 PM, tsippel tsip...@gmail.com wrote: Is the kolmogorov-smirnov test valid on both continuous and discrete data? ?I don't think so, and the example below helped me understand why. A suggestion on testing the discrete data would be appreciated. Thanks, -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Kolmogorov-smirnov test
Is the kolmogorov-smirnov test valid on both continuous and discrete data? I don't think so, and the example below helped me understand why. A suggestion on testing the discrete data would be appreciated. Thanks, a - rnorm(1000, 10, 1);a # normal distribution a b - rnorm(1000, 12, 1.5);b # normal distribution b c - rnorm(1000, 8, 1);c # normal distribution c d - rnorm(1000, 12, 2.5);d # normal distribution d par(mfrow=c(2,2), las=1) ahist-hist(a, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms of a bhist-hist(b, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms of b chist-hist(c, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms of c dhist-hist(d, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms of d ks.test(c(a,b), c(c,d), alternative=two.sided) # kolmogorov-smirnov on continuous data ks.test(c(ahist$density, bhist$density), c(chist$density, dhist$density), alternative=two.sided) # kolmogorov-smirnov on discrete data [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov-smirnov test
The KS test was designed for continuous variables. The vcd package has tools for exploring categorical variables and distributions. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of tsippel Sent: Friday, February 18, 2011 7:52 PM To: r-help@r-project.org Subject: [R] Kolmogorov-smirnov test Is the kolmogorov-smirnov test valid on both continuous and discrete data? I don't think so, and the example below helped me understand why. A suggestion on testing the discrete data would be appreciated. Thanks, a - rnorm(1000, 10, 1);a # normal distribution a b - rnorm(1000, 12, 1.5);b # normal distribution b c - rnorm(1000, 8, 1);c # normal distribution c d - rnorm(1000, 12, 2.5);d # normal distribution d par(mfrow=c(2,2), las=1) ahist-hist(a, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms of a bhist-hist(b, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms of b chist-hist(c, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms of c dhist-hist(d, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms of d ks.test(c(a,b), c(c,d), alternative=two.sided) # kolmogorov-smirnov on continuous data ks.test(c(ahist$density, bhist$density), c(chist$density, dhist$density), alternative=two.sided) # kolmogorov-smirnov on discrete data [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov Smirnov Test
Thanks for the feedback. My goal is to run a simple test to show that the data cannot be rejected as either normally or uniformally distributed (depening on the variable), which is what a previous K-S test run using SPSS had shown. The actual distribution I compare to my sample only matters that it would be rejected were my data multi- modal. This way I can suggest the data is from the same population. I later run PCA and cluster analyses to confirm this but I want an easy stat to start with for the individual variables. I didn't think I was comparing my data against itself, but rather again a normal distribution with the same mean and standard deviation. Using the mean seems necessary, so is it incorrect to have the same standard deviation too? I need to go back and read on the K-S test to see what the appropriate constraints are before bothering anyone for more help. Sorry, I thought I had it. Thanks again, kbrownk On Nov 11, 12:40 am, Greg Snow greg.s...@imail.org wrote: The way you are running the test the null hypothesis is that the data comes from a normal distribution with mean=0 and standard deviation = 1. If your minimum data value is 0, then it seems very unlikely that the mean is 0. So the test is being strongly influenced by the mean and standard deviation not just the shape of the distribution. Note that the KS test was not designed to test against a distribution with parameters estimated from the same data (you can do the test, but it makes the p-value inaccurate). You can do a little better by simulating the process and comparing the KS statistic to the simulations rather than looking at the computed p-value. However you should ask yourself why you are doing the normality tests in the first place. The common reasons that people do this don't match with what the tests actually test (see the fortunes on normality). -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Kerry Sent: Wednesday, November 10, 2010 9:23 PM To: r-h...@r-project.org Subject: [R] Kolmogorov Smirnov Test I'm using ks.test (mydata, dnorm) on my data. I know some of my different variable samples (mydata1, mydata2, etc) must be normally distributed but the p value is always 2.0^-16 (the 2.0 can change but not the exponent). I want to test mydata against a normal distribution. What could I be doing wrong? I tried instead using rnorm to create a normal distribution: y = rnorm (68,mean=mydata, sd=mydata), where N= the sample size from mydata. Then I ran the k-s: ks.test (mydata,y). Should this work? One issue I had was that some of my data has a minimum value of 0, but rnorm ran as I have it above will potentially create negative numbers. Also some of my variables will likely be better tested against non- normal distributions (uniform etc.), but if I figure I should learn how to even use ks.test first. I used to use SPSS but am really trying to jump into R instead, but I find the help to assume too heavy of statistical knowledge. I'm guessing I have a long road before I get this, so any bits of information that may help me get a bit further will be appreciated! Thanks, kbrownk __ r-h...@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov Smirnov Test
On 11-Nov-10 04:22:55, Kerry wrote: I'm using ks.test (mydata, dnorm) on my data. I think your problem may lie here! If you look at the documentation for ks.test, available with the command: help(ks.test) or simply: ?ks.test you will read the following near the beginning: Usage: ks.test(x, y, ..., Arguments: x: a numeric vector of data values. y: either a numeric vector of data values, or a character string naming a cumulative distribution function or an actual cumulative distribution function such as 'pnorm'. Note *cumulative* and *'pnorm'*. You say that you used 'dnorm'. dnorm is R's name for the *density* function of the Normal distribution, while the name for the *cumulative distribution* function is pnorm. So try the K-S test instead with ks.test(mydata, pnorm, ... ) where (as also stated in '?ks.test') the ... is to be replaced by a list of values for the parameters of the named cumulative distribution. For example (since the parameters for pnorm are its mean and SD): ks.test(mydata, pnorm, mean(mydata), sd(mydata) ) A toy example (comparing the two usages): ## First, using pnorm as above: Y - rnorm(200) ks.test(Y,pnorm,mean(Y),sd(Y)) # One-sample Kolmogorov-Smirnov test # data: Y # D = 0.0251, p-value = 0.9996 # alternative hypothesis: two-sided ## Note the nice P-value ## Next, using dnorm as you wrote: ks.test(Y,dnorm,mean(Y),sd(Y)) # One-sample Kolmogorov-Smirnov test # data: Y # D = 0.9965, p-value 2.2e-16 # alternative hypothesis: two-sided ## (Note the similarity to the p-values you report)! For the deatils of 'dnorm', 'pnorm' and the like, see the help at: ?dnorm or ?pnorm (both lead to the same page). Granted, for a newcomer to R the documentation (which often relies heavily on cross-referencing, and sometimes the cross-references can be difficult to identify) can be difficult to get to grips with. So look on this (which is one of the easier cases) as an initiation into getting to grips with R. Hoping this helps, Ted. I know some of my different variable samples (mydata1, mydata2, etc) must be normally distributed but the p value is always 2.0^-16 (the 2.0 can change but not the exponent). I want to test mydata against a normal distribution. What could I be doing wrong? I tried instead using rnorm to create a normal distribution: y = rnorm (68,mean=mydata, sd=mydata), where N= the sample size from mydata. Then I ran the k-s: ks.test (mydata,y). Should this work? One issue I had was that some of my data has a minimum value of 0, but rnorm ran as I have it above will potentially create negative numbers. Also some of my variables will likely be better tested against non- normal distributions (uniform etc.), but if I figure I should learn how to even use ks.test first. I used to use SPSS but am really trying to jump into R instead, but I find the help to assume too heavy of statistical knowledge. I'm guessing I have a long road before I get this, so any bits of information that may help me get a bit further will be appreciated! Thanks, kbrownk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 11-Nov-10 Time: 09:46:52 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov Smirnov Test
Consider the following simulations (also fixing the pnorm instead of dnorm that Ted pointed out and I missed): out1 - replicate(1, { x - rnorm(1000, 100, 3); ks.test( x, pnorm, mean=100, sd=3 )$p.value } ) out2 - replicate(1, { x - rnorm(1000, 100, 3); ks.test( x, pnorm, mean=mean(x), sd=sd(x) )$p.value } ) par(mfrow=c(2,1)) hist(out1) hist(out2) mean(out1 = 0.05 ) mean(out2 = 0.05 ) In both cases the null hypothesis is true (or at least a meaningful approximation to true) so the p-values should follow a uniform distribution. In the case of out1 where the mean and sd are specified as part of the null the p-values are reasonably uniform and the rejection rate is close to alpha (should asymptotically approach alpha as the number of simulations increases). However looking at out2, where the parameters are set not by outside knowledge or tests, but rather from the observed data, the p-values are clearly not uniform and the rejection rate is far from alpha. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Kerry Sent: Thursday, November 11, 2010 12:02 AM To: r-help@r-project.org Subject: Re: [R] Kolmogorov Smirnov Test Thanks for the feedback. My goal is to run a simple test to show that the data cannot be rejected as either normally or uniformally distributed (depening on the variable), which is what a previous K-S test run using SPSS had shown. The actual distribution I compare to my sample only matters that it would be rejected were my data multi- modal. This way I can suggest the data is from the same population. I later run PCA and cluster analyses to confirm this but I want an easy stat to start with for the individual variables. I didn't think I was comparing my data against itself, but rather again a normal distribution with the same mean and standard deviation. Using the mean seems necessary, so is it incorrect to have the same standard deviation too? I need to go back and read on the K-S test to see what the appropriate constraints are before bothering anyone for more help. Sorry, I thought I had it. Thanks again, kbrownk On Nov 11, 12:40 am, Greg Snow greg.s...@imail.org wrote: The way you are running the test the null hypothesis is that the data comes from a normal distribution with mean=0 and standard deviation = 1. If your minimum data value is 0, then it seems very unlikely that the mean is 0. So the test is being strongly influenced by the mean and standard deviation not just the shape of the distribution. Note that the KS test was not designed to test against a distribution with parameters estimated from the same data (you can do the test, but it makes the p-value inaccurate). You can do a little better by simulating the process and comparing the KS statistic to the simulations rather than looking at the computed p-value. However you should ask yourself why you are doing the normality tests in the first place. The common reasons that people do this don't match with what the tests actually test (see the fortunes on normality). -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Kerry Sent: Wednesday, November 10, 2010 9:23 PM To: r-h...@r-project.org Subject: [R] Kolmogorov Smirnov Test I'm using ks.test (mydata, dnorm) on my data. I know some of my different variable samples (mydata1, mydata2, etc) must be normally distributed but the p value is always 2.0^-16 (the 2.0 can change but not the exponent). I want to test mydata against a normal distribution. What could I be doing wrong? I tried instead using rnorm to create a normal distribution: y = rnorm (68,mean=mydata, sd=mydata), where N= the sample size from mydata. Then I ran the k-s: ks.test (mydata,y). Should this work? One issue I had was that some of my data has a minimum value of 0, but rnorm ran as I have it above will potentially create negative numbers. Also some of my variables will likely be better tested against non- normal distributions (uniform etc.), but if I figure I should learn how to even use ks.test first. I used to use SPSS but am really trying to jump into R instead, but I find the help to assume too heavy of statistical knowledge. I'm guessing I have a long road before I get this, so any bits of information that may help me get a bit further will be appreciated! Thanks, kbrownk __ r-h...@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
Re: [R] Kolmogorov Smirnov Test
Thanks Ted and Greg. I had actually tried pnorm and after having problems, thought maybe I was misunderstanding dnorm as a variable in ks.test due to over- (more likely under) thinking it. I'm assuming now that ks.test will consider my data in cumulative form (makes sense now that I think about it, but I didn't want to assume any steps that the R version of k-s test takes). I plan to explore the ideas and run the simulations you sent in full over the weekend. Thanks again! Kerry On Nov 11, 12:05 pm, Greg Snow greg.s...@imail.org wrote: Consider the following simulations (also fixing the pnorm instead of dnorm that Ted pointed out and I missed): out1 - replicate(1, { x - rnorm(1000, 100, 3); ks.test( x, pnorm, mean=100, sd=3 )$p.value } ) out2 - replicate(1, { x - rnorm(1000, 100, 3); ks.test( x, pnorm, mean=mean(x), sd=sd(x) )$p.value } ) par(mfrow=c(2,1)) hist(out1) hist(out2) mean(out1 = 0.05 ) mean(out2 = 0.05 ) In both cases the null hypothesis is true (or at least a meaningful approximation to true) so the p-values should follow a uniform distribution. In the case of out1 where the mean and sd are specified as part of the null the p-values are reasonably uniform and the rejection rate is close to alpha (should asymptotically approach alpha as the number of simulations increases). However looking at out2, where the parameters are set not by outside knowledge or tests, but rather from the observed data, the p-values are clearly not uniform and the rejection rate is far from alpha. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org801.408.8111begin_of_the_skype_highlighting 801.408.8111 end_of_the_skype_highlighting -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Kerry Sent: Thursday, November 11, 2010 12:02 AM To: r-h...@r-project.org Subject: Re: [R] Kolmogorov Smirnov Test Thanks for the feedback. My goal is to run a simple test to show that the data cannot be rejected as either normally or uniformally distributed (depening on the variable), which is what a previous K-S test run using SPSS had shown. The actual distribution I compare to my sample only matters that it would be rejected were my data multi- modal. This way I can suggest the data is from the same population. I later run PCA and cluster analyses to confirm this but I want an easy stat to start with for the individual variables. I didn't think I was comparing my data against itself, but rather again a normal distribution with the same mean and standard deviation. Using the mean seems necessary, so is it incorrect to have the same standard deviation too? I need to go back and read on the K-S test to see what the appropriate constraints are before bothering anyone for more help. Sorry, I thought I had it. Thanks again, kbrownk On Nov 11, 12:40 am, Greg Snow greg.s...@imail.org wrote: The way you are running the test the null hypothesis is that the data comes from a normal distribution with mean=0 and standard deviation = 1. If your minimum data value is 0, then it seems very unlikely that the mean is 0. So the test is being strongly influenced by the mean and standard deviation not just the shape of the distribution. Note that the KS test was not designed to test against a distribution with parameters estimated from the same data (you can do the test, but it makes the p-value inaccurate). You can do a little better by simulating the process and comparing the KS statistic to the simulations rather than looking at the computed p-value. However you should ask yourself why you are doing the normality tests in the first place. The common reasons that people do this don't match with what the tests actually test (see the fortunes on normality). -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Kerry Sent: Wednesday, November 10, 2010 9:23 PM To: r-h...@r-project.org Subject: [R] Kolmogorov Smirnov Test I'm using ks.test (mydata, dnorm) on my data. I know some of my different variable samples (mydata1, mydata2, etc) must be normally distributed but the p value is always 2.0^-16 (the 2.0 can change but not the exponent). I want to test mydata against a normal distribution. What could I be doing wrong? I tried instead using rnorm to create a normal distribution: y = rnorm (68,mean=mydata, sd=mydata), where N= the sample size from mydata. Then I ran the k-s: ks.test (mydata,y). Should this work? One issue I had was that some of my data has a minimum value of 0
[R] Kolmogorov Smirnov Test
I'm using ks.test (mydata, dnorm) on my data. I know some of my different variable samples (mydata1, mydata2, etc) must be normally distributed but the p value is always 2.0^-16 (the 2.0 can change but not the exponent). I want to test mydata against a normal distribution. What could I be doing wrong? I tried instead using rnorm to create a normal distribution: y = rnorm (68,mean=mydata, sd=mydata), where N= the sample size from mydata. Then I ran the k-s: ks.test (mydata,y). Should this work? One issue I had was that some of my data has a minimum value of 0, but rnorm ran as I have it above will potentially create negative numbers. Also some of my variables will likely be better tested against non- normal distributions (uniform etc.), but if I figure I should learn how to even use ks.test first. I used to use SPSS but am really trying to jump into R instead, but I find the help to assume too heavy of statistical knowledge. I'm guessing I have a long road before I get this, so any bits of information that may help me get a bit further will be appreciated! Thanks, kbrownk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov Smirnov Test
The way you are running the test the null hypothesis is that the data comes from a normal distribution with mean=0 and standard deviation = 1. If your minimum data value is 0, then it seems very unlikely that the mean is 0. So the test is being strongly influenced by the mean and standard deviation not just the shape of the distribution. Note that the KS test was not designed to test against a distribution with parameters estimated from the same data (you can do the test, but it makes the p-value inaccurate). You can do a little better by simulating the process and comparing the KS statistic to the simulations rather than looking at the computed p-value. However you should ask yourself why you are doing the normality tests in the first place. The common reasons that people do this don't match with what the tests actually test (see the fortunes on normality). -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Kerry Sent: Wednesday, November 10, 2010 9:23 PM To: r-help@r-project.org Subject: [R] Kolmogorov Smirnov Test I'm using ks.test (mydata, dnorm) on my data. I know some of my different variable samples (mydata1, mydata2, etc) must be normally distributed but the p value is always 2.0^-16 (the 2.0 can change but not the exponent). I want to test mydata against a normal distribution. What could I be doing wrong? I tried instead using rnorm to create a normal distribution: y = rnorm (68,mean=mydata, sd=mydata), where N= the sample size from mydata. Then I ran the k-s: ks.test (mydata,y). Should this work? One issue I had was that some of my data has a minimum value of 0, but rnorm ran as I have it above will potentially create negative numbers. Also some of my variables will likely be better tested against non- normal distributions (uniform etc.), but if I figure I should learn how to even use ks.test first. I used to use SPSS but am really trying to jump into R instead, but I find the help to assume too heavy of statistical knowledge. I'm guessing I have a long road before I get this, so any bits of information that may help me get a bit further will be appreciated! Thanks, kbrownk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov-Smirnov test, which one to use?
It is not clear what question you are trying to answer. Perhaps if you can give us an explanation of your overall goal then we can be more helpful. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Roslina Zakaria Sent: Wednesday, August 04, 2010 8:34 PM To: r-help@r-project.org Subject: [R] Kolmogorov-Smirnov test, which one to use? Hi, I have two sets of data, an observed data and generated data. The generated data is obtained from the model where the parameters is estimated from the observed data. So I'm not sure which to use either one-sample test ks.test(x+2, pgamma, 3, 2) # two-sided, exact or two-sample test ks.test(x, x2, alternative=l) If I use the one-sample test I need to specified the model which I don't have in my case. Actually I use the two-sample test and when I compare with what I got from using Chi-square test the result is too different. Data: obs_data pre_gam [1,] 93 25.6770 [2,] 115 127.9095 [3,] 125 151.6845 [4,] 120 146.9295 [5,] 106 107.9385 [6,] 101 107.4630 [7,] 75 86.5410 [8,] 58 55.6335 [9,] 46 43.7460 [10,] 38 32.8095 [11,] 31 16.1670 [12,] 17 18.5445 [13,] 10 9.0345 [14,] 16 20.9220 Results: chisq.test(obs_data, p = pre_gam, rescale.p = TRUE) Chi-squared test for given probabilities data: obs_data X-squared = 205.4477, df = 13, p-value 2.2e-16 ks.test(obs_data,pre_gam) Two-sample Kolmogorov-Smirnov test data: obs_data and pre_gam D = 0.2143, p-value = 0.9205 alternative hypothesis: two-sided Am I doing the right thing? Thank you so much for your help. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Kolmogorov-Smirnov test, which one to use?
Hi, I have two sets of data, an observed data and generated data. The generated data is obtained from the model where the parameters is estimated from the observed data. So I'm not sure which to use either one-sample test ks.test(x+2, pgamma, 3, 2) # two-sided, exact or two-sample test ks.test(x, x2, alternative=l) If I use the one-sample test I need to specified the model which I don't have in my case. Actually I use the two-sample test and when I compare with what I got from using Chi-square test the result is too different. Data: obs_data pre_gam [1,] 93 25.6770 [2,] 115 127.9095 [3,] 125 151.6845 [4,] 120 146.9295 [5,] 106 107.9385 [6,] 101 107.4630 [7,] 75 86.5410 [8,] 58 55.6335 [9,] 46 43.7460 [10,] 38 32.8095 [11,] 31 16.1670 [12,] 17 18.5445 [13,] 10 9.0345 [14,] 16 20.9220 Results: chisq.test(obs_data, p = pre_gam, rescale.p = TRUE) Chi-squared test for given probabilities data: obs_data X-squared = 205.4477, df = 13, p-value 2.2e-16 ks.test(obs_data,pre_gam) Two-sample Kolmogorov-Smirnov test data: obs_data and pre_gam D = 0.2143, p-value = 0.9205 alternative hypothesis: two-sided Am I doing the right thing? Thank you so much for your help. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Kolmogorov smirnov test
Hi r-users, I would like to use Kolmogorov smirnov test but in my observed data(xobs) there are ties. I got the warning message. My question is can I do something about it? ks.test(xobs, xsyn) Two-sample Kolmogorov-Smirnov test data: xobs and xsyn D = 0.0502, p-value = 0.924 alternative hypothesis: two-sided Warning message: In ks.test(xobs, xsyn) : cannot compute correct p-values with ties Thank you for all your help. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov smirnov test
Hi Roslina, I believe that you can ignore the warning. Alternatively, you may add a very small random noise to pairs with ties, i.e. something like xobs[which(duplicated(xobs))] - xobs[which(duplicated(xobs))] + 1.0e-6*sd(xobs)*rnorm(length(which(duplicated(xobs Regards, Moshe. --- On Tue, 13/10/09, Roslina Zakaria zrosl...@yahoo.com wrote: From: Roslina Zakaria zrosl...@yahoo.com Subject: [R] Kolmogorov smirnov test To: r-help@r-project.org Received: Tuesday, 13 October, 2009, 9:58 AM Hi r-users, I would like to use Kolmogorov smirnov test but in my observed data(xobs) there are ties. I got the warning message. My question is can I do something about it? ks.test(xobs, xsyn) Two-sample Kolmogorov-Smirnov test data: xobs and xsyn D = 0.0502, p-value = 0.924 alternative hypothesis: two-sided Warning message: In ks.test(xobs, xsyn) : cannot compute correct p-values with ties Thank you for all your help. [[alternative HTML version deleted]] -Inline Attachment Follows- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Kolmogorov-Smirnov test
I got a distribution function and a empirical distribution function. How do I make to Kolmogorov-Smirnov test in R. Lets call the empirical distribution function Fn on [0,1] and the distribution function F on [0,1] ks.test( ) thanks for the help -- View this message in context: http://www.nabble.com/Kolmogorov-Smirnov-test-tp23296096p23296096.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov-Smirnov test
help.search(kolmogorov) ?ks.test andydol...@gmail.com 2009/4/29 mathallan mathanm...@gmail.com I got a distribution function and a empirical distribution function. How do I make to Kolmogorov-Smirnov test in R. Lets call the empirical distribution function Fn on [0,1] and the distribution function F on [0,1] ks.test( ) thanks for the help -- View this message in context: http://www.nabble.com/Kolmogorov-Smirnov-test-tp23296096p23296096.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov-Smirnov test
This is the third homework question you have asked the list to do for you. How many more should we expect? The posting guide is pretty clear in that: Basic statistics and classroom homework: R-help is not intended for these. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of mathallan Sent: Wednesday, April 29, 2009 10:52 AM To: r-help@r-project.org Subject: [R] Kolmogorov-Smirnov test I got a distribution function and a empirical distribution function. How do I make to Kolmogorov-Smirnov test in R. Lets call the empirical distribution function Fn on [0,1] and the distribution function F on [0,1] ks.test( ) thanks for the help -- View this message in context: http://www.nabble.com/Kolmogorov-Smirnov-test-tp23296096p23296096.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This email message, including any attachments, is for th...{{dropped:6}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Kolmogorov–Smirnov Test for Left Censored Data
Can someone recommend a package in R that will perform a two-sample Kolmogorov–Smirnov test on left censored data? The package surv2sample appears to offer such a test for right censored data and I guess that I can use this package if I flip my data, but I figured I would first ask if there was a package specific to left-censored data. Tom -- View this message in context: http://www.nabble.com/Kolmogorov%E2%80%93Smirnov-Test-for-Left-Censored-Data-tp20602916p20602916.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.