Re: [R] Kolmogorov-Smirnov Test

2013-08-02 Thread peter dalgaard

On Aug 2, 2013, at 03:24 , Roslina Zakaria wrote:

 Dear r-users,
  
 I am using KS test to test the goodness of fit for my data and the got the 
 following output.  However, I don't understand about the warning messages.  
 What does it mean by horizontals is not a graphical parameter
  

It's horizontal, but I don't think this is coming from ks.test, which isn't 
supposed to do anything with graphics (unless you modified it). Would you by 
any chance have a graphics device open, for which you have been setting 
parameters?

Also, I think there is a buglet in which R warnings are sometimes delayed, so 
it may came from a previous command. I don't think it would happen twice, 
though.

-pd 


 Thank you so much for any help given and it is very much appreciated.
  
  
 ks.test(compare[,1], compare[,2])
 Two-sample Kolmogorov-Smirnov test
 data:  compare[, 1] and compare[, 2] 
 D = 0.0755, p-value = 2.238e-05
 alternative hypothesis: two-sided 
 Warning messages:
 1: horizontals is not a graphical parameter 
 2: horizontals is not a graphical parameter 
 3: horizontals is not a graphical parameter 
 4: horizontals is not a graphical parameter 
 5: horizontals is not a graphical parameter 
 6: horizontals is not a graphical parameter 
 7: In ks.test(compare[, 1], compare[, 2]) :
   cannot compute correct p-values with ties
 ks.test(compare[,1], compare[,2])
 Two-sample Kolmogorov-Smirnov test
 data:  compare[, 1] and compare[, 2] 
 D = 0.0755, p-value = 2.238e-05
 alternative hypothesis: two-sided 
 Warning messages:
 1: horizontals is not a graphical parameter 
 2: horizontals is not a graphical parameter 
 3: horizontals is not a graphical parameter 
 4: horizontals is not a graphical parameter 
 5: horizontals is not a graphical parameter 
 6: horizontals is not a graphical parameter 
 7: In ks.test(compare[, 1], compare[, 2]) :
   cannot compute correct p-values with ties
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kolmogorov-Smirnov Test

2013-08-02 Thread Roslina Zakaria
Hi Peter,
 
Thank you so much for your explaination.  I draw histogram before that, so 
maybe that warning messages are meant for that.
 


 From: peter dalgaard pda...@gmail.com

Cc: r-help@r-project.org r-help@r-project.org 
Sent: Friday, August 2, 2013 3:11 PM
Subject: Re: [R] Kolmogorov-Smirnov Test
  


On Aug 2, 2013, at 03:24 , Roslina Zakaria wrote:

 Dear r-users,
  
 I am using KS test to test the goodness of fit for my data and the got the 
 following output.  However, I don't understand about the warning messages.  
 What does it mean by horizontals is not a graphical parameter
  

It's horizontal, but I don't think this is coming from ks.test, which isn't 
supposed to do anything with graphics (unless you modified it). Would you by 
any chance have a graphics device open, for which you have been setting 
parameters?

Also, I think there is a buglet in which R warnings are sometimes delayed, so 
it may came from a previous command. I don't think it would happen twice, 
though.

-pd 


 Thank you so much for any help given and it is very much appreciated.
  
  
 ks.test(compare[,1], compare[,2])
         Two-sample Kolmogorov-Smirnov test
 data:  compare[, 1] and compare[, 2] 
 D = 0.0755, p-value = 2.238e-05
 alternative hypothesis: two-sided 
 Warning messages:
 1: horizontals is not a graphical parameter 
 2: horizontals is not a graphical parameter 
 3: horizontals is not a graphical parameter 
 4: horizontals is not a graphical parameter 
 5: horizontals is not a graphical parameter 
 6: horizontals is not a graphical parameter 
 7: In ks.test(compare[, 1], compare[, 2]) :
   cannot compute correct p-values with ties
 ks.test(compare[,1], compare[,2])
         Two-sample Kolmogorov-Smirnov test
 data:  compare[, 1] and compare[, 2] 
 D = 0.0755, p-value = 2.238e-05
 alternative hypothesis: two-sided 
 Warning messages:
 1: horizontals is not a graphical parameter 
 2: horizontals is not a graphical parameter 
 3: horizontals is not a graphical parameter 
 4: horizontals is not a graphical parameter 
 5: horizontals is not a graphical parameter 
 6: horizontals is not a graphical parameter 
 7: In ks.test(compare[, 1], compare[, 2]) :
   cannot compute correct p-values with ties
     [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Kolmogorov-Smirnov Test

2013-08-01 Thread Roslina Zakaria
Dear r-users,
 
I am using KS test to test the goodness of fit for my data and the got the 
following output.  However, I don't understand about the warning messages.  
What does it mean by horizontals is not a graphical parameter
 
Thank you so much for any help given and it is very much appreciated.
 
 
 ks.test(compare[,1], compare[,2])
    Two-sample Kolmogorov-Smirnov test
data:  compare[, 1] and compare[, 2] 
D = 0.0755, p-value = 2.238e-05
alternative hypothesis: two-sided 
Warning messages:
1: horizontals is not a graphical parameter 
2: horizontals is not a graphical parameter 
3: horizontals is not a graphical parameter 
4: horizontals is not a graphical parameter 
5: horizontals is not a graphical parameter 
6: horizontals is not a graphical parameter 
7: In ks.test(compare[, 1], compare[, 2]) :
  cannot compute correct p-values with ties
 ks.test(compare[,1], compare[,2])
    Two-sample Kolmogorov-Smirnov test
data:  compare[, 1] and compare[, 2] 
D = 0.0755, p-value = 2.238e-05
alternative hypothesis: two-sided 
Warning messages:
1: horizontals is not a graphical parameter 
2: horizontals is not a graphical parameter 
3: horizontals is not a graphical parameter 
4: horizontals is not a graphical parameter 
5: horizontals is not a graphical parameter 
6: horizontals is not a graphical parameter 
7: In ks.test(compare[, 1], compare[, 2]) :
  cannot compute correct p-values with ties
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves

2012-10-05 Thread user1234
Rui, 

Your response nearly answered a similar question of mine except that I also
have ecdfs of different lengths.  

Do you know how I can adjust  x - seq(min(loga, logb), max(loga, logb),
length.out=length(loga)) 
to account for this?  It must be in length.out() but I'm unsure how to
proceed.

Any advice is much appreciated.

-L


Rui Barradas wrote
 Hello,
 
 Try the following.
 (i've changed the color of the first ecdf.)
 
 
 loga - log10(a+1) # do this
 logb - log10(b+1) # only once
 
 f.a - ecdf(loga)
 f.b - ecdf(logb)
 # (2) max distance D
 
 x - seq(min(loga, logb), max(loga, logb), length.out=length(loga))
 x0 - x[which( abs(f.a(x) - f.b(x)) == max(abs(f.a(x) - f.b(x))) )]
 y0 - f.a(x0)
 y1 - f.b(x0)
 
 plot(f.a, verticals=TRUE, do.points=FALSE, col=blue)
 plot(f.b, verticals=TRUE, do.points=FALSE, col=green, add=TRUE)
 ## alternatine, use standard R plot of ecdf
 #plot(f.a, col=blue)
 #lines(f.b, col=green)
 
 points(c(x0, x0), c(y0, y1), pch=16, col=red)
 segments(x0, y0, x0, y1, col=red, lty=dotted)
 ## alternative, down to x axis
 #segments(x0, 0, x0, y1, col=red, lty=dotted)
 
 
 Hope this helps,
 
 Rui Barradas
 maxbre wrote
 Hi all, 
 
 given this example 
 
 #start 
 
 a-c(0,70,50,100,70,650,1300,6900,1780,4930,1120,700,190,940, 

 760,100,300,36270,5610,249680,1760,4040,164890,17230,75140,1870,22380,5890,2430)
  
 length(a)
 
 b-c(0,0,10,30,50,440,1000,140,70,90,60,60,20,90,180,30,90, 
  3220,490,20790,290,740,5350,940,3910,0,640,850,260) 
 length(b)
 
 out-ks.test(log10(a+1),log10(b+1)) 
 
 # max distance D 
 out$statistic 
 
 f.a-ecdf(log10(a+1)) 
 f.b-ecdf(log10(b+1)) 
 
 plot(f.a, verticals=TRUE, do.points=FALSE, col=red) 
 plot(f.b, verticals=TRUE, do.points=FALSE, col=green, add=TRUE) 
 
 #inverse of ecdf a
 x.a-get(x, environment(f.a))
 y.a-get(y, environment(f.a))
 
 # inverse of ecdf b
 x.b-get(x, environment(f.b))
 y.b-get(y, environment(f.b))
 
 
 #end
 
 I want to plot the max distance between the two ecdf curves as in the
 above given chart
 
 Is that possible and how? 
 
 
 Thanks for your help
 
 PS: this is an amended version of a previous thread (but no reply
 followed) that I’ve deleted from Nabble repository because I realised it
 was not enough clear (now I hope it’s a little better, sorry for that)





--
View this message in context: 
http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4645140.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves

2012-10-05 Thread Rui Barradas

Hello,

Try length.out = max(length(loga), length(logb))

Note also that all of the previous code and the line above assumes that 
we are interested in the max distance, whereas the KS statistic computes 
the supremum of the distance. If it's a two sample test then their 
values are almost surely the same but not if it's a one sample test.


Hope this helps,

Rui Barradas
Em 05-10-2012 12:15, user1234 escreveu:

Rui,

Your response nearly answered a similar question of mine except that I also
have ecdfs of different lengths.

Do you know how I can adjust  x - seq(min(loga, logb), max(loga, logb),
length.out=length(loga))
to account for this?  It must be in length.out() but I'm unsure how to
proceed.

Any advice is much appreciated.

-L


Rui Barradas wrote

Hello,

Try the following.
(i've changed the color of the first ecdf.)


loga - log10(a+1) # do this
logb - log10(b+1) # only once

f.a - ecdf(loga)
f.b - ecdf(logb)
# (2) max distance D

x - seq(min(loga, logb), max(loga, logb), length.out=length(loga))
x0 - x[which( abs(f.a(x) - f.b(x)) == max(abs(f.a(x) - f.b(x))) )]
y0 - f.a(x0)
y1 - f.b(x0)

plot(f.a, verticals=TRUE, do.points=FALSE, col=blue)
plot(f.b, verticals=TRUE, do.points=FALSE, col=green, add=TRUE)
## alternatine, use standard R plot of ecdf
#plot(f.a, col=blue)
#lines(f.b, col=green)

points(c(x0, x0), c(y0, y1), pch=16, col=red)
segments(x0, y0, x0, y1, col=red, lty=dotted)
## alternative, down to x axis
#segments(x0, 0, x0, y1, col=red, lty=dotted)


Hope this helps,

Rui Barradas
maxbre wrote

Hi all,

given this example

#start

a-c(0,70,50,100,70,650,1300,6900,1780,4930,1120,700,190,940,

760,100,300,36270,5610,249680,1760,4040,164890,17230,75140,1870,22380,5890,2430)

length(a)

b-c(0,0,10,30,50,440,1000,140,70,90,60,60,20,90,180,30,90,
  3220,490,20790,290,740,5350,940,3910,0,640,850,260)
length(b)

out-ks.test(log10(a+1),log10(b+1))

# max distance D
out$statistic

f.a-ecdf(log10(a+1))
f.b-ecdf(log10(b+1))

plot(f.a, verticals=TRUE, do.points=FALSE, col=red)
plot(f.b, verticals=TRUE, do.points=FALSE, col=green, add=TRUE)

#inverse of ecdf a
x.a-get(x, environment(f.a))
y.a-get(y, environment(f.a))

# inverse of ecdf b
x.b-get(x, environment(f.b))
y.b-get(y, environment(f.b))


#end

I want to plot the max distance between the two ecdf curves as in the
above given chart

Is that possible and how?


Thanks for your help

PS: this is an amended version of a previous thread (but no reply
followed) that I’ve deleted from Nabble repository because I realised it
was not enough clear (now I hope it’s a little better, sorry for that)





--
View this message in context: 
http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4645140.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves

2012-10-05 Thread Brian S Cade
Another alternative is to put the data in a linear model structure (1 
column for the response, another column for an indicator variable 
indicating group) and estimate all possible quantile regressions with rq() 
in quantreg package using a model with y ~ intercept + indicator (0,1) 
variable for group.   The estimated quantiles for the intercept will be 
the quantiles of the ecdf for one group and the estimated quantiles for 
the indicator grouping variable will be the differences in quantiles 
(ecdf) between the two groups.   There is useful built in graphing 
capability in quantreg with the plot.rqs() function.

Brian

Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  brian_c...@usgs.gov
tel:  970 226-9326



From:
user1234 mehenderso...@gmail.com
To:
r-help@r-project.org
Date:
10/05/2012 06:46 AM
Subject:
Re: [R] Kolmogorov-Smirnov test and the plot of max distance between two 
ecdf curves
Sent by:
r-help-boun...@r-project.org



Rui, 

Your response nearly answered a similar question of mine except that I 
also
have ecdfs of different lengths. 

Do you know how I can adjust  x - seq(min(loga, logb), max(loga, logb),
length.out=length(loga)) 
to account for this?  It must be in length.out() but I'm unsure how to
proceed.

Any advice is much appreciated.

-L


Rui Barradas wrote
 Hello,
 
 Try the following.
 (i've changed the color of the first ecdf.)
 
 
 loga - log10(a+1) # do this
 logb - log10(b+1) # only once
 
 f.a - ecdf(loga)
 f.b - ecdf(logb)
 # (2) max distance D
 
 x - seq(min(loga, logb), max(loga, logb), length.out=length(loga))
 x0 - x[which( abs(f.a(x) - f.b(x)) == max(abs(f.a(x) - f.b(x))) )]
 y0 - f.a(x0)
 y1 - f.b(x0)
 
 plot(f.a, verticals=TRUE, do.points=FALSE, col=blue)
 plot(f.b, verticals=TRUE, do.points=FALSE, col=green, add=TRUE)
 ## alternatine, use standard R plot of ecdf
 #plot(f.a, col=blue)
 #lines(f.b, col=green)
 
 points(c(x0, x0), c(y0, y1), pch=16, col=red)
 segments(x0, y0, x0, y1, col=red, lty=dotted)
 ## alternative, down to x axis
 #segments(x0, 0, x0, y1, col=red, lty=dotted)
 
 
 Hope this helps,
 
 Rui Barradas
 maxbre wrote
 Hi all, 
 
 given this example 
 
 #start 
 
 a-c(0,70,50,100,70,650,1300,6900,1780,4930,1120,700,190,940, 
 
 
760,100,300,36270,5610,249680,1760,4040,164890,17230,75140,1870,22380,5890,2430)
 

 length(a)
 
 b-c(0,0,10,30,50,440,1000,140,70,90,60,60,20,90,180,30,90, 
  3220,490,20790,290,740,5350,940,3910,0,640,850,260) 
 length(b)
 
 out-ks.test(log10(a+1),log10(b+1)) 
 
 # max distance D 
 out$statistic 
 
 f.a-ecdf(log10(a+1)) 
 f.b-ecdf(log10(b+1)) 
 
 plot(f.a, verticals=TRUE, do.points=FALSE, col=red) 
 plot(f.b, verticals=TRUE, do.points=FALSE, col=green, add=TRUE) 
 
 #inverse of ecdf a
 x.a-get(x, environment(f.a))
 y.a-get(y, environment(f.a))
 
 # inverse of ecdf b
 x.b-get(x, environment(f.b))
 y.b-get(y, environment(f.b))
 
 
 #end
 
 I want to plot the max distance between the two ecdf curves as in the
 above given chart
 
 Is that possible and how? 
 
 
 Thanks for your help
 
 PS: this is an amended version of a previous thread (but no reply
 followed) that I?ve deleted from Nabble repository because I realised 
it
 was not enough clear (now I hope it?s a little better, sorry for that)





--
View this message in context: 
http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4645140.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves

2012-05-28 Thread maxbre
thanks rui

that's what I was looking for

I have another related question: 
- why of the difference between the max distance D calculated with ks.test()
and the max distance D  “manually” calculated as in (2)?

I guess it has something to do with the fact that KS is obtained with a
maximisation that depends on the range of x values not necessarly coincident
in the two different approaches

...any thought about this?

maxbre


--
View this message in context: 
http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4631564.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves

2012-05-28 Thread Rui Barradas
Hello,

That's a very difficult question. See

Marsaglia, Tsang, Wang (2003)
http://www.jstatsoft.org/v08/i18/

Simard, L'Ecuyer (2011)
http://www.jstatsoft.org/v39/i11

R's ks functions are a port of Marsaglia et al. to the .C interface.

Rui Barradas

maxbre wrote
 
 thanks rui
 
 that's what I was looking for
 
 I have another related question: 
 - why of the difference between the max distance D calculated with
 ks.test() and the max distance D  “manually” calculated as in (2)?
 
 I guess it has something to do with the fact that KS is obtained with a
 maximisation that depends on the range of x values not necessarly
 coincident in the two different approaches
 
 ...any thought about this?
 
 maxbre
 


--
View this message in context: 
http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4631571.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves

2012-05-28 Thread maxbre
thanks for the help: I'll have a look at the papers
max

Il 28/05/2012 12:31, Rui Barradas [via R] ha scritto:
 Hello,

 That's a very difficult question. See

 Marsaglia, Tsang, Wang (2003)
 http://www.jstatsoft.org/v08/i18/

 Simard, L'Ecuyer (2011)
 http://www.jstatsoft.org/v39/i11

 R's ks functions are a port of Marsaglia et al. to the .C interface.

 Rui Barradas

 maxbre wrote
 thanks rui

 that's what I was looking for

 I have another related question:
 - why of the difference between the max distance D calculated with
 ks.test() and the max distance D  “manually” calculated as in (2)?

 I guess it has something to do with the fact that KS is obtained
 with a maximisation that depends on the range of x values not
 necessarly coincident in the two different approaches

 ...any thought about this?

 maxbre



 
 If you reply to this email, your message will be added to the 
 discussion below:
 http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4631571.html
  

 To unsubscribe from Kolmogorov-Smirnov test and the plot of max 
 distance between two ecdf curves, click here 
 http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4631437code=bWJyZXNzYW5AYXJwYS52ZW5ldG8uaXR8NDYzMTQzN3wyMjQwMjkzMTc=.
 NAML 
 http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
  



--
View this message in context: 
http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4631573.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves

2012-05-28 Thread Rui Barradas
Just a final correction.

I was wrong, stats::ks.test doesn't use only Marsaglia et al.
It's even clearly written in the help page.
Read the documentation before stating!

Rui Barradas

Em 28-05-2012 11:51, maxbre escreveu:
 thanks for the help: I'll have a look at the papers
 max

 Il 28/05/2012 12:31, Rui Barradas [via R] ha scritto:
 Hello,

 That's a very difficult question. See

 Marsaglia, Tsang, Wang (2003)
 http://www.jstatsoft.org/v08/i18/

 Simard, L'Ecuyer (2011)
 http://www.jstatsoft.org/v39/i11

 R's ks functions are a port of Marsaglia et al. to the .C interface.

 Rui Barradas

  maxbre wrote
  thanks rui

  that's what I was looking for

  I have another related question:
  - why of the difference between the max distance D calculated with
  ks.test() and the max distance D  âEURoemanuallyâEUR? calculated as in 
 (2)?

  I guess it has something to do with the fact that KS is obtained
  with a maximisation that depends on the range of x values not
  necessarly coincident in the two different approaches

  ...any thought about this?

  maxbre



 
 If you reply to this email, your message will be added to the
 discussion below:
 http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4631571.html

 To unsubscribe from Kolmogorov-Smirnov test and the plot of max
 distance between two ecdf curves, click here
 http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4631437code=bWJyZXNzYW5AYXJwYS52ZW5ldG8uaXR8NDYzMTQzN3wyMjQwMjkzMTc=.
 NAML
 http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4631573.html
 Sent from the R help mailing list archive at Nabble.com.
   [[alternative HTML version deleted]]



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves

2012-05-26 Thread Rui Barradas
Hello,

Try the following.
(i've changed the color of the first ecdf.)


loga - log10(a+1) # do this
logb - log10(b+1) # only once

f.a - ecdf(loga)
f.b - ecdf(logb)
# (2) max distance D

x - seq(min(loga, logb), max(loga, logb), length.out=length(loga))
x0 - x[which( abs(f.a(x) - f.b(x)) == max(abs(f.a(x) - f.b(x))) )]
y0 - f.a(x0)
y1 - f.b(x0)

plot(f.a, verticals=TRUE, do.points=FALSE, col=blue)
plot(f.b, verticals=TRUE, do.points=FALSE, col=green, add=TRUE)
## alternatine, use standard R plot of ecdf
#plot(f.a, col=blue)
#lines(f.b, col=green)

points(c(x0, x0), c(y0, y1), pch=16, col=red)
segments(x0, y0, x0, y1, col=red, lty=dotted)
## alternative, down to x axis
#segments(x0, 0, x0, y1, col=red, lty=dotted)


Hope this helps,

Rui Barradas

maxbre wrote
 
 Hi all, 
 
 given this example 
 
 #start 
 
 a-c(0,70,50,100,70,650,1300,6900,1780,4930,1120,700,190,940, 

 760,100,300,36270,5610,249680,1760,4040,164890,17230,75140,1870,22380,5890,2430)
  
 length(a)
 
 b-c(0,0,10,30,50,440,1000,140,70,90,60,60,20,90,180,30,90, 
  3220,490,20790,290,740,5350,940,3910,0,640,850,260) 
 length(b)
 
 out-ks.test(log10(a+1),log10(b+1)) 
 
 # max distance D 
 out$statistic 
 
 f.a-ecdf(log10(a+1)) 
 f.b-ecdf(log10(b+1)) 
 
 plot(f.a, verticals=TRUE, do.points=FALSE, col=red) 
 plot(f.b, verticals=TRUE, do.points=FALSE, col=green, add=TRUE) 
 
 #inverse of ecdf a
 x.a-get(x, environment(f.a))
 y.a-get(y, environment(f.a))
 
 # inverse of ecdf b
 x.b-get(x, environment(f.b))
 y.b-get(y, environment(f.b))
 
 
 #end
 
 I want to plot the max distance between the two ecdf curves as in the
 above given chart
 
 Is that possible and how? 
 
 
 Thanks for your help
 
 PS: this is an amended version of a previous thread (but no reply
 followed) that I’ve deleted from Nabble repository because I realised it
 was not enough clear (now I hope it’s a little better, sorry for that)
 


--
View this message in context: 
http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4631438.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves

2012-05-26 Thread maxbre
Hi all, 

given this example 

#start 

a-c(0,70,50,100,70,650,1300,6900,1780,4930,1120,700,190,940, 
   
760,100,300,36270,5610,249680,1760,4040,164890,17230,75140,1870,22380,5890,2430)
 
length(a)

b-c(0,0,10,30,50,440,1000,140,70,90,60,60,20,90,180,30,90, 
 3220,490,20790,290,740,5350,940,3910,0,640,850,260) 
length(b)

out-ks.test(log10(a+1),log10(b+1)) 

# max distance D 
out$statistic 

f.a-ecdf(log10(a+1)) 
f.b-ecdf(log10(b+1)) 

plot(f.a, verticals=TRUE, do.points=FALSE, col=red) 
plot(f.b, verticals=TRUE, do.points=FALSE, col=green, add=TRUE) 

#inverse of ecdf a
x.a-get(x, environment(f.a))
y.a-get(y, environment(f.a))

# inverse of ecdf b
x.b-get(x, environment(f.b))
y.b-get(y, environment(f.b))


#end

I want to plot the max distance between the two ecdf curves as in the above
given chart

Is that possible and how? 


Thanks for your help

PS: this is an amended version of a previous thread (but no reply followed)
that I’ve deleted from Nabble repository because I realised it was not
enough clear (now I hope it’s a little better, sorry for that)


--
View this message in context: 
http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kolmogorov-smirnov test

2011-11-13 Thread karlheinz037
I recently gave a presentation at the 50th Army Operational Research
Symposium at Ft Lee describing an implementation of Conover's exact
calculation method for the KS test applied to discrete distributions. My
implementation was done in Matlab script as opposed to R. Multiple
Monte-Carlo trials were most encouraging.  Seeing a comparison of the
methods of implementation would be interesting.  

--
View this message in context: 
http://r.789695.n4.nabble.com/Kolmogorov-smirnov-test-tp3313842p4037287.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Kolmogorov-Smirnov-Test on binned data, I guess gumbel-distributed data

2011-11-02 Thread Jochen1980
Hi R-Users,

I read some texts related to KS-tests. Most of those authors stated, that
KS-Tests are not suitable for binned data, but some of them refer to 'other'
authors who are claiming that KS-Tests are okay for binned data. 

I searched for sources and can't find examples which approve that it is okay
to use KS-Tests for binned data - do you have any links to articles or
tutorials? 

Anyway, I look for a test  which backens me up that my data is
gumbel-distributed. I estimated the gumbel-parameters mue and beta and after
having a look on  resulting plots, in my opinion: that looks quite good! 

You can the plot, related data, and the rscript here:
www.jochen-bauer.net/downloads/kstest/Rplots-1000.pdf
http://www.jochen-bauer.net/downloads/kstest/rm2700-1000.txt
http://www.jochen-bauer.net/downloads/kstest/rcalc.R 

The story about the data:
I am wondering what test I should choose if KS-Test is not appropriate? I
get real high p-Values for data-row-1-histogram-heights and
fitted-gumbel-distribution-function-to-bin-midth-vals. Most of the time,
KS-test results in distances of 0.01 and p-Values of 0.99 or 1. This sounds
strange to me, too high. Otherwise my plots are looking good and as you can
see, in my first experiment I sampled 1000 values. In a second experiment I
created only 50 random-values for the gumbel-parameter-estimation. I try to
reduce permutations, so I will be able to create results faster, but I have
to find out, when data fails for gumbel-distribution. The results surprised
me, I expected that my tests and plots get worse, but I got still high
p-values for the KS-Test and still a nice looking plot.

www.jochen-bauer.net/downloads/kstest/Rplots-0050.pdf
http://www.jochen-bauer.net/downloads/kstest/rm2700-0050.txt

Moreover besides the shuffled data of my randomisation-test there are
real-data-values. I calculated the p-value that my real data point occurs
under estimated gumbel distribution. Those p-values  between
1000permutation-experiment and 50-permutation-experiment are correlating
enormously ... around 0.98. Pearson and Spearman-correlation-coefficients
told me this. I guess that backens up the fact, that my plots are not
getting worse nor the KS-Tests do. 

I hope I was able to state my current situation and you are able to give me
some hints, for some literature or other tests or backen me up in my guess
that my data is gumbel-distributed.

Thanks in advance.

Jochen

I hope I was able to tell  


--
View this message in context: 
http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-Test-on-binned-data-I-guess-gumbel-distributed-data-tp3983781p3983781.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kolmogorov-Smirnov test

2011-09-26 Thread Greg Snow
There are criteria to tell if differences are meaningless, but they come from 
the science and the researcher, not from statistics tests and algorithms.  
Consider the question: Is one second of difference important?  to answer that 
needs a bunch of context.  One second can be a large period of time in nuclear 
physics or the 100 yard dash, but a small amount of time in geology or a 
marathon.  Consider the distribution function that is equal to 1 when 0  x  
0.99 or 99.99  x  100 and 0 otherwise, is this distribution meaningfully 
different from the uniform between 0 and 1?  In some cases yes, others probably 
not (and some distribution tests would have an easier or harder time finding 
this difference).

As for the differences in output between the programs, when the sample sizes 
are the same the KS statistic is pretty straight forward, when they differ 
there needs to be some type of interpolation of one or both datasets to get the 
comparison points.  The differences you are seeing are probably due to 
differences in how that interpolation is being done.  If the differences are 
small and do not change the decision then I would not worry about them.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of rommel
 Sent: Saturday, September 24, 2011 2:30 AM
 To: r-help@r-project.org
 Subject: Re: [R] Kolmogorov-Smirnov test
 
 Dear Dr. Snow,
 nbsp;
 Thank you for your reply.
 nbsp;
 1. Are you doing the 2 sample KS test? Comparing if 2 samples come from
 the same distribution? -Yes, I am doing 2-sample KS test
 nbsp;
 2. With 3,000 points you will still likely have power to find
 meaningless differences, what exactly are you trying to accomplish by
 doing the comparison? - I am comparing the swimming parameters of fish
 larvae such as move duration and move length.
 - The comparison is between treatments.
 -Sample sizes for example in one comparison pair :nbsp; Control (2700
 data pts) vs Medium (3012 pts)
 nbsp; Dmax = 0.07 p-level lt;0.001
 - Are there criteria to know if the differences are meaningless or not?
 nbsp;
 3. I am really only familiar with the KS test done in R (which did not
 make your list, yet you are asking on an R mailing list). Differences
 could be due to errors, different assumptions, different algorithms,
 sunspots, or any number of other things. Are the differences
 meaningful? R lets you see exactly what it is doing so you can check
 errors/assumptions/algorithms, I don't know about the ones you show. -
 sorry i forgot to list the R. I thought wessa.net was using R already.
 but I also made the software comparisons using R. The results were:
 nbsp;nbsp;nbsp; with equal data points: results are the same in both
 Dmax and p-value
 nbsp;nbsp;nbsp;nbsp;with unequal data points : conclusions from
 results were the same such that significant difference between samples
 holds through using different softwares. Only the Dmax and p-values
 differ a bit.
 (please see attached file for the comparisons).
 nbsp;
 4. You will need to ask someone who knows the programs you reference to
 determine what input they are expecting. R expects the raw data.
 - Thanks! I expected this also.
 nbsp;
 Thank you.
 nbsp;
 -Rommel
 nbsp;
 nbsp;
 nbsp;
 nbsp;
 - Ursprüngliche Nachricht - Von: Greg Snow-2 [via R] lt;ml-
 node+s789695n3838250...@n4.nabble.comgt; Datum: Samstag, 24. September
 2011, 12:52 am Betreff: Re: Kolmogorov-Smirnov test An: rommel
 lt;rman...@ifm-geomar.degt;
 Are you doing the 2 sample KS test? Comparing if 2 samples come from
 the same distribution? With 3,000 points you will still likely have
 power to find meaningless differences, what exactly are you trying to
 accomplish by doing the comparison? I am really only familiar with the
 KS test done in R (which did not make your list, yet you are asking on
 an R mailing list). nbsp;Differences could be due to errors, different
 assumptions, different algorithms, sunspots, or any number of other
 things. nbsp;Are the differences meaningful? nbsp;R lets you see
 exactly what it is doing so you can check
 errors/assumptions/algorithms, I don't know about the ones you show.
 You will need to ask someone who knows the programs you reference to
 determine what input they are expecting. nbsp;R expects the raw data.
 -Original Message- From: [hidden email]  [mailto: [hidden
 email] ] On Behalf Of rommel Sent: Friday, September 23, 2011 7:51 AM
 To: [hidden email]  Subject: Re: [R] Kolmogorov-Smirnov test Dear Dr.
 Snow, I would like to ask for help on my three questions regarding
 Kolmogorov Smirnov test. 1. 'With a sample size over 10,000 you will
 have power to detect differences that are not practically meaningful. '
 nbsp; nbsp; -Is sample size of 3000 for each sample okay for using
 Kolmogorov Smirnov test? 2. I am checking whether my KS

Re: [R] Kolmogorov-Smirnov test

2011-09-26 Thread Greg Snow
One additional point, you may want to look at the vis.test function in the 
TeachingDemos package for one option of comparing that focuses more on 
meaningful or at least visible differences.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Greg Snow
 Sent: Monday, September 26, 2011 11:45 AM
 To: rommel; r-help@r-project.org
 Subject: Re: [R] Kolmogorov-Smirnov test
 
 There are criteria to tell if differences are meaningless, but they
 come from the science and the researcher, not from statistics tests and
 algorithms.  Consider the question: Is one second of difference
 important?  to answer that needs a bunch of context.  One second can
 be a large period of time in nuclear physics or the 100 yard dash, but
 a small amount of time in geology or a marathon.  Consider the
 distribution function that is equal to 1 when 0  x  0.99 or 99.99  x
  100 and 0 otherwise, is this distribution meaningfully different from
 the uniform between 0 and 1?  In some cases yes, others probably not
 (and some distribution tests would have an easier or harder time
 finding this difference).
 
 As for the differences in output between the programs, when the sample
 sizes are the same the KS statistic is pretty straight forward, when
 they differ there needs to be some type of interpolation of one or both
 datasets to get the comparison points.  The differences you are seeing
 are probably due to differences in how that interpolation is being
 done.  If the differences are small and do not change the decision then
 I would not worry about them.
 
 --
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111
 
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
  project.org] On Behalf Of rommel
  Sent: Saturday, September 24, 2011 2:30 AM
  To: r-help@r-project.org
  Subject: Re: [R] Kolmogorov-Smirnov test
 
  Dear Dr. Snow,
  nbsp;
  Thank you for your reply.
  nbsp;
  1. Are you doing the 2 sample KS test? Comparing if 2 samples come
 from
  the same distribution? -Yes, I am doing 2-sample KS test
  nbsp;
  2. With 3,000 points you will still likely have power to find
  meaningless differences, what exactly are you trying to accomplish by
  doing the comparison? - I am comparing the swimming parameters of
 fish
  larvae such as move duration and move length.
  - The comparison is between treatments.
  -Sample sizes for example in one comparison pair :nbsp; Control
 (2700
  data pts) vs Medium (3012 pts)
  nbsp; Dmax = 0.07 p-level lt;0.001
  - Are there criteria to know if the differences are meaningless or
 not?
  nbsp;
  3. I am really only familiar with the KS test done in R (which did
 not
  make your list, yet you are asking on an R mailing list). Differences
  could be due to errors, different assumptions, different algorithms,
  sunspots, or any number of other things. Are the differences
  meaningful? R lets you see exactly what it is doing so you can check
  errors/assumptions/algorithms, I don't know about the ones you show.
 -
  sorry i forgot to list the R. I thought wessa.net was using R
 already.
  but I also made the software comparisons using R. The results were:
  nbsp;nbsp;nbsp; with equal data points: results are the same in
 both
  Dmax and p-value
  nbsp;nbsp;nbsp;nbsp;with unequal data points : conclusions from
  results were the same such that significant difference between
 samples
  holds through using different softwares. Only the Dmax and p-values
  differ a bit.
  (please see attached file for the comparisons).
  nbsp;
  4. You will need to ask someone who knows the programs you reference
 to
  determine what input they are expecting. R expects the raw data.
  - Thanks! I expected this also.
  nbsp;
  Thank you.
  nbsp;
  -Rommel
  nbsp;
  nbsp;
  nbsp;
  nbsp;
  - Ursprüngliche Nachricht - Von: Greg Snow-2 [via R]
 lt;ml-
  node+s789695n3838250...@n4.nabble.comgt; Datum: Samstag, 24.
 September
  2011, 12:52 am Betreff: Re: Kolmogorov-Smirnov test An: rommel
  lt;rman...@ifm-geomar.degt;
  Are you doing the 2 sample KS test? Comparing if 2 samples come from
  the same distribution? With 3,000 points you will still likely have
  power to find meaningless differences, what exactly are you trying to
  accomplish by doing the comparison? I am really only familiar with
 the
  KS test done in R (which did not make your list, yet you are asking
 on
  an R mailing list). nbsp;Differences could be due to errors,
 different
  assumptions, different algorithms, sunspots, or any number of other
  things. nbsp;Are the differences meaningful? nbsp;R lets you see
  exactly what it is doing so you can check
  errors/assumptions/algorithms, I don't know about the ones you show.
  You will need to ask someone who knows

Re: [R] Kolmogorov-Smirnov test

2011-09-24 Thread rommel
Dear Dr. Snow, 
nbsp; 
Thank you for your reply. 
nbsp; 
1. Are you doing the 2 sample KS test? Comparing if 2 samples come from the 
same distribution? -Yes, I am doing 2-sample KS test 
nbsp; 
2. With 3,000 points you will still likely have power to find meaningless 
differences, what exactly are you trying to accomplish by doing the comparison? 
- I am comparing the swimming parameters of fish larvae such as move duration 
and move length. 
- The comparison is between treatments. 
-Sample sizes for example in one comparison pair :nbsp; Control (2700 data 
pts) vs Medium (3012 pts) 
nbsp; Dmax = 0.07 p-level lt;0.001 
- Are there criteria to know if the differences are meaningless or not? 
nbsp; 
3. I am really only familiar with the KS test done in R (which did not make 
your list, yet you are asking on an R mailing list). Differences could be due 
to errors, different assumptions, different algorithms, sunspots, or any number 
of other things. Are the differences meaningful? R lets you see exactly what it 
is doing so you can check errors/assumptions/algorithms, I don't know about the 
ones you show. - sorry i forgot to list the R. I thought wessa.net was using R 
already. but I also made the software comparisons using R. The results were: 
nbsp;nbsp;nbsp; with equal data points: results are the same in both Dmax 
and p-value 
nbsp;nbsp;nbsp;nbsp;with unequal data points : conclusions from results 
were the same such that significant difference between samples holds through 
using different softwares. Only the Dmax and p-values differ a bit. 
(please see attached file for the comparisons). 
nbsp; 
4. You will need to ask someone who knows the programs you reference to 
determine what input they are expecting. R expects the raw data. 
- Thanks! I expected this also. 
nbsp; 
Thank you. 
nbsp; 
-Rommel 
nbsp; 
nbsp; 
nbsp; 
nbsp; 
- Ursprüngliche Nachricht - Von: Greg Snow-2 [via R] 
lt;ml-node+s789695n3838250...@n4.nabble.comgt; Datum: Samstag, 24. September 
2011, 12:52 am Betreff: Re: Kolmogorov-Smirnov test An: rommel 
lt;rman...@ifm-geomar.degt; 
Are you doing the 2 sample KS test? Comparing if 2 samples come from the same 
distribution? With 3,000 points you will still likely have power to find 
meaningless differences, what exactly are you trying to accomplish by doing the 
comparison? I am really only familiar with the KS test done in R (which did not 
make your list, yet you are asking on an R mailing list). nbsp;Differences 
could be due to errors, different assumptions, different algorithms, sunspots, 
or any number of other things. nbsp;Are the differences meaningful? nbsp;R 
lets you see exactly what it is doing so you can check 
errors/assumptions/algorithms, I don't know about the ones you show. You will 
need to ask someone who knows the programs you reference to determine what 
input they are expecting. nbsp;R expects the raw data. -Original 
Message- From: [hidden email]  [mailto: [hidden email] ] On Behalf Of 
rommel Sent: Friday, September 23, 2011 7:51 AM To: [hidden email]  Subject: 
Re: [R] Kolmogorov-Smirnov test Dear Dr. Snow, I would like to ask for help on 
my three questions regarding Kolmogorov Smirnov test. 1. 'With a sample size 
over 10,000 you will have power to detect differences that are not practically 
meaningful. ' nbsp; nbsp; -Is sample size of 3000 for each sample okay for 
using Kolmogorov Smirnov test? 2. I am checking whether my KS procedure is 
correct. I have compared results of KS tests using the following 3 softwares: 
1. Statistica 2. http://www.wessa.net/rwasp_Reddy-Moores%20K-S%20Test.wasp 3. 
http://www.physics.csbsju.edu/stats/KS-test.html I have observed that the three 
softwares produced the same results only if the samples sizes are equal. 
However, when samples are not equal, I did not get similar results particularly 
from the wessa.net calculator. Is it allowed to do a KS test to compare samples 
with unequal sizes? 3. Is it allowed to use the raw data values in doing KS 
test? Or should I use the frequencies obtained from frequency distribution 
table of the raw data from each sample? I think that when I use the frequency, 
the KS test will construct new cumulative fractions from the frequencies, which 
I think is not right. Hope you can assist me. Thanks! -rommel nbsp; -- View 
this message in context: 
http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-tp3479506p3836910.html 
Sent from the R help mailing list archive at Nabble.com. 
__ [hidden email]  mailing list 
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html and provide commented, minimal, 
self-contained, reproducible code. 
__ [hidden email]  mailing list 
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html and provide commented, minimal

Re: [R] Kolmogorov-Smirnov test

2011-09-23 Thread rommel
Dear Dr. Snow,

I would like to ask for help on my three questions regarding Kolmogorov
Smirnov test.

1. 
'With a sample size over 10,000 you will have power to detect differences
that are not practically meaningful. '
-Is sample size of 3000 for each sample okay for using Kolmogorov
Smirnov test?

2. 
I am checking whether my KS procedure is correct. 
I have compared results of KS tests using the following 3 softwares:
1. Statistica
2. http://www.wessa.net/rwasp_Reddy-Moores%20K-S%20Test.wasp
3. http://www.physics.csbsju.edu/stats/KS-test.html


I have observed that the three softwares produced the same results only if
the samples sizes are equal. 
However, when samples are not equal, I did not get similar results
particularly from the wessa.net calculator.
Is it allowed to do a KS test to compare samples with unequal sizes?

3. 
Is it allowed to use the raw data values in doing KS test? Or should I use
the frequencies obtained from frequency distribution table of the raw data
from each sample?
I think that when I use the frequency, the KS test will construct new
cumulative fractions from the frequencies, which I think is not right. 

Hope you can assist me. Thanks!

-rommel
  


--
View this message in context: 
http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-tp3479506p3836910.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kolmogorov-Smirnov test

2011-09-23 Thread Greg Snow
Are you doing the 2 sample KS test? Comparing if 2 samples come from the same 
distribution?

With 3,000 points you will still likely have power to find meaningless 
differences, what exactly are you trying to accomplish by doing the comparison?

I am really only familiar with the KS test done in R (which did not make your 
list, yet you are asking on an R mailing list).  Differences could be due to 
errors, different assumptions, different algorithms, sunspots, or any number of 
other things.  Are the differences meaningful?  R lets you see exactly what it 
is doing so you can check errors/assumptions/algorithms, I don't know about the 
ones you show.

You will need to ask someone who knows the programs you reference to determine 
what input they are expecting.  R expects the raw data. 


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of rommel
Sent: Friday, September 23, 2011 7:51 AM
To: r-help@r-project.org
Subject: Re: [R] Kolmogorov-Smirnov test

Dear Dr. Snow,

I would like to ask for help on my three questions regarding Kolmogorov
Smirnov test.

1. 
'With a sample size over 10,000 you will have power to detect differences
that are not practically meaningful. '
-Is sample size of 3000 for each sample okay for using Kolmogorov
Smirnov test?

2. 
I am checking whether my KS procedure is correct. 
I have compared results of KS tests using the following 3 softwares:
1. Statistica
2. http://www.wessa.net/rwasp_Reddy-Moores%20K-S%20Test.wasp
3. http://www.physics.csbsju.edu/stats/KS-test.html


I have observed that the three softwares produced the same results only if
the samples sizes are equal. 
However, when samples are not equal, I did not get similar results
particularly from the wessa.net calculator.
Is it allowed to do a KS test to compare samples with unequal sizes?

3. 
Is it allowed to use the raw data values in doing KS test? Or should I use
the frequencies obtained from frequency distribution table of the raw data
from each sample?
I think that when I use the frequency, the KS test will construct new
cumulative fractions from the frequencies, which I think is not right. 

Hope you can assist me. Thanks!

-rommel
  


--
View this message in context: 
http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-tp3479506p3836910.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kolmogorov-Smirnov test

2011-05-01 Thread m.marcinmichal
Hi,
many thanks for helpful answer. 

Best

Marcin M.--
View this message in context: 
http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-tp3479506p3488364.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kolmogorov-Smirnov test

2011-04-29 Thread Greg Snow
The general idea of the KS test (and others) can be applied to discrete data, 
but the implementation in R assumes continuous data (does not have the needed 
adjustments to deal with ties).  The chi-square and other tests suffer from the 
same problems in your case.  In all cases the null hypothesis is that the data 
comes from the stated distribution (poisson in your case), failing to reject 
the null hypothesis does not prove that the data comes from that distribution, 
only shows that we cannot disprove that it comes from that distribution.  With 
large sample sizes, your data could come from a true distribution that for all 
practical purposes is equivalent to the poisson, but due to slight rounding or 
other errors has probabilities slightly different for some values (a difference 
that no one would reasonably care about), but these tests can show a 
significant difference.

Usually it is better to just show that your data and the theoretical 
distribution are close enough to each other rather than depending on a formal 
test.  The plots and diagnostics in the vcd package are a good choice here, you 
could also use the KS test statistic (ignoring the p-value and warnings) as 
another measure, but plot the empirical and theoretical distributions to see 
what the value means and how close they are.

Another option is the vis.test function in TeachingDemos, it lets you plot data 
from the theoretical distribution and the actual data, then see if you can 
visually tell the difference.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of m.marcinmichal
 Sent: Thursday, April 28, 2011 3:54 PM
 To: r-help@r-project.org
 Subject: Re: [R] Kolmogorov-Smirnov test
 
 Hi,
 thanks for response.
 
  The Kolmogorov-Smirnov test is designed for distributions on
 continuous
  variable, not discrete like the  poisson.  That is why you are
 getting
  some of your warnings.
 
 I read in Fitting distributions whith R Vito Ricci page 19  that:
 ...
 Kolmogorov-Smirnov test is used to decide if a sample comes from a
 population with a specific distribution. I can be applied both for
 discrete
 (count) data and continuous binned (even if some Authors do not agree
 on
 this point) and both for continuous variables but in page 16 i read
 that
 ... while the Kolmogorov-Smirnov and Anderson-Darling tests are
 restricted
 to continuous distribution and i was little confused, but try this
 test to
 my discrete data.
 
 Generally in first step, I try fit my data to discret or continuous
 distribution (task: find distribution for emirical data). Question, Can
 I
 approximate my discret data by the continuous  distribution? I know
 that
 sometmies we can poisson distribution approxime by the normal
 distribution.
 But what happen if I use another distribution like log normall or gama?
 
 I done another three tests - chi square test. But this tests return
 three
 another results. Suppose that we have the same data i.e vectorSentence.
 Test:
 1. One
 param - fitdistr(vectorSentence, poisson)
 chisq.test(table(vectorSentence), p = dpois(1:9, lambda=param[[1]][1]),
 rescale.p = TRUE)
 
 X-squared = 272.8958, df = 8, p-value  2.2e-16
 
 2. Two
 library(vcd)
 gf - goodfit(vectorSentence, type=poisson, method=MinChisq)
 summary(gf)
 
  X^2 df P( X^2)
 Pearson 404.3607  8 2.186332e-82
 
 3. Three
 fdistc - fitdist(vectorSentence, pois)
 g-gofstat(fdistc, print.test = TRUE)
 
 Chi-squared statistic:  535.344
 Degree of freedom of the Chi-squared distribution:  8
 Chi-squared p-value:  1.824112e-110
 
 Question which results is correct?
 
 I know that I can reject null hipotesis: data don't come from poisson
 distribution. But which result is correct?
 
 For another side I trying to accomplish another problem:
 1. Suppose that we have a reference data (dr) from some process (pr)
 which
 save in vectorSentence.
 2. Suppose that we have a two another sample data d1, d2 from another
 two
 process p1, p2
 3. We know that all data is discrete.
 
 Task:
 One: check if data d1, d2 is equal to reference data (dr) - this is not
 a
 problem. I use a cdf, histogram, another mensure etc. chi square test.
 But
 can I use Kolmogorov-Smirnov  to test cumulative distribution function
 hipotesis i.e F(d1) = F(d) for my data?
 Two: find dr distributions discret or if possible continuous
 
 Best
 
 Marcin M.
 
 
 --
 View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-
 Smirnov-test-tp3479506p3482349.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code

Re: [R] Kolmogorov-Smirnov test

2011-04-28 Thread Greg Snow
A couple of things to consider:

The Kolmogorov-Smirnov test is designed for distributions on continuous 
variable, not discrete like the poisson.  That is why you are getting some of 
your warnings.

With a sample size over 10,000 you will have power to detect differences that 
are not practically meaningful.  You might as well use 
SnowsPenultimateNormalityTest (at least read the help page).

What are you trying to accomplish?  We may be able to give you a better 
approach.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of m.marcinmichal
 Sent: Wednesday, April 27, 2011 3:23 PM
 To: r-help@r-project.org
 Subject: [R] Kolmogorov-Smirnov test
 
 Hi,
 I have a problem with Kolmogorov-Smirnov test fit. I try fit
 distribution to
 my data. Actualy I create two test:
 - # First Kolmogorov-Smirnov Tests fit
 - # Second Kolmogorov-Smirnov Tests fit
 see below. This two test return difrent result and i don't know which
 is
 properly. Which result is properly? The first test return lower D =
 0.0234
 and lower p-value = 0.00304. The lower 'D' indicate that distribution
 function (empirical and teoretical) coincide but low p-value indicate
 that i
 can reject hypotezis H0. For another side this p-value is most higer
 than
 p-value from second test (2.2e-16). Which result, test is most
 propertly?
 
 matr = rbind(c(1,2))
 layout(matr)
 
 # length vectorSentence = 11999
 vectorSentence - c()
 vectorLength - length(vectorSentence)
 
 # assume that we have a table(vectorSentence)
 #  123456789
 # 512 1878 2400 2572 1875 1206  721  520  315
 
 # Poisson parameter
 param - fitdistr(vectorSentence, poisson)
 
 # Expected density
 density.exp - dpois(1:9, lambda=param[[1]][1])
 
 # Expected frequ.
 frequ.exp - dpois(1:9, lambda=param[[1]][1])*vectorLength
 
 # Construct numeric vector of data values (y = vFrequ for Kolmogorov-
 Smirnov
 Tests)
 vFrequ - c()
 for(i in 1:length(frequ.exp)) {
   vFrequ - append(vFrequ, rep(i, times=frequ.exp[i]))
 }
 
 # Check transformation plot(density.exp, ylim=c(0,0.20)) ==
 plot(table(vFrequ)/vectorLength, ylim=c(0,0.20))
 plot(table(vectorSentence)/vectorLength)
 plot(density.exp, ylim=c(0,0.20))
 par(new=TRUE)
 plot(table(vFrequ)/vectorLength, ylim=c(0,0.20))
 
 # First Kolmogorov-Smirnov Tests fit
 ks.test(vectorSentence, vFrequ)
 
 # Second Kolmogorov-Smirnov Tests fit
 ks.test(vectorSentence, dpois, lambda=param[[1]][1])
 
 # First Kolmogorov-Smirnov Tests fit return data
 
 Two-sample Kolmogorov-Smirnov test
 
 data:  vectorSentence and vFrequ
 D = 0.0234, p-value = 0.00304
 alternative hypothesis: two-sided
 
 Warning message:
 In ks.test(vectorSentence, vFrequ) :
   cannot compute correct p-values with ties
 
 
 # Second Kolmogorov-Smirnov Tests fit return data
 
 One-sample Kolmogorov-Smirnov test
 
 data:  vectorSentence
 D = 0.9832, p-value  2.2e-16
 alternative hypothesis: two-sided
 
 Warning message:
 In ks.test(vectorSentence, dpois, lambda = param[[1]][1]) :
   cannot compute correct p-values with ties
 
 
 
 Best
 
 Marcin M.
 
 --
 View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-
 Smirnov-test-tp3479506p3479506.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kolmogorov-Smirnov test

2011-04-28 Thread m.marcinmichal
Hi, 
thanks for response.

 The Kolmogorov-Smirnov test is designed for distributions on continuous
 variable, not discrete like the  poisson.  That is why you are getting
 some of your warnings. 

I read in Fitting distributions whith R Vito Ricci page 19  that: ...
Kolmogorov-Smirnov test is used to decide if a sample comes from a
population with a specific distribution. I can be applied both for discrete
(count) data and continuous binned (even if some Authors do not agree on
this point) and both for continuous variables but in page 16 i read that
... while the Kolmogorov-Smirnov and Anderson-Darling tests are restricted
to continuous distribution and i was little confused, but try this test to
my discrete data. 

Generally in first step, I try fit my data to discret or continuous 
distribution (task: find distribution for emirical data). Question, Can I
approximate my discret data by the continuous  distribution? I know that
sometmies we can poisson distribution approxime by the normal distribution.
But what happen if I use another distribution like log normall or gama?

I done another three tests - chi square test. But this tests return three
another results. Suppose that we have the same data i.e vectorSentence.
Test:
1. One
param - fitdistr(vectorSentence, poisson)
chisq.test(table(vectorSentence), p = dpois(1:9, lambda=param[[1]][1]),
rescale.p = TRUE)

X-squared = 272.8958, df = 8, p-value  2.2e-16

2. Two
library(vcd)
gf - goodfit(vectorSentence, type=poisson, method=MinChisq)
summary(gf)

 X^2 df P( X^2)
Pearson 404.3607  8 2.186332e-82

3. Three
fdistc - fitdist(vectorSentence, pois)
g-gofstat(fdistc, print.test = TRUE) 

Chi-squared statistic:  535.344 
Degree of freedom of the Chi-squared distribution:  8 
Chi-squared p-value:  1.824112e-110 

Question which results is correct?

I know that I can reject null hipotesis: data don't come from poisson
distribution. But which result is correct?

For another side I trying to accomplish another problem:
1. Suppose that we have a reference data (dr) from some process (pr) which
save in vectorSentence. 
2. Suppose that we have a two another sample data d1, d2 from another two
process p1, p2
3. We know that all data is discrete.

Task:
One: check if data d1, d2 is equal to reference data (dr) - this is not a
problem. I use a cdf, histogram, another mensure etc. chi square test. But
can I use Kolmogorov-Smirnov  to test cumulative distribution function 
hipotesis i.e F(d1) = F(d) for my data?
Two: find dr distributions discret or if possible continuous 

Best

Marcin M.


--
View this message in context: 
http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-tp3479506p3482349.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kolmogorov-Smirnov test

2011-04-28 Thread m.marcinmichal
This test SnowsPenultimateNormalityTest() is great :)
Best

--
View this message in context: 
http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-tp3479506p3482401.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Kolmogorov-Smirnov test

2011-04-27 Thread m.marcinmichal
Hi,
I have a problem with Kolmogorov-Smirnov test fit. I try fit distribution to
my data. Actualy I create two test:
- # First Kolmogorov-Smirnov Tests fit
- # Second Kolmogorov-Smirnov Tests fit
see below. This two test return difrent result and i don't know which is
properly. Which result is properly? The first test return lower D = 0.0234
and lower p-value = 0.00304. The lower 'D' indicate that distribution
function (empirical and teoretical) coincide but low p-value indicate that i
can reject hypotezis H0. For another side this p-value is most higer than
p-value from second test (2.2e-16). Which result, test is most propertly?

matr = rbind(c(1,2))
layout(matr) 

# length vectorSentence = 11999
vectorSentence - c()
vectorLength - length(vectorSentence)

# assume that we have a table(vectorSentence)
#  123456789 
# 512 1878 2400 2572 1875 1206  721  520  315 

# Poisson parameter
param - fitdistr(vectorSentence, poisson)

# Expected density
density.exp - dpois(1:9, lambda=param[[1]][1])

# Expected frequ.
frequ.exp - dpois(1:9, lambda=param[[1]][1])*vectorLength

# Construct numeric vector of data values (y = vFrequ for Kolmogorov-Smirnov
Tests) 
vFrequ - c()
for(i in 1:length(frequ.exp)) {
vFrequ - append(vFrequ, rep(i, times=frequ.exp[i]))
}

# Check transformation plot(density.exp, ylim=c(0,0.20)) ==
plot(table(vFrequ)/vectorLength, ylim=c(0,0.20))
plot(table(vectorSentence)/vectorLength)
plot(density.exp, ylim=c(0,0.20))
par(new=TRUE)
plot(table(vFrequ)/vectorLength, ylim=c(0,0.20))

# First Kolmogorov-Smirnov Tests fit
ks.test(vectorSentence, vFrequ)

# Second Kolmogorov-Smirnov Tests fit
ks.test(vectorSentence, dpois, lambda=param[[1]][1])

# First Kolmogorov-Smirnov Tests fit return data

Two-sample Kolmogorov-Smirnov test

data:  vectorSentence and vFrequ 
D = 0.0234, p-value = 0.00304
alternative hypothesis: two-sided 

Warning message:
In ks.test(vectorSentence, vFrequ) :
  cannot compute correct p-values with ties


# Second Kolmogorov-Smirnov Tests fit return data

One-sample Kolmogorov-Smirnov test

data:  vectorSentence 
D = 0.9832, p-value  2.2e-16
alternative hypothesis: two-sided 

Warning message:
In ks.test(vectorSentence, dpois, lambda = param[[1]][1]) :
  cannot compute correct p-values with ties



Best

Marcin M.

--
View this message in context: 
http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-tp3479506p3479506.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kolmogorov-smirnov test

2011-02-28 Thread Glen Barnett
It's designed for continuous distributions. See the first sentence here:

http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test

K-S is conservative on discrete distributions

On Sat, Feb 19, 2011 at 1:52 PM, tsippel tsip...@gmail.com wrote:
 Is the kolmogorov-smirnov test valid on both continuous and discrete data?
  I don't think so, and the example below helped me understand why.

 A suggestion on testing the discrete data would be appreciated.

 Thanks,

 a - rnorm(1000, 10, 1);a # normal distribution a
 b - rnorm(1000, 12, 1.5);b # normal distribution b
 c - rnorm(1000, 8, 1);c # normal distribution c
 d - rnorm(1000, 12, 2.5);d # normal distribution d

 par(mfrow=c(2,2), las=1)
 ahist-hist(a, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms of a
 bhist-hist(b, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms of b
 chist-hist(c, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms of c
 dhist-hist(d, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms of d

 ks.test(c(a,b), c(c,d), alternative=two.sided) # kolmogorov-smirnov on
 continuous data
 ks.test(c(ahist$density, bhist$density), c(chist$density, dhist$density),
 alternative=two.sided) # kolmogorov-smirnov on discrete data

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kolmogorov-smirnov test

2011-02-28 Thread Jay Emerson
Taylor Arnold and I have developed a package ks.test (available on R-Forge
in beta version) that modifies stats::ks.test to handle discrete null
distributions
for one-sample tests.  We also have a draft of a paper we could provide (email
us).  The package uses methodology of Conover (1972) and Gleser (1985) to
provide exact p-values.  It also corrects an algorithmic problem with
stats::ks.test
in the calculation of the test statistic.  This is not a bug, per se,
because it was
never intended to be used this way.  We will submit this new function for
inclusion in package stats once we're done testing.

So, for example:
# With the default ks.test (ouch):
 stats::ks.test(c(0,1), ecdf(c(0,1)))

One-sample Kolmogorov-Smirnov test

data:  c(0, 1)
D = 0.5, p-value = 0.5
alternative hypothesis: two-sided

# With our new function (what you would want in this toy example):
 ks.test::ks.test(c(0,1), ecdf(c(0,1)))

One-sample Kolmogorov-Smirnov test

data:  c(0, 1)
D = 0, p-value = 1
alternative hypothesis: two-sided



Original Message:

Date: Mon, 28 Feb 2011 21:31:26 +1100
From: Glen Barnett glnbr...@gmail.com
To: tsippel tsip...@gmail.com
Cc: r-help@r-project.org
Subject: Re: [R] Kolmogorov-smirnov test
Message-ID:
   aanlktikcjigrgjuotkozqfxfqatin6arzjvt_appi...@mail.gmail.com
Content-Type: text/plain; charset=ISO-8859-1

It's designed for continuous distributions. See the first sentence here:

http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test

K-S is conservative on discrete distributions

On Sat, Feb 19, 2011 at 1:52 PM, tsippel tsip...@gmail.com wrote:
 Is the kolmogorov-smirnov test valid on both continuous and discrete data?
 ?I don't think so, and the example below helped me understand why.

 A suggestion on testing the discrete data would be appreciated.

 Thanks,

-- 
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Kolmogorov-smirnov test

2011-02-18 Thread tsippel
Is the kolmogorov-smirnov test valid on both continuous and discrete data?
 I don't think so, and the example below helped me understand why.

A suggestion on testing the discrete data would be appreciated.

Thanks,

a - rnorm(1000, 10, 1);a # normal distribution a
b - rnorm(1000, 12, 1.5);b # normal distribution b
c - rnorm(1000, 8, 1);c # normal distribution c
d - rnorm(1000, 12, 2.5);d # normal distribution d

par(mfrow=c(2,2), las=1)
ahist-hist(a, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms of a
bhist-hist(b, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms of b
chist-hist(c, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms of c
dhist-hist(d, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms of d

ks.test(c(a,b), c(c,d), alternative=two.sided) # kolmogorov-smirnov on
continuous data
ks.test(c(ahist$density, bhist$density), c(chist$density, dhist$density),
alternative=two.sided) # kolmogorov-smirnov on discrete data

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kolmogorov-smirnov test

2011-02-18 Thread Greg Snow
The KS test was designed for continuous variables.  The vcd package has tools 
for exploring categorical variables and distributions.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of tsippel
 Sent: Friday, February 18, 2011 7:52 PM
 To: r-help@r-project.org
 Subject: [R] Kolmogorov-smirnov test
 
 Is the kolmogorov-smirnov test valid on both continuous and discrete
 data?
  I don't think so, and the example below helped me understand why.
 
 A suggestion on testing the discrete data would be appreciated.
 
 Thanks,
 
 a - rnorm(1000, 10, 1);a # normal distribution a
 b - rnorm(1000, 12, 1.5);b # normal distribution b
 c - rnorm(1000, 8, 1);c # normal distribution c
 d - rnorm(1000, 12, 2.5);d # normal distribution d
 
 par(mfrow=c(2,2), las=1)
 ahist-hist(a, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms
 of a
 bhist-hist(b, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms
 of b
 chist-hist(c, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms
 of c
 dhist-hist(d, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms
 of d
 
 ks.test(c(a,b), c(c,d), alternative=two.sided) # kolmogorov-smirnov
 on
 continuous data
 ks.test(c(ahist$density, bhist$density), c(chist$density,
 dhist$density),
 alternative=two.sided) # kolmogorov-smirnov on discrete data
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kolmogorov Smirnov Test

2010-11-11 Thread Kerry
Thanks for the feedback. My goal is to run a simple test to show that
the data cannot be rejected as either normally or uniformally
distributed (depening on the variable), which is what a previous K-S
test run using SPSS had shown. The actual distribution I compare to my
sample only matters that it would be rejected were my data multi-
modal. This way I can suggest the data is from the same population. I
later run PCA and cluster analyses to confirm this but I want an easy
stat to start with for the individual variables.

I didn't think I was comparing my data against itself, but rather
again a normal distribution with the same mean and standard deviation.
Using the mean seems necessary, so is it incorrect to have the same
standard deviation too? I need to go back and read on the K-S test to
see what the appropriate constraints are before bothering anyone for
more help. Sorry, I thought I had it.

Thanks again,
kbrownk

On Nov 11, 12:40 am, Greg Snow greg.s...@imail.org wrote:
 The way you are running the test the null hypothesis is that the data comes 
 from a normal distribution with mean=0 and standard deviation = 1.  If your 
 minimum data value is 0, then it seems very unlikely that the mean is 0.  So 
 the test is being strongly influenced by the mean and standard deviation not 
 just the shape of the distribution.

 Note that the KS test was not designed to test against a distribution with 
 parameters estimated from the same data (you can do the test, but it makes 
 the p-value inaccurate).  You can do a little better by simulating the 
 process and comparing the KS statistic to the simulations rather than looking 
 at the computed p-value.

 However you should ask yourself why you are doing the normality tests in the 
 first place.  The common reasons that people do this don't match with what 
 the tests actually test (see the fortunes on normality).

 --
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111



  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
  project.org] On Behalf Of Kerry
  Sent: Wednesday, November 10, 2010 9:23 PM
  To: r-h...@r-project.org
  Subject: [R] Kolmogorov Smirnov Test

  I'm using ks.test (mydata, dnorm) on my data. I know some of my
  different variable samples (mydata1, mydata2, etc) must be normally
  distributed but the p value is always  2.0^-16 (the 2.0 can change
  but not the exponent).

  I want to test mydata against a normal distribution. What could I be
  doing wrong?

  I tried instead using rnorm to create a normal distribution: y = rnorm
  (68,mean=mydata, sd=mydata), where N= the sample size from mydata.
  Then I ran the k-s: ks.test (mydata,y). Should this work?

  One issue I had was that some of my data has a minimum value of 0, but
  rnorm ran as I have it above will potentially create negative numbers.

  Also some of my variables will likely be better tested against non-
  normal distributions (uniform etc.), but if I figure I should learn
  how to even use ks.test first.

  I used to use SPSS but am really trying to jump into R instead, but I
  find the help to assume too heavy of statistical knowledge.

  I'm guessing I have a long road before I get this, so any bits of
  information that may help me get a bit further will be appreciated!

  Thanks,
  kbrownk

  __
  r-h...@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guidehttp://www.R-project.org/posting-
  guide.html
  and provide commented, minimal, self-contained, reproducible code.

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kolmogorov Smirnov Test

2010-11-11 Thread Ted Harding
On 11-Nov-10 04:22:55, Kerry wrote:
 I'm using ks.test (mydata, dnorm) on my data.

I think your problem may lie here! If you look at the documentation
for ks.test, available with the command:
  help(ks.test)
or simply:
  ?ks.test 
you will read the following near the beginning:

Usage: ks.test(x, y, ...,
Arguments:
   x: a numeric vector of data values.
   y: either a numeric vector of data values, or a character string
  naming a cumulative distribution function or an actual
  cumulative distribution function such as 'pnorm'.

Note *cumulative* and *'pnorm'*. You say that you used 'dnorm'.
dnorm is R's name for the *density* function of the Normal
distribution, while the name for the *cumulative distribution*
function is pnorm. So try the K-S test instead with

  ks.test(mydata, pnorm, ... )

where (as also stated in '?ks.test') the ... is to be replaced
by a list of values for the parameters of the named cumulative
distribution. For example (since the parameters for pnorm are
its mean and SD):

   ks.test(mydata, pnorm, mean(mydata), sd(mydata) )

A toy example (comparing the two usages):

## First, using pnorm as above:
  Y - rnorm(200)
  ks.test(Y,pnorm,mean(Y),sd(Y))
  # One-sample Kolmogorov-Smirnov test
  # data:  Y 
  # D = 0.0251, p-value = 0.9996
  # alternative hypothesis: two-sided 
## Note the nice P-value

## Next, using dnorm as you wrote:
 ks.test(Y,dnorm,mean(Y),sd(Y))
  # One-sample Kolmogorov-Smirnov test
  # data:  Y 
  # D = 0.9965, p-value  2.2e-16
  # alternative hypothesis: two-sided 
## (Note the similarity to the p-values you report)!

For the deatils of 'dnorm', 'pnorm' and the like, see the help at:

   ?dnorm
or
   ?pnorm

(both lead to the same page). Granted, for a newcomer to R the
documentation (which often relies heavily on cross-referencing,
and sometimes the cross-references can be difficult to identify)
can be difficult to get to grips with. So look on this (which is
one of the easier cases) as an initiation into getting to grips
with R.

Hoping this helps,
Ted.

 I know some of my
 different variable samples (mydata1, mydata2, etc) must be normally
 distributed but the p value is always  2.0^-16 (the 2.0 can change
 but not the exponent).
 
 I want to test mydata against a normal distribution. What could I be
 doing wrong?
 
 I tried instead using rnorm to create a normal distribution: y = rnorm
 (68,mean=mydata, sd=mydata), where N= the sample size from mydata.
 Then I ran the k-s: ks.test (mydata,y). Should this work?
 
 One issue I had was that some of my data has a minimum value of 0, but
 rnorm ran as I have it above will potentially create negative numbers.
 
 Also some of my variables will likely be better tested against non-
 normal distributions (uniform etc.), but if I figure I should learn
 how to even use ks.test first.
 
 I used to use SPSS but am really trying to jump into R instead, but I
 find the help to assume too heavy of statistical knowledge.
 
 I'm guessing I have a long road before I get this, so any bits of
 information that may help me get a bit further will be appreciated!
 
 Thanks,
 kbrownk
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) ted.hard...@wlandres.net
Fax-to-email: +44 (0)870 094 0861
Date: 11-Nov-10   Time: 09:46:52
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kolmogorov Smirnov Test

2010-11-11 Thread Greg Snow
Consider the following simulations (also fixing the pnorm instead of dnorm that 
Ted pointed out and I missed):

out1 - replicate(1, { 
x - rnorm(1000, 100, 3);
ks.test( x, pnorm, mean=100, sd=3 )$p.value
} )

out2 - replicate(1, {
x - rnorm(1000, 100, 3);
ks.test( x, pnorm, mean=mean(x), sd=sd(x) )$p.value
} )

par(mfrow=c(2,1))
hist(out1)
hist(out2)

mean(out1 = 0.05 )
mean(out2 = 0.05 )


In both cases the null hypothesis is true (or at least a meaningful 
approximation to true) so the p-values should follow a uniform distribution.  
In the case of out1 where the mean and sd are specified as part of the null the 
p-values are reasonably uniform and the rejection rate is close to alpha 
(should asymptotically approach alpha as the number of simulations increases).  
However looking at out2, where the parameters are set not by outside knowledge 
or tests, but rather from the observed data, the p-values are clearly not 
uniform and the rejection rate is far from alpha.


-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Kerry
 Sent: Thursday, November 11, 2010 12:02 AM
 To: r-help@r-project.org
 Subject: Re: [R] Kolmogorov Smirnov Test
 
 Thanks for the feedback. My goal is to run a simple test to show that
 the data cannot be rejected as either normally or uniformally
 distributed (depening on the variable), which is what a previous K-S
 test run using SPSS had shown. The actual distribution I compare to my
 sample only matters that it would be rejected were my data multi-
 modal. This way I can suggest the data is from the same population. I
 later run PCA and cluster analyses to confirm this but I want an easy
 stat to start with for the individual variables.
 
 I didn't think I was comparing my data against itself, but rather
 again a normal distribution with the same mean and standard deviation.
 Using the mean seems necessary, so is it incorrect to have the same
 standard deviation too? I need to go back and read on the K-S test to
 see what the appropriate constraints are before bothering anyone for
 more help. Sorry, I thought I had it.
 
 Thanks again,
 kbrownk
 
 On Nov 11, 12:40 am, Greg Snow greg.s...@imail.org wrote:
  The way you are running the test the null hypothesis is that the data
 comes from a normal distribution with mean=0 and standard deviation =
 1.  If your minimum data value is 0, then it seems very unlikely that
 the mean is 0.  So the test is being strongly influenced by the mean
 and standard deviation not just the shape of the distribution.
 
  Note that the KS test was not designed to test against a distribution
 with parameters estimated from the same data (you can do the test, but
 it makes the p-value inaccurate).  You can do a little better by
 simulating the process and comparing the KS statistic to the
 simulations rather than looking at the computed p-value.
 
  However you should ask yourself why you are doing the normality tests
 in the first place.  The common reasons that people do this don't match
 with what the tests actually test (see the fortunes on normality).
 
  --
  Gregory (Greg) L. Snow Ph.D.
  Statistical Data Center
  Intermountain Healthcare
  greg.s...@imail.org
  801.408.8111
 
 
 
   -Original Message-
   From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
   project.org] On Behalf Of Kerry
   Sent: Wednesday, November 10, 2010 9:23 PM
   To: r-h...@r-project.org
   Subject: [R] Kolmogorov Smirnov Test
 
   I'm using ks.test (mydata, dnorm) on my data. I know some of my
   different variable samples (mydata1, mydata2, etc) must be normally
   distributed but the p value is always  2.0^-16 (the 2.0 can change
   but not the exponent).
 
   I want to test mydata against a normal distribution. What could I
 be
   doing wrong?
 
   I tried instead using rnorm to create a normal distribution: y =
 rnorm
   (68,mean=mydata, sd=mydata), where N= the sample size from mydata.
   Then I ran the k-s: ks.test (mydata,y). Should this work?
 
   One issue I had was that some of my data has a minimum value of 0,
 but
   rnorm ran as I have it above will potentially create negative
 numbers.
 
   Also some of my variables will likely be better tested against non-
   normal distributions (uniform etc.), but if I figure I should learn
   how to even use ks.test first.
 
   I used to use SPSS but am really trying to jump into R instead, but
 I
   find the help to assume too heavy of statistical knowledge.
 
   I'm guessing I have a long road before I get this, so any bits of
   information that may help me get a bit further will be appreciated!
 
   Thanks,
   kbrownk
 
   __
   r-h...@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do

Re: [R] Kolmogorov Smirnov Test

2010-11-11 Thread Kerry
Thanks Ted and Greg. I had actually tried pnorm and after having
problems, thought maybe I was misunderstanding dnorm as a variable in
ks.test due to over- (more likely under) thinking it. I'm assuming now
that ks.test will consider my data in cumulative form (makes sense now
that I think about it, but I didn't want to assume any steps that the
R version of k-s test takes). I plan to explore the ideas and run the
simulations you sent in full over the weekend.

Thanks again!
Kerry

On Nov 11, 12:05 pm, Greg Snow greg.s...@imail.org wrote:
 Consider the following simulations (also fixing the pnorm instead of dnorm 
 that Ted pointed out and I missed):

 out1 - replicate(1, {
         x - rnorm(1000, 100, 3);
         ks.test( x, pnorm, mean=100, sd=3 )$p.value
         } )

 out2 - replicate(1, {
         x - rnorm(1000, 100, 3);
         ks.test( x, pnorm, mean=mean(x), sd=sd(x) )$p.value
         } )

 par(mfrow=c(2,1))
 hist(out1)
 hist(out2)

 mean(out1 = 0.05 )
 mean(out2 = 0.05 )

 In both cases the null hypothesis is true (or at least a meaningful 
 approximation to true) so the p-values should follow a uniform distribution.  
 In the case of out1 where the mean and sd are specified as part of the null 
 the p-values are reasonably uniform and the rejection rate is close to alpha 
 (should asymptotically approach alpha as the number of simulations 
 increases).  However looking at out2, where the parameters are set not by 
 outside knowledge or tests, but rather from the observed data, the p-values 
 are clearly not uniform and the rejection rate is far from alpha.

 --
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org801.408.8111begin_of_the_skype_highlighting  801.408.8111  end_of_the_skype_highlighting



  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
  project.org] On Behalf Of Kerry
  Sent: Thursday, November 11, 2010 12:02 AM
  To: r-h...@r-project.org
  Subject: Re: [R] Kolmogorov Smirnov Test

  Thanks for the feedback. My goal is to run a simple test to show that
  the data cannot be rejected as either normally or uniformally
  distributed (depening on the variable), which is what a previous K-S
  test run using SPSS had shown. The actual distribution I compare to my
  sample only matters that it would be rejected were my data multi-
  modal. This way I can suggest the data is from the same population. I
  later run PCA and cluster analyses to confirm this but I want an easy
  stat to start with for the individual variables.

  I didn't think I was comparing my data against itself, but rather
  again a normal distribution with the same mean and standard deviation.
  Using the mean seems necessary, so is it incorrect to have the same
  standard deviation too? I need to go back and read on the K-S test to
  see what the appropriate constraints are before bothering anyone for
  more help. Sorry, I thought I had it.

  Thanks again,
  kbrownk

  On Nov 11, 12:40 am, Greg Snow greg.s...@imail.org wrote:
   The way you are running the test the null hypothesis is that the data
  comes from a normal distribution with mean=0 and standard deviation =
  1.  If your minimum data value is 0, then it seems very unlikely that
  the mean is 0.  So the test is being strongly influenced by the mean
  and standard deviation not just the shape of the distribution.

   Note that the KS test was not designed to test against a distribution
  with parameters estimated from the same data (you can do the test, but
  it makes the p-value inaccurate).  You can do a little better by
  simulating the process and comparing the KS statistic to the
  simulations rather than looking at the computed p-value.

   However you should ask yourself why you are doing the normality tests
  in the first place.  The common reasons that people do this don't match
  with what the tests actually test (see the fortunes on normality).

   --
   Gregory (Greg) L. Snow Ph.D.
   Statistical Data Center
   Intermountain Healthcare
   greg.s...@imail.org
   801.408.8111

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
project.org] On Behalf Of Kerry
Sent: Wednesday, November 10, 2010 9:23 PM
To: r-h...@r-project.org
Subject: [R] Kolmogorov Smirnov Test

I'm using ks.test (mydata, dnorm) on my data. I know some of my
different variable samples (mydata1, mydata2, etc) must be normally
distributed but the p value is always  2.0^-16 (the 2.0 can change
but not the exponent).

I want to test mydata against a normal distribution. What could I
  be
doing wrong?

I tried instead using rnorm to create a normal distribution: y =
  rnorm
(68,mean=mydata, sd=mydata), where N= the sample size from mydata.
Then I ran the k-s: ks.test (mydata,y). Should this work?

One issue I had was that some of my data has a minimum value of 0

[R] Kolmogorov Smirnov Test

2010-11-10 Thread Kerry
I'm using ks.test (mydata, dnorm) on my data. I know some of my
different variable samples (mydata1, mydata2, etc) must be normally
distributed but the p value is always  2.0^-16 (the 2.0 can change
but not the exponent).

I want to test mydata against a normal distribution. What could I be
doing wrong?

I tried instead using rnorm to create a normal distribution: y = rnorm
(68,mean=mydata, sd=mydata), where N= the sample size from mydata.
Then I ran the k-s: ks.test (mydata,y). Should this work?

One issue I had was that some of my data has a minimum value of 0, but
rnorm ran as I have it above will potentially create negative numbers.

Also some of my variables will likely be better tested against non-
normal distributions (uniform etc.), but if I figure I should learn
how to even use ks.test first.

I used to use SPSS but am really trying to jump into R instead, but I
find the help to assume too heavy of statistical knowledge.

I'm guessing I have a long road before I get this, so any bits of
information that may help me get a bit further will be appreciated!

Thanks,
kbrownk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kolmogorov Smirnov Test

2010-11-10 Thread Greg Snow
The way you are running the test the null hypothesis is that the data comes 
from a normal distribution with mean=0 and standard deviation = 1.  If your 
minimum data value is 0, then it seems very unlikely that the mean is 0.  So 
the test is being strongly influenced by the mean and standard deviation not 
just the shape of the distribution.

Note that the KS test was not designed to test against a distribution with 
parameters estimated from the same data (you can do the test, but it makes the 
p-value inaccurate).  You can do a little better by simulating the process and 
comparing the KS statistic to the simulations rather than looking at the 
computed p-value.

However you should ask yourself why you are doing the normality tests in the 
first place.  The common reasons that people do this don't match with what the 
tests actually test (see the fortunes on normality).

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Kerry
 Sent: Wednesday, November 10, 2010 9:23 PM
 To: r-help@r-project.org
 Subject: [R] Kolmogorov Smirnov Test
 
 I'm using ks.test (mydata, dnorm) on my data. I know some of my
 different variable samples (mydata1, mydata2, etc) must be normally
 distributed but the p value is always  2.0^-16 (the 2.0 can change
 but not the exponent).
 
 I want to test mydata against a normal distribution. What could I be
 doing wrong?
 
 I tried instead using rnorm to create a normal distribution: y = rnorm
 (68,mean=mydata, sd=mydata), where N= the sample size from mydata.
 Then I ran the k-s: ks.test (mydata,y). Should this work?
 
 One issue I had was that some of my data has a minimum value of 0, but
 rnorm ran as I have it above will potentially create negative numbers.
 
 Also some of my variables will likely be better tested against non-
 normal distributions (uniform etc.), but if I figure I should learn
 how to even use ks.test first.
 
 I used to use SPSS but am really trying to jump into R instead, but I
 find the help to assume too heavy of statistical knowledge.
 
 I'm guessing I have a long road before I get this, so any bits of
 information that may help me get a bit further will be appreciated!
 
 Thanks,
 kbrownk
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kolmogorov-Smirnov test, which one to use?

2010-08-05 Thread Greg Snow
It is not clear what question you are trying to answer.  Perhaps if you can 
give us an explanation of your overall goal then we can be more helpful.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Roslina Zakaria
 Sent: Wednesday, August 04, 2010 8:34 PM
 To: r-help@r-project.org
 Subject: [R] Kolmogorov-Smirnov test, which one to use?
 
 Hi,
 
 I have two sets of data, an observed data and generated data.
 The generated data is obtained from the model where the parameters is
 estimated
 from the observed data.
 
 So I'm not sure which to use either
 one-sample test
 ks.test(x+2, pgamma, 3, 2) # two-sided, exact
 
 or
 
 two-sample test
 ks.test(x, x2, alternative=l)
 
 If I use the one-sample test I need to specified the model which I
 don't have in
 my case.
 
 Actually I use the two-sample test and when I compare with what I got
 from using
 Chi-square test the result is too different.
 
 Data:
 
   obs_data  pre_gam
  [1,]   93  25.6770
  [2,]  115 127.9095
  [3,]  125 151.6845
  [4,]  120 146.9295
  [5,]  106 107.9385
  [6,]  101 107.4630
  [7,]   75  86.5410
  [8,]   58  55.6335
  [9,]   46  43.7460
 [10,]   38  32.8095
 [11,]   31  16.1670
 [12,]   17  18.5445
 [13,]   10   9.0345
 [14,]   16  20.9220
 
 Results:
  chisq.test(obs_data, p = pre_gam, rescale.p = TRUE)
     Chi-squared test for given probabilities
 data:  obs_data
 X-squared = 205.4477, df = 13, p-value  2.2e-16
 
  ks.test(obs_data,pre_gam)
     Two-sample Kolmogorov-Smirnov test
 data:  obs_data and pre_gam
 D = 0.2143, p-value = 0.9205
 alternative hypothesis: two-sided
 
 
 Am I doing the right thing?
 
 Thank you so much for your help.
 
 
 
   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Kolmogorov-Smirnov test, which one to use?

2010-08-04 Thread Roslina Zakaria
Hi,

I have two sets of data, an observed data and generated data.
The generated data is obtained from the model where the parameters is estimated 
from the observed data.

So I'm not sure which to use either
one-sample test 
ks.test(x+2, pgamma, 3, 2) # two-sided, exact

or

two-sample test 
ks.test(x, x2, alternative=l)

If I use the one-sample test I need to specified the model which I don't have 
in 
my case.

Actually I use the two-sample test and when I compare with what I got from 
using 
Chi-square test the result is too different.

Data:

  obs_data  pre_gam
 [1,]   93  25.6770
 [2,]  115 127.9095
 [3,]  125 151.6845
 [4,]  120 146.9295
 [5,]  106 107.9385
 [6,]  101 107.4630
 [7,]   75  86.5410
 [8,]   58  55.6335
 [9,]   46  43.7460
[10,]   38  32.8095
[11,]   31  16.1670
[12,]   17  18.5445
[13,]   10   9.0345
[14,]   16  20.9220

Results:
 chisq.test(obs_data, p = pre_gam, rescale.p = TRUE)
    Chi-squared test for given probabilities
data:  obs_data 
X-squared = 205.4477, df = 13, p-value  2.2e-16

 ks.test(obs_data,pre_gam)
    Two-sample Kolmogorov-Smirnov test
data:  obs_data and pre_gam 
D = 0.2143, p-value = 0.9205
alternative hypothesis: two-sided


Am I doing the right thing?

Thank you so much for your help.


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Kolmogorov smirnov test

2009-10-12 Thread Roslina Zakaria

Hi r-users,
 
I would like to use Kolmogorov smirnov test but in my observed data(xobs) there 
are ties.  I got the warning message.  My question is can I do something about 
it?
 
ks.test(xobs, xsyn)
 
    Two-sample Kolmogorov-Smirnov test
data:  xobs and xsyn 
D = 0.0502, p-value = 0.924
alternative hypothesis: two-sided 

Warning message:
In ks.test(xobs, xsyn) : cannot compute correct p-values with ties
 
Thank you for all your help.



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kolmogorov smirnov test

2009-10-12 Thread Moshe Olshansky
Hi Roslina,

I believe that you can ignore the warning.
Alternatively, you may add a very small random noise to pairs with ties, i.e. 
something like
xobs[which(duplicated(xobs))] - xobs[which(duplicated(xobs))] + 
1.0e-6*sd(xobs)*rnorm(length(which(duplicated(xobs

Regards,

Moshe.

--- On Tue, 13/10/09, Roslina Zakaria zrosl...@yahoo.com wrote:

 From: Roslina Zakaria zrosl...@yahoo.com
 Subject: [R] Kolmogorov smirnov test
 To: r-help@r-project.org
 Received: Tuesday, 13 October, 2009, 9:58 AM
 
 Hi r-users,
  
 I would like to use Kolmogorov smirnov test but in my
 observed data(xobs) there are ties.  I got the warning
 message.  My question is can I do something about it?
  
 ks.test(xobs, xsyn)
  
     Two-sample Kolmogorov-Smirnov test
 data:  xobs and xsyn 
 D = 0.0502, p-value = 0.924
 alternative hypothesis: two-sided 
 
 Warning message:
 In ks.test(xobs, xsyn) : cannot compute correct p-values
 with ties
  
 Thank you for all your help.
 
 
 
       
     [[alternative HTML version deleted]]
 
 
 -Inline Attachment Follows-
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Kolmogorov-Smirnov test

2009-04-29 Thread mathallan

I got a distribution function and a empirical distribution function. How do I
make to Kolmogorov-Smirnov test in R.

Lets call the empirical distribution function Fn on [0,1]
   and the distribution function F  on [0,1]

ks.test(  )

thanks for the help
-- 
View this message in context: 
http://www.nabble.com/Kolmogorov-Smirnov-test-tp23296096p23296096.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kolmogorov-Smirnov test

2009-04-29 Thread Andrew Dolman
help.search(kolmogorov)

?ks.test



andydol...@gmail.com


2009/4/29 mathallan mathanm...@gmail.com


 I got a distribution function and a empirical distribution function. How do
 I
 make to Kolmogorov-Smirnov test in R.

 Lets call the empirical distribution function Fn on [0,1]
   and the distribution function F  on [0,1]

 ks.test(  )

 thanks for the help
 --
 View this message in context:
 http://www.nabble.com/Kolmogorov-Smirnov-test-tp23296096p23296096.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kolmogorov-Smirnov test

2009-04-29 Thread Richardson, Patrick
This is the third homework question you have asked the list to do for you. How 
many more should we expect?

The posting guide is pretty clear in that:   Basic statistics and classroom 
homework: R-help is not intended for these. 


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of mathallan
Sent: Wednesday, April 29, 2009 10:52 AM
To: r-help@r-project.org
Subject: [R] Kolmogorov-Smirnov test


I got a distribution function and a empirical distribution function. How do I
make to Kolmogorov-Smirnov test in R.

Lets call the empirical distribution function Fn on [0,1]
   and the distribution function F  on [0,1]

ks.test(  )

thanks for the help
-- 
View this message in context: 
http://www.nabble.com/Kolmogorov-Smirnov-test-tp23296096p23296096.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
This email message, including any attachments, is for th...{{dropped:6}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Kolmogorov–Smirnov Test for Left Censored Data

2008-11-20 Thread Tom La Bone

Can someone recommend a package in R that will perform a two-sample
Kolmogorov–Smirnov test on left censored data? The package surv2sample
appears to offer such a test for right censored data and I guess that I can
use this package if I flip my data, but I figured I would first ask if there
was a package specific to left-censored data.

Tom


-- 
View this message in context: 
http://www.nabble.com/Kolmogorov%E2%80%93Smirnov-Test-for-Left-Censored-Data-tp20602916p20602916.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.