Hello,
I don't have the algorithms at hand but the KS statistic calculation is
more complicated than your max/abs difference.
Anyway, why not use ks.test? it's not that difficult:
set.seed(1234)
#reference distribution
d_1 <- sort(rpois(1000, 500))
p_1 <- d_1/sum(d_1)
m_1 <- data.frame(d_1, p_1)
#data frame to store the values of the simulation
d_stat <- data.frame(1:1000, NA, NA)
names(d_stat) <- c("sample_size", "ks_distance", "p_value")
#simulation
for (i in 1:1000) {
#sample from the reference distribution
m_2 <-m_1[(sample(nrow(m_1), size=i, prob=p_1, replace=F)),]
d_2 <- m_2$d_1
ht <- ks.test(d_1, d_2)
#kolmogorov-smirnov distance
d_stat[i, 2] <- ht$statistic
d_stat[i, 3] <- ht$p.value
}
hist(d_stat[, 2])
hist(d_stat[, 3])
Note that d_2 is not sorted, but the results are equal in the sense of
function identical(), meaning they are *exactly* the same. Why shouldn't
they?
Hope this helps,
Rui Barradas
Às 17:06 de 05/09/19, Boo G. escreveu:
Hello,
I am trying to perform a Kolmogorov–Smirnov test to assess the difference
between a distribution and samples drawn proportionally to size of different
sizes. I managed to compute the Kolmogorov–Smirnov distance but I am lost with
the p-value. I have looked into the ks.test function unsuccessfully. Can anyone
help me with computing p-values for a two-tailed test?
Below a simplified version of my code.
Thanks in advance.
Gianluca
library(spatstat)
#reference distribution
d_1 <- sort(rpois(1000, 500))
p_1 <- d_1/sum(d_1)
m_1 <- data.frame(d_1, p_1)
#data frame to store the values of the siumation
d_stat <- data.frame(1:1000, NA, NA)
names(d_stat) <- c("sample_size", "ks_distance", "p_value")
#simulation
for (i in 1:1000) {
#sample from the reference distribution
m_2 <-m_1[(sample(nrow(m_1), size=i, prob=p_1, replace=F)),]
m_2 <-m_2[order(m_2$d_1),]
d_2 <- m_2$d_1
p_2 <- m_2$p_1
#weighted ecdf for the reference distribution and the sample
f_d_1 <- ewcdf(d_1, normalise=F)
f_d_2 <- ewcdf(d_2, 1/p_2, normalise=F, adjust=1/length(d_2))
#kolmogorov-smirnov distance
d_stat[i,2] <- max(abs(f_d_1(d_2) - f_d_2(d_2)))
}
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.