[R] Kolmogorov-Smirnov-Test on binned data, I guess gumbel-distributed data
Hi R-Users, I read some texts related to KS-tests. Most of those authors stated, that KS-Tests are not suitable for binned data, but some of them refer to 'other' authors who are claiming that KS-Tests are okay for binned data. I searched for sources and can't find examples which approve that it is okay to use KS-Tests for binned data - do you have any links to articles or tutorials? Anyway, I look for a test which backens me up that my data is gumbel-distributed. I estimated the gumbel-parameters mue and beta and after having a look on resulting plots, in my opinion: that looks quite good! You can the plot, related data, and the rscript here: www.jochen-bauer.net/downloads/kstest/Rplots-1000.pdf http://www.jochen-bauer.net/downloads/kstest/rm2700-1000.txt http://www.jochen-bauer.net/downloads/kstest/rcalc.R The story about the data: I am wondering what test I should choose if KS-Test is not appropriate? I get real high p-Values for data-row-1-histogram-heights and fitted-gumbel-distribution-function-to-bin-midth-vals. Most of the time, KS-test results in distances of 0.01 and p-Values of 0.99 or 1. This sounds strange to me, too high. Otherwise my plots are looking good and as you can see, in my first experiment I sampled 1000 values. In a second experiment I created only 50 random-values for the gumbel-parameter-estimation. I try to reduce permutations, so I will be able to create results faster, but I have to find out, when data fails for gumbel-distribution. The results surprised me, I expected that my tests and plots get worse, but I got still high p-values for the KS-Test and still a nice looking plot. www.jochen-bauer.net/downloads/kstest/Rplots-0050.pdf http://www.jochen-bauer.net/downloads/kstest/rm2700-0050.txt Moreover besides the shuffled data of my randomisation-test there are real-data-values. I calculated the p-value that my real data point occurs under estimated gumbel distribution. Those p-values between 1000permutation-experiment and 50-permutation-experiment are correlating enormously ... around 0.98. Pearson and Spearman-correlation-coefficients told me this. I guess that backens up the fact, that my plots are not getting worse nor the KS-Tests do. I hope I was able to state my current situation and you are able to give me some hints, for some literature or other tests or backen me up in my guess that my data is gumbel-distributed. Thanks in advance. Jochen I hope I was able to tell -- View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-Test-on-binned-data-I-guess-gumbel-distributed-data-tp3983781p3983781.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems with ks.test()
Hi, I used a ks-function of another library (kstwo() of numerical recipes, a mathematics book) to test it for myself and the same happens there - I cannot understand why this observation happens? I hope someone can 'enlighten' me. -- View this message in context: http://r.789695.n4.nabble.com/Problems-with-ks-test-tp3703469p3706072.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problems with ks.test()
Hi, I got two data point vectors. Now I want to make a ks.test(). I you print both vectors you will see, that they fit pretty fine. Here is a picture: http://www.jochen-bauer.net/downloads/kstest-r-help-list-plot.png As you can see there is one histogram and moreover there is the gumbel density function plotted. Now I took to bin-mids and the bin-height for vector1 and computed the distribution-values to all bin-mids as vector2. I pass these two vectors to ks.test(). Are those the right vectors, if I want to decide afterwards, if my experiment-data is gumbel-distributed? Surprisingly the p-value changes tremendously if I calculate more digits out of my theoretical formula. If I round to 0 digits, p is 1, if I round to 4 digits, p drops to 0 - how could this happen, I thought more digits will bring more accurate results?! Case 0 digits: XXX [1] 0 0 0 0 0 24 74 98 133 147 134 120 89 69 46 31 16 7 [19] 7 3 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [37] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [55] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [73] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [91] 0 0 0 0 0 0 0 0 0 0 [1] 0 0 0 0 1 10 49 113 160 168 147 113 81 55 37 24 15 10 [19] 6 4 2 2 1 1 0 0 0 0 0 0 0 0 0 0 0 0 [37] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [55] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [73] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [91] 0 0 0 0 0 0 0 0 0 0 [1] Ergebnisse [1] Analyse der Eingangsdaten [1] Mean: 0.104537195 [1] SAbw.: 0.0277657985898433 [1] Parameter-Berechnung der Daten bei angenommener Gumbelverteilung [1] Mue: 0.0920411082987717 [1] Beta: 0.0216489043196013 [1] KS-Test - 1000 Werte, 100 Bins, x: Klassenmitten, y1, y2 = Histogrammhöhen [1] KST D: 0.04 [1] KST P: 1 XXX Case 4 digits: [1] 0 0 0 0 0 24 74 98 133 147 134 120 89 69 46 31 16 7 [19] 7 3 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [37] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [55] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [73] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [91] 0 0 0 0 0 0 0 0 0 0 [1] 0.000 0.000 0.000 0.006 0.622 10.094 49.271 112.776 160.174 [10] 168.419 146.527 113.137 81.026 55.344 36.690 23.870 15.347 9.793 [19] 6.220 3.939 2.490 1.572 0.992 0.625 0.394 0.248 0.157 [28] 0.099 0.062 0.039 0.025 0.016 0.010 0.006 0.004 0.002 [37] 0.002 0.001 0.001 0.000 0.000 0.000 0.000 0.000 0.000 [46] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 [55] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 [64] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 [73] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 [82] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 [91] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 [100] 0.000 [1] Ergebnisse [1] Analyse der Eingangsdaten [1] Mean: 0.104537195 [1] SAbw.: 0.0277657985898433 [1] Parameter-Berechnung der Daten bei angenommener Gumbelverteilung [1] Mue: 0.0920411082987717 [1] Beta: 0.0216489043196013 [1] KS-Test - 1000 Werte, 100 Bins, x: Klassenmitten, y1, y2 = Histogrammhöhen [1] KST D: 0.2 [1] KST P: 0.0366 Thanks in advance for some help. Jochen -- View this message in context: http://r.789695.n4.nabble.com/Problems-with-ks-test-tp3703469p3703469.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Use ks.test() or an alternative in C/C++ application
Hi, I am looking for an opportunity to make a KS-Test in my C/C++-app. Unfortunately I am not able to find a lib or function in C or C++ which does the job. For my other numerical stuff Gnu Scientific Library was recommended to me. What to do now? I read that there are options to call R in C++-Code. How does that work and is this a good option if the test will be called very often and the program should be running on a computer cluster? Thanks in advance! -- View this message in context: http://r.789695.n4.nabble.com/Use-ks-test-or-an-alternative-in-C-C-application-tp3688640p3688640.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ecdf() to nls() - how to transform data?
Ok, I think I had a good idea to solve my problem and need someone who second me on that or tell me what a fool I am :-) . My problem: I went for curve-fitting with nls() and took knot()-ecdf() for collecting data for nls-basis-dataframe. I came into trouble, because my x-y-data of the data.frame() was not equal. I computed 1000 values and there were always some ties, those ties were not returned by knot()-function. Okay, I thought about it and now I am using 100 percentile-values for the data-frame - nls() works fine - moreover 100 percentiles are always there, exceptions won't happen. Is it wise to use 100 cumulative percentile for curve-fitting-base-data instead of ecdf()-function data? Thanks in advance. -- View this message in context: http://r.789695.n4.nabble.com/ecdf-to-nls-how-to-transform-data-tp3671754p3676932.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ecdf() to nls() - how to transform data?
Thanks David and Peter! I tried to improve my R-script to get closer to my goal. I guess I have to use nls(), because later I want to work with Levenberg-Marquardt-Algorithm and when I got it right, LM-Algorithm uses least squares as well, fitdistr() instead uses Maximum Likelihoods. Anyway, I am wondering, how to create the appropriate data.frame for nls()? In my opinion all data must be there: I got the absolute numbers in my dataset from the internet. I got the histogram/density function right, as you can see in the graph number 3 after running the following example. I got the cumulative density right as well - so how to find x and y for the data-frame in line 116? Is it necessary/easier to get this example worked with absolute numbers? The two easy tutorials for nls() worked out as well, so that can't be that hard I guess, but I am sitting here for two days and got no idea what to do now!? Thanks in advance for a little more input guys. # ### R-Skript Example Fit ### # cat( * \n ) cat( Script curvefit - from histogram to function \n ) cat( \n ) # Preparation setwd( /home/joba/jobascripts/rproject/ ) rm( list=ls( all=TRUE ) ) # Organise X11, x Rows, y Graphs par( mfrow=c( 2, 2 ) ) ### tutorial 1 # Build data.frame() x - seq( 0.0, 0.30, length=10 )# seq( from, to, by ) y - c( 0.04, 0.06, 0.12, 0.14, 0.15, 0.16, 0.18, 0.19, 0.20, 0.26 )# vals of target function df - data.frame( x=x, y=y ) #print( df ) plot( df, main=Tutorial 1 ) # Formula: y = x^a , look for a res - nls( y ~ I(x^exponent), data = df, start = list( exponent = 1 ), trace = T ) print( round( summary(res)$coefficients[1], 3 )) lines( x, predict( res, list( x = x ) ), lty = 1, col = blue ) # Formula: y = x^a + b, look for a and b res2 - nls( y ~ I( x^exponent + gamma ), data = df, start = list( exponent=1, gamma=0 ), trace = TRUE ) print( round( summary(res2)$coefficients[1], 3 )) lines( x, predict( res2, list( x = x ) ), lty = 1, col = red ) ### tutorial 2 len - 24 x - runif( len )# 24 random values between 0 and 1 y - x^3 + rnorm( len, 0, 0.06 ) # compute y-values and add some noise ds - data.frame( x = x, y = y ) # build data.frame #str( ds ) # Compute prediction s - seq( 0, 1, length = 100 ) # compute real-data, to get y-values m - nls( y ~ I(x^power), data = ds, start = list( power = 1 ), trace = T ) #print( class(m) ) print( summary(m) ) # plot fitted curve and known curve power - round( summary(m)$coefficients[1], 3 ) # get power from summary power.se - round( summary(m)$coefficients[2], 3 ) # get failure from summary plot( y ~ x, main = Tutorial 2, sub=Blue: fit; green: known ) lines( s, s^3, lty = 2, col = green ) # known real data lines( s, predict( m, list( x = s )), lty = 1, col = blue ) # predicted vals text( 0, 0.5, paste( y =x^ (, power, +/- , power.se, ), sep=), pos=4 ) ### real data # # --- Create Points --- # Real data and density hgd - read.csv( file=http://www.jochen-bauer.net/downloads/curve-fitting-tutorial/data01.txt;, sep=,, head=FALSE) #print( hgd$V1 ) print( summary( hgd$V1 ) ) denspoints - density( hgd$V1 ) # define variables for later use msa - MSA-Name c1 - col1 c2 - col2 histtitle - paste( msa, Spalten:, c1, ,, c2 ) # # --- Plot --- # Organise X11, x Rows, y Graphs #par( mfrow=c( 1, 2 ) ) # Histogram histo - hist( hgd$V1, freq=FALSE, main=histtitle, xlab=U-Value, ylab=Density) # Density-Line lines( denspoints, col=GREEN, lwd=1 ) # Plot cumulatives plot( ecdf( hgd$V1 ), col=GREEN, lwd=1, main=, xlab=U-Value, ylab=Probability ) # real data # # --- Fit --- # Matching Gumbel distribution as tip from expert # Gumbel-Dist-Function, cumulative # ( - ( x - mue ) / beta ) # ( -e )^ # F(x) = e^ # = exp( - exp( - ( x - mue ) / beta ) ) # Thanks to Peter Dalgaard :-) # Starting guesses: mue = medianOfData, beta = sdevOfData mueStart - median( hgd$V1 ) betaStart - sd( hgd$V1 ) print( paste( mueStart: , mueStart )) print( paste( betaStart: , betaStart )) # Build data.frame() # x = min of uvals to max of uvals # y = probabilities, which add to 1, normalized histogramms could help ? ### XXX # X # x - seq( 0.0, 0.30, length=10 )# seq( from, to, by ) y - c( 0.04, 0.06, 0.12, 0.14, 0.15, 0.16, 0.18, 0.19, 0.20, 0.26 )# vals of target function df - data.frame( x=x, y=y ) print( df ) # Formula: y = x^a , look for a res - nls( y ~ exp( - exp( - ( x - mue ) / beta )), data = df, start = list(
Re: [R] ecdf() to nls() - how to transform data?
Hi David, my first attempt to work through your code was successful, my predicted line is pretty close to the ecdf-function. I have no idea why you inverted the gumbel-function and what the advantage of this strategy is? I interpret your (1:100/100)-trick like this: you build a sequence of 100 Tics. In addition to this 100 is identical to the number of created points. In my real world situation, ecdf() will produce ties, same values, and this results that I do not know what number knot() will return, then this will result in a non-symmetric-data.frame()-error. R-Code: ### Tutorial 3, Gamma-Distribution-Points to Gumbel # --- create Points --- # http://r.789695.n4.nabble.com/ecdf-to-nls-how-to-transform-data-td3671754.html print( Tutorial 3 ) p - rgamma( 100, 2) print( p ) ep - ecdf( p ) x - knots(ep) hist( p, main=Tutorial 3 ) ypre - seq( from=1, to=100, by=1 ) y - ypre / 100 df - data.frame( x=x, y=y ) plot( df ) res - nls( y ~ exp( - exp( - ( x - mue ) / beta )), start=list(mue=2, beta=2), data=df, trace=TRUE ) print( summary(res) ) #print( str(res)) lines( x, predict( res, list( x=x )), lty = 1, col = blue ) # predicted vals Console-Output: [1] Tutorial 3 [1] 4.75748951 5.36642043 2.03025702 1.80216440 1.40745277 0.91393251 [7] 1.30575505 2.07451301 1.18500815 0.24972503 1.36604865 2.36786796 [13] 1.37386798 1.72390843 1.93700139 0.78722468 2.92828385 1.51165415 [19] 4.17000960 2.98356424 1.20195996 1.85899533 0.94863757 1.50438053 [25] 0.50854274 3.32760075 2.21334316 0.58463817 2.26985676 2.88033389 [31] 2.00718903 1.19043470 1.62945752 0.32010478 0.54837755 1.32965131 [37] 3.32024573 0.53825226 3.30077557 0.46426513 1.00681720 0.59276030 [43] 1.31431609 2.27822419 2.56404713 0.52459218 1.05228996 2.23799673 [49] 1.14438962 2.41311612 1.40254244 1.20379073 0.44195457 0.95880408 [55] 1.45254027 4.49645001 2.18826796 1.07161515 1.62617544 3.15506003 [61] 2.15611491 2.43261017 1.40293461 2.79977886 3.44503201 3.25282551 [67] 0.91570531 0.70243512 5.67971774 3.55532192 1.71515046 2.97718949 [73] 0.47145131 0.32879089 0.60735231 0.92388638 4.29277857 1.73681839 [79] 0.09387383 2.81281295 1.84419058 2.63070353 1.52298124 2.89761263 [85] 1.05251987 1.24258703 3.09130229 2.66738574 3.17035060 2.15287327 [91] 2.63042751 1.69736779 3.21126033 1.56920999 2.68985477 0.82952068 [97] 3.62230865 1.65286592 2.76798783 2.08091935 Formula: y ~ exp(-exp(-(x - mue)/beta)) Parameters: Estimate Std. Error t value Pr(|t|) mue 1.382744 0.005926 233.3 2e-16 *** beta 0.984836 0.008816 111.7 2e-16 *** -- View this message in context: http://r.789695.n4.nabble.com/ecdf-to-nls-how-to-transform-data-tp3671754p3674227.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ecdf() to nls() - how to transform data?
Hi, I am using ecdf-function and want to use the ecdf()-data-points for nls() as data-parameter. nls() expects 'list' or 'environment' as a data-type, knots(ecdf(mydata)) gives me 'numeric'. What should I do now? Thanks in advance - Jochen Here is the code: # # --- Fit --- # Gumbel-Dist-Function, cumulative, http://en.wikipedia.org/wiki/Gumbel_distribution # ( - ( x - mue ) / beta ) # ( -e )^ # F(x) = e^ # formula for fitting-function # formula: y ~ I( exp(1) ^ ( - exp(1) ) ^ ( - ( x - mue ) / beta ) ) # data: ecdf( hgd$V1 ) # start: list( mue=0.1, beta=0.025 ) gpfunction - ecdf( hgd$V1 ) gplistrange - seq( 0.0, 1, by=0.001 ) gplist - gpfunction( gplistrange ) print( gplist ) print( class( gplist ) ) print(---) #res - nls( y ~ I( exp(1) ^ ( - exp(1) ) ^ ( - ( x - mue ) / beta ) ), df, list( mue=0.1, beta=0.025 ), trace=TRUE ) #print( summary( res ) ) -- View this message in context: http://r.789695.n4.nabble.com/ecdf-to-nls-how-to-transform-data-tp3671754p3671754.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Add a density line to a cumulative histogram - second try
Thanks, I found the function ecdf() which does the job. plot( ecdf( nvtpoints), col=BLUE, lwd=1, add=TRUE ) -- View this message in context: http://r.789695.n4.nabble.com/Add-a-density-line-to-a-cumulative-histogram-second-try-tp3666969p3669310.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Add a density line to a cumulative histogram - second try
Hi list, this is my second try for first post on this list (I tried to post via email and nothing appeared in my email-inbox, so now I try to use the nabble-web-interface) - I hope that you will only have to read one post in your inbox! Okay, my question ... I was able to plot a histogram and add the density()-line to this plot. I was able to plot a cumulative form of this histogram. Yet, I was not able to add the density line to this cumulative histogram. You can watch a picture of the histograms here: http://www.jochen-bauer.net/downloads/histo-cumulative-density.png Source: # Histogramm histo - hist( hgd$V1, freq=FALSE ) # Dichte-Schätzer-Funktion reinzeichnen denspoints - density( hgd$V1 ) lines( denspoints, col=GREEN, lwd=2 ) # Kumulative Verteilung relativer Häufigkeiten relcum - cumsum( histo$counts ) / sum(histo$counts) barplot( relcum, names.arg=round( histo$mids, 2), col=green, ylab=Wahrscheinlichkeit) # Kumulative Dichtefunktion ? Thanks in advance - Jochen -- View this message in context: http://r.789695.n4.nabble.com/Add-a-density-line-to-a-cumulative-histogram-second-try-tp3666969p3666969.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.