Dear R-help readers, I am writing to you in order to ask you a few questions about distribution fitting in R.
I am trying to find out whether the set of event interarrival times that I am currently analyzing is distributed with a Gamma or General Pareto distribution. The event arrival granularity is in minutes and interarrival times are in seconds, so the values I have are 0, 60, 120, 180, and so on... Here is what I did. Since several values of the interarrival_times array are zero, to avoid numerical errors when calculating their logarithm, I set them to 1E-10: > # fix numerical values > for (i in 1:length(interarrival_times)) { + if (interarrival_times[i] == 0) + interarrival_times[i] = 0.0000000001 + } Then I defined the negative log likelihood for the Gamma distribution: > nll_gamma_log <- function(lambda_log, alfa_log) { + n <- length(interarrival_times) + x <- interarrival_times + -n*exp(alfa_log)*lambda_log + n*log(gamma(exp(alfa_log))) - (exp(alfa_log)-1)*sum(log(x)) + exp(lambda_log)*sum(x) + } and passed it to the mle function: > est_gamma <- mle(minuslog = nll_gamma_log, start = list(lambda_log = 0, alfa_log=0)) There were 50 or more warnings (use warnings() to see the first 50) of course I got many "value out of range" warnings: > warnings() Warning messages: 1: In log(gamma(exp(log_alfa))) ... : value out of range in 'gammafn' [...] 7: In gamma(exp(log_alfa)) ... : NaNs produced [...] 50: In log(gamma(exp(log_alfa))) ... : value out of range in 'gammafn' but that was expected, right? The real problems come out when I try to fit a General Pareto distribution. Using the same method, I calculated the negative log likelihood function for the General Pareto distribution: > # negative log-likelihood function for general pareto distribution > nll_gpd <- function(xi, log_sigma, mu) { # shape, scale, location + n <- length(interarrival_times) + x <- interarrival_times + cat("xi:",xi,"; log_sigma:",log_sigma,"; mu:",mu,"\n") + n*log_sigma + (xi+1)*sum(log(1+xi*(x-mu)/exp(log_sigma)))/xi + } and passed it to the mle function. However, this time I get some errors: > est_gpd <- mle(minuslog = nll_gpd, start = list(xi = 1, log_sigma = 1, mu = 1)) xi: 1 ; log_sigma: 1 ; mu: 1 xi: 1.001 ; log_sigma: 1 ; mu: 1 xi: 0.999 ; log_sigma: 1 ; mu: 1 xi: 1 ; log_sigma: 1.001 ; mu: 1 xi: 1 ; log_sigma: 0.999 ; mu: 1 xi: 1 ; log_sigma: 1 ; mu: 1.001 xi: 1 ; log_sigma: 1 ; mu: 0.999 xi: 52007.84 ; log_sigma: 13414.95 ; mu: 4241.565 xi: 10402.37 ; log_sigma: 2683.789 ; mu: 849.113 xi: 18.08721 ; log_sigma: 3.862682 ; mu: 2.627865 xi: 18.08721 ; log_sigma: 3.860682 ; mu: 2.627865 Error in optim(start, f, method = method, hessian = TRUE, ...) : non-finite finite-difference value [2] In addition: There were 50 or more warnings (use warnings() to see the first 50) I also get many warnings: > warnings() Warning messages: 1: In base::log(x, base) ... : NaNs produced 2: In base::log(x, base) ... : NaNs produced [...] 50: In base::log(x, base) ... : NaNs produced I really don't know how to fix this problem. Do you have any suggestions? Thank you very much in advance. Best regards, Mauro Tortonesi -- Mauro Tortonesi, Ph.D. Research Assistant Distributed Systems Research Group Engineering Department University of Ferrara [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.