Hi there Thank you for everyone's help in all my previous questions.
By way of intro, I am a masters student in actuarial science at the University of Cape Town, and I am doing a project in R on some healthcare cost data. Just for clarity before I embark on further research may I please ask the following. I want to take the direction of modelling healh insurance claims data with Tweedie compound poisson models for over 2 million beneficiaries. I'd also like to work in a double GLM framework so that the dispersion parameter captures as much variance as possible. In addition, I'd like these results to somehow feed into a stochastic model application, which will form part of a Dynamic Financial Analysis model of a health insurer. My question is, in light of the above broad overview, how large must data sets be before R faces any performance problems or issues? In other words what "scale" can R handle? Thanks ever so much once again. Kind regards Stratos On Tue, Oct 12, 2010 at 11:31 AM, Dennis Murphy <djmu...@gmail.com> wrote: > Hi: > > On Tue, Oct 12, 2010 at 12:51 AM, Stratos Laskarides <stratl...@gmail.com > > wrote: > >> Dear Madam/Sir >> >> This may be quite a long shot... >> >> By way of intro, I am a masters student in actuarial science at the >> University of Cape Town, and I am doing a project in R on some healthcare >> cost data. During my coding in R I encountered an error message, which I >> then googled, but I am still unable to resolve the issue. >> >> I would like to please ask if and how it is possible to resolve the >> problem >> raised by the error message "Error: NA/NaN/Inf in foreign function call >> (arg >> 1) In addition: Warning message: *step size truncated due to divergence" >> *in >> R? >> > > That error message can arise if division by zero occurs somewhere in the > computation. Try using ftable() or some related function that will print > out your > complete table (4-way?) and check whether you have zero frequency in one > or more cells. If there are zero frequencies, that does not necessarily > explain > the problem, but it's a reasonable initial hypothesis. Merging some > categories to > get enough frequencies per cell may be useful if you do have zero > frequencies, > and then try the fit again to see if you get more sensible results. > > When the error is thrown, it can be useful to do > traceback() > > as it recalls the sequence of function calls that led up to the error, but > it helps to > have enough R experience to make heads or tails of the output :) > >> >> As for some background on my specific data and research problem at hand, I >> am fitting a gamma regression model to 13 000 lines of insurance claims >> data, which will be regressed against categorical variables such as Age >> Band, Gender, and Region. >> > > The more variables you have in the model, the greater the number of cell > combinations. A 15 x 2 x 5 combination of your three variables, for > example, would generate 150 combinations of the three variables, and it's > entirely possible for a few of those combinations to have small or zero > frequencies. > In addition, adding a new variable to the model would at least double the > number > of cells, spreading/thinning out the data even more. > >> >> Perhaps my problem arises because the data set is too large and the >> iteratively reweighted least squares algorithm therefore cannot converge, >> in >> which case I perhaps need another GLM type. Or maybe the categorical >> explanatory variables can take on too many values (e.g. there are 15 Age >> Bands, 5 Regions). >> > > If your response is continuous and positive valued with a right skewed > distribution, > then a Gamma model would appear to be sensible. > > The data set is not too large; successful GLMs have been fit with much > larger > data sets. Your second hypothesis sounds more plausible, though. > > HTH, > Dennis > >> >> Any insights you could provide would be much appreciated. >> >> Thank you ever so much. >> >> Kind regards >> Stratos Laskarides >> South Africa >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.