[R] Limitations and scale of R, and performance issues if and when limit reached

Stratos Laskarides Thu, 21 Oct 2010 14:41:36 -0700

 Hi there

Thank you for everyone's help in all my previous questions.


By way of intro, I am a masters student in actuarial science at the
University of Cape Town, and I am doing a project in R on some healthcare
cost data. Just for clarity before I embark on further research may I please
ask the following.

I want to take the direction of modelling healh insurance claims data with
Tweedie compound poisson models for over 2 million beneficiaries. I'd also
like to work in a double GLM framework so that the dispersion parameter
captures as much variance as possible. In addition, I'd like these results
to somehow feed into a stochastic model application, which will form part of
a Dynamic Financial Analysis model of a health insurer.

My question is, in light of the above broad overview, how large must data
sets be before R faces any performance problems or issues? In other words
what "scale" can R handle?

Thanks ever so much once again.

Kind regards
Stratos

 On Tue, Oct 12, 2010 at 11:31 AM, Dennis Murphy <djmu...@gmail.com> wrote:

> Hi:
>
>  On Tue, Oct 12, 2010 at 12:51 AM, Stratos Laskarides <stratl...@gmail.com
> > wrote:
>
>>  Dear Madam/Sir
>>
>> This may be quite a long shot...
>>
>> By way of intro, I am a masters student in actuarial science at the
>> University of Cape Town, and I am doing a project in R on some healthcare
>> cost data. During my coding in R I encountered an error message, which I
>> then googled, but I am still unable to resolve the issue.
>>
>> I would like to please ask if and how it is possible to resolve the
>> problem
>> raised by the error message "Error: NA/NaN/Inf in foreign function call
>> (arg
>> 1) In addition: Warning message: *step size truncated due to divergence"
>> *in
>> R?
>>
>
> That error message can arise if division by zero occurs somewhere in the
> computation. Try using ftable() or some related function that will print
> out your
> complete table (4-way?) and check whether you have zero frequency in one
> or more cells. If there are zero frequencies, that does not necessarily
> explain
> the problem, but it's a reasonable initial hypothesis. Merging some
> categories to
> get enough frequencies per cell may be useful if you do have zero
> frequencies,
> and then try the fit again to see if you get more sensible results.
>
> When the error is thrown, it can be useful to do
> traceback()
>
> as it recalls the sequence of function calls that led up to the error, but
> it helps to
> have enough R experience to make heads or tails of the output :)
>
>>
>> As for some background on my specific data and research problem at hand, I
>> am fitting a gamma regression model to 13 000 lines of insurance claims
>> data, which will be regressed against categorical variables such as Age
>> Band, Gender, and Region.
>>
>
> The more variables you have in the model, the greater the number of cell
> combinations. A 15 x 2 x 5 combination of your three variables, for
> example, would generate 150 combinations of the three variables, and it's
> entirely possible for a few of those combinations to have small or zero
> frequencies.
> In addition, adding a new variable to the model would at least double the
> number
> of cells, spreading/thinning out the data even more.
>
>>
>> Perhaps my problem arises because the data set is too large and the
>> iteratively reweighted least squares algorithm therefore cannot converge,
>> in
>> which case I perhaps need another GLM type. Or maybe the categorical
>> explanatory variables can take on too many values (e.g. there are 15 Age
>> Bands, 5 Regions).
>>
>
> If your response is continuous and positive valued with a right skewed
> distribution,
> then a Gamma model would appear to be sensible.
>
> The data set is not too large; successful GLMs have been fit with much
> larger
> data sets. Your second hypothesis sounds more plausible, though.
>
> HTH,
> Dennis
>
>>
>> Any insights you could provide would be much appreciated.
>>
>> Thank you ever so much.
>>
>> Kind regards
>> Stratos Laskarides
>> South Africa
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Limitations and scale of R, and performance issues if and when limit reached

Reply via email to