Re: [R] Regression performance when using summary() twice

John Fox Fri, 21 Jun 2024 09:15:46 -0700

Dear Christian,

You're apparently using the glm.nb() function in the MASS package.

Your function is peculiar in several respects. For example, you specifythe model formula as a character string and then convert it into aformula, but you could just pass the formula to the function -- theconversion seems unnecessary. Similarly, you compute the summary for themodel twice rather than just saving it in a local variable in yourfunction. And the form of the function output is a bit strange, but Isuppose you have reasons for that.

The primary reason that your function is slow, however, is that theconfidence intervals computed by confint() profile the likelihood, whichrequires refitting the model a number of times. If you're willing to usepossibly less accurate Wald-based rather than likelihood-basedconfidence intervals, computed, e.g., by the Confint() function in thecar package, then you could speed up the computation considerably,


Using a model fit by example(glm.nb),

        library(MASS)
        example(glm.nb)
        microbenchmark::microbenchmark(
          Wald = car::Confint(quine.nb1, vcov.=vcov(quine.nb1),
                       estimate=FALSE),
          LR = confint(quine.nb1)
        )

which produces

Unit: microseconds
 expr       min       lq       mean    median       uq        max
 Wald   136.366   161.13   222.0872   184.541   283.72    386.466
   LR 87223.031 88757.09 95162.8733 95761.568 97672.23 182734.048
 neval
   100
   100


I hope this helps,
 Johm
--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://www.john-fox.ca/
--
On 2024-06-21 10:38 a.m., c.bu...@posteo.jp wrote:

[You don't often get email from c.bu...@posteo.jp. Learn why this isimportant at https://aka.ms/LearnAboutSenderIdentification ]


Caution: External email.


Hello,

I am not a regular R user but coming from Python. But I use R for
several special task.

Doing a regression analysis does cost some compute time. But I wonder
when this big time consuming algorithm is executed and if it is done
twice in my sepcial case.

It seems that calling "glm()" or similar does not execute the time
consuming part of the regression code.
It seems it is done when calling "summary(model)".
Am I right so far?

If this is correct I would say that in my case the regression is down
twice with the identical formula and data. Which of course is
inefficient. See this code:

my_function <- function(formula_string, data) {
             formula <- as.formula(formula_string)
             model <- glm.nb(formula, data = data)

             result = cbind(summary(model)$coefficients, confint(model))
             result = as.data.frame(result)

             string_result = capture.output(summary(model))

             return(list(result, string_result))
         }

I do call summary() once to get the "$coefficents" and a second time
when capturing its output as a string.

If this really result in computing the regression twice I ask myself if
there is a R-way to make this more efficent?

Best regards,
Christian Buhtz

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help

PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regression performance when using summary() twice

Reply via email to