Re: [R] weights vs. offset (negative binomial regression)

peter dalgaard Thu, 02 Nov 2023 00:28:53 -0700

I think it is more clear-cut than so, at least if the Poisson situation is 
something to go by.


There, you can do either of these and get equivalent results

> fit.lung <- glm(cases ~ age + city,  offset=log(pop), 
+                 family=poisson, data=lungcancer)
> fit.lung2 <- glm(cases/pop ~ age + city,  weights=pop, 
+                 family=poisson, data=lungcancer)
There were 12 warnings (use warnings() to see them)

(Except for the warnings about non-integer responses, which have annoyed some 
epidemiologists trying to work with non-log link fuctions.)

The point is that you need to convert to rates on the LHS and then compensate 
for the fact that this has a smaller variance when the population is larger. 
Counts on the LHS combined with weights wouldn't be right. So I would expect 
that the weighted version of the OP's code should model Catch/Effort, although 
I'm not quite sure how glm.nb reacts to non-integer responses.

-pd


> On 31 Oct 2023, at 17:59 , Ben Bolker <bbol...@gmail.com> wrote:
> 
>  [Please keep r-help in the cc: list]
> 
>  I don't quite know how to interpret the difference between specifying effort 
> as an offset vs. as weights; I would have to spend more time thinking about 
> it/working through it than I have available at the moment.
> 
>   I don't know that specifying effort as weights is *wrong*, but I don't know 
> that it's right or what it is doing: if I were the reviewer of a paper (for 
> example) I would require you to explain what the difference is and convince 
> me that it was appropriate. (Furthermore, "I want to do it this way because 
> it gives me significant effects" is automatically suspicious.)
> 
>  This would be a good question for CrossValidated 
> (https://stats.stackexchange.com), you could try posting it there (I would be 
> interested in the answer!)
> 
>  cheers
>    Ben Bolker
> 
> 
> On 2023-10-30 8:19 p.m., 유준택 wrote:
>> Dear Mr. Bolker,
>> Thank you for the fast response.
>> I also know that a poisson (or negative binomial ) regression of glm is  
>> generally modelled using an offset variable.
>> In this case, when a weights term instead of the offset is used, this gave 
>> me significant coefficients of covariance.
>> I understand that the weights function for exponential family distributions 
>> in glm affects the variance of response variable.
>> I was just wondering whether my first model is a completely wrong model and 
>> the use of offset variable is valid in the case that
>> response variable is  not proportional to offset variable such as my dataset.
>> Sincerely,
>> Joon-Taek
>> 2023년 10월 29일 (일) 오전 3:25, Ben Bolker <bbol...@gmail.com 
>> <mailto:bbol...@gmail.com>>님이 작성:
>>        Using an offset of log(Effort) as in your second model is the more
>>    standard way to approach this problem; it corresponds to assuming that
>>    catch is strictly proportional to effort. Adding log(Effort) as a
>>    covariate (as illustrated below) tests whether a power-law model (catch
>>    propto (Effort)^(b+1), b!=0) is a better description of the data.  (In
>>    this case it is not, although the confidence intervals on b are very
>>    wide, indicating that we have very little information -- this is not
>>    surprising since the proportional range of effort is very small
>>    (246-258) in this data set.
>>        In general you should *not* check overdispersion of the raw data
>>    (i.e., the *marginal distribution* of the data, you should check
>>    overdispersion of a fitted (e.g. Poisson) model, as below.
>>        cheers
>>         Ben Bolker
>>    edata <- data.frame(Catch, Effort, xx1, xx2, xx3)
>>    ## graphical exploration
>>    library(ggplot2); theme_set(theme_bw())
>>    library(tidyr)
>>    edata_long <- edata |> pivot_longer(names_to="var", cols =-c("Catch",
>>    "Effort"))
>>    ggplot(edata_long, aes(value, Catch)) +
>>          geom_point(alpha = 0.2, aes(size = Effort)) +
>>          facet_wrap(~var, scale="free_x") +
>>          geom_smooth(method = "glm", method.args = list(family =
>>    "quasipoisson"))
>>    #
>>    library(MASS)
>>    g1 <- glm.nb(Catch~xx1+xx2+xx3+offset(log(Effort)), data=edata)
>>    g2 <- update(g1, . ~ . + log(Effort))
>>    g0 <- glm(Catch~xx1+xx2+xx3+offset(log(Effort)), data=edata,
>>                family = poisson)
>>    performance::check_overdispersion(g0)
>>    summary(g1)
>>    summary(g2)
>>    options(digits = 3)
>>    confint(g2)
>>    summary(g1)
>>    On 2023-10-28 3:30 a.m., 유준택 wrote:
>>     > Colleagues,
>>     >
>>     >
>>     >
>>     > I have a dataset that includes five variables.
>>     >
>>     > - Catch: the catch number counted in some species (ind.)
>>     >
>>     > - Effort: fishing effort (the number of fishing vessels)
>>     >
>>     > - xx1, xx2, xx3: some environmental factors
>>     >
>>     > As an overdispersion test on the “Catch” variable, I modeled with
>>    negative
>>     > binomial distribution using a GLM. The “Effort” variable showed a
>>    gradually
>>     > decreasing trend during the study period. I was able to get the
>>    results I
>>     > wanted when considered “Effort” function as a weights function in the
>>     > negative binomial regression as follows:
>>     >
>>     >
>>     >
>>     > library(qcc)
>>     >
>>     >
>>    
>> Catch=c(25,2,7,6,75,5,1,4,66,15,9,25,40,8,7,4,36,11,1,14,141,9,74,38,126,3)
>>     >
>>     >
>>    
>> Effort=c(258,258,258,258,258,258,258,254,252,252,252,252,252,252,252,252,252,252,252,248,246,246,246,246,246,246)
>>     >
>>     >
>>    
>> xx1=c(0.8,0.5,1.2,0.5,1.1,1.1,1.0,0.6,0.9,0.5,1.2,0.6,1.2,0.7,1.0,0.6,1.6,0.7,0.8,0.6,1.7,0.9,1.1,0.5,1.4,0.5)
>>     >
>>     >
>>    
>> xx2=c(1.7,1.6,2.7,2.6,1.5,1.5,2.8,2.5,1.7,1.9,2.2,2.4,1.6,1.4,3.0,2.4,1.4,1.5,2.2,2.3,1.7,1.7,1.9,1.9,1.4,1.4)
>>     >
>>     >
>>    
>> xx3=c(188,40,2,10,210,102,117,14,141,28,48,15,220,115,10,14,320,20,3,10,400,150,145,160,460,66)
>>     >
>>     > #
>>     >
>>     > edata <- data.frame(Catch, Effort, xx1, xx2, xx3)
>>     >
>>     > #
>>     >
>>     > qcc.overdispersion.test(edata$Catch, type="poisson")
>>     >
>>     > #
>>     >
>>     > summary(glm.nb(Catch~xx1+xx2+xx3, weights=Effort, data=edata))
>>     >
>>     > summary(glm.nb(Catch~xx1+xx2+xx3+offset(log(Effort)), data=edata))
>>     >
>>     >
>>     >
>>     > I am not sure the application of the weights function to the negative
>>     > binomial regression is correct. Also I wonder if there is a
>>    better way
>>     > doing this. Can anyone help?
>>     >
>>     >       [[alternative HTML version deleted]]
>>     >
>>     > ______________________________________________
>>     > R-help@r-project.org <mailto:R-help@r-project.org> mailing list
>>    -- To UNSUBSCRIBE and more, see
>>     > https://stat.ethz.ch/mailman/listinfo/r-help
>>    <https://stat.ethz.ch/mailman/listinfo/r-help>
>>     > PLEASE do read the posting guide
>>    http://www.R-project.org/posting-guide.html
>>    <http://www.R-project.org/posting-guide.html>
>>     > and provide commented, minimal, self-contained, reproducible code.
>>    ______________________________________________
>>    R-help@r-project.org <mailto:R-help@r-project.org> mailing list --
>>    To UNSUBSCRIBE and more, see
>>    https://stat.ethz.ch/mailman/listinfo/r-help
>>    <https://stat.ethz.ch/mailman/listinfo/r-help>
>>    PLEASE do read the posting guide
>>    http://www.R-project.org/posting-guide.html
>>    <http://www.R-project.org/posting-guide.html>
>>    and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd....@cbs.dk  Priv: pda...@gmail.com

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] weights vs. offset (negative binomial regression)

Reply via email to