I think it is more clear-cut than so, at least if the Poisson situation is something to go by.
There, you can do either of these and get equivalent results > fit.lung <- glm(cases ~ age + city, offset=log(pop), + family=poisson, data=lungcancer) > fit.lung2 <- glm(cases/pop ~ age + city, weights=pop, + family=poisson, data=lungcancer) There were 12 warnings (use warnings() to see them) (Except for the warnings about non-integer responses, which have annoyed some epidemiologists trying to work with non-log link fuctions.) The point is that you need to convert to rates on the LHS and then compensate for the fact that this has a smaller variance when the population is larger. Counts on the LHS combined with weights wouldn't be right. So I would expect that the weighted version of the OP's code should model Catch/Effort, although I'm not quite sure how glm.nb reacts to non-integer responses. -pd > On 31 Oct 2023, at 17:59 , Ben Bolker <bbol...@gmail.com> wrote: > > [Please keep r-help in the cc: list] > > I don't quite know how to interpret the difference between specifying effort > as an offset vs. as weights; I would have to spend more time thinking about > it/working through it than I have available at the moment. > > I don't know that specifying effort as weights is *wrong*, but I don't know > that it's right or what it is doing: if I were the reviewer of a paper (for > example) I would require you to explain what the difference is and convince > me that it was appropriate. (Furthermore, "I want to do it this way because > it gives me significant effects" is automatically suspicious.) > > This would be a good question for CrossValidated > (https://stats.stackexchange.com), you could try posting it there (I would be > interested in the answer!) > > cheers > Ben Bolker > > > On 2023-10-30 8:19 p.m., 유준택 wrote: >> Dear Mr. Bolker, >> Thank you for the fast response. >> I also know that a poisson (or negative binomial ) regression of glm is >> generally modelled using an offset variable. >> In this case, when a weights term instead of the offset is used, this gave >> me significant coefficients of covariance. >> I understand that the weights function for exponential family distributions >> in glm affects the variance of response variable. >> I was just wondering whether my first model is a completely wrong model and >> the use of offset variable is valid in the case that >> response variable is not proportional to offset variable such as my dataset. >> Sincerely, >> Joon-Taek >> 2023년 10월 29일 (일) 오전 3:25, Ben Bolker <bbol...@gmail.com >> <mailto:bbol...@gmail.com>>님이 작성: >> Using an offset of log(Effort) as in your second model is the more >> standard way to approach this problem; it corresponds to assuming that >> catch is strictly proportional to effort. Adding log(Effort) as a >> covariate (as illustrated below) tests whether a power-law model (catch >> propto (Effort)^(b+1), b!=0) is a better description of the data. (In >> this case it is not, although the confidence intervals on b are very >> wide, indicating that we have very little information -- this is not >> surprising since the proportional range of effort is very small >> (246-258) in this data set. >> In general you should *not* check overdispersion of the raw data >> (i.e., the *marginal distribution* of the data, you should check >> overdispersion of a fitted (e.g. Poisson) model, as below. >> cheers >> Ben Bolker >> edata <- data.frame(Catch, Effort, xx1, xx2, xx3) >> ## graphical exploration >> library(ggplot2); theme_set(theme_bw()) >> library(tidyr) >> edata_long <- edata |> pivot_longer(names_to="var", cols =-c("Catch", >> "Effort")) >> ggplot(edata_long, aes(value, Catch)) + >> geom_point(alpha = 0.2, aes(size = Effort)) + >> facet_wrap(~var, scale="free_x") + >> geom_smooth(method = "glm", method.args = list(family = >> "quasipoisson")) >> # >> library(MASS) >> g1 <- glm.nb(Catch~xx1+xx2+xx3+offset(log(Effort)), data=edata) >> g2 <- update(g1, . ~ . + log(Effort)) >> g0 <- glm(Catch~xx1+xx2+xx3+offset(log(Effort)), data=edata, >> family = poisson) >> performance::check_overdispersion(g0) >> summary(g1) >> summary(g2) >> options(digits = 3) >> confint(g2) >> summary(g1) >> On 2023-10-28 3:30 a.m., 유준택 wrote: >> > Colleagues, >> > >> > >> > >> > I have a dataset that includes five variables. >> > >> > - Catch: the catch number counted in some species (ind.) >> > >> > - Effort: fishing effort (the number of fishing vessels) >> > >> > - xx1, xx2, xx3: some environmental factors >> > >> > As an overdispersion test on the “Catch” variable, I modeled with >> negative >> > binomial distribution using a GLM. The “Effort” variable showed a >> gradually >> > decreasing trend during the study period. I was able to get the >> results I >> > wanted when considered “Effort” function as a weights function in the >> > negative binomial regression as follows: >> > >> > >> > >> > library(qcc) >> > >> > >> >> Catch=c(25,2,7,6,75,5,1,4,66,15,9,25,40,8,7,4,36,11,1,14,141,9,74,38,126,3) >> > >> > >> >> Effort=c(258,258,258,258,258,258,258,254,252,252,252,252,252,252,252,252,252,252,252,248,246,246,246,246,246,246) >> > >> > >> >> xx1=c(0.8,0.5,1.2,0.5,1.1,1.1,1.0,0.6,0.9,0.5,1.2,0.6,1.2,0.7,1.0,0.6,1.6,0.7,0.8,0.6,1.7,0.9,1.1,0.5,1.4,0.5) >> > >> > >> >> xx2=c(1.7,1.6,2.7,2.6,1.5,1.5,2.8,2.5,1.7,1.9,2.2,2.4,1.6,1.4,3.0,2.4,1.4,1.5,2.2,2.3,1.7,1.7,1.9,1.9,1.4,1.4) >> > >> > >> >> xx3=c(188,40,2,10,210,102,117,14,141,28,48,15,220,115,10,14,320,20,3,10,400,150,145,160,460,66) >> > >> > # >> > >> > edata <- data.frame(Catch, Effort, xx1, xx2, xx3) >> > >> > # >> > >> > qcc.overdispersion.test(edata$Catch, type="poisson") >> > >> > # >> > >> > summary(glm.nb(Catch~xx1+xx2+xx3, weights=Effort, data=edata)) >> > >> > summary(glm.nb(Catch~xx1+xx2+xx3+offset(log(Effort)), data=edata)) >> > >> > >> > >> > I am not sure the application of the weights function to the negative >> > binomial regression is correct. Also I wonder if there is a >> better way >> > doing this. Can anyone help? >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help@r-project.org <mailto:R-help@r-project.org> mailing list >> -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> <https://stat.ethz.ch/mailman/listinfo/r-help> >> > PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> <http://www.R-project.org/posting-guide.html> >> > and provide commented, minimal, self-contained, reproducible code. >> ______________________________________________ >> R-help@r-project.org <mailto:R-help@r-project.org> mailing list -- >> To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> <https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> <http://www.R-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.