Re: [R] glmer with non integer weights
hello, krebs (1995) states MH as prob., but yes it's rather a ratio of probs. at each site i had 4 blocks with 2 treatments (treat vs. control) - after treating i looked for similarity between each of those pairs. it is of interest if changes in similarity due to treatment differ between stages. hope this clarifies the thing a bit. greetings, kay -- View this message in context: http://r.789695.n4.nabble.com/glmer-with-non-integer-weights-tp1837179p2062412.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glmer with non integer weights
Sorry for this late answer (I've had a seriously nonmaskable interrupt). Since I have technical questions not related to R, I take the liberty to follow this up by e-mail. I might post a followup summary if another R problem arises... Emmanuel Charpentier Le lundi 19 avril 2010 à 23:40 -0800, Kay Cichini a écrit : hello, it's the Morisita Horn Index, which is an ecological index for similarity between two multivariate objects (vegetation samples with species and its abundance) where a value of one indicates completely same relative importance of species in both samples and 0 denotes total absence of any same species. it can be expressed as a probability: (prob. that an individual drawn from sample j and one from sample k belong to the same species) - = MH-INdex (prob. that two ind. drawn from either sample will belong to same species) [ Technical rambling ] Hmmm ... that's the *ratio* of two probabilities, not a probability. According to http://www.tnstate.edu/ganter/B412%20Ch%201516% 20CommMetric.html, that I found in the first page of answers to a simple google query, in can also be thought of the ratio of two distances between sites : (maximal distnce - actual distance) / maximal distance (with a massive (over-?) simplification). There is no reason to think a priori that the logit transformation (or the asin(sqrt()) transformation has better properties for this index than any other mapping from [0 1] to R. (BTW, a better mapping might be from [0 1] to [0 Inf] or conversely, but negative distances have no (obvious) meaning. Here asin(sqrt()) might make more sense that qlogis().) [ End of technical rambling ] But I have trouble understanding how a similarity or distance index can characterize *one* site... Your data clearly associate a MH.Index to each site : what distance or similarity do you measure at this site ? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glmer with non integer weights
hello, it's the Morisita Horn Index, which is an ecological index for similarity between two multivariate objects (vegetation samples with species and its abundance) where a value of one indicates completely same relative importance of species in both samples and 0 denotes total absence of any same species. it can be expressed as a probability: (prob. that an individual drawn from sample j and one from sample k belong to the same species) - = MH-INdex (prob. that two ind. drawn from either sample will belong to same species) it is also covered in library(vegan);?vegdist here it is its complement: 1-MH, which then is a dissimilarity measure best regards, kay -- View this message in context: http://n4.nabble.com/glmer-with-non-integer-weights-tp1837179p2017027.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glmer with non integer weights
hi emmanuel, thanks a lot for your extensive answer. do you think using the asin(sqrt()) transf. can be justified for publishing prurpose or do i have to expect criticism. naivly i excluded that possibility, because of violated anova-assumptions, but if i did get you right the finite range rather posses a problem here. why is it in this special case an advantage? greetings, kay -- View this message in context: http://n4.nabble.com/glmer-with-non-integer-weights-tp1837179p2015732.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glmer with non integer weights
Le lundi 19 avril 2010 à 03:00 -0800, Kay Cichini a écrit : hi emmanuel, thanks a lot for your extensive answer. do you think using the asin(sqrt()) transf. can be justified for publishing prurpose or do i have to expect criticism. Hmmm ... depends of your reviewers. But if an half-asleep dental surgeon caught that after an insomnia, you might expect that a fully caffeinated reviewer will. Add Murphy's law to the mix and ... boom ! naivly i excluded that possibility, because of violated anova-assumptions, but if i did get you right the finite range rather posses a problem here. No. your problem is that you model a probability as a smooth (linear) finite function of finite variables. Under those assumptions, you can't get a *certitude* (probability 0 or 1). Your model is *intrinsically* inconsistent with your data. In other word, I'm unable to believe both your model (linear whathyoumaycallit regression) and your data (wich include certainties) *simultaneously*. I'd reconsider your 0 or 1, as meaning *censored* quantities (i. e. no farther than some epsilon from 0 or 1), with *hard* data (i. e. not a cooked-up estimate such as the ones i used) to estimate epsilon. There are *lots* of ways to fit models with censored dependent variables. why is it in this special case an advantage? It's bloody hell *not* a specific advantage : if you want to fit a linear model to a a probability, you *need* some function mapping R to the open ]0 1[ (i. e. all reals strictly superior to 0 and strictly inferior to 1 ; I thing that's denoted (0 1) in English/American usage). Asin(sqrt()) does that. However, (asin(sqrt()))^-1 has a big problem (mapping back [0 1] i. e. *including* 0 and 1, *not* (0 1), to R) which *hides* the (IMHO bigger) problem of the inadequacy of your model to your data ! In other words, it lets you shoot yourself in the foot after a nice sciatic nerve articaïne block making the operation painless (but still harmful). On the other hand, logit (or, as pointed by Martin Maechler, qlogis), is kind enough to choke on this (i. e. returning back Inf values, which will make the regression program choke). So please quench my thirst : what exactly is MH.Index supposed to be ? How is it measured, estimated, guessed or divined ? HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glmer with non integer weights
Le vendredi 16 avril 2010 à 00:15 -0800, Kay Cichini a écrit : thanks thierry, i considered this transformations already, but variance is not stabilized and/or normality is neither achieved. i guess i'll have to look out for non-parametrics? Or (maybe) a model based on a non-Gaussian likelihood ? A beta distribution comes to mind, either fitted by maximum likelihood or (if relevant prior information is available) in a Bayesian framework ? But beware : you have a not-so-small problem ... Your data have zeroes and ones, which, if you have no information on a sample size, are sharp zeroes and ones, and there therefore theoretically bound to infinite linear predictors (in plain English : bloody unlikely). These values make a fixed effect analysis impossible : these points at infinite will make regression essentially impossible. Consider : logit-function(x)log(x/(1-x)) ilogit-function(x)1/(1+exp(-x)) ilogit(coef(lm(logit(MH.Index)~0+stage,data=similarity))) Erreur dans lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : NA/NaN/Inf dans un appel à une fonction externe (argument 4) You need to have some information on the precision of your zeroes and ones nd use it to get some possible values of MH.Index One might be tempted to unround them by a small amount (representing a reasonable guess on your precision : epsilon=0.01 ilogit(coef(lm(logit(MH.Index)~0 +stage,data=within(similarity,{MH.Index-pmax(epsilon,pmin(1-epsilon,MH.Index))} stageAstageBstageCstageD 0.6490997 0.2914323 0.5087639 0.5721789 BUT : epsilon=0.01 ilogit(coef(lm(logit(MH.Index)~0 +stage,data=within(similarity,{MH.Index-pmax(epsilon,pmin(1-epsilon,MH.Index))} stageAstageBstageCstageD 0.6588222 0.1020177 0.5087639 0.5721789 The estimation for stageB depends critically of the unrounding amount chosen. I tried a small fixed effect logit model in JAGS : it won't initialize with the original data (O and 1 are effectively impossible with possible values of the beta coefficients), and seems to exhibit, the same kind of sensitivity to unrounding amount that the linear model : LogitModFix-local({ Modele-function(){ for(k in 1:nobs) { ## logit(MH.Index[k])-lpi.i[k] lpi.i[k]~dnorm(lpi[stage[k]], tau.lpi[stage[k]]) } for(i in 1:nstage) { lpi[i]~dnorm(0,1.0E-6) tau.lpi[i]-pow(sigma.lpi[i],-2) sigma.lpi[i]~dunif(0,100) pi[i]-ilogit(lpi[i]) } } Data-function() { for (i in 1:nobs) { lpi.i[i]-logit(MH.Index[i]) } } tmpf-tempfile() ## write.model has been shoplifted from R2WinBUGS and adapted to JAGS ## by allowing a data argument for transformations write.model(Modele,tmpf, data=Data) epsilon-0.1 Modele.jags-jags.model(tmpf, data=with(similarity, list(MH.Index=pmax(epsilon,pmin(1-epsilon,MH.Index)), ## MH.Index=MH.Index, stage=stage, nobs=nrow(similarity), nstage=nlevels(stage))), n.chains=3) unlink(tmpf) Modele.coda-coda.samples(Modele.jags, variable.names=c(deviance, pi, sigma.lpi), n.iter=1000) list(Modele.jags, Modele.coda) }) ## Convergence (not shown) is quite acceptable summary(LogitModFix[[2]]) Iterations = 1001:2000 Thinning interval = 1 Number of chains = 3 Sample size per chain = 1000 1. Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE deviance 660.5656 4.71697 0.0861196 0.1672079 pi[1] 0.6304 0.21666 0.0039557 0.0040347 pi[2] 0.1506 0.08036 0.0014671 0.0014460 pi[3] 0.5082 0.03956 0.0007222 0.0006777 pi[4] 0.5696 0.07473 0.0013644 0.0012952 sigma.lpi[1] 6.9409 0.86442 0.0157821 0.0287626 sigma.lpi[2] 4.2775 0.49281 0.0089974 0.0132247 sigma.lpi[3] 1.0875 0.11506 0.0021006 0.0026048 sigma.lpi[4] 0.8801 0.29371 0.0053623 0.0118630 2. Quantiles for each variable: 2.5% 25% 50% 75%97.5% deviance 654.0216 657.13069 659.6788 662.9535 672.2101 pi[1] 0.1695 0.47600 0.6681 0.8107 0.9485 pi[2] 0.0413 0.09206 0.1345 0.1914 0.3564 pi[3] 0.4320 0.48079 0.5078 0.5366 0.5851 pi[4] 0.4154 0.52518 0.5722 0.6168 0.7135 sigma.lpi[1] 5.5553 6.33366 6.8287 7.4412 8.8207 sigma.lpi[2] 3.5082 3.93440 4.2274 4.5598 5.3497 sigma.lpi[3] 0.8833 1.00397 1.0805 1.1565 1.3412 sigma.lpi[4] 0.5304 0.67740 0.8189 1.0067 1.5987 The same model re-fitted with epsilon=0.01 gives : summary(LogitModFix[[2]]) Iterations = 1001:2000 Thinning interval = 1 Number of chains = 3 Sample size per chain = 1000 1. Empirical mean and
Re: [R] glmer with non integer weights
Addendum to my previous answer : In that special case, the limited range of the asin(sqrt()) transformation, which is a shortcoming, turns out to be useful. The fixed-effect doefficients seem semi-reasonable (except for stageB) : (sin(coef(lm(asin(sqrt(MH.Index))~0+stage, data=similarity^2 stageAstageBstageCstageD 0.6164870 0.3389430 0.5083574 0.5672021 The random effects being nested in the fixed efect, one can't afford to be lazy in the parametrization of the corresponding random effect : summary(lmer(asin(sqrt(MH.Index))~stage+(stage|site), data=similarity)) Linear mixed model fit by REML Formula: asin(sqrt(MH.Index)) ~ stage + (stage | site) Data: similarity AIC BIC logLik deviance REMLdev 155.3 199 -62.65111.8 125.3 Random effects: Groups NameVariance Std.Dev. Corr site (Intercept) 0.043579 0.20876 stageB 0.033423 0.18282 -0.999 stageC 0.043580 0.20876 -1.000 0.999 stageD 0.043575 0.20875 -1.000 0.999 1.000 Residual 0.128403 0.35833 Number of obs: 136, groups: site, 39 Fixed effects: Estimate Std. Error t value (Intercept) 0.930360.08431 11.035 stageB -0.308790.10079 -3.064 stageC -0.136600.09981 -1.369 stageD -0.077550.14620 -0.530 Correlation of Fixed Effects: (Intr) stageB stageC stageB -0.836 stageC -0.845 0.707 stageD -0.577 0.482 0.487 v-fixef(lmer(asin(sqrt(MH.Index))~stage+(stage|site), data=similarity)) v[2:4]-v[1]+v[2:4] names(v)[1]-stageA (sin(v))^2 stageAstageBstageCstageD 0.6429384 0.3390903 0.5083574 0.5672021 But again, we're exploiting a shortcoming of the asin(sqrt()) transformation. HTH, Emmanuel Charpentier Le vendredi 16 avril 2010 à 00:15 -0800, Kay Cichini a écrit : thanks thierry, i considered this transformations already, but variance is not stabilized and/or normality is neither achieved. i guess i'll have to look out for non-parametrics? best regards, kay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glmer with non integer weights
thanks thierry, i considered this transformations already, but variance is not stabilized and/or normality is neither achieved. i guess i'll have to look out for non-parametrics? best regards, kay -- View this message in context: http://n4.nabble.com/glmer-with-non-integer-weights-tp1837179p1965623.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glmer with non integer weights
thank you thomas for the helpful hint! yours, kay -- View this message in context: http://n4.nabble.com/glmer-with-non-integer-weights-tp1837179p1965827.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glmer with non integer weights
Dear Kay, There is a R list about mixed models. Which is a better place for your questions. The (quasi)binomial family is used with binary data or a ratio that originates from binary data. In case of a ratio you need to provide the number of trials through the weights argument. Further I would suggest to drop stage from either the random effects or the fixed effects. You are trying to estimate the same effect twice in a model. HTH, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Kay Cichini Verzonden: maandag 12 april 2010 16:12 Aan: r-help@r-project.org Onderwerp: [R] glmer with non integer weights hello, i'd appreciate help with my glmer. i have a dependent which is an index (MH.index) ranging from 0-1. this index can also be considered as a propability. as i have a fixed factor (stage) and a nested random factor (site) i tried to model with glmer. i read that it's possible to use a quasibinomial distribution, for this kind of data, which i than actually did - but firstly (1) i'm not quite sure if that's appropiate for my data, secondly (2) i wondered if the model can be correct when variance of then main and nested factor are zero. (3) also i could not yield p-values for that model. here's data, call and output: ## #call: ## glmer(MH~stage+(1|stage/site),family=quasibinomial) ## #output: ## #Generalized linear mixed model fit by the Laplace approximation #Formula: MH ~ stage + (1 | stage/site) # AIC BIC logLik deviance # 66.03 86.47 -26.0152.03 #Random effects: # Groups NameVariance Std.Dev. # site:stage (Intercept) 0.00 0.000 # stage (Intercept) 0.00 0.000 # Residual 0.076175 0.276 # Number of obs: 137, groups: site:stage, 39; stage, 4 #Fixed effects: #Estimate Std. Error t value #(Intercept) 0.392050.09009 4.352 #stageB -0.872140.12498 -6.978 #stageC -0.361530.12202 -2.963 #stageD -0.098840.19811 -0.499 #Correlation of Fixed Effects: # (Intr) stageB stageC #stageB -0.721 #stageC -0.738 0.532 #stageD -0.455 0.328 0.336 ## #my data: ## similarity-data.frame(list(structure(list(stage = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c(A, B, C, D), class = factor), site = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 11L, 11L, 12L, 12L, 12L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 15L, 15L, 15L, 15L, 16L, 16L, 16L, 17L, 17L, 17L, 17L, 18L, 18L, 19L, 19L, 19L, 19L, 20L, 20L, 20L, 20L, 21L, 21L, 21L, 21L, 22L, 22L, 22L, 22L, 23L, 23L, 23L, 24L, 24L, 24L, 24L, 25L, 25L, 25L, 25L, 26L, 26L, 26L, 26L, 27L, 27L, 27L, 27L, 28L, 28L, 28L, 28L, 29L, 29L, 29L, 30L, 30L, 30L, 30L, 31L, 31L, 32L, 32L, 32L, 32L, 33L, 33L, 33L, 33L, 34L, 34L, 34L, 34L, 35L, 35L, 35L, 35L, 36L, 36L, 36L, 36L, 37L, 37L, 38L, 38L, 38L, 38L, 39L, 39L, 39L ), .Label = c(A11, A12, A14, A15, A16, A17, A18, A19, A20, A5, A7, A8, B1, B12, B13, B14, B15, B17, B18, B2, B4, B7, B8, B9, C1, C10, C11, C15, C17, C18, C19, C2, C20, C3, C4, C6, D1, D4, D7), class = factor),
Re: [R] glmer with non integer weights
thanks thierry, my problem is that the index is a propability which is not derived from incidents per nr. of observations, thus i don't have those numbers but only the plain index, which i want to test. greatings, kay -- View this message in context: http://n4.nabble.com/glmer-with-non-integer-weights-tp1837179p1838268.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glmer with non integer weights
So your respons variable behaves like a continuous variable except that is range is limited to the 0-1 interval. In such a case I would transform the respons variable (e.g. logit, sqrt(arcsin())) and use a gaussian model. HTH, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Kay Cichini Verzonden: dinsdag 13 april 2010 13:37 Aan: r-help@r-project.org Onderwerp: Re: [R] glmer with non integer weights thanks thierry, my problem is that the index is a propability which is not derived from incidents per nr. of observations, thus i don't have those numbers but only the plain index, which i want to test. greatings, kay -- View this message in context: http://n4.nabble.com/glmer-with-non-integer-weights-tp1837179p 1838268.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Druk dit bericht a.u.b. niet onnodig af. Please do not print this message unnecessarily. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glmer with non integer weights
On Tue, 13 Apr 2010, ONKELINX, Thierry wrote: So your respons variable behaves like a continuous variable except that is range is limited to the 0-1 interval. In such a case I would transform the respons variable (e.g. logit, sqrt(arcsin())) and use a gaussian model. A logit-Normal has variance roughly mu^2(1-mu)^2 and a quasibinomial logistic uses mu(1-mu), with the parameters having the same interpretations. The question of which variance function best approximates the data should really be an empirical one, not an a prioiri one. There is an example of exactly this in the quasilikelihood chapter in McCullagh and Nelder, where the observations are the proportion of damage on a set of leaves. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.