[R-sig-eco] PhD position in ecology and statistics at NTNU, Norway
(the deadline for this is the 31st of October, sorry about the short notice! If you apply, please mention that you saw this on r-sig-ecology) We have a vacancy for a position as a PhD research fellow at the Department of Mathematical Sciences, for someone interested in statistical modelling and applying it to problems of biodiversity and sustainability. I can promise the chance to work with a cross-disciplinary team: there will also be a student working on the ecological modelling, and we also have a very active group of students working with citizen science data. The successful applicant will work as part of an inter-disciplinary team to develop methods to aid in the understanding of the effects of water resource management on biodiversity. They will join a team working on statistical methods to apply ecological models to messy data. The project will focus on developing the statistical models to integrate different data from different sources (e.g. different surveys, and reports from citizen scientists) to model the biodiversity of Norwegian rivers and use these models to predict the impacts of resource management decisions. Duties of the position • make a workflow to bring data in from a variety of sources • develop models, based on recent work at NTNU, to model the distributions of riverine species • work with other members of the project to look at the effects of water resource management, such as the effects of dams. Required selection criteria The PhD-position's main objective is to qualify for work in research positions. The qualification requirement is that you have completed a master’s degree or second degree (equivalent to 120 credits) with a strong academic background in statistics or equivalent education with a grade of B or better in terms of NTNU’s grading scale. If you do not have letter grades from previous studies, you must have an equally good academic foundation. If you are unable to meet these criteria you may be considered only if you can document that you are particularly suitable for education leading to a PhD degree. The appointment is to be made in accordance with the regulations in force concerning State Employees and Civil Servants and national guidelines for appointment as PhD, post doctor and research assistant. Required selection criteria: • ability to work in a cross-disciplinary team • interest in applying statistical methods to ecological problems • good knowledge of statistical modeling methods Preferred selection criteria • experience with Bayesian methods • skills in hierarchical modelling • good written and oral English language skills More details here (including how to apply, and the formalities): https://www.jobbnorge.no/en/available-jobs/job/194344/phd-research-fellow-at-the-department-of-mathematical-sciences If any potential applicant have questions, they should feel free to ask me. Bob -- Bob O'Hara Institutt for matematiske fag NTNU 7491 Trondheim Norway Mobile: +47 915 54 416 Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] CI for stratified Cox PH model
Is there any reason you can#'t use confint()? e.g. test1 <- list(time=c(4,3,1,1,2,2,3), status=c(1,1,1,0,1,1,0), x=c(0,2,1,1,1,0,0), sex=c(0,0,0,0,1,1,1)) confint(coxph(Surv(time, status) ~ x + strata(sex), test1)) Another trick is to use predict(), with new data at all of hte levels you are interested in: pred1 <- list(x=c(0:2, 0:2), sex=c(0,0,0,1,1,1)) predict(coxph(Surv(time, status) ~ x + strata(sex), test1), newdata=pred1, type="lp" , se.fit=TRUE) (it doesn't provide CIs, but mean +/-1.96s.e. should work OK for the linear predictor scale) Bob On 05/06/18 18:47, Bertolo, Andrea wrote: Hi everyone, I have a doubt about the way to calculate 95% CI for coefficients in the stratified Cox proportional hazard models and your help is welcomed . Say that I have a variable of interest Imi and a stratifying variable UV in a interaction model (since the interaction between Imi and UV is of interest for me and the interaction model has a better fit than the no-interaction model: library(survival) model1 <- coxph(Surv(start,stop, Status.time) ~ Imi + Imi:UV + strata(UV) + cluster(ID), weights = NB_Event, data=Data.unfold) Whereas it is pretty straightforward to calculate the coefficients (and associated HR) for each combination of Imi and UV, I am not sure about how to calculate the associated CI (note that, of course, I got the CI for the estimate of "Imi:UV" from the output of model1). Is it correct to calculate separately a model for each UV level and use the CI for the Imi variable to get the CI for the two levels of UV (see below) ? # 2 models (one per UV level) data.Low <- subset(Data.unfold,UV=="low") model2.1 <- coxph(Surv(start,stop, Status.time) ~ Imi + cluster(ID), weights = NB_Event, data=data.Low) data.High <- subset(Data.unfold,UV=="high") model2.2 <- coxph(Surv(start,stop, Status.time) ~ Imi + cluster(ID), weights = NB_Event, data=data.High) Alternatively, is there a way to get the CI directly from the output of the stratified model ? Many thanks Andrea Bertolo Université du Québec à Trois-Rivières 3351, bd des Forges C.P.500, Trois-Rivières (Québec) Canada G9A 5H7 ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Bob O'Hara Institutt for matematiske fag NTNU 7491 Trondheim Norway Mobile: +49 1515 888 5440 Homepage: http://www.ntnu.edu/employees/bob.ohara Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] correlating three time-series in R
You can pass the columns to ccf() directly: df <- data.frame(x=rnorm(6), y=rnorm(6)) ccf(df$x, df$y) print(ccf(df$x, df$y)) You should probably also check the time series task view: <https://cran.r-project.org/web/views/TimeSeries.html>, in particular the zoo package, to see what can be done with irregular time series. But with 6 data points I'd be surprised if you have the power to detect anything that doesn't jump out when you simply plot the data. Bob On 26/07/17 11:07, Tania Bird wrote: I have three data sets of abundances through time for plants, insects and reptiles. There are 6 samples over a ten year period (all taxa sampled at the same time). I recognise this is a small data set for time series. I would like to correlate the time series to see if a) increases in abundance of one taxon are correlated to another, and b) to see if the correlation between plants:insects is greater than plants:reptiles. I thought to use the cross-correlation function in R e.g. ccf(insects, reptiles) Currently the data is in one dataframe with time as one column and abundance of each taxa is the next three columns. How do I convert the data to a time.series format as given in the R example? How can I compare the two ccf outputs? Thanks Tania Tania Bird MSc *"There is a sufficiency in the world for man's need but not for man's greed" ~ Mahatma Gandhi* [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Bob O'Hara NOTE: this email will die at some point, so please update you records to bob.oh...@ntnu.no Institutt for matematiske fag NTNU 7491 Trondheim Norway Mobile: +49 1515 888 5440 Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] mixed effects in ordination (?)
This doesn't sound like an ordination problem: it's more similar to a standard mixed model regression. There are several ways to approach this, for example Ben Bolker has some (old) notes here: <https://rpubs.com/bbolker/3336> I'd suggest you start by running the analyses on each trait individually, as that way you can make sure you have the model sorted correctly. The step up to the multivariate model just needs some work understanding what the relevant R package wants. HTH Bob On 12/01/17 14:12, Martin Weiser wrote: Dear friends, Could you please help me with analysis? I am afraid that it involves crossed random effects in the mixed-effect constrained ordination setting, so to say. Goal: Show an effect of the species trait (single one) and treatment (four levels, quantitative scale) on parameters. Trait x treatment interaction is possible. If possible, show changes through time. Data: Individuals of 7 species, subjected to 4 treatment levels (fully factorial) - from 6 to 12 individuals in each combination. Each individual scored 4 times (same times: 1st wk, 2nd wk, 3rd wk, 4th wk). Several (10) parameters scored every time on each individual. What I did: In order to avoid multiple testing (the parameters are likely to be correlated to each other), I decided to use multivariate analysis (RDA). I am by far much more accustomed to vegan than ade4, so excuse me if I use some "veganisms". Predictors: time, trait, treatment (possibly with interactions), conditioned on individual identity to avoid treating records from the same plant as independent. Variance partitioning. Here comes the problem: how to set permutations in order to be able to report p_vals (some people just are not happy without them)? Since individuals of the same species share the same trait value, maybe the right way is to: shift observations within individual (if time is among predictors for the particular model) and permute trait value among species. Is it so? Is this treatment of the pseudoreplication at the species level (i.e. only in the significance testing, not in the ordination per se) ok? I also tried to use different approach: I averaged all params individual-wise (getting rid of temporal pseudoreplication, but also time effects), and subsequently I averaged the result within treatment x species levels. I assume that I can go for simple free permutations this way? Pity is that this way, I cannot see development in time. And another way: I averaged params for species x treatment x time groups, ignoring interdependence of records from the same individual, hoping that the effect of an individual "dissolves" in the average. Is that meaningful? If yes, what is the appropriate permutation structure in this case? But maybe I miss something and there are better ways how to deal with this problem? Any suggestions (ok: not any, just those made in an attempt to help :-) ) are appreciated. With the best regards, Martin Weiser ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Bob O'Hara NEW EMAIL: bob.oh...@ntnu.no NOTE NEW ADDRESS!!! Institutt for matematiske fag NTNU 7491 Trondheim Norway Mobile: +49 1515 888 5440 Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] LMER: applying a random term to only one level of a factor
On 06/04/16 20:12, Aislinn Pearson wrote: Hi, I've tried googling this but haven't been very successful. Essentially, I'd like to know what is the most statistically valid way of dealing with a random term which doesn't apply to every level of fixed-effect factor. I have a mixed effect model that looks like this Disease level <- weight + Flown +(1|DateFlown) Either I flew my insects on a flight mill (which can be thought of as a 'treadmill' for flying insects) or I didn't, hence flown is a two level factor (Yes or No) and I want to understand how this affects the amount of disease in my insect. To get as many replicates as I could on a single day, I had two different banks of flight mills (A & B), each bank containing 16 individual insect treadmills. The insects were randomly assigned to one of the two sets of 16 flight mills. Previous studies tell me there are differences between these two sets of flight mills, so I would like to account for them as a random term in my model. As a practical matter, it's not worth setting a level with two levels as random: you don't gain anything in the analysis and the variance component is really poorly estimated. In practice, this might actually make things cleaner, as you will have to look a bit more at the flight mill effects, so you should get a better feel for what's happening. However, I can't run this in LMER. When I tried I got the error; Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels Which I imagine means that one of my factors (i.e. Flown) doesn't include any levels for the random term mill set (i.e. for all unflown insects the value in the mill set column is NA) Is it possible to include this form of experimental design in LMER (the package I know best) or, alternatively, nlme (which I am a lot less accustomed to using)? I can think of two ways of doing this: either set up a factor with three levels (Flight mill A, Flight mill B, Not Flown) or set the Not Flown to one of the flight mill levels. The first way feels less confusing, but you might have to set up some contrasts to estimate the differences. But hopefully your insects will cooperate nicely and make the difference between the flight ills will be much smaller than between flight mills and not flown. HTH Bob I would be really grateful if anyone has any insight. Many thanks Rothamsted Research is a company limited by guarantee, registered in England at Harpenden, Hertfordshire, AL5 2JQ under the registration number 2393175 and a not for profit charity number 802038. [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 7542 1863 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] AIC in R: back-transforming standardized model parameters (slopes)
-1.07202 0.26102 -4.11 z.time:c.air.ice -0.38008 0.13722 -2.77 [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 7542 1863 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] NMDS axes scores
ally for further analysis. I therefore would like to include both of my NMDS site scores as a response into a GLM model simultaneously. Unfortunately, I couldn't find any advice on how to actually do this. I found a couple of papers using NMDS scores in GLMs, but they all seem to use them individually, fitting separate models to each of the ordination axes. I'm a bit at a loss here and any advice is very much appreciated, Conny [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- -- Pokud je tento e-mail součástí obchodního jednání, Přírodovědecká fakulta Univerzity Karlovy v Praze: a) si vyhrazuje právo jednání kdykoliv ukončit a to i bez uvedení důvodu, b) stanovuje, že smlouva musí mít písemnou formu, c) vylučuje přijetí nabídky s dodatkem či odchylkou, d) stanovuje, že smlouva je uzavřena teprve výslovným dosažením shody na všech náležitostech smlouvy. ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 7542 1863 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] best choice of GLMM for seed set data
On 27/08/15 15:33, Mehdi Abedi wrote: > Thanks Bob for reply. > I have problem that when you have results of effect size i am not sure > to how get only main effect without their levels data. I previously > asked this simple question but i couldn't get clear answer. The > question is simple: how we can have only temp, light and their > interactions(similar like output of anova() without their levels. You can't: that's what the effect sizes are! For more on how to interpret them, see any good stats text, or this paper: <http://dx.doi.org/10.5735/086.046.0205> Bob > *Call:* > glm(formula = cbind(A..hierochuntica, A..hierochunticano) ~ temp * > light, family = binomial) > > Deviance Residuals: > Min 1Q Median 3Q Max > -4.4473 -1.1993 0.9904 2.0101 3.6663 > > Coefficients: > Estimate Std. Error z value Pr(>|z|) > (Intercept) 2.666160.14341 18.591 < 2e-16 *** > temp20/30 0.085380.20671 0.413 0.679596 > temp25/35-0.673730.18001 -3.743 0.000182 *** > lightlight0.511890.23048 2.221 0.026350 * > temp20/30:lightlight 0.628390.37291 1.685 0.091965 . > temp25/35:lightlight 2.090800.43729 4.781 1.74e-06 *** > --- > Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > > (Dispersion parameter for binomial family taken to be 1) > > Null deviance: 334.88 on 47 degrees of freedom > Residual deviance: 209.30 on 42 degrees of freedom > AIC: 331.56 > > Number of Fisher Scoring iterations: 6 > > this is by *anova(model,test="Chi")* > Analysis of Deviance Table > > Model: binomial, link: logit > > Response: cbind(A..hierochuntica, A..hierochunticano) > > Terms added sequentially (first to last) > > >Df Deviance Resid. Df Resid. Dev Pr(>Chi) > NULL 47 334.88 > temp2 10.14445 324.73 0.006271 ** > light 1 86.86044 237.87 < 2.2e-16 *** > temp:light 2 28.56942 209.30 6.255e-07 *** > --- > Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > > > > *anova(model, test="F")* > Analysis of Deviance Table > > Model: binomial, link: logit > > Response: cbind(A..hierochuntica, A..hierochunticano) > > Terms added sequentially (first to last) > > >Df Deviance Resid. Df Resid. Dev F Pr(>F) > NULL 47 334.88 > temp2 10.14445 324.73 5.0718 0.006271 ** > light 1 86.86044 237.87 86.8597 < 2.2e-16 *** > temp:light 2 28.56942 209.30 14.2847 6.255e-07 *** > --- > Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > Warning message: > In anova.glm(model, test = "F") : > using F test with a 'binomial' family is inappropriate > > Warm regards, > Mehdi > > > On Thu, Aug 27, 2015 at 2:38 PM, Bob O'Hara <mailto:boh...@senckenberg.de>> wrote: > > On 27/08/15 11:57, Mehdi Abedi wrote: > > Dear Thierry, > Yes i am using (success, failure) but in this case i need to > change all > data frame. I was thinking to use codes which is not necessary > to create > new column when you have a ll of species. Because we know > success(germinated seeds) and we know failure (Total seeds - > success(germinated seeds)). > > Yes i used codes with ANOVA but there is no P- value for study. > > model2<- glmer(cbind(germinated, Nongerminated) ~ temp *light + > (1|Replication ), data=growthdata, > + family=binomial) > > anova(model2) > > Analysis of Variance Table > Df Sum Sq Mean Sq F value > temp2 30.600 15.300 15.300 > light 1 46.231 46.231 46.231 > temp:light 2 22.877 11.439 11.439 > > p-values are difficult. See here: > <http://glmm.wikidot.com/faq> > > Better to stick to reporting your effect sizes: your analysis of > deviance only tells you if you have enough data to see a > difference, not how big the differences are. > > Also, if Replication is 1:nrow(growthdata), you could use a simple > GLM and estimate your over-dispersion term (the residual deviance > divided by the residual sum of squares should be OK). You can use > this to correct the standard errors with summary(glm.obj, > dispersion=overdisp). > > Bob > > > Warm regards, > Mehdi > > > On Thu, Aug 27, 201
Re: [R-sig-eco] best choice of GLMM for seed set data
[[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Mehdi Abedi Department of Range Management Faculty of Natural Resources & Marine Sciences Tarbiat Modares University (TMU) 46417-76489, Noor Mazandaran, IRAN mehdi.ab...@modares.ac.ir Homepage Tel: +98-122-6253101 Fax: +98-122-6253499 -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 7542 1863 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Zero infalted model- T50
On 08/07/15 09:56, Mehdi Abedi wrote: Hi, I want to calculate time to 50 percent reduction of seed viability. Mainly researcher calculate P50 using probit model for calculation T50. In some treatments seed viability decline quit fast therefore i have only value 100 and 0 values . Can i calculate T50 for this data using probit?! Could we apply Zero inflated model for this conditions with 0? All the best, Mehdi Time(weed) Seed viability 0 100 8 0 16 0 24 0 32 0 40 0 You can use a probit, but it'll be unhappy. The problem is that the curve is steep compared to the times, so you don't see it. The only real solution is to change the assays, so you observe more times. But I appreciate this might not be practical. I don't think zero inflation will help. There isn't a unique maximum likelihood estimate for this, but I think a sensible estimate would be mid way between the times where you observe 100 and the first 0. If you want to be Bayesian, this would be the posterior mean. The best confidence interval is probably the central 95% of the interval that the T50 is in (i.e. assuming the likelihood is flat). It's probably a little bit wider than it should be, but I doubt that will be a big problem. I used to get data where the data would flip between 0 and 100 over a range of doses. In the end I gave up with those problems and became an ecologist Bob -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 7542 1863 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] mvabund SIMPER alternative
On 02/06/15 15:38, carvin90 wrote: Hi community, I am new to stats to this level in ecology. I am trying to compare DNA and RNA libraries with thousands of OTUs. I summarized taxa to get the most abundant species, but I can obtain only relative abundances. I was thinking to use SIMPER as I read in several paper to test which species differ the most per station between DNA and RNA based libraries. However I found that SIMPER is a more or less robust test. I was wondering if the manyglm was also an alternative for my question or if you suggest another function. Thank you for your help! Yes, you can use mvabund for OTUs: I've helped a colleague do this (e.g. dx.doi.org/10.1371/journal.pone.0053987). Bob -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 7542 1863 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Association Routine?
On 02/28/2015 06:48 PM, Alexandre F. Souza wrote: Dear friends, I need to write a code to find data using one variable as reference. The code I wrote, however, is not working and I can't figure it out why. Could anyone help me? Imagine a data set with two variables, B and C. Now I have variable A, which is the same variable as variable B but the data are not in the same order nor have necessarily the same extension as B (it may be a sample of B, for example). I want to find the values of variable C that match each line in variable A using B as the association criterion. So the code should perform a loop in which it would take the first line in A, search B until it finds it there, then copy the corresponding value of C and store it in a new variable D. Do it until all lines in A have been associated to a C value. starting with... df<-data.frame(B=sample(letters[1:10],replace=FALSE), C=rnorm(10), stringsAsFactors=FALSE) A=letters[1:10] two thoughts spring to mind: (a) would merge() do what you want? e.g. df2 <- merge(df,data.frame(A=A), by.x="B", by.y="A"), and then extract the values of C with df2$C[df2$B=="f"], for example. (b) sapply(A, function(lt, DF) DF$C[DF$B==lt], DF=df) R's looping is generally more efficient when it's done internally, so it will be easier for you if you understand the R mentality, in particular vectorisation. usually if you have a for() loop, you're not writing R code efficiently. Bob Here is the code I wrote: # Considering that matrices data.ref and data.assoc have been already read, containing the # User-defined number of columns to be associated with A (I imagined that more than one variable could be associated at once) col.assoc = 20 # To assure that data will not be in a non-usable data category ref = as.matrix(data.ref) assoc = as.matrix(data.assoc) # Table where results will be stored # Number of columns = n associated variables plus one column # Reserved to receive the initial data (example column A) result = matrix(nrow = nrow(ref), ncol = col.assoc + 1) # Fulfill the first column of the result table with the original reference variable result[,1] = ref[,1] for (i in 1:nrow(ref)){ for (j in 1:nrow(assoc)) if (ref[i, 1] == assoc[j, 1]){ resultado[i, 2] == assoc[j, 2] } } col = ncol(dados) #### Any thoughts? Thanks in advance, Alexandre -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 7542 1863 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Multivariate quasi-bionomial analysis of proportion data?
On 08/02/15 12:27, Amanda Greer wrote: Hi All, I am trying to best analyse a set of foraging ecology data with >10 behaviour categories (DVs) and 3 levels of IV (season, sex, age). The time which an animal spent engaged in a behaviour was recorded and then divided by the total time spent in sight of the observer, so my data are proportional. As is typical, not all animals engaged in all behaviours and there are a large number of zeros in my dataset which is severely over-dispersed. I had initially analysed all the data using the glm function (family = quasibinomial, followed by anova. The intention was then to use the false discovery rate alpha to account for the large number of analyses. However, it was pointed out to me that a multivariate approach might be better so I have been trying to figure out (a) if it's possible to run a quasi-binomial multivariate analysis of proportion data (b) how to go about it. In the R Documentation quasi-binomial family function page ( http://artax.karlin.mff.cuni.cz/r-help/library/VGAM/html/quasibinomialff.html ) it is stated that if multivariate response = TRUE the response matrix should be binary. This seems a pretty straightforward indictment of my idea to run this analysis on my proportion data, but I am wondering why - is this just not possible, or is there a particular package that could help? If anyone could provide me with an answer or some much needed guidance on this topic I would be very grateful. Ignoring the zeroes problem for the moment, I think (quasi-)binomial distributions are a distraction: binomials are based on counts of things (see Petr Keil's post: http://www.petrkeil.com/?p=603). If you're looking at proportions of times, then it might be better to think in terms of gamma distributions, which lead to a beta distribution for the proportion of times spent doing one thing, and a Dirichlet distribution if you have several items (as you do here). Once you have to worry about the zeroes, you need to do something more, for example see this paper: <http://onlinelibrary.wiley.com/doi/10./2041-210X.12122/abstract> Bob Thanks, Amanda [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 7542 1863 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] SPEI index
On 12/23/2014 09:57 AM, ONKELINX, Thierry wrote: If the file is tab delimited then you need to use read.table(sep = "\t") instead of read.table(sep = ""). read.delim() is another option. Or save the file as a .csv and use read.csv (this makes the reading in easier if you have things like spaces in the file). Also, it's worth checking that you don't have a header row and then some empty rows. It's often worth opening up the .txt or .csv file as a text file (e.g. in Notepad) to see if there are any problems. Oh, and the gdata package has a read.xls() function that is another alternative. Bob ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium + 32 2 525 02 51 + 32 54 43 61 85 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey Van: R-sig-ecology [mailto:r-sig-ecology-boun...@r-project.org] Namens Manuel Esteban Lucas Borja Verzonden: dinsdag 23 december 2014 9:13 Aan: r-sig-ecology@r-project.org Onderwerp: [R-sig-eco] SPEI index Dear All, I am starting to use R software in order to calculate an aridity index (SPEI). I have installed the spei package and the library. Then, I have created a .xls file with year, month, Precip and Temp variables. The xls file has been exported to txt format (separated by tabulations). Then to try, the only output I have obtained can be see below. Could anyone of you please help me? If you want, I can send you the txt file. All the best, Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 1 NA NA NA NA NA NA NA NA NA NA NA NA 2 NA NA NA NA NA NA NA NA NA NA NA NA Script: library(SPEI) SPEI <- read.table("/Users/PALAN.txt", header=TRUE, sep="") str(SPEI) PET <- thornthwaite(SPEI$Tm,40.0) spei(SPEI$PREC-PET,1) --- Manuel Esteban Lucas Borja Universidad de Castilla La Mancha Escuela T�cnica Superior de Ingenieros Agr�nomos y de Montes Departamento de Ciencia y Tecnolog�a Agroforestal y Gen�tica Campus Universitario s/n, C.P. 02071, Albacete (Spain) T�lf.; 967599200 ext. 2818 Mail: manuelesteban.lu...@uclm.es Web:http://www.uclm.es/profesorado/manuelestebanlucas/ [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology Disclaimer Bezoek onze website / Visit our website<https://drupal.inbo.be/nl/disclaimer-mailberichten-van-het-inbo> ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 7542 1863 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] species richness, GLM and negative values
On 10/29/2014 04:27 PM, Ludovico Frate wrote: > Dear all,I'am trying to fit a very simple linear model. I am analyzing the > differences in the number of species (DS) found in several permanent plots in > two year of observations. > Firstly, I have calculated the differences per plot (i.e. number of species > in Plot 1 in Time A - number of species in Plot 1 in Time B and so on for all > the plots).Secondly, those differences were tested for deviation from zero by > means of a linear model > M2<-lm(DC~1, data = gransasso)summary(M2)E2<-residuals(M2)qqnorm(E2, pch = > 19, col = "blue"); qqline(E2, col = "red") > The qqnorm has shown that residuals were not normally distributed, thus I > need to use a GLM. However GLM (poisson family) does not work with negative > values (DS has negative values).I've tried to add a constant value to these > differences (i.e. +100) but the result is misleading since I am testing for > deviation from zero. > Do you have any suggestions? > Regards,Ludovico Use the Poisson to model the number of species in each sample, so use data like this: Plot Time DS 1 A 4 1 B 7 2 A 0 2 B 1 3 A 23 3 B 7 ... Then you fit the model Mgood <- glm(DS ~ Plot + Time, family=poisson()) where Plot and Time are factors (use Plot <- factor(Plot), for example, if you need to). You're interested in the Time effect, which is the average difference between the numbers of species in the plots in the different times. The Plot effect controls for different plots having different numbers of species overall. If you look at summary(Mgood), the Time effect is the log of the ratio of the species richnesses in times A and B. It will be written as TimeB, which means it's log(E.B/E.A) (where E.A and E.B are the expected species richnesses at times A and B). So, for example, an estimate of 0.4 would mean that at Time B there are exp(0.4)=1.49 times more species at time B than time A. Bob > > > Ludovico > Frate > > PhD student (University of Molise - Italy) > Environmetrics Lab > http://www.distat.unimol.it/STAT/environmetrica/organico/collaboratori/ludovico-frate-1 > Department of Biosciences and Territory - DiBT > Universit� del Molise. > Contrada Fonte > Lappone, > 86090 - Pesche (IS) > ITALIA. > Cel: ++39 > 767557 > Fax: ++39 (0874) 404123 > E-mail ludovico.fr...@unimol.it > ludovicofr...@hotmail.it > > [[alternative HTML version deleted]] > > > > ___ > R-sig-ecology mailing list > R-sig-ecology@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 7542 1863 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Question about poisson GLM
On 27/10/14 15:38, rosanna di maggio wrote: Dear all, I'm trying to perform a poisson GLM with lme4 using the number of individuals as dependent variable and i) years of study and ii) different study areas as factors... So far so good.. But the results are quite curious... The estimates and standard errors are huge.. Is is normal? I don't think so... This is an example of what happen: It's difficult to say without seeing that data, but a couple of things spring to mind: 1. If the covariates are meant to be continuous, then did you centre and scale them? 2. If they are meant to be factors, did you tell R this? e.g. anno <- factor(anno). Bob Call: glm(formula = NTOT ~ areaN * anno, family = poisson) Deviance Residuals: Min 1Q Median 3Q Max -3.7465 -1.0629 -0.5000 0.3132 6.8188 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 140.786646 27.766187 5.070 3.97e-07 *** areaN71.479294 11.876992 6.018 1.76e-09 *** anno -0.069199 0.013823 -5.006 5.56e-07 *** areaN:anno -0.035572 0.005912 -6.017 1.78e-09 *** I cannot understand the reason of these results... Someone can help me? Thank you Rox -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 7542 1863 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] EstimateR and Chao 1 standard deviation
On 10/17/2014 05:36 PM, Gavin Simpson wrote: > On 17 October 2014 06:44, Bob O'Hara <mailto:boh...@senckenberg.de>> wrote: > > On 17/10/14 13:54, "José M. Blanco Moreno" wrote: > > Dear users > > I have been checking the values returned by the function estimateR > against the formulae in the appendix to nonparametric > estimators of > species richness in EstimateS > > (http://viceroy.eeb.uconn.edu/EstimateS/EstimateSPages/EstSUsersGuide/EstimateSUsersGuide.htm#AppendixB). > > <http://viceroy.eeb.uconn.edu/EstimateS/EstimateSPages/EstSUsersGuide/EstimateSUsersGuide.htm#AppendixB%29.>.. > > and they do not match each other. > > Where does this line of code in vegan:::estimateR.default come > from? > > sd.Chao1 <- sqrt(a[2] * ((G^4)/4 + G^3 + (G^2)/2)) > > It is from any other reference that I should have under control? > > Unless Jari added something, the functions come from my paper from > about 10 years ago: > O’Hara RB (2005) Species richness estimators: how many species can > dance on the head of a pin? J Anim Ecol 74: 375–386. > http://doi.wiley.com/10./j.1365-2656.2005.00940.x, which > refers to Chao (1987), which is cited in the estimateR > documentation. EstimateS now uses a small sample correction: > > <http://viceroy.eeb.uconn.edu/EstimateS/EstimateSPages/EstSUsersGuide/EstimateSUsersGuide.htm#Chao1AndChao2> > Does this account for the discrepancy? > > Bob > > > Yep, vegan does not include the small sample correction. The Chao1 > computation is here: > > https://github.com/vegandevs/vegan/blob/master/R/estimateR.default.R#L44 > > but we can certainly add it if this is useful? (And I suppose it is if > EstimateS has included it by default now...) > Might as well. it looks like it shouldn't make a huge difference unless sample sizes are small. Bob > G > > -- > > Bob O'Hara > > Biodiversity and Climate Research Centre > Senckenberganlage 25 > D-60325 Frankfurt am Main, > Germany > > Tel: +49 69 7542 1863 > Mobile: +49 1515 888 5440 > WWW: http://www.bik-f.de/root/index.php?page_id=219 > Blog: http://blogs.nature.com/boboh > Journal of Negative Results - EEB: www.jnr-eeb.org > <http://www.jnr-eeb.org> > > > _______ > R-sig-ecology mailing list > R-sig-ecology@r-project.org <mailto:R-sig-ecology@r-project.org> > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology > > > > > -- > Gavin Simpson, PhD -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 7542 1863 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] EstimateR and Chao 1 standard deviation
On 17/10/14 13:54, "José M. Blanco Moreno" wrote: Dear users I have been checking the values returned by the function estimateR against the formulae in the appendix to nonparametric estimators of species richness in EstimateS (http://viceroy.eeb.uconn.edu/EstimateS/EstimateSPages/EstSUsersGuide/EstimateSUsersGuide.htm#AppendixB)... and they do not match each other. Where does this line of code in vegan:::estimateR.default come from? sd.Chao1 <- sqrt(a[2] * ((G^4)/4 + G^3 + (G^2)/2)) It is from any other reference that I should have under control? Unless Jari added something, the functions come from my paper from about 10 years ago: O’Hara RB (2005) Species richness estimators: how many species can dance on the head of a pin? J Anim Ecol 74: 375–386. http://doi.wiley.com/10./j.1365-2656.2005.00940.x, which refers to Chao (1987), which is cited in the estimateR documentation. EstimateS now uses a small sample correction: <http://viceroy.eeb.uconn.edu/EstimateS/EstimateSPages/EstSUsersGuide/EstimateSUsersGuide.htm#Chao1AndChao2> Does this account for the discrepancy? Bob -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 7542 1863 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Bayesian analysis of canopy cover/percent data
95, 0.7, 0.5, 0.7, 0.2, 0.95, 1, 1, 0.25, 0.85, 0.5, 0.8, 0.75, 0.85, 0.7, 0.95, 0.05, 0.65, 0.65, 1, 1, 1, 0.65, 0.4, 0.6, 0.9, 0.85, 0.75, 0.5, 0.65, 1, 0.65, 0.55, 0.75, 0.4, 0.9, 0.35, 1, 1, 0.4, 0.5, 0.8, 0.95, 0.95, 0.55, 0.7, 0.85, 0.8, 0.8, 0.65, 1, 0.6, 0.5, 1, 0.8, 1, 0.45, 1, 1, 0.8, 0.85, 1, 1, 1, 1, 0.5, 0.6, 0.15, 0.75, 0.6, 0.1, 0.05, 0, 1, 0.6, 0.1, 0.35, 0.9, 0.9, 0.95, 0.95, 0.9, 0.55, 0.65, 0.9, 0.4, 1, 0.65, 0.5, 0.8), .Dim = c(227L, 2L), .Dimnames = list( NULL, c("site2", "cover20p2"))) -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 7542 1863 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Binomial GLMM or GAMM with random intercept and temporal correlation
On 08/26/2014 07:34 PM, SamiC wrote: Dear ecology mailing list, I am trying to model some binomial data (0/1) as a function of sex (0/1) and DistanceToFeature (continuous km’s) with an interaction between the two. My data is nested and I therefore want to include a random intercept for InidividualID and within that I want to include an AR1 correlation structure as the data is serially/temporally auto-correlated. I understand any correlation structure should be nested within the random effect. So far I have tried running the model using glmmPQL as glmmPQL(Y ~ DistanceToFeature * Sex + (1|InidividualID), correlation=corAR1(form=~1|IndividualID/ContinuousBout), family=’binomial’, data=’birds’) (note – ContinuousBout is an ID for where there are time gaps in the data). However, although this runs, am I right in understanding that I should not use PQL estimation with binomial data as it gives biased results? Does anyone know of a way I can model this? Bernoulli responses are tricky things, so I'd be a bit worried that you might be trying to fit too complex a model. But anyway If you don't mind going Bayesian, you could try INLA (R-INLA.org). Alternatively, could you simply include the previous time step's value as a covariate? I understand that this is also the case if I wish to use GAMM (as later I will be modelling a non-linear explanatory as well)? Yes - a GAM is just a GLMM with a complicated covariate structure. Additionally I will also be running a similar set up but where the data are not equally spaced in time (and therefore an AR1 structure would not apply). Can anyone give a recommendation of a modelling framework for this also. That should come from the subject matter - I can think of a couple of approaches, but they make different assumptions about how your system is behaving. Bob -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 7542 1863 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Question on unexpected regression results
.59 Myokea 33.757.861.353.80 57.77 Myonig 34.551.655.752.77 56.52 Myooxy 40.545.747.644.98 47.09 Myorip 36 53.357.550.82 54.16 Myosim 38 NA NA 48.23 51.02 Perhaps simple linear regression is not the method to use? Thanks for any additional suggestions. Bruce [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 7542 1863 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] deviance as a goodness of fit in GLM
On 21/07/14 19:52, Samantha PameLa wrote: Good day everybody, I'm a marine biologist student, working on my bachelor thesis and I'm stucked with a statistical doubt in the process, I hope someone here could help me. My thesis aims to understand which biological and environmental factors influences the male aggressive rate of male California sea lions. For that purpose I'm using GLM’s where the response variable is the male aggressive rate. Right now I am testing the goodness of fit of the global models and for that I'm using the deviances as a goodness of fit test. I calculated pseudoR2 (Zuur, 2009) in order to know the percentage of explanation of each candidate model. However I’m not sure how to choose the “good models”; since I am not sure over which percent of explanation indicates a “model with good fit”. For my data I am working with three different scenarios, and it seems that 20%, is a good value to could indicate the best models, but I’m not sure how to choose the value. I thank you in advance for your time and the help you can give me. Best regards, Just to follow up, AIC isn't a measure of model fit either: it's a measure of model adequacy, and can be used to compare different models. For model fit, you can (usually) compare the residual deviance to the degrees of freedom (they should be approximately equal, and you can use a chi-squared test, if you feel the need to generate a p-value). This doesn't work if your response is binomial with a small N (and psuedo R2 doesn't work terribly well for this case either). A better approach to checking if the model fits is to check to see if the residuals have any gross patterns in them, by plotting them against the fitted values and also against covariates. If you'll excuse my self-promotion, PNAS recently gave me an excuse to show some residual plots to the general public: <http://www.theguardian.com/science/grrlscientist/2014/jun/04/hurricane-gender-name-bias-sexism-statistics> (my explanation for why this is not just gratuitous self-promotion is that I linked to the R code to generate the plots too, so you have something to work from). Bob -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 7542 1863 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] probability distribution for zero-inflated, right skewed data
On 16/06/14 13:57, Johannes Björk wrote: Dear all, Im looking into how to fit a GLM model (Im using rjags) with data that are heavily right skewed. In addition, some variables also zero-inflated. The data are species area distribution measured as "total area (km^2)" which is subsetted into "area in tropical zone" and "area in temperate zone". The last two variables contain zeros. I have google zero-inflated models... and most that come up is "zero-inflated negative binomial" and zero-inflated negative poisson" for count data. I reckon I cannot use any of these distributions since my variables are not discrete. Any pointer to which distribution(s) that might fit this kind of data would be much appreciated. I think a Tweedie distribution is sometimes used, but that always makes me think of escaping chickens. Recently this was published, which might also be useful: <http://onlinelibrary.wiley.com/doi/10./2041-210X.12122/abstract>. They used BUGS, so you could ask them if the code is available. Even if it isn't, it shouldn't be too difficult to code up. Bob -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 7542 1863 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Question about GLM post hoc and chi square (Angel)
On 16/06/14 08:49, Åström, Jens wrote: Hi, Adding to what's already been said: Just in case you're not aware of this, I think you have a typo in your model. Are you looking for glm(Count~Sex+Time+Behaviour+Sex*Time+Sex*Behaviour+Time*Behaviour,family=poisson)? By the way, I think this could also be written glm(Count~Sex+Time+Behaviour+Sex:Time+Sex:Behaviour+Time:Behaviour,family=poisson) or simply glm(Count~+Sex*Time+Sex*Behaviour+Time*Behaviour,family=poisson). The "*" means both main effects and all possible interactions. FWIW, glm(Count~(Sex +Time+Behaviour)^2,family=poisson) will also work: it expands to the main effects and first order interactions. (oh, and the advice I've seen is don't use Wald tests: they are conservative because they assume the other parameters are fixed at their MLEs, so ignore any uncertainty in them. Use likelihood ratio tests instead) Bob Also, you should probably look into the issue of overdispersion. Overdispersion is very common in ecological count data and basically means that you have more variation in your data than the Poisson distribution assumes. This typically leads to anti-consevative p-values, i.e. too small p-values, and needs to be accounted for. Read more about it and potential solutions here: http://glmm.wikidot.com/faq Good luck, Jens -- Message: 1 Date: Sat, 14 Jun 2014 18:55:44 -0700 (PDT) From: Angel To: r-sig-ecology@r-project.org Subject: Re: [R-sig-eco] Question about GLM post hoc and chi square Message-ID: Content-Type: text/plain You are able to obtain Chi squared values by using a wald chi square post hoc test. To do this you can use the aod package, function wald.test. This function is perfect for generalised linear models using poisson distribution. Also, as long as you have got interaction terms in your results from GLM, you could get the wald chi square (and an associated p-value) for these terms, hence giving you the table which you are after. If this response is not clear enough, I can post some example (I am not sure of the etiquette) AA. Date: Fri, 13 Jun 2014 18:07:27 -0700 From: ml-node+s471788n757894...@n2.nabble.com To: alexander.angelo...@hotmail.com Subject: Question about GLM post hoc and chi square Dear all, I am making an analysis using a GLM using three explanatory variables and a response variable. I need to obtain a table similar to this one, http://postimg.org/image/5sau79wlt/r nevertheless, I have not been able to do it. I am having a hard time specially getting the chi square values. I would like to know how to obatin them. I also would like to know what function could help me to make ad hoc comparisons for single variables and interactions. If any of you knows how to do both estimations, I would really appreciate it. All the best!!! This is my script a=read.table("ricis3.txt",header=T) attach(a) model7=glm(Count~Sex+Time+Behaviour+Sex*Time+Sex*Behaviour+Time+Behaviour*Sex,family=poisson) summary(model7) ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 7542 1863 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Question about GLM post hoc and chi square
On 06/14/2014 03:05 AM, Luis Fernando García wrote: > Dear all, > > I am making an analysis using a GLM using three explanatory variables and a > response variable. I need to obtain a table similar to this one, > http://postimg.org/image/5sau79wlt/r > > nevertheless, I have not been able to do it. I am having a hard time > specially getting the chi square values. I would like to know how to obatin > them. Use anova(). The deviance follows a chi-squared distribution (usually - if you have overdispersion it gets a bit more complicated). > I also would like to know what function could help me to make ad hoc > comparisons for single variables and interactions. These comparisons are called contrasts. There is a contrasts() function in R, and also a contrast package (which, I'm guessing will be of more use). Googling "R contrast" might help too - there seems to be plenty of material, so hopefully one or two results will be exactly what you want. Contrasts can get esoteric, so if you can find some a page with code that gives you the comparisons you want, that should help a lot. Good luck! Bob > If any of you knows how to do both estimations, I would really appreciate > it. > > All the best!!! > > This is my script > a=read.table("ricis3.txt",header=T) > attach(a) > model7=glm(Count~Sex+Time+Behaviour+Sex*Time+Sex*Behaviour+Time+Behaviour*Sex,family=poisson) > summary(model7) > > > ___ > R-sig-ecology mailing list > R-sig-ecology@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 7542 1863 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] mantel within mantel? multidimensional variance
Do you need to run Mantel tests at all? I don't see that you have measured any distances, as such, so I think you can do at leat as well by using standard [genrealized] linear [mixed] models. I'm not sure I understand everything, but it sounds like you run an experiment with 2 treatments, and N species, and you measure T traits. So, for each trait you could have a model y_t ~ (Treat1 + Treat2)*Species (you could run this separately for each species, if that makes things easier) and size of the Treat1:Species effect tells you how the species responds to the treatment (if you want a measure over all traits, a PCA on the coefficients should suffice). It sounds like you are then asking how these reactions, i.e. the Treat1:Species effects are related to traits which are constant within a species, i.e. a Treat1:Trait effect. So, for each y_t you could run a model y_t ~ (Treat1 + Treat2)*(Trait+Species) (or even make Species a random effect) So, now I think you just want the Treat1:Trait effects. if the Trait's are discrete classes, then you can (again) run a PCA on the effects, to visualise the effects. If you want to analyse all of the y_t's together, it gets a bit more complicated, but in principle it's the same, except you have a MANOVA-like structure. I think you could use a Seemingly Unrelated Regression approach. Bob On 06/08/2014 05:56 PM, Mgr. Martin Weiser wrote: Dear friends, I need to quantify variance within variance, and what is worse, say if it is non-random. This is the setup: in the experiment, there were 2 (partly correlated) treatments, each of six levels (but theoretically of infinite levels, so I treated them as continuous). Different species responded to them, and we measured various traits ("endotraits"). We used R2 from the per-species redundancy analyses (RDA) as a reaction norm: higher R2, more the species responds to the treatment (so traits were used as "species" in the community ecology jargon). We have twice as much RDAs as there were species (because of 2 treatments). Next, we correlated these R2 (reaction norms) with some other traits per species ("exotraits"). As there were 2 correlated treatments (lets say irrigation and fertilisation), and we sed the same reaction norms for correlation with different exotraits, the correlations were obviously non independent. To overcome this, we run Mantel test (matrix1= species x reaction norms to treatment, matrix2=species x exotraits). Next, we are interested in "endotraits" x "exotraits" correlation. For this, we used endotrait scores from the per-species RDAs on the treatment axis (constraining variable). Correlations are non-independent again, so we wanted an overall test. And here comes the problem: results of Mantel tests (matrix1=species x endotrait scores, matrix2=species x exotraits) are suprisingly weak. Some single correlations of endotrait scores x exotraits seem to be pretty afar from random, but the overall mantel... I think this is because single endotrait can not show better correlation than the overall reaction norm, which is based on them, so something like (pval of this mantel)*(1-pval of the correlation of reaction norm with exotraits) may be desirable for the overall test, but I simply do not know. Any advice? Best, Martin Weiser -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 7542 1863 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Help with a function
On 06/06/14 15:39, Rodrigues wrote: > Dear R users, > > > Iâm trying to build a function to select random samples idfâs from a > database. > So, my data frame had 2 columns and 575 rows. Follow bellow an example of my > database > Idf1 casod > 121 > 141 > 151 > 161 > 173 > 183 > 193 > 213 > 251 > 241 > 261 > 281 > 293 > 323 > 333 > 351 > 363 > 371 > 483 > > So my function is > > blinding=function(sample){ >sort=sample(idf1,10,replace=F) >return(sort2) > } Does this work? It looks like you're passing a function to the function! Also, you're not returning something that has been calculated. In practice, you might as well just use sample(idf1,10,replace=F). > It is pretty simple and I would like to add one more step in my choice. I > would like to link my choice to casod stats. Thus if casod==3 sample would > be random idfs could not be an idf with casod=1. Does someone can help me? If I understand you, you only want to sample within casod==3. So this will work: sample(idf1[casod==3],10,replace=F) Or, if you need a wrapper function, this should work: blinding=function(which.casod, df, N=10) sample(df$idf1[df$casod==which.casod], N, replace=F) e.g. SampledData=sapply(unique(Data$casod), blinding, df=Data) will sample for each value of casod Bob -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 7542 1863 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] inference from GLMM fit using glmer
ect of the bfi. These > results seem to agree with the summary of the more general model, but they > also seem a bit extreme given our small sample size. > > Model comparisons using PBmodcomp from the pbkrtest package also seem to > support the inclusion of both fixed effects. >> mc.oc.bfi.nobfi = PBmodcomp(oc.sr.bfi, oc.sr.null, nsim = 100) >> mc.oc.yr.noyr = PBmodcomp(oc.sr.bfi.yr, oc.sr.bfi, nsim = 100) >> mc.oc.bfi.nobfi > Parametric bootstrap test; time: 26.13 sec; samples: 100 extremes: 0; > large : oc ~ bfi + (1 | dist) > small : oc ~ 1 + (1 | dist) > stat df p.value > LRT513.43 1 < 2.2e-16 *** > PBtest 513.43 0.009901 ** > --- > >> mc.oc.yr.noyr > Parametric bootstrap test; time: 39.83 sec; samples: 100 extremes: 0; > large : oc ~ bfi + lyr + (1 | dist) > small : oc ~ bfi + (1 | dist) > stat df p.value > LRT98.323 1 < 2.2e-16 *** > PBtest 98.323 0.009901 ** > --- > > We calculated bootstrap confidence intervals using "confint". They seem to > indicate that the estimated effects were unambiguous, but we were unsure > whether our sample size was adequate to apply this method. > >> confint (oc.sr.bfi.yr, method = c("boot")) > Computing bootstrap confidence intervals ... >2.5 % 97.5 % > sd_(Intercept)|dist 0.13871500 0.83969172 > (Intercept) 6.54440740 7.56234905 > bfi -0.53389761 -0.44979598 > lyr 0.02835891 0.04325712 > > Simple diagnostic plots (residuals vs fitted values and a normal QQ plot of > residuals) of the GLMM with effects of both bfi and lyr did not reveal > violations of assumptions. We would be happy to report significant effects > of food availability and a significant trend across years, but are concerned > that we may not be interpreting these results correctly. Any help or > suggestions would be greatly appreciated. > > Sincerely, > Eric > > > [[alternative HTML version deleted]] > > ___ > R-sig-ecology mailing list > R-sig-ecology@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 7542 1863 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Error distribution for fractional response
On 30/01/14 10:58, Adhara Pardo wrote: Dear R users, I would like to fit a GLM to some plant regeneration data (see bottom of this e-mail). The dependent variable, an index of regeneration, was obtained by diviving the number of saplings by the number of adults plants present in each plot. The result is a highly skewed variable and thus, specifying, for instance, a Gaussian distribution does not seem to be appropriate. Data transformation does not help either. Do you have any suggestion on the best distribution to choose? Rather than use an index, it might be better to use the number of saplings directly, and assume they are Poisson distributed (or some form of over-dispersed Poisson). You can use the log of the number of adults as an offset: glm(saplings ~ something + offset(log(adults)) The model is saplings ~ Poisson(lambda) log(lambda) = alpha + beta*something + log(adults) where alpha and beta are the parameters being estimated (lambda is the expected number of saplings). This model is the same as lambda = adult*exp(alpha + beta*something) so it's equivalent to modelling saplings/adults: the adults have just been moved to the other side of the equation. Bob -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 7542 1863 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Bayesian analysis with MCMC
On 01/18/2014 04:39 AM, 明 wrote: Dear all I have a model with four paramters, I want to estimate the parameter uncertainty, so Bayesian analysis with MCMC method is applied. But every sigle mcmc chain seems give quite different parameter marginal distributions. In order to get the true parameter marginal distributions, I do like this: (1) I take 100 MCMC chain, and each chain has 1 iterations. (2) caculate the different parameter marginal distributions according to the frequence of paramter in step 1 sampling. The result seems reasonable. but is it right? Looking forward for your reply, Thanks in advance Han Ming [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology Have you checked for convergence? If the chains are giving different marginal distributions, this suggests they haven't converged, so they are not sampling from the correct posterior distribution (or, at least, some chains aren't). Bob -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 7542 1863 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] This chains contains uninitialized variables and the trap report says " undefined real result"
On 28/11/13 15:09, kassahun Alemu wrote: > Dear all, > > > > When I do analysis, I have troubled with frequent pop up message: > > This chains contains uninitialized variables and the trap report says " > undefined real result" > > > > I am looking for your courage and help to continue my further analysis. My guess is that some of your mu's are getting too large, so exp(mu[]) becomes infinite. In your inits you're giving the precisions values of dgamma(1,5,10)=0.1891664 (I think you meant to use rgamma()), so the variance is about 5. With alpha and all of the betas being positive, I can easily see exp(mu) exploding. I'd recommend you use BUGS to do the debugging, because it's easier to look at the problems (e.g. here I'd use save state to see what parameter estimates you have: I'm guessing some mu's will be large). In models like this I usually set the inits for my beta's to be close to zero, so they don't explode the mu's unexpectedly. You can always relax the inits later, once it's working. Bob > > > model{ > > for (j in 1:2160) { > > y[j] ~ dpois(mu[j]) > > log(mu[j]) <- log(e[districts[j]]) + alpha + beta1*x1[j] + beta2*x2[j] + > beta3*x3[j] + beta4* x4[j] + > > beta5*x5[j] + beta6*x6[j] + beta7*x7[j] + beta8*x8[j] + > phi[districts[j]]+ delta[time[j]] + omega[time[j]] > > rho[j]<-exp(alpha + beta1*x1[j] + beta2*x2[j] + beta3*x3[j] + beta4* x4[j] + > > > beta5*x5[j] + beta6*x6[j] + beta7*x7[j] + beta8*x8[j] + > phi[districts[j]] + delta[time[j]] + omega[time[j]]) > # RR > > } > > > > for (i in 1:18) { > > e[i] ~ dnorm(0,sigma.e) > > phi[i] ~ dnorm(0,tau.phi) > > } > > > > delta[1]<-0 > > omega[1] ~ dnorm(0,tau.omega1) > # Adjusted RR in month 1 over all districts/years > > for (t in 2:120){ > > delta[t] ~ dnorm(0,tau.delta) > > omega[t] ~ dnorm(omega[t-1],tau.omega) > > } > > > > #PRIORS > > tau.e ~ dgamma(0.001,0.001) > > alpha ~ dflat() > > beta1 ~ dgamma(1,1) > > beta2 ~ dgamma(1,1) > > beta3 ~ dgamma(1,1) > > beta4 ~ dgamma(1,1) > > beta5 ~ dgamma(1,1) > > beta6 ~ dgamma(1,1) > > beta7 ~ dgamma(1,1) > > beta8 ~ dgamma(1,1) > > tau.phi ~ dgamma(1,1) > > tau.delta ~ dgamma(1,1) > > tau.omega1 ~ dgamma(1,1) > > tau.omega ~ dgamma(0.001, 0.001) > > sigma.e <- 1/sqrt(tau.e) > > sigma.phi <- sqrt(1/tau.phi) > > sigma.omega1 <- sqrt(1/tau.omega1) > > sigma.delta <- sqrt(1/tau.delta) > > sigma.omega <- sqrt(1/tau.omega) > > } > > > > #R script to call R2WinBUGS and plot density > > > > # Define the dataset > > > > library("R2WinBUGS") > > library(help="BRugs") > > library(help="R2WinBUGS") > > data1<-read.csv(file.choose("Data1.csv"), header=TRUE, sep=(",")) > > > > names(data1) > > districslist<-as.data.frame(unique(data1$districts)) > > data2<-as.data.frame(cbind((1:18),districslist)) > > names(data2)<-c("D_ID","districts") > > data3<-merge(data1, data2,by="districts") > > data3<-as.data.frame(data3) > > names(data3) > > > > data6<-NULL > > for (i in 1:18){ > > data4<-data3[data3$D_ID==i,] > > data5<-cbind(data4,seq(1,length(data4[,1]),by=1)) > > data6<-rbind(data6,data5) > > } > > districts<-data6[,13] > > time<- data6[,14] > > y<- data1$y > > x1<- data1$x1 > > x2<- data1$x2 > > x3<- data1$x3 > > x4<- data1$x4 > > x5<- data1$x5 > > x6<- data1$x6 > > x7<- data1$x7 > > x8<- data1$x8 > > data <- list(districts=districts,time=time, > y=y,x1=x1,x2=x2,x3=x3,x4=x4,x5=x5,x6=x6,x7=x7,x8=x8) > > > > # Define inits > > > > inits <- function(){ > > list(tau.e=dgamma(1,5, 10), alpha = dnorm(1,1, 1), beta1 = dgamma(1,5,10), > beta2 = dgamma(1,5,10), > > beta3 = dgamma(1,5,10), beta4 = dgamma(1,5,10), beta5 = dgamma(1,5,10), > > beta6 = dgamma(1,5,10), beta7 = dgamma(1,5,10), beta8 = dgamma(1,5,10), > > tau.phi=dgamma(1,5,10), tau.delta=dgamma(1,5,10), > > tau.omega=dgamma(1,5,10), tau.omega1=dgamma(1,5,10)) > > } > > > > # run the MCMC analysis > > > > model_G<- bugs(data, inits, model.file = "C:/model_G.txt", > > parameters = c("tau.e", "alpha", "beta1", "beta2", "beta3", "beta4", > "beta5", "beta6
Re: [R-sig-eco] Two-sided significant test for null models
On 07/23/2013 12:06 PM, Cayetano Gutiérrez Cánovas wrote: Dear all, I'd like to test the two-side significance of an observed value being lower or higher in comparison to a null distribution, but I don't know how to build this in R code. Can anybody help me with some code or any R function? Let's say 'null.dist' the null distribution, 'obs' the observed value and 'alpha' the probability of Type I error. set.seed(10) rnorm(999)->null.dist obs<-1.5 alpha<-0.05 pval=mean(obs>null.dist) min(pval, 1-pval)<0.05 Bob -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 7542 1863 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] interpretation of interaction between explanatory variables
It's difficult to give definitive advice about what it means, but the simplest approach is to look at what it means: plot the model and see what it looks like. Basically, the interaction says that as the variables both increase, the relationship gets less negative. The details depend on the ranges of the variables, hence plotting the results is helpful. I would write some code, but that would probably mainly show that I'm out of date with R. But you could use expand.grid() to create new data to predict, use predict() to predict it, and then plot it as several lines on one plot (e.g. plot against C.abundances, with one line for each value of C.diversity). Bob On 05/08/2013 02:56 PM, "Iris Kröger" wrote: > Dear list members, > > I want to analyse the impact of a competitor community (i.e. "community > abundances" on the one hand and "community species diversity" on the other > hand) on mosquito larval populations of species A and B. Each variable on its > own has a negative impact on mosquitoes - but when both variables are > interacting, there is a positive impact... How can I interpret that? For > mosquito A only C.diversity has a significant impact - but the interaction > between C.abundances and C.diversity is significant? What does that mean? > > I used the model: > lm (mosquito ~ C.abundances * C.diversity) > output Mosquito A: > Estimate Std. Error t value Pr(>|t|) > (Intercept) -0.2120 0.1159 -1.829 0.074855 . > C.abundances -0.1277 0.1616 -0.790 0.434067 > C.diversity -0.5787 0.1385 -4.178 0.000155 *** > C.abundances:C.diversity 0.4096 0.1712 2.393 0.021476 * > > Output Mosquito B: > Estimate Std. Error t value Pr(>|t|) > (Intercept) -0.2900 0.1220 -2.377 0.02233 * > C.abundances -0.3856 0.1701 -2.266 0.02891 * > C.diversity -0.3470 0.1458 -2.381 0.02213 * > C.abundances:C.diversity 0.5367 0.1801 2.980 0.00489 ** > > Thanks a lot for your help! > Iris > > [[alternative HTML version deleted]] > > ___________ > R-sig-ecology mailing list > R-sig-ecology@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 798 40226 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] GLM
On 03/07/2013 04:24 PM, Mahnaz Rabbaniha wrote: Dear all I want to find regression between fish larva abundance and some abiotic factor ,i used this code: glm(formula = mychto ~ po4 + No3 + Si + Tn) result: Deviance Residuals: Min 1Q Median 3Q Max -26.586 -18.262 -12.296 -2.949 226.229 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 67.421173.9781 0.9110.371 po4 -0.2887 1.6037 -0.1800.859 No3 0.9151 4.5261 0.2020.841 Si -0.1145 0.4850 -0.2360.815 Tn -1.1568 4.4818 -0.2580.798 (Dispersion parameter for gaussian family taken to be 2444.917) Null deviance: 63156 on 29 degrees of freedom Residual deviance: 61123 on 25 degrees of freedom AIC: 325.72 my question is about the acceptable this AIC, or this result with goodness of fit? thanks AIC tells you nothing about goodness of fit, it is used to compare different models. A better way to assess goodness of fit is to look at the data and the model. It's clear that you have very skewed data (look at the distances between the quartiles of the residuals), so a Gaussian model, which assumes the residuals are symmetric, is inappropriate. It might make more sense to use a Poisson distribution, although you will probably need to account for over-dispersion (there are a few ways of doing this). Also, once you've fitted a glm, you can use plot() to get some useful model-checking plots (e.g. residuals). There is more advice in the links here: <http://r.789695.n4.nabble.com/Checking-the-assumptions-for-a-proper-GLM-model-tp1559502p1560503.html>. Bob -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 7542 1863 / +49 69 798 40226 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] patterns in weather data that could relate to pathogen prevalence
On 11/30/2012 01:09 PM, Anto Raja wrote: Hi all I am searching for a tool that would help me to identify weather patterns that influence the prevalence of a pathogen, 'Pn'. Say, I have annual prevalence data (collected in April) and I know that the prevalence of 'Pn' is affected by the weather conditions since November. I also have daily data for different weeather variables. You have a lot of weather data, but I assume you don't have so much prevalence data. So it's going to be difficult, whatever you do... The objective is to identify the relationship between the weather from Nov-Mar and the prevalence of Pn. We know that weather has an influence on Pn. The question is to find out weather from which period is relevant or what kind of weather is relevant. It could be that the first two winter months (Nov-Dec) is the decisive factor or that a certain weather situation (like 20 consecutive days of below zero conditions) occuring at any time is important or a combination of both. I have tried correlations between prevalence and monthly means for Mar, Feb-Mar, Jan-Mar and so on and nothing definite turned up. I could also do it on a weekly basis manually. But I wonder if there is a tool that uses a moving window of different sizes (say, from a min size of 1 week to a max of 4 months) and checks correlations for each of these periods. I am thinking of ARMA, but my present intention is not to forecast but only to study. Can it still be used? Or ARMA in combination with multivariate analysis to study the relative importance of each weather variable. I don't see why an ARMA model would help you, as that assumes a covariance between times (i.e. autocorrelation) in the response (i.e. prevalence). There are methods for assuming that the response has an autocorrelation, but I don't think that's your big problem. My reaction (without seeing the data, of course) is that you might be asking too much of your data to get anything meaningful out of it. Any suugestions are welcome. I have used R for basic stats analysis but never worked with time-series data or the advanced tools of data mining. So, it could also be possible I am not thinking along the right lines. Feel free to correct if I am looking in the wrong place. It sounds like you're trying to mine your data for any pattern. To be honest, if you do that, I wouldn't trust the results unless you can validate them independently: you'll find some relationship if you try enough models, but will it make biological sense? This is particularly problematic when you have correlated variables, which you will do (especially when you start sliding windows around) I'd suggest you start by using what's known of the pathogen or its host, or of similar host-pathogen systems, to develop a smaller number of hypotheses about what sort of effects are likely. Plant ecologists often use GDD5 (Growing Degree Days above 5°C), which might be a useful way of reducing the temperature data to something smaller. Of course, another temperature than 5°C might work better for you. Bob -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 7542 1863 / +49 69 798 40226 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Minta's (1992) tests of temporal interactions
On 09/19/2012 09:03 PM, Howe, Eric (MNR) wrote: Hello, Is anyone aware of an R package for performing the tests described by Minta (1992) Tests of spatial and temporal interaction among animals. Ecological Applications 2:178 - 188? After a quick look at the paper, it's a fairly standard contingency table analysis. I would suggest using a GLM. I think the expected areas can be fed into the model as offsets, but I haven't read the paper closely enough to be sure. Bob -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 798 40216 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Spatial AND temporal correlation in linear mixed-effects models
On 07/26/2012 10:45 AM, Tscheulin Thomas wrote: Hi, I have a question regarding the nesting structure in linear mixed models of data, which is spatially and at the same time temporally correlated. So far I have tried to get around the problem by averaging out the temporal component but I would really like to keep everything in the model. Here is the experimental set-up: We have measured insect abundance in two different islands, each having 5 sites, each site having 4 plots, in which each we have 5 collection points (traps). The sampling was repeated 5 times. Two times in 2011 and 3 times in 2012. After averaging the temporal correlation out I composed the following model for the abundance: model<-lme(abundance~continous_explanatory_variable,random=~1|island/site/plot/trap,method="REML") This works fine but I have problems making a model that allows the temporal component to stay in. How can I do this using the function lme? I know with lmer I could do this: model2<-lmer(abundance~continous_variable+(1|island/site/distance/triplet)+(1|year/round)) but I really want to use the function lme. How can I insert multiple levels of grouping in lme? Any help is very much appreciated! Best, A bit of googling brought up this, from 10 years ago: <http://tolstoy.newcastle.edu.au/R/help/02b/2068.html> I don't think lme has changed that much. Bob -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 798 40216 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Using residuals as dependent variables
On 06/21/2012 07:06 PM, Chris Mcowen wrote: > Dear List, > > > > I am wondering if the methodological approach I am taking is correct and > would be very grateful if people could comment and make suggestions, much > appreciated. The simple answer is no it isn't: <http://gking.harvard.edu/files/mist.pdf> > > > I have developed the best model ( AIC model selection) using oceanographic > data ( i.e. SST, chlorophyll, NPP...x6) to predict reported fisheries catch > for 52 countries. > > > > I then extract the residuals from the model and anything positive has a > higher catch than would be predicted given the level of productivity in the > country, with the opposite being true. > > > > What I want to do is: > > > > 1. Regress a suite of ecological and socioeconomic variables against > the residuals from the oceanographic model to determine which factors cause > some countries to be above and some below. I.E as trophic level increase the > residuals become increasingly negative. You could simply put everything into one big model. > 2. Ideally ( and I am unsure how or if it is possible) work out for > each country which variables/s cause the poor fit of that country to the > oceanographic model. Try using partial residuals (http://en.wikipedia.org/wiki/Partial_residual_plot). Bob -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 798 40226 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Multiple comparisons among predictors generated from same data
On 05/25/2012 10:18 AM, Gavin Simpson wrote: On Thu, 2012-05-24 at 15:00 -0700, J Straka wrote: Hello, I'm planning on using a regression model to describe seed set of plants (my response) using some sort of predictor based on temperature. I have a number of temperature variables calculated from the same set of data (hourly temperatures for the growing season, converted to variables such as average temperature, maximum temperature, minimum temperature, degree-days above zero Celsius, degree days above ten Celsius, etc...), and I want to decide which one should be included in my model. I know that I would ideally select one based on "prior knowledge" of the system (e.g. so-called "planned comparisons" or choosing a temperature threshold that is known to be important for the development of seeds), but not much is known about this system. What is the model for? Understanding so you want to interpret the coefficients directly as something meaningful or for prediction? If the latter I would say it doesn't really matter; choose the model that gives the best out-of-sample predictions (lowest error etc), or average predictions over a set of best/good models. Simply choosing the best model via some sort of selection procedure may result in a model with high variance (change the data a bit and different variables would be selected). If so, consider a regression method that applies shrinkage to the coefficients such as the lasso or the elastic net; this will lead to a small bit of bias in the estimates of the coefficients but should reduce the variance of the final model because you are considering the selection of variables as part of the model itself. If you want to interpret the model coefficients as something real then you have to be very careful doing any form of selection; the stepwise procedures and best subsets all can potentially lead to strong bias in the model coefficients. Be removing a variable from the model in effect you are saying that the sample estimate of the effect of that variable on the response is 0, not some small (statistically insignificant) value. This is a very tricky thing to get right and I'm not sure I know the right answer (or even if there is one!?). An additional complication here is that the variables are going to be correlated, so a model with all or most in it could be unstable. If a single temperature variable is enough, then I'd suggest either trying your best to pick one, or use what everyone else uses (GDD5?), so the study can be comparable. Once you have a model, it might be worth checking to see if the other variables tell a different story. If it's the same story but with different p-values, you might as well stick to the original analysis. Bob -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 798 40226 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] PCA as a predictive model
On 05/23/2012 10:55 AM, Marc Taylor wrote: Hi Jari - one more question if you don't mind. Since the weights of the PCs are related to the the amount of variance that they explain in the original data - is it problematic to predict the PC scores with a second data set that has a different amount of variance (e.g. due to differing number of samples)? In both the 1st and 2nd data sets I have been using scaled values for the variables (mean=0 and sd=1 for each sample). Cheers, Marc I'll pretend to be Jari for a moment. :-) PCA just scales and rotates the data in cunning ways, so with the new data you need to scale and rotate it in the same way. If you scale the values first then you've already changed the scaling. What you need to do is either do PCA on the raw data or scale the new data using the mean and varianes of the old data. library(MASS) NVar=5; NObs=50 Sigma=matrix(c( 10,0.2, 0, 0,0.4, 0.2, 5,0.1, 0,0.6, 0,0.1,1.0, 0.2,0, 0, 0,0.2, 5, 0, 0.4,0.6, 0, 0,1), nrow=5) # simulate data Data=mvrnorm(NObs, rnorm(NVar), Sigma=Sigma) # Do PCA on scaled data Data.Sc=scale(Data) PC=princomp(Data.Sc) # Simulate new data NewData=mvrnorm(10, rnorm(NVar), Sigma=Sigma) # Do PCA on new data. First do it wrong... PC.wrong=predict(PC, newdata=scale(NewData)) # Now scale correctly NewData.Sc=scale(NewData, center=attr(Data.Sc, "scaled:center"), scale=attr(Data.Sc, "scaled:scale") PC.right=predict(PC, newdata=NewData.Sc) HTH Bob -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 798 40226 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] gam
On 05/17/2012 06:17 PM, Mahnaz Rabbaniha wrote: Dear all i have done GAM for data ( they are non- normal and i find regression between Sillaginidae with several hydrological factors),i done : [1] "depth""temperature" "salinity" "Sillaginidae" > pairs(sc,panel=function(x,y){points(x,y);lines(lowess(x,y))}) model<-gam(Sillaginidae~s(temperature)+s(salinity)+s(depth)) but i take this message: Error in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) : A term has fewer unique covariate combinations than specified maximum degrees of freedom please help me what can i do? or what is meaning this sentence? It means that you don't have a lot of different values for one of your covariates. You can check which with length(unique(depth)) length(unique(temperature)) length(unique(salinity)) Unless you have a lot of data, that the sensible thing to is probably to fit a straight line rather than a smooth curve. The other thing to do is to define the maximum flexibility of the curve, e.g. model<-gam(Sillaginidae~s(temperature, k=6)+s(salinity)+s(depth)) I hope this helps. Bob -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 798 40226 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] data normality
On 05/06/2012 10:54 AM, Yong Shen wrote: Dear all, I have two questions about data normality. I used stepwise multiple regression to determine which variables contributed to tree growth, and want to built a model to explain tree growth. Sample size is about 50 tree species, I think it is not a large sample size, and some variables are not normal distribution. 1. Do I have to transform them to normal distributions before I perform multiple regression? No. The only area where a Normal assumption comes in is that the residuals are normally distributed. So you can happily fit the model without worrying about normality until after you've got the model. 2. Two variables can not transform to normal distributions although I used some methods (e.g log, sqrt, boxcoxfit), what should I do for the two variables? Leave them as they are. Advice that makes life simpler - always the best sort. Bob -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 798 40226 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Which function would fit these data (Version 2)
On 04/13/2012 06:16 AM, Beutel, Terry S wrote: Hi list members Sending this again as symbols in the previous version did not translate. Apologies if this is not too specifically R related, but I am looking fit a model to some simulated data. X is distance to sample point, y is binary outcome (present/absent). I was hoping someone can suggest a (presumably) non linear function that might meet the following criteria A. 0<= y<=1 (actual responses are binary) B. xmax = infinity C. xmin<= 0 D. Where x = 0 then y>= 0 E. Where x equals Infinity then y equals 1 F. As x approaches Infinity from xmin, the slope of the function declines monotonically toward 0. my initial reaction was to think "logistic regression", but my experience with questions like this is that we'll save a lot of time if you explain the context. BTW, I'm not sure why you want "the function" to decrease to zero when x approaches infinity when at the same time y should equal 1 at infinity. It's not a big problem, though. Just use 1-y instead of y. Bob -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 798 40226 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] ICC confidence intervals and power analysis for random effects in lmer?
On 04/13/2012 04:17 AM, Bradley Carlson wrote: Thanks for the tips everyone - I'll look into MCMC sampling for the CI. As far as the power analysis goes, I'm somewhat familiar with the criticisms regarding power analysis. I think this reviewer was curious about it because there was a small sample size (small number of levels of the random effect) and the ICC point estimate was not so low as to be biologically insignificant. It would be nice to state in the paper how much larger of a sample size would have enabled us to detect an effect given the observed variation, as though this were a pilot study for planning a bigger experiment. I'll certainly bring up the suggested citations, but I would still be interested to know if there is a method available for performing a power analysis for an LRT of a random effect. Thanks again, I'd use simulation: you can write a function to simulate data and fit the model to it, so you can run that for different sample sizes. It's not too fiddly to write, and you can leave the computer to run it over the weekend, so you can run reasonable sample sizes. Bob Brad On Thu, Apr 12, 2012 at 12:18 PM, Brian Inouye wrote: In addition to Bob O'Hara's suggestion, here is another citation you can give the reviewer/editor, as to why retrospective power analyses are a waste of time. Hoenig, J. M. and D. M. Heisey (2001). "The abuse of power: the pervasive fallacy of power calculations for data analysis." American Statistician 55(1): 19 - 24. Last year the ESA updated its author guidelines for reporting statistics, and removed a suggestion to report power analyses that had been inserted in the 1980s. -Brian Inouye Florida State University Chair, statistical ecology section of the ESA On 4/12/2012 6:00 AM, r-sig-ecology-request@r-**project.orgwrote: 2) A reviewer requested a power analysis of the ability to detect a significant random effect. Any tips on how to approach that? Report the random effect and confidence intervals. Retrospective power analyses are pretty pointless (e.g. see http://beheco.oxfordjournals.** org/content/14/3/446.full<http://beheco.oxfordjournals.org/content/14/3/446.full>), unless you're planning to repeat the experiment. Bob __**_ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/**listinfo/r-sig-ecology<https://stat.ethz.ch/mailman/listinfo/r-sig-ecology> -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 798 40226 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] ICC confidence intervals and power analysis for random effects in lmer?
On 04/12/2012 03:50 AM, Bradley Carlson wrote: I'm performing an analysis of behavioral variation among individual tadpoles, using individual ID as a random effect and time as a continuous fixed covariate in the lmer function in lmer4 package. I'm really interested in making inferences about the random effect (i.e. the extent of variation among individuals). I'd like to do two things that I can't seem to find straightforward answers about and I'm hoping someone can help or point me to a good resource. 1) The intraclass correlation coefficient is of particular interest to me, as it describes the proportion of variation that occurs among individuals. Ideally I'd like to report a confidence interval of the ICC but I can't find any way to calculate one, other than a function in the psychometric package that appears to only work when there are no covariates in the model (random effect only). MCMC has already been mentioned and lme4 still has its mcmcsamp() function. Failing that, you could try a parametric bootstrap, which requires a little bit of coding but simulate() makes it much easier. 2) A reviewer requested a power analysis of the ability to detect a significant random effect. Any tips on how to approach that? Report the random effect and confidence intervals. Retrospective power analyses are pretty pointless (e.g. see http://beheco.oxfordjournals.org/content/14/3/446.full), unless you're planning to repeat the experiment. Bob -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 798 40226 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Output for interactions in models that do not include all main effects
On 04/03/2012 11:31 PM, Kristen Gorman wrote: Dear all, I have R code to run AIC including multi-model inference. I am running into a problem in calling the output from models where both parameters in an interaction are not included as main effects. Why would you want to do that? Why would you (for example) expect the average of the Rlipid slope to be zero if the slope varies with the value of RFGinit? Does this make sense? (this is the sort of thing that makes statisticians splutter into their tea when they see someone do it: it rarely makes sense. Well, unless you have nested effects - which you don't have here- where the interaction is the nested effect) if you respect marginality, there won't be a problem because the main effect is always included. If you really want to include interactions without main effects, you can either write the formula "by hand", using paste(): something=Rlipid form = paste("Slipid ~ ", something, " + RFGinit:", something, sep="") lm(form, data = DataSet) and then work out how to get the order. Or you could try using update(): mod1 = lm(formula = Slipid ~ RFGinit*Rlipid, data = DataSet) mod2=update(mod1, . ~ . -RFGinit) HTH Bob In R, the interaction will be called depending on the parameter that was used as the only main effect in the model. So, I end up generating 2 different interactions (e.g., Rlipid:RFGinit vs RFGinit:Rlipid) that are actually the same. This becomes a problem in the remaining R code that requires weighted and summed values for the parameter and SE estimates. Thus, I would like to call the interaction consistently across models. See the following code: -- lm(formula = Slipid ~ Rlipid + RFGinit:Rlipid, data = DataSet) Residuals: Min 1Q Median 3Q Max -74.075 -19.047 7.233 20.445 45.391 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept)120.338475.30405 22.688<2e-16 *** Rlipid 0.304930.23615 1.2910.202 Rlipid:RFGinit -0.020990.01773 -1.1840.241 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 30.88 on 60 degrees of freedom Multiple R-squared: 0.02721,Adjusted R-squared: -0.005221 F-statistic: 0.839 on 2 and 60 DF, p-value: 0.4372 lm(formula = Slipid ~ RFGinit + Rlipid:RFGinit, data = DataSet) Residuals: Min 1Q Median 3QMax -76.35 -21.63 7.09 22.46 45.71 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept)131.028546 8.717104 15.031<2e-16 *** RFGinit -0.933483 0.742083 -1.2580.213 RFGinit:Rlipid 0.003926 0.009283 0.4230.674 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 30.9 on 60 degrees of freedom Multiple R-squared: 0.02586,Adjusted R-squared: -0.00661 F-statistic: 0.7964 on 2 and 60 DF, p-value: 0.4556 -- Is there a way to tell R to call the interaction based on alphabetical order of the 2 interaction terms and not based on the term that was used as a main effect? Thanks very much for any insight. Kristen Gorman ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 798 40226 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Problem with R2WinBUGS packages
On 01/22/2012 05:50 PM, sergio vignali wrote: Hi, I have a problem with the R2WinBUGS package, when I run the bugs() function with the argument debug=TRUE, WinBUGS executes the calculations but R is blocked, instead with the option debug=FALSE it is all ok! Is there anybody who can help me? Sergio Vignali Uwe Ligges (lig...@statistik.tu-dortmund.de) is the maintainer. But isn't this expected behaviour? The help says debug if FALSE (default), WinBUGS is closed automatically when the script has fin- ished running, otherwise WinBUGS remains open for further investigation IIRC, you just need to close WinBUGS, and it'll return to R. You only need to use debug=T if BUGS is trapping or throwing errors (and have fun understanding the traps). Bob -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 798 40216 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Mixed model with zero truncated Poisson distributed data
On 01/11/2012 04:58 PM, Helen Ward wrote: Hello, I am a PhD student currently trying to derive a mixed effects model. I would like to describe the relationship between age and male reproductive success in a population of greater horseshoe bats. My data consists of three columns: MaleID, Age, NumberofPups (at that age). Many of the males appear multiple times in the data set, so I believe I need to derive a mixed model with MaleID as a random variable. The data is Poisson distributed, but zero-truncated. So far I have only succeeded in making a mixed model with a poisson distribution (using glmmPQL in the MASS package), and a zero truncated poisson model (using vglm in the VGAM package), but not a mixed model capable of handling zero truncated Poisson data. It has been suggested that I could just minus 1 from each value in the NumberofPups column to make a more usual Poisson distribution, so I can ignore the zero truncated bit. I have tried this and it changes the results of the model, but is this an acceptable transformation? If not, can anyone advise me on a mixed model that can handle zero truncated Poisson data please? I also intend to post this on the R-sig-mixed-models <https://stat.ethz.ch/mailman/options/r-sig-mixed-models/h.l.ward%40qmul.ac.uk> list, so apologies if you've seen it twice! If an individual would have had pups, would you have seen it? I'm wondering if you could simply add the zeroes back into the data. Bob -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 798 40216 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology