from:"Bob O'Hara"

[R-sig-eco] PhD position in ecology and statistics at NTNU, Norway

2020-10-28 Thread Bob O';Hara

(the deadline for this is the 31st of October, sorry about the short 
notice! If you apply, please mention that you saw this on r-sig-ecology)


We have a vacancy for a position as a PhD research fellow at the 
Department of Mathematical Sciences, for someone interested in 
statistical modelling and applying it to problems of biodiversity and 
sustainability. I can promise the chance to work with a 
cross-disciplinary team: there will also be a student working on the 
ecological modelling, and we also have a very active group of students 
working with citizen science data.
The successful applicant will work as part of an inter-disciplinary team 
to develop methods to aid in the understanding of the effects of water 
resource management on biodiversity. They will join a team working on 
statistical methods to apply ecological models to messy data. The 
project will focus on developing the statistical models to integrate 
different data from different sources (e.g. different surveys, and 
reports from citizen scientists) to model the biodiversity of Norwegian 
rivers and use these models to predict the impacts of resource 
management decisions.


Duties of the position

• make a workflow to bring data in from a variety of sources
• develop models, based on recent work at NTNU, to model the 
distributions of riverine species
• work with other members of the project to look at the effects of water 
resource management, such as the effects of dams.


Required selection criteria

The PhD-position's main objective is to qualify for work in research 
positions. The qualification requirement is that you have completed a 
master’s degree or second degree (equivalent to 120 credits) with a 
strong academic background in statistics or equivalent education with a 
grade of B or better in terms of NTNU’s grading scale. If you do not 
have letter grades from previous studies, you must have an equally good 
academic foundation. If you are unable to meet these criteria you may be 
considered only if you can document that you are particularly suitable 
for education leading to a PhD degree.


The appointment is to be made in accordance with the regulations in 
force concerning State Employees and Civil Servants and national 
guidelines for appointment as PhD, post doctor and research assistant.


Required selection criteria:

• ability to work in a cross-disciplinary team
• interest in applying statistical methods to ecological problems
• good knowledge of statistical modeling methods

Preferred selection criteria

• experience with Bayesian methods
• skills in hierarchical modelling
• good written and oral English language skills

More details here (including how to apply, and the formalities):

https://www.jobbnorge.no/en/available-jobs/job/194344/phd-research-fellow-at-the-department-of-mathematical-sciences

If any potential applicant have questions, they should feel free to ask me.

Bob
--
Bob O'Hara
Institutt for matematiske fag
NTNU
7491 Trondheim
Norway

Mobile: +47 915 54 416
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] CI for stratified Cox PH model

2018-06-06 Thread Bob O';Hara


Is there any reason you can#'t use confint()? e.g.

 test1 <- list(time=c(4,3,1,1,2,2,3),
   status=c(1,1,1,0,1,1,0),
   x=c(0,2,1,1,1,0,0),
   sex=c(0,0,0,0,1,1,1))
confint(coxph(Surv(time, status) ~ x + strata(sex), test1))

Another trick is to use predict(), with new data at all of hte levels 
you are interested in:


pred1 <- list(x=c(0:2, 0:2),
   sex=c(0,0,0,1,1,1))
predict(coxph(Surv(time, status) ~ x + strata(sex), test1), 
newdata=pred1, type="lp" , se.fit=TRUE)


(it doesn't provide CIs, but mean +/-1.96s.e. should work OK for the 
linear predictor scale)


Bob

On 05/06/18 18:47, Bertolo, Andrea wrote:

Hi everyone,

I have a doubt about the way to calculate 95% CI for coefficients in
the stratified Cox proportional hazard models and your help is welcomed
.

Say that I have a variable of interest Imi and a stratifying variable
UV in a interaction model (since the interaction between Imi and UV is
of interest for me and the interaction model has a better fit than the
no-interaction model:

library(survival)
model1 <- coxph(Surv(start,stop, Status.time)
    ~ Imi + Imi:UV + strata(UV) + cluster(ID),
                      weights = NB_Event, data=Data.unfold)



Whereas it is pretty straightforward to calculate the coefficients (and
associated HR) for each combination of Imi and UV, I am not sure about
how to calculate the associated CI (note that, of course, I got the CI
for the estimate of "Imi:UV" from the output of model1).

Is it correct to calculate separately a model for each UV level and use
the CI for the Imi variable to get the CI for the two levels of UV (see
below) ?

# 2 models (one per UV level)
data.Low <- subset(Data.unfold,UV=="low")
model2.1 <- coxph(Surv(start,stop, Status.time)
    ~ Imi + cluster(ID),
                      weights =
NB_Event, data=data.Low)

data.High <- subset(Data.unfold,UV=="high")
model2.2 <-
coxph(Surv(start,stop, Status.time)
    ~ Imi +
cluster(ID),
                      weights = NB_Event, data=data.High)

Alternatively, is there a way to get the CI directly from the output of the 
stratified model ?
Many thanks
Andrea Bertolo



Université du Québec à Trois-Rivières
3351, bd des Forges
C.P.500, Trois-Rivières (Québec) Canada
G9A 5H7

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology




--
Bob O'Hara
Institutt for matematiske fag
NTNU
7491 Trondheim
Norway

Mobile: +49 1515 888 5440
Homepage: http://www.ntnu.edu/employees/bob.ohara
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] correlating three time-series in R

2017-07-26 Thread Bob O';Hara


You can pass the columns to ccf() directly:

df <- data.frame(x=rnorm(6), y=rnorm(6))
ccf(df$x, df$y)
print(ccf(df$x, df$y))

You should probably also check the time series task view: 
<https://cran.r-project.org/web/views/TimeSeries.html>, in particular 
the zoo package, to see what can be done with irregular time series.


But with 6 data points I'd be surprised if you have the power to detect 
anything that doesn't jump out when you simply plot the data.


Bob


On 26/07/17 11:07, Tania Bird wrote:

I have three data sets of abundances through time for plants, insects and
reptiles.
There are 6 samples over a ten year period (all taxa sampled at the same
time).
I recognise this is a small data set for time series.

I would like to correlate the time series to see if
a) increases in abundance of one taxon are correlated to another, and
b) to see if the correlation between plants:insects is greater than
plants:reptiles.

I thought to use the cross-correlation function in R
e.g.  ccf(insects, reptiles)

Currently the data is in one dataframe with time as one column and
abundance of each taxa is the next three columns.

How do I convert the data to a time.series format as given in the R
example?

How can I compare the two ccf outputs?

Thanks

Tania


Tania Bird MSc
*"There is a sufficiency in the world for man's need but not for man's
greed" ~ Mahatma Gandhi*

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology




--
Bob O'Hara
NOTE: this email will die at some point, so please update you records to 
bob.oh...@ntnu.no


Institutt for matematiske fag
NTNU
7491 Trondheim
Norway

Mobile: +49 1515 888 5440
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] mixed effects in ordination (?)

2017-01-12 Thread Bob O';Hara

This doesn't sound like an ordination problem: it's more similar to a 
standard mixed model regression. There are several ways to approach 
this, for example Ben Bolker has some (old) notes here:

<https://rpubs.com/bbolker/3336>

I'd suggest you start by running the analyses on each trait 
individually, as that way you can make sure you have the model sorted 
correctly. The step up to the multivariate model just needs some work 
understanding what the relevant R package wants.


HTH

Bob

On 12/01/17 14:12, Martin Weiser wrote:

Dear friends,

Could you please help me with analysis? I am afraid that it involves
crossed random effects in the mixed-effect constrained ordination
setting, so to say.

Goal:
Show an effect of the species trait (single one) and treatment (four
levels, quantitative scale) on parameters. Trait x treatment
interaction is possible. If possible, show changes through time.

Data:
Individuals of 7 species, subjected to 4 treatment levels (fully
factorial) - from 6 to 12 individuals in each combination. Each
individual scored 4 times (same times: 1st wk, 2nd wk, 3rd wk, 4th wk).
Several (10) parameters scored every time on each individual.

What I did:
In order to avoid multiple testing (the parameters are likely to be
correlated to each other), I decided to use multivariate analysis
(RDA). I am by far much more accustomed to vegan than ade4, so excuse
me if I use some "veganisms". Predictors: time, trait, treatment
(possibly with interactions), conditioned on individual identity to
avoid treating records from the same plant as independent. Variance
partitioning.

Here comes the problem: how to set permutations in order to be able to
report p_vals (some people just are not happy without them)? Since
individuals of the same species share the same trait value, maybe the
right way is to: shift observations within individual (if time is among
predictors for the particular model) and permute trait value among
species. Is it so? Is this treatment of the pseudoreplication at the
species level (i.e. only in the significance testing, not in the
ordination per se) ok?

I also tried to use different approach: I averaged all params
individual-wise (getting rid of temporal pseudoreplication, but also
time effects), and subsequently I averaged the result within treatment
x species levels. I assume that I can go for simple free permutations
this way? Pity is that this way, I cannot see development in time.

And another way: I averaged params for species x treatment x time
groups, ignoring interdependence of records from the same individual,
hoping that the effect of an individual "dissolves" in the average. Is
that meaningful? If yes, what is the appropriate permutation structure
in this case?

But maybe I miss something and there are better ways how to deal with
this problem?

Any suggestions (ok: not any, just those made in an attempt to help :-)
) are appreciated.

With the best regards,
Martin Weiser

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology




--
Bob O'Hara

NEW EMAIL: bob.oh...@ntnu.no
NOTE NEW ADDRESS!!!
Institutt for matematiske fag
NTNU
7491 Trondheim
Norway

Mobile: +49 1515 888 5440
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] LMER: applying a random term to only one level of a factor

2016-04-07 Thread Bob O';Hara


On 06/04/16 20:12, Aislinn Pearson wrote:

Hi,

I've tried googling this but haven't been very successful. Essentially, I'd 
like to know what is the most statistically valid way of dealing with a random 
term which doesn't apply to every level of fixed-effect factor.

I have a mixed effect model that looks like this

Disease level <- weight + Flown +(1|DateFlown)

Either I flew my insects on a flight mill (which can be thought of as a 'treadmill' 
for flying insects) or I didn't, hence flown is a two level factor (Yes or No) and 
I want to understand how this affects the amount of disease in my insect. To get as 
many replicates as I could on a single day, I had two different banks of flight 
mills (A & B), each bank containing 16 individual insect treadmills. The 
insects were randomly assigned to one of the two sets of 16 flight mills. Previous 
studies tell me there are differences between these two sets of flight mills, so I 
would like to account for them as a random term in my model.
As a practical matter, it's not worth setting a level with two levels as 
random: you don't gain anything in the analysis and the variance 
component is really poorly estimated. In practice, this might actually 
make things cleaner, as you will have to look a bit more at the flight 
mill effects, so you should get a better feel for what's happening.



However, I can't run this in LMER. When I tried I got the error;

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
   contrasts can be applied only to factors with 2 or more levels

Which I imagine means that one of my factors (i.e. Flown) doesn't include any 
levels for the random term mill set (i.e. for all unflown insects the value in 
the mill set column is NA)

Is it possible to include this form of experimental design in LMER (the package 
I know best) or, alternatively, nlme (which I am a lot less accustomed to 
using)?
I can think of two ways of doing this: either set up a factor with three 
levels (Flight mill A, Flight mill B, Not Flown) or set the Not Flown to 
one of the flight mill levels. The first way feels less confusing, but 
you might have to set up some contrasts to estimate the differences. But 
hopefully your insects will cooperate nicely and make the difference 
between the flight ills will be much smaller than between flight mills 
and not flown.


HTH

Bob


I would be really grateful if anyone has any insight.

Many thanks

Rothamsted Research is a company limited by guarantee, registered in England at 
Harpenden, Hertfordshire, AL5 2JQ under the registration number 2393175 and a 
not for profit charity number 802038.



[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



--

Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 7542 1863
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] AIC in R: back-transforming standardized model parameters (slopes)

2016-01-12 Thread Bob O';Hara

-1.07202   0.26102  -4.11
z.time:c.air.ice  -0.38008   0.13722 -2.77


   [[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



--

Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 7542 1863
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] NMDS axes scores

2016-01-11 Thread Bob O';Hara

ally for further analysis.

I therefore would like to include both of my NMDS site scores as a

response

into a GLM model simultaneously.  Unfortunately, I couldn't find any

advice

on how to actually do this. I found a  couple of papers using NMDS

scores in

GLMs, but they all seem to use them individually, fitting separate

models to

each of the ordination axes.



I'm a bit at a loss here and any advice is very much appreciated,

Conny


  [[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


--

--
Pokud je tento e-mail součástí obchodního jednání, Přírodovědecká fakulta
Univerzity Karlovy v Praze:
a) si vyhrazuje právo jednání kdykoliv ukončit a to i bez uvedení důvodu,
b) stanovuje, že smlouva musí mít písemnou formu,
c) vylučuje přijetí nabídky s dodatkem či odchylkou,
d) stanovuje, že smlouva je uzavřena teprve výslovným dosažením shody na
všech náležitostech smlouvy.

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology







--

Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 7542 1863
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] best choice of GLMM for seed set data

2015-08-27 Thread Bob O';Hara

On 27/08/15 15:33, Mehdi Abedi wrote:
> Thanks Bob for reply.
> I have problem that when you have results of effect size i am not sure 
> to how get only main effect without their levels data. I previously 
> asked this simple question but i couldn't get clear answer. The 
> question is simple: how we can have only temp, light and their 
> interactions(similar like output of anova() without their levels.
You can't: that's what the effect sizes are! For more on how to 
interpret them, see any good stats text, or this paper:
<http://dx.doi.org/10.5735/086.046.0205>

Bob

> *Call:*
> glm(formula = cbind(A..hierochuntica, A..hierochunticano) ~ temp *
> light, family = binomial)
>
> Deviance Residuals:
> Min   1Q   Median   3Q  Max
> -4.4473  -1.1993   0.9904   2.0101   3.6663
>
> Coefficients:
>  Estimate Std. Error z value Pr(>|z|)
> (Intercept)   2.666160.14341  18.591  < 2e-16 ***
> temp20/30 0.085380.20671   0.413 0.679596
> temp25/35-0.673730.18001  -3.743 0.000182 ***
> lightlight0.511890.23048   2.221 0.026350 *
> temp20/30:lightlight  0.628390.37291   1.685 0.091965 .
> temp25/35:lightlight  2.090800.43729   4.781 1.74e-06 ***
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> (Dispersion parameter for binomial family taken to be 1)
>
> Null deviance: 334.88  on 47  degrees of freedom
> Residual deviance: 209.30  on 42  degrees of freedom
> AIC: 331.56
>
> Number of Fisher Scoring iterations: 6
>
> this is by *anova(model,test="Chi")*
> Analysis of Deviance Table
>
> Model: binomial, link: logit
>
> Response: cbind(A..hierochuntica, A..hierochunticano)
>
> Terms added sequentially (first to last)
>
>
>Df Deviance Resid. Df Resid. Dev  Pr(>Chi)
> NULL  47 334.88
> temp2   10.14445 324.73  0.006271 **
> light   1   86.86044 237.87 < 2.2e-16 ***
> temp:light  2   28.56942 209.30 6.255e-07 ***
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
>
>
> *anova(model, test="F")*
> Analysis of Deviance Table
>
> Model: binomial, link: logit
>
> Response: cbind(A..hierochuntica, A..hierochunticano)
>
> Terms added sequentially (first to last)
>
>
>Df Deviance Resid. Df Resid. Dev   F  Pr(>F)
> NULL  47 334.88
> temp2   10.14445 324.73  5.0718  0.006271 **
> light   1   86.86044 237.87 86.8597 < 2.2e-16 ***
> temp:light  2   28.56942 209.30 14.2847 6.255e-07 ***
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> Warning message:
> In anova.glm(model, test = "F") :
>   using F test with a 'binomial' family is inappropriate
>
> Warm regards,
> Mehdi
>
>
> On Thu, Aug 27, 2015 at 2:38 PM, Bob O'Hara  <mailto:boh...@senckenberg.de>> wrote:
>
> On 27/08/15 11:57, Mehdi Abedi wrote:
>
> Dear Thierry,
> Yes i am using (success, failure) but in this case i need to
> change all
> data frame. I was thinking to use codes which is not necessary
> to create
> new column when you have a ll of species. Because we know
> success(germinated seeds) and we know failure (Total seeds -
> success(germinated seeds)).
>
> Yes i used codes with ANOVA but there is no P- value for study.
>
> model2<- glmer(cbind(germinated, Nongerminated) ~ temp *light +
> (1|Replication ), data=growthdata,
> + family=binomial)
>
> anova(model2)
>
> Analysis of Variance Table
> Df Sum Sq Mean Sq F value
> temp2 30.600  15.300  15.300
> light   1 46.231  46.231  46.231
> temp:light  2 22.877  11.439  11.439
>
> p-values are difficult. See here:
> <http://glmm.wikidot.com/faq>
>
> Better to stick to reporting your effect sizes: your analysis of
> deviance only tells you if you have enough data to see a
> difference, not how big the differences are.
>
> Also, if Replication is 1:nrow(growthdata), you could use a simple
> GLM and estimate your over-dispersion term (the residual deviance
> divided by the residual sum of squares should be OK). You can use
> this to correct the standard errors with summary(glm.obj,
> dispersion=overdisp).
>
> Bob
>
>
> Warm regards,
> Mehdi
>
>
> On Thu, Aug 27, 201

Re: [R-sig-eco] best choice of GLMM for seed set data

2015-08-27 Thread Bob O';Hara

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology




--

Mehdi Abedi
Department of Range Management

Faculty of Natural Resources & Marine Sciences

Tarbiat Modares University (TMU)

46417-76489, Noor

Mazandaran, IRAN

mehdi.ab...@modares.ac.ir

Homepage

Tel: +98-122-6253101

Fax: +98-122-6253499






--

Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 7542 1863
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] Zero infalted model- T50

2015-07-08 Thread Bob O';Hara


On 08/07/15 09:56, Mehdi Abedi wrote:

Hi,
I want to calculate time to 50 percent reduction of seed viability. Mainly
researcher calculate P50 using probit model for calculation T50.

In some treatments seed viability decline quit fast therefore i have only
value 100 and 0 values . Can i calculate T50 for this data using probit?!
Could we apply Zero inflated model for this conditions with 0?

All the best,
Mehdi

   Time(weed) Seed viability  0 100  8 0  16 0  24 0  32 0  40 0

You can use a probit, but it'll be unhappy. The problem is that the 
curve is steep compared to the times, so you don't see it. The only real 
solution is to change the assays, so you observe more times. But I 
appreciate this might not be practical.


I don't think zero inflation will help. There isn't a unique maximum 
likelihood estimate for this, but I think a sensible estimate would be 
mid way between the times where you observe 100 and the first 0. If you 
want to be Bayesian, this would be the posterior mean. The best 
confidence interval is probably the central 95% of the interval that the 
T50 is in (i.e. assuming the likelihood is flat). It's probably a little 
bit wider than it should be, but I doubt that will be a big problem.


I used to get data where the data would flip between 0 and 100 over a 
range of doses. In the end I gave up with those problems and became an 
ecologist


Bob

--

Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 7542 1863
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] mvabund SIMPER alternative

2015-06-02 Thread Bob O';Hara


On 02/06/15 15:38, carvin90 wrote:

Hi community,

I am new to stats to this level in ecology. I am trying to compare DNA and
RNA libraries with thousands of OTUs. I summarized taxa to get the most
abundant species, but I can obtain only relative abundances. I was thinking
to use SIMPER as I read in several paper to test which species differ the
most per station between DNA and RNA based libraries. However I found that
SIMPER is a more or less robust test. I was wondering if the manyglm was
also an alternative for my question or if you suggest another function.
Thank you for your help!
Yes, you can use mvabund for OTUs: I've helped a colleague do this (e.g. 
dx.doi.org/10.1371/journal.pone.0053987).


Bob

--

Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 7542 1863
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] Association Routine?

2015-02-28 Thread Bob O';Hara


On 02/28/2015 06:48 PM, Alexandre F. Souza wrote:

Dear friends,

I need to write a code to find data using one variable as reference. The
code I wrote, however, is not working and I can't figure it out why. Could
anyone help me?

Imagine a data set with two variables, B and C. Now I have variable A,
which is the same variable as variable B but the data are not in the same
order nor have necessarily the same extension as B (it may be a sample of
B, for example).

I want to find the values of variable C that match each line in variable A
using B as the association criterion. So the code should perform a loop in
which it would take the first line in A, search B until it finds it there,
then copy the corresponding value of C and store it in a new variable D. Do
it until all lines in A have been associated to a C value.


starting with...

df<-data.frame(B=sample(letters[1:10],replace=FALSE), C=rnorm(10), 
stringsAsFactors=FALSE)

A=letters[1:10]

two thoughts spring to mind:
(a) would merge() do what you want? e.g. df2 <- 
merge(df,data.frame(A=A), by.x="B", by.y="A"), and then extract the 
values of C with df2$C[df2$B=="f"], for example.

(b) sapply(A, function(lt, DF) DF$C[DF$B==lt], DF=df)

R's looping is generally more efficient when it's done internally, so it 
will be easier for you if you understand the R mentality, in particular 
vectorisation. usually if you have a for() loop, you're not writing R 
code efficiently.


Bob



Here is the code I wrote:


# Considering that matrices data.ref and data.assoc have been already read,
containing the

# User-defined number of columns to be associated with A (I imagined that
more than one variable could be associated at once)
col.assoc = 20

# To assure that data will not be in a non-usable data category
ref = as.matrix(data.ref)
assoc = as.matrix(data.assoc)


# Table where results will be stored
#  Number of columns = n associated variables plus one column
#  Reserved to receive the initial data (example column A)

result = matrix(nrow = nrow(ref), ncol = col.assoc + 1)

# Fulfill the first column of the result table with the original reference
variable

result[,1] = ref[,1]


for (i in 1:nrow(ref)){
   for (j in 1:nrow(assoc))
if (ref[i, 1] == assoc[j, 1]){
  resultado[i, 2] == assoc[j, 2]
}
}



col = ncol(dados)

####

Any thoughts?

Thanks in advance,

Alexandre




--
Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 7542 1863
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] Multivariate quasi-bionomial analysis of proportion data?

2015-02-09 Thread Bob O';Hara


On 08/02/15 12:27, Amanda Greer wrote:

Hi All,

I am trying to best analyse a set of foraging ecology data with >10
behaviour categories (DVs) and 3 levels of IV (season, sex, age). The time
which an animal spent engaged in a behaviour was recorded and then divided
by the total time spent in sight of the observer, so my data are
proportional. As is typical, not all animals engaged in all behaviours and
there are a large number of zeros in my dataset which is severely
over-dispersed. I had initially analysed all the data using the glm
function (family = quasibinomial, followed by anova. The intention was then
to use the false discovery rate alpha to account for the large number of
analyses. However, it was pointed out to me that a multivariate approach
might be better so I have been trying to figure out (a) if it's possible to
run a quasi-binomial multivariate analysis of proportion data  (b) how to
go about it.

In the R Documentation quasi-binomial family function page (
http://artax.karlin.mff.cuni.cz/r-help/library/VGAM/html/quasibinomialff.html
) it is stated that if multivariate response = TRUE the response matrix
should be binary. This seems a pretty straightforward indictment of my idea
to run this analysis on my proportion data, but I am wondering why - is
this just not possible, or is there a particular package that could help?
If anyone could provide me with an answer or some much needed guidance on
this topic I would be very grateful.
Ignoring the zeroes problem for the moment, I think (quasi-)binomial 
distributions are a distraction: binomials are based on counts of things 
(see Petr Keil's post: http://www.petrkeil.com/?p=603). If you're 
looking at proportions of times, then it might be better to think in 
terms of gamma distributions, which lead to a beta distribution for the 
proportion of times spent doing one thing, and a Dirichlet distribution 
if you have several items (as you do here).


Once you have to worry about the zeroes, you need to do something more, 
for example see this paper:

<http://onlinelibrary.wiley.com/doi/10./2041-210X.12122/abstract>

Bob


Thanks,

Amanda

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



--

Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 7542 1863
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] SPEI index

2014-12-23 Thread Bob O';Hara


On 12/23/2014 09:57 AM, ONKELINX, Thierry wrote:

If the file is tab delimited then you need to use read.table(sep = "\t") instead of 
read.table(sep = ""). read.delim() is another option.
Or save the file as a .csv and use read.csv (this makes the reading in 
easier if you have things like spaces in the file).


Also, it's worth checking that you don't have a header row and then some 
empty rows. It's often worth opening up the .txt or .csv file as a text 
file (e.g. in Notepad) to see if there are any problems.


Oh, and the gdata package has a read.xls() function that is another 
alternative.


Bob


ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and 
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than 
asking him to perform a post-mortem examination: he may be able to say what the 
experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure 
that a reasonable answer can be extracted from a given body of data.
~ John Tukey

Van: R-sig-ecology [mailto:r-sig-ecology-boun...@r-project.org] Namens Manuel 
Esteban Lucas Borja
Verzonden: dinsdag 23 december 2014 9:13
Aan: r-sig-ecology@r-project.org
Onderwerp: [R-sig-eco] SPEI index

Dear All,

I am starting to use R software in order to calculate an aridity index (SPEI). 
I have installed the spei package and the library. Then, I have created a .xls 
file with year, month, Precip and Temp variables. The xls file has been 
exported to txt format (separated by tabulations). Then to try, the only output 
I have obtained can be see below. Could anyone of you please help me? If you 
want, I can send you the txt file.

All the best,

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1 NA NA NA NA NA NA NA NA NA NA NA NA
2 NA NA NA NA NA NA NA NA NA NA NA NA

Script:
library(SPEI)
SPEI <- read.table("/Users/PALAN.txt", header=TRUE, sep="")
str(SPEI)
PET <- thornthwaite(SPEI$Tm,40.0)
spei(SPEI$PREC-PET,1)

---
Manuel Esteban Lucas Borja
Universidad de Castilla La Mancha
Escuela T�cnica Superior de Ingenieros Agr�nomos y de Montes
Departamento de Ciencia y Tecnolog�a Agroforestal y Gen�tica
Campus Universitario s/n,
C.P. 02071, Albacete (Spain)
T�lf.; 967599200 ext. 2818
Mail: manuelesteban.lu...@uclm.es
Web:http://www.uclm.es/profesorado/manuelestebanlucas/


 [[alternative HTML version deleted]]
___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Disclaimer Bezoek onze website / Visit our 
website<https://drupal.inbo.be/nl/disclaimer-mailberichten-van-het-inbo>
___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



--
Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 7542 1863
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] species richness, GLM and negative values

2014-10-29 Thread Bob O';Hara

On 10/29/2014 04:27 PM, Ludovico Frate wrote:
> Dear all,I'am trying to fit a very simple linear model. I am analyzing the 
> differences in the number of species (DS) found in several permanent plots in 
> two year of observations.
>   Firstly, I have calculated the differences per plot (i.e. number of species 
> in Plot 1 in Time A - number of species in Plot 1 in Time B and so on for all 
> the plots).Secondly, those differences were tested for deviation from zero by 
> means of a linear model
> M2<-lm(DC~1, data = gransasso)summary(M2)E2<-residuals(M2)qqnorm(E2, pch = 
> 19, col = "blue"); qqline(E2, col = "red")
> The qqnorm has shown that residuals were not normally distributed, thus I 
> need to use a GLM. However GLM (poisson family) does not work with negative 
> values (DS has negative values).I've tried to add a constant value to these 
> differences (i.e. +100) but the result is misleading  since I am testing for 
> deviation from zero.
> Do you have any suggestions?
> Regards,Ludovico
Use the Poisson to model the number of species in each sample, so use 
data like this:

Plot  Time   DS
1  A 4
1  B 7
2  A 0
2  B 1
3  A 23
3  B 7
...

Then you fit the model

Mgood <- glm(DS ~ Plot + Time, family=poisson())

where Plot and Time are factors (use Plot <- factor(Plot), for example, 
if you need to).

You're interested in the Time effect, which is the average difference 
between the numbers of species in the plots in the different times. The 
Plot effect controls for different plots having different numbers of 
species overall. If you look at summary(Mgood), the Time effect is the 
log of the ratio of the species richnesses in times A and B. It will be 
written as TimeB, which means it's log(E.B/E.A) (where E.A and E.B are 
the expected species richnesses at times A and B). So, for example, an 
estimate of 0.4 would mean that at Time B there are exp(0.4)=1.49 times 
more species at time B than time A.

Bob


>   
> 
> Ludovico
> Frate
>
> PhD student (University of Molise - Italy)
> Environmetrics Lab
> http://www.distat.unimol.it/STAT/environmetrica/organico/collaboratori/ludovico-frate-1
> Department of Biosciences and Territory - DiBT
> Universit� del Molise.
> Contrada Fonte
> Lappone,
> 86090 -  Pesche (IS)
> ITALIA.
> Cel: ++39
> 767557
> Fax: ++39 (0874) 404123
> E-mail ludovico.fr...@unimol.it
> ludovicofr...@hotmail.it
>   
>   [[alternative HTML version deleted]]
>
>
>
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


-- 
Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 7542 1863
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org


[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] Question about poisson GLM

2014-10-27 Thread Bob O';Hara


On 27/10/14 15:38, rosanna di maggio wrote:

Dear all,

I'm trying to perform a poisson GLM with lme4 using the number of
individuals as dependent variable and i) years of study and ii) different
study areas as factors... So far so good..

But the results are quite curious...  The estimates and standard errors are
huge.. Is is normal? I don't think so...


This is an example of what happen:
It's difficult to say without seeing that data, but a couple of things 
spring to mind:


1. If the covariates are meant to be continuous, then did you centre and 
scale them?
2. If they are meant to be factors, did you tell R this? e.g. anno <- 
factor(anno).


Bob



Call:
glm(formula = NTOT ~ areaN * anno, family = poisson)

Deviance Residuals:
 Min   1Q   Median   3Q  Max
-3.7465  -1.0629  -0.5000   0.3132   6.8188

Coefficients:
   Estimate Std. Error z value Pr(>|z|)
(Intercept) 140.786646  27.766187   5.070 3.97e-07 ***
areaN71.479294  11.876992   6.018 1.76e-09 ***
anno -0.069199   0.013823  -5.006 5.56e-07 ***
areaN:anno   -0.035572   0.005912  -6.017 1.78e-09 ***




I cannot understand the reason of these results... Someone can help me?



Thank you

Rox




--

Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 7542 1863
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] EstimateR and Chao 1 standard deviation

2014-10-17 Thread Bob O';Hara

On 10/17/2014 05:36 PM, Gavin Simpson wrote:
> On 17 October 2014 06:44, Bob O'Hara  <mailto:boh...@senckenberg.de>> wrote:
>
> On 17/10/14 13:54, "José M. Blanco Moreno" wrote:
>
> Dear users
>
> I have been checking the values returned by the function estimateR
> against the formulae in the appendix to nonparametric
> estimators of
> species richness in EstimateS
> 
> (http://viceroy.eeb.uconn.edu/EstimateS/EstimateSPages/EstSUsersGuide/EstimateSUsersGuide.htm#AppendixB).
> 
> <http://viceroy.eeb.uconn.edu/EstimateS/EstimateSPages/EstSUsersGuide/EstimateSUsersGuide.htm#AppendixB%29.>..
>
> and they do not match each other.
>
> Where does this line of code in vegan:::estimateR.default come
> from?
>
> sd.Chao1 <- sqrt(a[2] * ((G^4)/4 + G^3 + (G^2)/2))
>
> It is from any other reference that I should have under control?
>
> Unless Jari added something, the functions come from my paper from
> about 10 years ago:
> O’Hara RB (2005) Species richness estimators: how many species can
> dance on the head of a pin? J Anim Ecol 74: 375–386.
> http://doi.wiley.com/10./j.1365-2656.2005.00940.x, which
> refers to Chao (1987), which is cited in the estimateR
> documentation. EstimateS now uses a small sample correction:
> 
> <http://viceroy.eeb.uconn.edu/EstimateS/EstimateSPages/EstSUsersGuide/EstimateSUsersGuide.htm#Chao1AndChao2>
> Does this account for the discrepancy?
>
> Bob
>
>
> Yep, vegan does not include the small sample correction. The Chao1 
> computation is here:
>
> https://github.com/vegandevs/vegan/blob/master/R/estimateR.default.R#L44
>
> but we can certainly add it if this is useful? (And I suppose it is if 
> EstimateS has included it by default now...)
>
Might as well. it looks like it shouldn't make a huge difference unless 
sample sizes are small.

Bob

> G
>
> -- 
>
> Bob O'Hara
>
> Biodiversity and Climate Research Centre
> Senckenberganlage 25
> D-60325 Frankfurt am Main,
> Germany
>
> Tel: +49 69 7542 1863 
> Mobile: +49 1515 888 5440 
> WWW: http://www.bik-f.de/root/index.php?page_id=219
> Blog: http://blogs.nature.com/boboh
> Journal of Negative Results - EEB: www.jnr-eeb.org
> <http://www.jnr-eeb.org>
>
>
>     _______
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org <mailto:R-sig-ecology@r-project.org>
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>
>
>
>
> -- 
> Gavin Simpson, PhD


-- 
Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 7542 1863
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org


[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] EstimateR and Chao 1 standard deviation

2014-10-17 Thread Bob O';Hara


On 17/10/14 13:54, "José M. Blanco Moreno" wrote:

Dear users

I have been checking the values returned by the function estimateR
against the formulae in the appendix to nonparametric estimators of
species richness in EstimateS
(http://viceroy.eeb.uconn.edu/EstimateS/EstimateSPages/EstSUsersGuide/EstimateSUsersGuide.htm#AppendixB)... 


and they do not match each other.

Where does this line of code in vegan:::estimateR.default come from?

sd.Chao1 <- sqrt(a[2] * ((G^4)/4 + G^3 + (G^2)/2))

It is from any other reference that I should have under control?

Unless Jari added something, the functions come from my paper from about 
10 years ago:
O’Hara RB (2005) Species richness estimators: how many species can dance 
on the head of a pin? J Anim Ecol 74: 375–386. 
http://doi.wiley.com/10./j.1365-2656.2005.00940.x, which refers to 
Chao (1987), which is cited in the estimateR documentation. EstimateS 
now uses a small sample correction: 
<http://viceroy.eeb.uconn.edu/EstimateS/EstimateSPages/EstSUsersGuide/EstimateSUsersGuide.htm#Chao1AndChao2>

Does this account for the discrepancy?

Bob

--

Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 7542 1863
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] Bayesian analysis of canopy cover/percent data

2014-08-27 Thread Bob O';Hara

95, 0.7, 0.5, 0.7, 0.2, 0.95, 1, 1, 0.25,
0.85, 0.5, 0.8, 0.75, 0.85, 0.7, 0.95, 0.05, 0.65, 0.65, 1, 1,
1, 0.65, 0.4, 0.6, 0.9, 0.85, 0.75, 0.5, 0.65, 1, 0.65, 0.55,
0.75, 0.4, 0.9, 0.35, 1, 1, 0.4, 0.5, 0.8, 0.95, 0.95, 0.55,
0.7, 0.85, 0.8, 0.8, 0.65, 1, 0.6, 0.5, 1, 0.8, 1, 0.45, 1, 1,
0.8, 0.85, 1, 1, 1, 1, 0.5, 0.6, 0.15, 0.75, 0.6, 0.1, 0.05,
0, 1, 0.6, 0.1, 0.35, 0.9, 0.9, 0.95, 0.95, 0.9, 0.55, 0.65,
0.9, 0.4, 1, 0.65, 0.5, 0.8), .Dim = c(227L, 2L), .Dimnames = list(
 NULL, c("site2", "cover20p2")))






--
Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 7542 1863
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] Binomial GLMM or GAMM with random intercept and temporal correlation

2014-08-26 Thread Bob O';Hara


On 08/26/2014 07:34 PM, SamiC wrote:

Dear ecology mailing list,

I am trying to model some binomial data (0/1) as a function of sex (0/1) and
DistanceToFeature (continuous km’s) with an interaction between the two.  My
data is nested and I therefore want to include a random intercept for
InidividualID and within that I want to include an AR1 correlation structure
as the data is serially/temporally auto-correlated.  I understand any
correlation structure should be nested within the random effect.

So far I have tried running the model using glmmPQL as

glmmPQL(Y ~ DistanceToFeature * Sex + (1|InidividualID),
correlation=corAR1(form=~1|IndividualID/ContinuousBout), family=’binomial’,
data=’birds’)

(note – ContinuousBout is an ID for where there are time gaps in the data).

However, although this runs, am I right in understanding that I should not
use PQL estimation with binomial data as it gives biased results?  Does
anyone know of a way I can model this?


Bernoulli responses are tricky things, so I'd be a bit worried that you 
might be trying to fit too complex a model. But anyway


If you don't mind going Bayesian, you could try INLA (R-INLA.org).

Alternatively, could you simply include the previous time step's value 
as a covariate?



I understand that this is also the case if I wish to use GAMM (as later I
will be modelling a non-linear explanatory as well)?

Yes - a GAM is just a GLMM with a complicated covariate structure.


Additionally I will also be running a similar set up but where the data are
not equally spaced in time (and therefore an AR1 structure would not apply).
Can anyone give a recommendation of a modelling framework for this also.
That should come from the subject matter - I can think of a couple of 
approaches, but they make different assumptions about how your system is 
behaving.


Bob

--
Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 7542 1863
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] Question on unexpected regression results

2014-07-30 Thread Bob O';Hara

.59
Myokea  33.757.861.353.80   57.77
Myonig  34.551.655.752.77   56.52
Myooxy  40.545.747.644.98   47.09
Myorip  36  53.357.550.82   54.16
Myosim  38  NA  NA  48.23   51.02


Perhaps simple linear regression is not the method to use?
Thanks for any additional suggestions.

Bruce

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



--

Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 7542 1863
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] deviance as a goodness of fit in GLM

2014-07-22 Thread Bob O';Hara

On 21/07/14 19:52, Samantha PameLa wrote:

Good day everybody,

I'm a marine biologist student, working on my bachelor thesis and I'm stucked with a statistical doubt in the process, I hope someone here could help me. My thesis aims to understand which biological and environmental factors influences the male aggressive rate of male California sea lions. For that purpose I'm using GLM’s where the response variable is the male aggressive rate. Right now I am testing the goodness of fit of the global models and for that I'm using the deviances as a goodness of fit test. I calculated pseudoR2 (Zuur, 2009) in order to know the percentage of explanation of each candidate model. However I’m not sure how to choose the “good models”; since I am not sure over which percent of explanation indicates a “model with good fit”. For my data I am working with three different scenarios, and it seems that 20%, is a good value to could indicate the best models, but I’m not sure how to choose the value.

I thank you in advance for your time and the help you can give me.

Best regards,

Just to follow up, AIC isn't a measure of model fit either: it's a
measure of model adequacy, and can be used to compare different models.

For model fit, you can (usually) compare the residual deviance to the
degrees of freedom (they should be approximately equal, and you can use
a chi-squared test, if you feel the need to generate a p-value). This
doesn't work if your response is binomial with a small N (and psuedo R2
doesn't work terribly well for this case either).

A better approach to checking if the model fits is to check to see if
the residuals have any gross patterns in them, by plotting them against
the fitted values and also against covariates.

If you'll excuse my self-promotion, PNAS recently gave me an excuse to
show some residual plots to the general public:
<http://www.theguardian.com/science/grrlscientist/2014/jun/04/hurricane-gender-name-bias-sexism-statistics>
(my explanation for why this is not just gratuitous self-promotion is
that I linked to the R code to generate the plots too, so you have
something to work from).

Bob

Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 7542 1863
Mobile: +49 1515 888 5440
WWW: http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] probability distribution for zero-inflated, right skewed data

2014-06-16 Thread Bob O';Hara

On 16/06/14 13:57, Johannes Björk wrote:

Dear all,

Im looking into how to fit a GLM model (Im using rjags) with data that are heavily right skewed. In addition,
some variables also zero-inflated. The data are species area distribution measured as "total area
(km^2)" which is subsetted into "area in tropical zone" and "area in temperate
zone". The last two variables contain zeros.

I have google zero-inflated models... and most that come up is "zero-inflated negative
binomial" and zero-inflated negative poisson" for count data. I reckon I cannot use
any of these distributions since my variables are not discrete.

Any pointer to which distribution(s) that might fit this kind of data would be
much appreciated.
I think a Tweedie distribution is sometimes used, but that always makes
me think of escaping chickens. Recently this was published, which might
also be useful:
<http://onlinelibrary.wiley.com/doi/10./2041-210X.12122/abstract>.
They used BUGS, so you could ask them if the code is available. Even if
it isn't, it shouldn't be too difficult to code up.

Bob

Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 7542 1863
Mobile: +49 1515 888 5440
WWW: http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] Question about GLM post hoc and chi square (Angel)

2014-06-16 Thread Bob O';Hara


On 16/06/14 08:49, Åström, Jens wrote:

Hi,

Adding to what's already been said:


Just in case you're not aware of this, I think you have a typo in your model. Are you 
looking for 
glm(Count~Sex+Time+Behaviour+Sex*Time+Sex*Behaviour+Time*Behaviour,family=poisson)? By 
the way, I think this could also be written  
glm(Count~Sex+Time+Behaviour+Sex:Time+Sex:Behaviour+Time:Behaviour,family=poisson) or 
simply glm(Count~+Sex*Time+Sex*Behaviour+Time*Behaviour,family=poisson). The 
"*" means both main effects and all possible interactions.
FWIW, glm(Count~(Sex +Time+Behaviour)^2,family=poisson) will also work: 
it expands to the main effects and first order interactions.


(oh, and the advice I've seen is don't use Wald tests: they are 
conservative because they assume the other parameters are fixed at their 
MLEs, so ignore any uncertainty in them. Use likelihood ratio tests instead)


Bob


Also, you should probably look into the issue of overdispersion. Overdispersion 
is very common in ecological count data and basically means that you have more 
variation in your data than the Poisson distribution assumes. This typically 
leads to anti-consevative p-values, i.e. too small p-values, and needs to be 
accounted for.

Read more about it and potential solutions here: http://glmm.wikidot.com/faq

Good luck,

Jens

--

Message: 1
Date: Sat, 14 Jun 2014 18:55:44 -0700 (PDT)
From: Angel 
To: r-sig-ecology@r-project.org
Subject: Re: [R-sig-eco] Question about GLM post hoc and chi square
Message-ID: 
Content-Type: text/plain

You are able to obtain Chi squared values by using a wald chi square post hoc 
test. To do this you can use the aod package, function wald.test. This function 
is perfect for generalised linear models using poisson distribution.


Also, as long as you have got
interaction terms in your results from GLM, you could get the wald chi square 
(and an associated p-value) for these terms, hence giving you the  table which 
you are after.

If this response is not clear enough, I can post some example (I am not sure of 
the etiquette)

AA.

Date: Fri, 13 Jun 2014 18:07:27 -0700
From: ml-node+s471788n757894...@n2.nabble.com
To: alexander.angelo...@hotmail.com
Subject: Question about GLM post hoc and chi square



Dear all,


I am making an analysis using a GLM using three explanatory variables and a

response variable. I need to obtain a table similar to this one,

http://postimg.org/image/5sau79wlt/r

  nevertheless, I have not been able to do it. I am having a hard time

specially getting the chi square values. I would like to know how to obatin

them.


I also would like to know what function could help me to make ad hoc

comparisons for single variables and interactions.


If any of you knows how to do both estimations, I would really appreciate

it.


All the best!!!


This is my script

a=read.table("ricis3.txt",header=T)

attach(a)

model7=glm(Count~Sex+Time+Behaviour+Sex*Time+Sex*Behaviour+Time+Behaviour*Sex,family=poisson)

summary(model7)

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



--

Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 7542 1863
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] Question about GLM post hoc and chi square

2014-06-14 Thread Bob O';Hara

On 06/14/2014 03:05 AM, Luis Fernando García wrote:
> Dear all,
>
> I am making an analysis using a GLM using three explanatory variables and a
> response variable. I need to obtain a table similar to this one,
> http://postimg.org/image/5sau79wlt/r
>
>   nevertheless, I have not been able to do it. I am having a hard time
> specially getting the chi square values. I would like to know how to obatin
> them.
Use anova(). The deviance follows a chi-squared distribution (usually - 
if you have overdispersion it gets a bit more complicated).
> I also would like to know what function could help me to make ad hoc
> comparisons for single variables and interactions.
These comparisons are called contrasts. There is a contrasts() function 
in R, and also a contrast package (which, I'm guessing will be of more 
use). Googling "R contrast" might help too - there seems to be plenty of 
material, so hopefully one or two results will be exactly what you want. 
Contrasts can get esoteric, so if you can find some a page with code 
that gives you the comparisons you want, that should help a lot.

Good luck!

Bob

> If any of you knows how to do both estimations, I would really appreciate
> it.
>
> All the best!!!
>
> This is my script
> a=read.table("ricis3.txt",header=T)
> attach(a)
> model7=glm(Count~Sex+Time+Behaviour+Sex*Time+Sex*Behaviour+Time+Behaviour*Sex,family=poisson)
> summary(model7)
>
>
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


-- 
Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 7542 1863
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org


[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] mantel within mantel? multidimensional variance

2014-06-08 Thread Bob O';Hara

Do you need to run Mantel tests at all? I don't see that you have 
measured any distances, as such, so I think you can do at leat as well 
by using standard [genrealized] linear [mixed] models.


I'm not sure I understand everything, but it sounds like you run an 
experiment with 2 treatments, and N species, and you measure T traits. 
So, for each trait you could have a model


y_t ~ (Treat1 + Treat2)*Species
(you could run this separately for each species, if that makes things 
easier)


and size of the Treat1:Species effect tells you how the species responds 
to the treatment (if you want a measure over all traits, a PCA on the 
coefficients should suffice).


It sounds like you are then asking how these reactions, i.e. the 
Treat1:Species effects are related to traits which are constant within a 
species, i.e. a Treat1:Trait effect.


So, for each y_t you could run a model

y_t ~ (Treat1 + Treat2)*(Trait+Species)
(or even make Species a random effect)

So, now I think you just want the Treat1:Trait effects. if the Trait's 
are discrete classes, then you can (again) run a PCA on the effects, to 
visualise the effects.


If you want to analyse all of the y_t's together, it gets a bit more 
complicated, but in principle it's the same, except you have a 
MANOVA-like structure. I think you could use a Seemingly Unrelated 
Regression approach.


Bob

On 06/08/2014 05:56 PM, Mgr. Martin Weiser wrote:

Dear friends,

I need to quantify variance within variance, and what is worse, say if
it is non-random.
This is the setup: in the experiment, there were 2 (partly correlated)
treatments, each of six levels (but theoretically of infinite levels,
so I treated them as continuous).
Different species responded to them, and we measured various traits
("endotraits").

We used R2 from the per-species redundancy analyses (RDA) as a
reaction norm: higher R2, more the species responds to the treatment
(so traits were used as "species" in the community ecology jargon). We
have twice as much RDAs as there were species (because of 2
treatments).

Next, we correlated these R2 (reaction norms) with some other traits
per species ("exotraits"). As there were 2 correlated treatments (lets
say irrigation and fertilisation), and we sed the same reaction norms
for correlation with different exotraits, the correlations were
obviously non independent. To overcome this, we run Mantel test
(matrix1= species x reaction norms to treatment, matrix2=species x
exotraits).

Next, we are interested in "endotraits" x "exotraits" correlation. For
this, we used endotrait scores from the per-species RDAs on the
treatment axis (constraining variable). Correlations are
non-independent again, so we wanted an overall test. And here comes
the problem: results of Mantel tests (matrix1=species x endotrait
scores, matrix2=species x exotraits) are suprisingly weak. Some single
correlations of endotrait scores x exotraits seem to be pretty afar
from random, but the overall mantel...

I think this is because single endotrait can not show better
correlation than the overall reaction norm, which is based on them, so
something like (pval of this mantel)*(1-pval of the correlation of
reaction norm with exotraits) may be desirable for the overall test,
but I simply do not know.

Any advice?

Best,
Martin Weiser




--
Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 7542 1863
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] Help with a function

2014-06-06 Thread Bob O';Hara

On 06/06/14 15:39, Rodrigues wrote:
> Dear R users,
>
>
> Iâm trying to build a function to select random samples idfâs from a
> database.
> So, my data frame had 2 columns and 575 rows. Follow bellow an example of my
> database
> Idf1  casod
> 121
> 141
> 151
> 161
> 173
> 183
> 193
> 213
> 251
> 241
> 261
> 281
> 293
> 323
> 333
> 351
> 363
> 371
> 483
>
> So my function is
>
> blinding=function(sample){
>sort=sample(idf1,10,replace=F)
>return(sort2)
> }
Does this work? It looks like you're passing a function to the function! 
Also, you're not returning something that has been calculated. In 
practice, you might as well just use sample(idf1,10,replace=F).

> It is pretty simple and I would like to add one more step in my choice. I
> would like to link my choice to casod stats. Thus if casod==3 sample would
> be random idfs could not be an idf with casod=1. Does someone can help me?
If I understand you, you only want to sample within casod==3. So this 
will work:

sample(idf1[casod==3],10,replace=F)

Or, if you need a wrapper function, this should work:

blinding=function(which.casod, df, N=10) 
sample(df$idf1[df$casod==which.casod], N, replace=F)

e.g. SampledData=sapply(unique(Data$casod), blinding, df=Data)
will sample for each value of casod

Bob

-- 

Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 7542 1863
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org


[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] inference from GLMM fit using glmer

2014-03-03 Thread Bob O';Hara

ect of the bfi.  These 
> results seem to agree with the summary of the more general model, but they 
> also seem a bit extreme given our small sample size.
>
> Model comparisons using PBmodcomp from the pbkrtest package also seem to 
> support the inclusion of both fixed effects.
>> mc.oc.bfi.nobfi = PBmodcomp(oc.sr.bfi, oc.sr.null, nsim = 100)
>> mc.oc.yr.noyr = PBmodcomp(oc.sr.bfi.yr, oc.sr.bfi, nsim = 100)
>> mc.oc.bfi.nobfi
> Parametric bootstrap test; time: 26.13 sec; samples: 100 extremes: 0;
> large : oc ~ bfi + (1 | dist)
> small : oc ~ 1 + (1 | dist)
>   stat df   p.value
> LRT513.43  1 < 2.2e-16 ***
> PBtest 513.43 0.009901 **
> ---
>
>> mc.oc.yr.noyr
> Parametric bootstrap test; time: 39.83 sec; samples: 100 extremes: 0;
> large : oc ~ bfi + lyr + (1 | dist)
> small : oc ~ bfi + (1 | dist)
>   stat df   p.value
> LRT98.323  1 < 2.2e-16 ***
> PBtest 98.323 0.009901 **
> ---
>
> We calculated bootstrap confidence intervals using "confint".  They seem to 
> indicate that the estimated effects were unambiguous, but we were unsure 
> whether our sample size was adequate to apply this method.
>
>> confint (oc.sr.bfi.yr, method = c("boot"))
> Computing bootstrap confidence intervals ...
>2.5 %  97.5 %
> sd_(Intercept)|dist  0.13871500  0.83969172
> (Intercept)  6.54440740  7.56234905
> bfi -0.53389761 -0.44979598
> lyr  0.02835891  0.04325712
>
> Simple diagnostic plots (residuals vs fitted values and a normal QQ plot of 
> residuals) of the GLMM with effects of both bfi and lyr did not reveal 
> violations of assumptions.  We would be happy to report significant effects 
> of food availability and a significant trend across years, but are concerned 
> that we may not be interpreting these results correctly.  Any help or 
> suggestions would be greatly appreciated.
>
> Sincerely,
> Eric
>
>
>   [[alternative HTML version deleted]]
>
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


-- 

Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 7542 1863
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org


[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] Error distribution for fractional response

2014-01-30 Thread Bob O';Hara


On 30/01/14 10:58, Adhara Pardo wrote:

Dear R users,

I would like to fit a GLM to some plant regeneration data (see bottom
of this e-mail). The dependent variable, an index of regeneration, was
obtained by diviving the number of saplings by the number of adults
plants present in each plot. The result is a highly skewed variable and
thus, specifying, for instance, a
Gaussian distribution does not seem to be appropriate. Data
transformation does not help either. Do you have any suggestion on the
best distribution to choose?
Rather than use an index, it might be better to use the number of 
saplings directly, and assume they are Poisson distributed (or some form 
of over-dispersed Poisson). You can use the log of the number of adults 
as an offset:


glm(saplings ~ something + offset(log(adults))

The model is

saplings ~ Poisson(lambda)
log(lambda) = alpha + beta*something + log(adults)

where alpha and beta are the parameters being estimated (lambda is the 
expected number of saplings). This model is the same as


lambda = adult*exp(alpha + beta*something)

so it's equivalent to modelling saplings/adults: the adults have just 
been moved to the other side of the equation.


Bob

--

Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 7542 1863
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] Bayesian analysis with MCMC

2014-01-18 Thread Bob O';Hara


On 01/18/2014 04:39 AM, 明 wrote:

Dear all
  I have a model with four paramters, I want to estimate the parameter 
uncertainty, so Bayesian analysis with MCMC method is applied.
  But every sigle mcmc chain seems give quite different parameter marginal 
distributions.
  In order to get the true parameter marginal distributions, I do like this:
  (1) I take 100 MCMC chain, and each chain has 1 iterations.
  (2) caculate the different parameter marginal distributions according to 
the frequence of paramter in step 1 sampling.
  The result seems reasonable. but is it right?
  Looking forward for your reply, Thanks in advance
   
Han Ming

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Have you checked for convergence? If the chains are giving different 
marginal distributions, this suggests they haven't converged, so they 
are not sampling from the correct posterior distribution (or, at least, 
some chains aren't).


Bob

--
Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 7542 1863
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] This chains contains uninitialized variables and the trap report says " undefined real result"

2013-11-28 Thread Bob O';Hara

On 28/11/13 15:09, kassahun Alemu wrote:
> Dear all,
>
>   
>
> When I do analysis, I have troubled with frequent pop up message:
>
> This chains contains uninitialized variables and the trap report says "
> undefined real result"
>
>   
>
> I am looking for your courage and help to continue my further analysis.
My guess is that some of your mu's are getting too large, so exp(mu[]) 
becomes infinite. In your inits you're giving the precisions values of 
dgamma(1,5,10)=0.1891664 (I think you meant to use rgamma()), so the 
variance is about 5. With alpha and all of the betas being positive, I 
can easily see exp(mu) exploding.

I'd recommend you use BUGS to do the debugging, because it's easier to 
look at the problems (e.g. here I'd use save state to see what parameter 
estimates you have: I'm guessing some mu's will be large).

In models like this I usually set the inits for my beta's to be close to 
zero, so they don't explode the mu's unexpectedly. You can always relax 
the inits later, once it's working.

Bob

>   
>
> model{
>
> for (j in 1:2160) {
>
> y[j] ~ dpois(mu[j])
>
> log(mu[j]) <- log(e[districts[j]]) + alpha + beta1*x1[j] + beta2*x2[j] +
> beta3*x3[j] + beta4* x4[j] +
>
>   beta5*x5[j] + beta6*x6[j] + beta7*x7[j] + beta8*x8[j] +
> phi[districts[j]]+ delta[time[j]] + omega[time[j]]
>
> rho[j]<-exp(alpha + beta1*x1[j] + beta2*x2[j] + beta3*x3[j] + beta4* x4[j] +
>
>
> beta5*x5[j] + beta6*x6[j] + beta7*x7[j] + beta8*x8[j] +
> phi[districts[j]] + delta[time[j]] + omega[time[j]])
> # RR
>
> }
>
>   
>
> for (i in 1:18) {
>
> e[i] ~ dnorm(0,sigma.e)
>
> phi[i] ~ dnorm(0,tau.phi)
>
> }
>
>   
>
> delta[1]<-0
>
> omega[1] ~ dnorm(0,tau.omega1)
> # Adjusted RR in month 1 over all districts/years
>
> for (t in 2:120){
>
> delta[t] ~ dnorm(0,tau.delta)
>
> omega[t] ~ dnorm(omega[t-1],tau.omega)
>
> }
>
>   
>
> #PRIORS
>
> tau.e ~ dgamma(0.001,0.001)
>
> alpha ~ dflat()
>
> beta1 ~ dgamma(1,1)
>
> beta2 ~ dgamma(1,1)
>
> beta3 ~ dgamma(1,1)
>
> beta4 ~ dgamma(1,1)
>
> beta5 ~ dgamma(1,1)
>
> beta6 ~ dgamma(1,1)
>
> beta7 ~ dgamma(1,1)
>
> beta8 ~ dgamma(1,1)
>
> tau.phi ~ dgamma(1,1)
>
> tau.delta ~ dgamma(1,1)
>
> tau.omega1 ~ dgamma(1,1)
>
> tau.omega ~ dgamma(0.001,  0.001)
>
> sigma.e <- 1/sqrt(tau.e)
>
> sigma.phi <- sqrt(1/tau.phi)
>
> sigma.omega1 <- sqrt(1/tau.omega1)
>
> sigma.delta <- sqrt(1/tau.delta)
>
> sigma.omega <- sqrt(1/tau.omega)
>
> }
>
>   
>
> #R script to call R2WinBUGS and plot density
>
>   
>
> # Define the dataset
>
>   
>
> library("R2WinBUGS")
>
> library(help="BRugs")
>
> library(help="R2WinBUGS")
>
> data1<-read.csv(file.choose("Data1.csv"), header=TRUE, sep=(","))
>
>   
>
> names(data1)
>
> districslist<-as.data.frame(unique(data1$districts))
>
> data2<-as.data.frame(cbind((1:18),districslist))
>
> names(data2)<-c("D_ID","districts")
>
> data3<-merge(data1, data2,by="districts")
>
> data3<-as.data.frame(data3)
>
> names(data3)
>
>   
>
> data6<-NULL
>
> for (i in 1:18){
>
> data4<-data3[data3$D_ID==i,]
>
> data5<-cbind(data4,seq(1,length(data4[,1]),by=1))
>
> data6<-rbind(data6,data5)
>
> }
>
> districts<-data6[,13]
>
> time<- data6[,14]
>
> y<- data1$y
>
> x1<- data1$x1
>
> x2<- data1$x2
>
> x3<- data1$x3
>
> x4<- data1$x4
>
> x5<- data1$x5
>
> x6<- data1$x6
>
> x7<- data1$x7
>
> x8<- data1$x8
>
> data <- list(districts=districts,time=time,
> y=y,x1=x1,x2=x2,x3=x3,x4=x4,x5=x5,x6=x6,x7=x7,x8=x8)
>
>   
>
> # Define inits
>
>   
>
> inits <- function(){
>
> list(tau.e=dgamma(1,5, 10), alpha = dnorm(1,1, 1), beta1 = dgamma(1,5,10),
> beta2 = dgamma(1,5,10),
>
> beta3 = dgamma(1,5,10), beta4 = dgamma(1,5,10), beta5 = dgamma(1,5,10),
>
> beta6 = dgamma(1,5,10), beta7 = dgamma(1,5,10), beta8 = dgamma(1,5,10),
>
> tau.phi=dgamma(1,5,10), tau.delta=dgamma(1,5,10),
>
> tau.omega=dgamma(1,5,10), tau.omega1=dgamma(1,5,10))
>
> }
>
>   
>
> # run the MCMC analysis
>
>   
>
> model_G<- bugs(data, inits, model.file = "C:/model_G.txt",
>
> parameters = c("tau.e", "alpha", "beta1", "beta2", "beta3", "beta4",
> "beta5", "beta6

Re: [R-sig-eco] Two-sided significant test for null models

2013-07-23 Thread Bob O';Hara


On 07/23/2013 12:06 PM, Cayetano Gutiérrez Cánovas wrote:

Dear all,

I'd like to test the two-side significance of an observed value being
lower or higher in comparison to a null distribution, but I don't know
how to build this in R code. Can anybody help me with some code or any R
function?

Let's say  'null.dist' the null distribution, 'obs' the observed value
and 'alpha' the probability of Type I error.
set.seed(10)
rnorm(999)->null.dist
obs<-1.5
alpha<-0.05

pval=mean(obs>null.dist)
min(pval, 1-pval)<0.05

Bob

--

Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 7542 1863
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] interpretation of interaction between explanatory variables

2013-05-10 Thread Bob O';Hara

It's difficult to give definitive advice about what it means, but the 
simplest approach is to look at what it means: plot the model and see 
what it looks like. Basically, the interaction says that as the 
variables both increase, the relationship gets less negative. The 
details depend on the ranges of the variables, hence plotting the 
results is helpful.

I would write some code, but that would probably mainly show that I'm 
out of date with R. But you could use expand.grid() to create new data 
to predict, use predict() to predict it, and then plot it as several 
lines on one plot (e.g. plot against C.abundances, with one line for 
each value of C.diversity).

Bob

On 05/08/2013 02:56 PM, "Iris Kröger" wrote:
> Dear list members,
>
> I want to analyse the impact of a competitor community (i.e. "community 
> abundances" on the one hand and "community species diversity" on the other 
> hand) on mosquito larval populations of species A and B. Each variable on its 
> own has a negative impact on mosquitoes - but when both variables are 
> interacting, there is a positive impact... How can I interpret that? For 
> mosquito A only C.diversity has a significant impact - but the interaction 
> between C.abundances and C.diversity is significant? What does that mean?
>
> I used the model:
> lm (mosquito ~ C.abundances * C.diversity)
> output Mosquito A:
>   Estimate Std. Error t value Pr(>|t|)
> (Intercept) -0.2120 0.1159 -1.829 0.074855 .
> C.abundances -0.1277 0.1616 -0.790 0.434067
> C.diversity -0.5787 0.1385 -4.178 0.000155 ***
> C.abundances:C.diversity 0.4096 0.1712 2.393 0.021476 *
>
> Output Mosquito B:
>   Estimate Std. Error t value Pr(>|t|)
> (Intercept) -0.2900 0.1220 -2.377 0.02233 *
> C.abundances -0.3856 0.1701 -2.266 0.02891 *
> C.diversity -0.3470 0.1458 -2.381 0.02213 *
> C.abundances:C.diversity 0.5367 0.1801 2.980 0.00489 **
>
> Thanks a lot for your help!
> Iris
>
>   [[alternative HTML version deleted]]
>
> ___________
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


-- 

Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 798 40226
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org


[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] GLM

2013-03-07 Thread Bob O';Hara


On 03/07/2013 04:24 PM, Mahnaz Rabbaniha wrote:

Dear all

I want to find regression between fish larva abundance and some
abiotic factor ,i used this code:

glm(formula = mychto ~ po4 + No3 + Si + Tn)


result:
Deviance Residuals:
 Min   1Q   Median   3Q  Max
-26.586  -18.262  -12.296   -2.949  226.229

Coefficients:
 Estimate Std. Error t value Pr(>|t|)
(Intercept)  67.421173.9781   0.9110.371
po4  -0.2887 1.6037  -0.1800.859
No3   0.9151 4.5261   0.2020.841
Si   -0.1145 0.4850  -0.2360.815
Tn   -1.1568 4.4818  -0.2580.798

(Dispersion parameter for gaussian family taken to be 2444.917)

 Null deviance: 63156  on 29  degrees of freedom
Residual deviance: 61123  on 25  degrees of freedom
AIC: 325.72


my question is about the acceptable this AIC, or this result with
goodness of fit?

thanks
AIC tells you nothing about goodness of fit, it is used to compare 
different models.


A better way to assess goodness of fit is to look at the data and the 
model. It's clear that you have very skewed data (look at the distances 
between the quartiles of the residuals), so a Gaussian model, which 
assumes the residuals are symmetric, is inappropriate. It might make 
more sense to use a Poisson distribution, although you will probably 
need to account for over-dispersion (there are a few ways of doing 
this). Also, once you've fitted a glm, you can use plot() to get some 
useful model-checking plots (e.g. residuals).


There is more advice in the links here: 
<http://r.789695.n4.nabble.com/Checking-the-assumptions-for-a-proper-GLM-model-tp1559502p1560503.html>.


Bob

--
Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 7542 1863 /  +49 69 798 40226
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] patterns in weather data that could relate to pathogen prevalence

2012-11-30 Thread Bob O';Hara


On 11/30/2012 01:09 PM, Anto Raja wrote:

Hi all

I am searching for a tool that would help me to identify weather patterns
that influence the prevalence of a pathogen, 'Pn'.

Say, I have annual prevalence data (collected in April) and I know that the
prevalence of 'Pn' is affected by the weather conditions since November. I
also have daily data for different weeather variables.
You have a lot of weather data, but I assume you don't have so much 
prevalence data. So it's going to be difficult, whatever you do...

The objective is to identify the relationship between the weather from
Nov-Mar and the prevalence of Pn. We know that weather has an influence on
Pn. The question is to find out weather from which period is relevant or
what kind of weather is relevant. It could be that the first two winter
months (Nov-Dec) is the decisive factor or that a certain weather situation
(like 20 consecutive days of below zero conditions) occuring at any time is
important or a combination of both.

I have tried correlations between prevalence and monthly means for Mar,
Feb-Mar, Jan-Mar and so on and nothing definite turned up. I could also do
it on a weekly basis manually. But I wonder if there is a tool that uses a
moving window of different sizes (say, from a min size of 1 week to a max
of 4 months) and checks correlations for each of these periods.

I am thinking of ARMA, but my present intention is not to forecast but only
to study. Can it still be used? Or ARMA in combination with multivariate
analysis to study the relative importance of each weather variable.
I don't see why an ARMA model would help you, as that assumes a 
covariance between times (i.e. autocorrelation) in the response (i.e. 
prevalence). There are methods for assuming that the response has an 
autocorrelation, but I don't think that's your big problem. My reaction 
(without seeing the data, of course) is that you might be asking too 
much of your data to get anything meaningful out of it.



Any suugestions are welcome. I have used R for basic stats analysis but
never worked with time-series data or the advanced tools of data mining.
So, it could also be possible I am not thinking along the right lines. Feel
free to correct if I am looking in the wrong place.
It sounds like you're trying to mine your data for any pattern. To be 
honest, if you do that, I wouldn't trust the results unless you can 
validate them independently: you'll find some relationship if you try 
enough models, but will it make biological sense? This is particularly 
problematic when you have correlated variables, which you will do 
(especially when you start sliding windows around)


I'd suggest you start by using what's known of the pathogen or its host, 
or of similar host-pathogen systems, to develop a smaller number of 
hypotheses about what sort of effects are likely. Plant ecologists often 
use GDD5 (Growing Degree Days above 5°C), which might be a useful way of 
reducing the temperature data to something smaller. Of course, another 
temperature than 5°C might work better for you.


Bob

--
Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 7542 1863 /  +49 69 798 40226
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] Minta's (1992) tests of temporal interactions

2012-09-20 Thread Bob O';Hara


On 09/19/2012 09:03 PM, Howe, Eric (MNR) wrote:

Hello,

  


Is anyone aware of an R package for performing the tests described by
Minta (1992) Tests of spatial and temporal interaction among animals.
Ecological Applications 2:178 - 188?
After a quick look at the paper, it's a fairly standard contingency 
table analysis. I would suggest using a GLM. I think the expected areas 
can be fed into the model as offsets, but I haven't read the paper 
closely enough to be sure.


Bob

--
Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 798 40216
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] Spatial AND temporal correlation in linear mixed-effects models

2012-07-26 Thread Bob O';Hara


On 07/26/2012 10:45 AM, Tscheulin Thomas wrote:

Hi,

I have a question regarding the nesting structure in linear mixed models of 
data, which is spatially and at the same time temporally correlated.

So far I have tried to get around the problem by averaging out the temporal 
component but I would really like to keep everything in the model.

Here is the experimental set-up:
We have measured insect abundance in two different islands, each having 5 
sites, each site having 4 plots, in which each we have 5 collection points 
(traps). The sampling was repeated 5 times. Two times in 2011 and 3 times in 
2012.

After averaging the temporal correlation out I composed the following model for 
the abundance:

model<-lme(abundance~continous_explanatory_variable,random=~1|island/site/plot/trap,method="REML")

This works fine but I have problems making a model that allows the temporal 
component to stay in.
How can I do this using the function lme?

I know with lmer I could do this:

model2<-lmer(abundance~continous_variable+(1|island/site/distance/triplet)+(1|year/round))
but I really want to use the function lme. How can I insert multiple levels of 
grouping in lme?

Any help is very much appreciated!
Best,

A bit of googling brought up this, from 10 years ago:
<http://tolstoy.newcastle.edu.au/R/help/02b/2068.html>
I don't think lme has changed that much.

Bob

--
Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 798 40216
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] Using residuals as dependent variables

2012-06-22 Thread Bob O';Hara

On 06/21/2012 07:06 PM, Chris Mcowen wrote:
> Dear List,
>
>
>
> I am wondering if the methodological approach I am taking is correct and
> would be very grateful if people could comment and make suggestions, much
> appreciated.
The simple answer is no it isn't:
<http://gking.harvard.edu/files/mist.pdf>
>
>
> I have developed the best model ( AIC model selection) using oceanographic
> data ( i.e. SST, chlorophyll, NPP...x6) to predict reported fisheries catch
> for 52 countries.
>
>
>
> I then extract the residuals from the model and anything positive has a
> higher catch than would be predicted given the level of productivity in the
> country, with the opposite being true.
>
>
>
> What I want to do is:
>
>
>
> 1.   Regress a suite of ecological and socioeconomic variables against
> the residuals from the oceanographic model to determine which factors cause
> some countries to be above and some below. I.E as trophic level increase the
> residuals become increasingly negative.
You could simply put everything into one big model.
> 2.   Ideally ( and I am unsure how or if it is possible) work out for
> each country which variables/s cause the poor fit of that country to the
> oceanographic model.

Try using partial residuals 
(http://en.wikipedia.org/wiki/Partial_residual_plot).

Bob

-- 

Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 798 40226
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org


[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] Multiple comparisons among predictors generated from same data

2012-05-25 Thread Bob O';Hara


On 05/25/2012 10:18 AM, Gavin Simpson wrote:

On Thu, 2012-05-24 at 15:00 -0700, J Straka wrote:

Hello,

I'm planning on using a regression model to describe seed set of plants (my
response) using some sort of predictor based on temperature.  I have a
number of temperature variables calculated from the same set of data
(hourly temperatures for the growing season, converted to variables such as
average temperature, maximum temperature, minimum temperature, degree-days
above zero Celsius, degree days above ten Celsius, etc...), and I want to
decide which one should be included in my model. I know that I would
ideally select one based on "prior knowledge" of the system (e.g. so-called
"planned comparisons" or choosing a temperature threshold that is known to
be important for the development of seeds), but not much is known about
this system.

What is the model for? Understanding so you want to interpret the
coefficients directly as something meaningful or for prediction?

If the latter I would say it doesn't really matter; choose the model
that gives the best out-of-sample predictions (lowest error etc), or
average predictions over a set of best/good models. Simply choosing the
best model via some sort of selection procedure may result in a model
with high variance (change the data a bit and different variables would
be selected). If so, consider a regression method that applies shrinkage
to the coefficients such as the lasso or the elastic net; this will lead
to a small bit of bias in the estimates of the coefficients but should
reduce the variance of the final model because you are considering the
selection of variables as part of the model itself.

If you want to interpret the model coefficients as something real then
you have to be very careful doing any form of selection; the stepwise
procedures and best subsets all can potentially lead to strong bias in
the model coefficients. Be removing a variable from the model in effect
you are saying that the sample estimate of the effect of that variable
on the response is 0, not some small (statistically insignificant)
value.

This is a very tricky thing to get right and I'm not sure I know the
right answer (or even if there is one!?).
An additional complication here is that the variables are going to be 
correlated, so a model with all or most in it could be unstable. If a 
single temperature variable is enough, then I'd suggest either trying 
your best to pick one, or use what everyone else uses (GDD5?), so the 
study can be comparable.


Once you have a model, it might be worth checking to see if the other 
variables tell a different story. If it's the same story but with 
different p-values, you might as well stick to the original analysis.


Bob

--

Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 798 40226
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] PCA as a predictive model

2012-05-23 Thread Bob O';Hara


On 05/23/2012 10:55 AM, Marc Taylor wrote:

Hi Jari - one more question if you don't mind. Since the weights of the PCs
are related to the the amount of variance that they explain in the original
data - is it problematic to predict the PC scores with a second data set
that has a different amount of variance (e.g. due to differing number of
samples)? In both the 1st and 2nd data sets I have been using scaled values
for the variables (mean=0 and sd=1 for each sample).
Cheers,
Marc

I'll pretend to be Jari for a moment. :-)

PCA just scales and rotates the data in cunning ways, so with the new 
data you need to scale and rotate it in the same way. If you scale the 
values first then you've already changed the scaling.


What you need to do is either do PCA on the raw data or scale the new 
data using the mean and varianes of the old data.


library(MASS)

NVar=5; NObs=50
Sigma=matrix(c(
 10,0.2,   0, 0,0.4,
0.2,   5,0.1, 0,0.6,
   0,0.1,1.0, 0.2,0,
   0,   0,0.2, 5, 0,
0.4,0.6,  0, 0,1), nrow=5)

# simulate data
Data=mvrnorm(NObs, rnorm(NVar), Sigma=Sigma)
# Do PCA on scaled data
Data.Sc=scale(Data)
PC=princomp(Data.Sc)

# Simulate new data
NewData=mvrnorm(10, rnorm(NVar), Sigma=Sigma)
# Do PCA on new data. First do it wrong...
PC.wrong=predict(PC, newdata=scale(NewData))

# Now scale correctly

NewData.Sc=scale(NewData, center=attr(Data.Sc, "scaled:center"), 
scale=attr(Data.Sc, "scaled:scale")

PC.right=predict(PC, newdata=NewData.Sc)

HTH

Bob

--

Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 798 40226
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] gam

2012-05-17 Thread Bob O';Hara


On 05/17/2012 06:17 PM, Mahnaz Rabbaniha wrote:

Dear all

i have done GAM for data ( they are non- normal and i find regression
between Sillaginidae with several hydrological factors),i done :

[1] "depth""temperature"  "salinity" "Sillaginidae"

  >  pairs(sc,panel=function(x,y){points(x,y);lines(lowess(x,y))})


model<-gam(Sillaginidae~s(temperature)+s(salinity)+s(depth))

but i take this message:

Error in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) :
   A term has fewer unique covariate combinations than specified
maximum degrees of freedom


please help me what can i do? or what is meaning this sentence?
It means that you don't have a lot of different values for one of your 
covariates. You can check which with


length(unique(depth))
length(unique(temperature))
length(unique(salinity))

Unless you have a lot of data, that the sensible thing to is probably to 
fit a straight line rather than a smooth curve. The other thing to do is 
to define the maximum flexibility of the curve, e.g.


model<-gam(Sillaginidae~s(temperature, k=6)+s(salinity)+s(depth))

I hope this helps.

Bob

--
Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 798 40226
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] data normality

2012-05-06 Thread Bob O';Hara


On 05/06/2012 10:54 AM, Yong Shen wrote:

Dear all,
   I have two questions about data normality.
   I used stepwise multiple regression to determine which variables contributed 
to tree growth, and want to built a model to explain tree growth. Sample size 
is about 50 tree species, I think it is not a large sample size, and some 
variables are not normal distribution.
1. Do I have to transform them to normal distributions before I perform 
multiple regression?
No. The only area where a Normal assumption comes in is that the 
residuals are normally distributed. So you can happily fit the model 
without worrying about normality until after you've got the model.

2. Two variables can not transform to normal distributions although I used some 
methods (e.g log, sqrt, boxcoxfit), what should I do for the two variables?


Leave them as they are.

Advice that makes life simpler - always the best sort.

Bob

--
Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 798 40226
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] Which function would fit these data (Version 2)

2012-04-12 Thread Bob O';Hara


On 04/13/2012 06:16 AM, Beutel, Terry S wrote:

Hi list members

Sending this again as symbols in the previous version did not translate.

Apologies if this is not too specifically R related, but I am looking fit a 
model to some simulated data. X is distance to sample point, y is binary 
outcome (present/absent). I was hoping someone can suggest a (presumably) non 
linear function that might meet the following criteria

A.   0<= y<=1 (actual responses are binary)
B.   xmax = infinity
C.   xmin<= 0
D.   Where x = 0 then y>= 0
E.   Where x equals Infinity then y equals 1
F.   As x approaches Infinity from xmin, the slope of the function declines 
monotonically toward 0.
my initial reaction was to think "logistic regression", but  my 
experience with questions like this is that we'll save a lot of time if 
you explain the context.


BTW, I'm not sure why you want "the function" to decrease to zero when x 
approaches infinity when at the same time y should equal 1 at infinity. 
It's not a big problem, though. Just use 1-y instead of y.


Bob


--
Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 798 40226
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] ICC confidence intervals and power analysis for random effects in lmer?

2012-04-12 Thread Bob O';Hara

On 04/13/2012 04:17 AM, Bradley Carlson wrote:

Thanks for the tips everyone - I'll look into MCMC sampling for the CI. As
far as the power analysis goes, I'm somewhat familiar with the criticisms
regarding power analysis. I think this reviewer was curious about it
because there was a small sample size (small number of levels of the random
effect) and the ICC point estimate was not so low as to be biologically
insignificant. It would be nice to state in the paper how much larger of a
sample size would have enabled us to detect an effect given the observed
variation, as though this were a pilot study for planning a bigger
experiment. I'll certainly bring up the suggested citations, but I would
still be interested to know if there is a method available for performing a
power analysis for an LRT of a random effect. Thanks again,
I'd use simulation: you can write a function to simulate data and fit
the model to it, so you can run that for different sample sizes. It's
not too fiddly to write, and you can leave the computer to run it over
the weekend, so you can run reasonable sample sizes.

Bob

Brad

On Thu, Apr 12, 2012 at 12:18 PM, Brian Inouye wrote:

In addition to Bob O'Hara's suggestion, here is another citation you can
give the reviewer/editor, as to why retrospective power analyses are a
waste of time.

Hoenig, J. M. and D. M. Heisey (2001). "The abuse of power: the pervasive
fallacy of power calculations for data analysis." American Statistician
55(1): 19 - 24.

Last year the ESA updated its author guidelines for reporting statistics,
and removed a suggestion to report power analyses that had been inserted in
the 1980s.

-Brian Inouye
Florida State University
Chair, statistical ecology section of the ESA

On 4/12/2012 6:00 AM,
r-sig-ecology-request@r-**project.orgwrote:

2) A reviewer requested a power analysis of the ability to detect a
significant random effect. Any tips on how to approach that?

Report the random effect and confidence intervals. Retrospective power
analyses are pretty pointless (e.g. see http://beheco.oxfordjournals.**
org/content/14/3/446.full<http://beheco.oxfordjournals.org/content/14/3/446.full>),
unless you're planning to repeat the experiment. Bob

__**_
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/**listinfo/r-sig-ecology<https://stat.ethz.ch/mailman/listinfo/r-sig-ecology>

--
Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 798 40226
Mobile: +49 1515 888 5440
WWW: http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] ICC confidence intervals and power analysis for random effects in lmer?

2012-04-11 Thread Bob O';Hara


On 04/12/2012 03:50 AM, Bradley Carlson wrote:

I'm performing an analysis of behavioral variation among individual
tadpoles, using individual ID as a random effect and time as a continuous
fixed covariate in the lmer function in lmer4 package. I'm really
interested in making inferences about the random effect (i.e. the extent of
variation among individuals). I'd like to do two things that I can't seem
to find straightforward answers about and I'm hoping someone can help or
point me to a good resource.

1) The intraclass correlation coefficient is of particular interest to me,
as it describes the proportion of variation that occurs among individuals.
Ideally I'd like to report a confidence interval of the ICC but I can't
find any way to calculate one, other than a function in the psychometric
package that appears to only work when there are no covariates in the model
(random effect only).
MCMC has already been mentioned and lme4 still has its mcmcsamp() 
function. Failing that, you could try a parametric bootstrap, which 
requires a little bit of coding but simulate() makes it much easier.

2) A reviewer requested a power analysis of the ability to detect a
significant random effect. Any tips on how to approach that?
Report the random effect and confidence intervals. Retrospective power 
analyses are pretty pointless (e.g. see 
http://beheco.oxfordjournals.org/content/14/3/446.full), unless you're 
planning to repeat the experiment.


Bob

--
Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 798 40226
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] Output for interactions in models that do not include all main effects

2012-04-04 Thread Bob O';Hara


On 04/03/2012 11:31 PM, Kristen Gorman wrote:

Dear all,
I have R code to run AIC including multi-model inference. I am running into a 
problem in calling the output from models where both parameters in an 
interaction are not included as main effects.
Why would you want to do that? Why would you (for example) expect the 
average of the Rlipid slope to be zero if the slope varies with the 
value of RFGinit? Does this make sense?


(this is the sort of thing that makes statisticians splutter into their 
tea when they see someone do it: it rarely makes sense. Well, unless you 
have nested effects - which you don't have here- where the interaction 
is the nested effect)


if you respect marginality, there won't be a problem because the main 
effect is always included. If you really want to include interactions 
without main effects, you can either write the formula "by hand", using 
paste():


something=Rlipid
form = paste("Slipid ~ ", something, " + RFGinit:", something, sep="")
lm(form, data = DataSet)

and then work out how to get the order. Or you could try using update():

mod1 = lm(formula = Slipid ~ RFGinit*Rlipid, data = DataSet)
mod2=update(mod1, . ~ . -RFGinit)

HTH

Bob


In R, the interaction will be called depending on the parameter that was used 
as the only main effect in the model. So, I end up generating 2 different 
interactions (e.g., Rlipid:RFGinit vs RFGinit:Rlipid) that are actually the 
same. This becomes a problem in the remaining R code that requires weighted and 
summed values for the parameter and SE estimates. Thus, I would like to call 
the interaction consistently across models. See the following code:

--
lm(formula = Slipid ~ Rlipid + RFGinit:Rlipid, data = DataSet)

Residuals:
 Min  1Q  Median  3Q Max
-74.075 -19.047   7.233  20.445  45.391

Coefficients:
 Estimate Std. Error t value Pr(>|t|)
(Intercept)120.338475.30405  22.688<2e-16 ***
Rlipid   0.304930.23615   1.2910.202
Rlipid:RFGinit  -0.020990.01773  -1.1840.241
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 30.88 on 60 degrees of freedom
Multiple R-squared: 0.02721,Adjusted R-squared: -0.005221
F-statistic: 0.839 on 2 and 60 DF,  p-value: 0.4372


lm(formula = Slipid ~ RFGinit + Rlipid:RFGinit, data = DataSet)

Residuals:
Min 1Q Median 3QMax
-76.35 -21.63   7.09  22.46  45.71

Coefficients:
  Estimate Std. Error t value Pr(>|t|)
(Intercept)131.028546   8.717104  15.031<2e-16 ***
RFGinit -0.933483   0.742083  -1.2580.213
RFGinit:Rlipid   0.003926   0.009283   0.4230.674
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 30.9 on 60 degrees of freedom
Multiple R-squared: 0.02586,Adjusted R-squared: -0.00661
F-statistic: 0.7964 on 2 and 60 DF,  p-value: 0.4556
--


Is there a way to tell R to call the interaction based on alphabetical order of 
the 2 interaction terms and not based on the term that was used as a main 
effect?

Thanks very much for any insight.

Kristen Gorman

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



--
Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 798 40226
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] Problem with R2WinBUGS packages

2012-01-22 Thread Bob O';Hara


On 01/22/2012 05:50 PM, sergio vignali wrote:

Hi,
I have a problem with the R2WinBUGS package,
when I run the bugs() function with the argument debug=TRUE, WinBUGS
executes the calculations
but R is blocked, instead with the option debug=FALSE it is all ok!
Is there anybody who can help me?

Sergio Vignali

Uwe Ligges (lig...@statistik.tu-dortmund.de) is the maintainer. But 
isn't this expected behaviour? The help says


debug if FALSE (default), WinBUGS is closed automatically when the 
script has fin-
   ished running, otherwise WinBUGS remains open for 
further investigation


IIRC, you just need to close WinBUGS, and it'll return to R.

You only need to use debug=T if BUGS is trapping or throwing errors (and 
have fun understanding the traps).


Bob

--
Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 798 40216
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] Mixed model with zero truncated Poisson distributed data

2012-01-11 Thread Bob O';Hara


On 01/11/2012 04:58 PM, Helen Ward wrote:

Hello,

I am a PhD student currently trying to derive a mixed effects model.

I would like to describe the relationship between age and male
reproductive success in a population of greater horseshoe bats.

My data consists of three columns: MaleID, Age, NumberofPups (at that
age). Many of the males appear multiple times in the data set, so I
believe I need to derive a mixed model with MaleID as a random variable.

The data is Poisson distributed, but zero-truncated. So far I have only
succeeded in making a mixed model with a poisson distribution (using
glmmPQL in the MASS package), and a zero truncated poisson model (using
vglm in the VGAM package), but not a mixed model capable of handling
zero truncated Poisson data.

It has been suggested that I could just minus 1 from each value in the
NumberofPups column to make a more usual Poisson distribution, so I can
ignore the zero truncated bit. I have tried this and it changes the
results of the model, but is this an acceptable transformation?

If not, can anyone advise me on a mixed model that can handle zero
truncated Poisson data please?

I also intend to post this on the R-sig-mixed-models
<https://stat.ethz.ch/mailman/options/r-sig-mixed-models/h.l.ward%40qmul.ac.uk>
list, so apologies if you've seen it twice!

If an individual would have had pups, would you have seen it?

I'm wondering if you could simply add the zeroes back into the data.

Bob

--

Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 798 40216
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

48 matches

Mail list logo