On 11/30/2012 01:09 PM, Anto Raja wrote:
Hi all

I am searching for a tool that would help me to identify weather patterns
that influence the prevalence of a pathogen, 'Pn'.

Say, I have annual prevalence data (collected in April) and I know that the
prevalence of 'Pn' is affected by the weather conditions since November. I
also have daily data for different weeather variables.
You have a lot of weather data, but I assume you don't have so much prevalence data. So it's going to be difficult, whatever you do...
The objective is to identify the relationship between the weather from
Nov-Mar and the prevalence of Pn. We know that weather has an influence on
Pn. The question is to find out weather from which period is relevant or
what kind of weather is relevant. It could be that the first two winter
months (Nov-Dec) is the decisive factor or that a certain weather situation
(like 20 consecutive days of below zero conditions) occuring at any time is
important or a combination of both.

I have tried correlations between prevalence and monthly means for Mar,
Feb-Mar, Jan-Mar and so on and nothing definite turned up. I could also do
it on a weekly basis manually. But I wonder if there is a tool that uses a
moving window of different sizes (say, from a min size of 1 week to a max
of 4 months) and checks correlations for each of these periods.

I am thinking of ARMA, but my present intention is not to forecast but only
to study. Can it still be used? Or ARMA in combination with multivariate
analysis to study the relative importance of each weather variable.
I don't see why an ARMA model would help you, as that assumes a covariance between times (i.e. autocorrelation) in the response (i.e. prevalence). There are methods for assuming that the response has an autocorrelation, but I don't think that's your big problem. My reaction (without seeing the data, of course) is that you might be asking too much of your data to get anything meaningful out of it.

Any suugestions are welcome. I have used R for basic stats analysis but
never worked with time-series data or the advanced tools of data mining.
So, it could also be possible I am not thinking along the right lines. Feel
free to correct if I am looking in the wrong place.
It sounds like you're trying to mine your data for any pattern. To be honest, if you do that, I wouldn't trust the results unless you can validate them independently: you'll find some relationship if you try enough models, but will it make biological sense? This is particularly problematic when you have correlated variables, which you will do (especially when you start sliding windows around)

I'd suggest you start by using what's known of the pathogen or its host, or of similar host-pathogen systems, to develop a smaller number of hypotheses about what sort of effects are likely. Plant ecologists often use GDD5 (Growing Degree Days above 5°C), which might be a useful way of reducing the temperature data to something smaller. Of course, another temperature than 5°C might work better for you.

Bob

--
Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 7542 1863 /  +49 69 798 40226
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Reply via email to