Re: [R] distributions and glm

2008-10-21 Thread Rubén Roa-Ureta

drbn wrote:

Hello,
I have seen that some papers do this:

1.) Group data by year (e.g. 35 years) 


2.) Estimate the mean of the key variable through the distribution that fits
better (some years is a normal distribution , others is a more skewed, gamma
distribution, etc.)

3.) With these estimated means of each year do a GLM.

I'd like to know if it is possible (to use these means in a GLM) or is a
wrong idea.

Thanks in advance

David
  

David,
You can model functions of data, such as means, but you must be careful 
to carry over most of the uncertainty in the original data into the 
model. If you don't, for example if you let the model know only the 
values of the means, then you are actually assuming that these means 
were observed with absolute certainty instead of being estimated from 
the data. To carry over the uncertainty in the original data to your 
modeling you can use a Bayesian approach or you can use a marginal 
likelihood approach. A marginal likelihood is a true likelihood function 
not of the data, but of functions of the data, such as of maximum 
likelihood estimates. If your means per year were estimated using 
maximum likelihood (for example with fitdistr in package MASS) and you 
sample size is not too small then you can use a normal marginal 
likelihood model for the means. Note however that each mean may come 
from a different distribution so the full likelihood model for your data 
would be a mixture of normal distributions. You may not be able to use  
the pre-built glm function so you may face the challenge to write your 
own code.

HTH
Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] distributions and glm

2008-10-21 Thread drbn

Thanks for your clear response, Ruben.

I'm trying to fit the data of each year in a distribution because my data is
truncated at fixed points to the right and to the left (each year). By
fitting a truncate distribution I think I will be able to get an unbiased
estimate of the yearly mean.

But, as you said, each mean come from different distributions. As I'm not
able to write my own glm, I'd like to know if posibly exists some
alternative. Is possible, for instance, to model the data of each year with
a nonlinear function and estimate the mean and other parameters from this
function? Are these means more appropiate to use in a glm?

Thanks in advance

David




Ruben Roa Ureta wrote:
 
 drbn wrote:
 Hello,
 I have seen that some papers do this:

 1.) Group data by year (e.g. 35 years) 

 2.) Estimate the mean of the key variable through the distribution that
 fits
 better (some years is a normal distribution , others is a more skewed,
 gamma
 distribution, etc.)

 3.) With these estimated means of each year do a GLM.

 I'd like to know if it is possible (to use these means in a GLM) or is a
 wrong idea.

 Thanks in advance

 David
   
 David,
 You can model functions of data, such as means, but you must be careful 
 to carry over most of the uncertainty in the original data into the 
 model. If you don't, for example if you let the model know only the 
 values of the means, then you are actually assuming that these means 
 were observed with absolute certainty instead of being estimated from 
 the data. To carry over the uncertainty in the original data to your 
 modeling you can use a Bayesian approach or you can use a marginal 
 likelihood approach. A marginal likelihood is a true likelihood function 
 not of the data, but of functions of the data, such as of maximum 
 likelihood estimates. If your means per year were estimated using 
 maximum likelihood (for example with fitdistr in package MASS) and you 
 sample size is not too small then you can use a normal marginal 
 likelihood model for the means. Note however that each mean may come 
 from a different distribution so the full likelihood model for your data 
 would be a mixture of normal distributions. You may not be able to use  
 the pre-built glm function so you may face the challenge to write your 
 own code.
 HTH
 Rubén
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/distributions-and-glm-tp20075826p20096379.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] distributions and glm

2008-10-20 Thread drbn

Hello,
I have seen that some papers do this:

1.) Group data by year (e.g. 35 years) 

2.) Estimate the mean of the key variable through the distribution that fits
better (some years is a normal distribution , others is a more skewed, gamma
distribution, etc.)

3.) With these estimated means of each year do a GLM.

I'd like to know if it is possible (to use these means in a GLM) or is a
wrong idea.

Thanks in advance

David
-- 
View this message in context: 
http://www.nabble.com/distributions-and-glm-tp20075826p20075826.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.