Re: [R-sig-eco] Question on unexpected regression results

2014-07-30 Thread Bob O'Hara
Manuel is right - you're fitting a saturated model (i.e. each data point 
has one parameter, so hte model fits perfectly).


Once you've fixed that, if you still have problems then start by 
plotting the data, and looking at it to see what's going on. It may be 
that the relationship isn't a straight line.


Bob

On 30/07/14 15:10, Bruce Miller wrote:

Hi all,

Sorry this is a bit long, but the explanation of what I want to do needs
to be clear to avoid  issues such as this quote..."It is impossible to
speak in such a way that you cannot be misunderstood." Karl Popper.

I am running linear regression models, but I am getting expected results.
I wonder what else I might try to derive an estimated value of bat
echolocation parameters based on forearm measurements.  It is known that
the size of the bat is negatively related to the characteristic
frequency (Fc) of their echolocation calls (decades of my field work) .
So in general larger guys have lower frequency calls and smaller guys
have higher frequency calls.

I have run the regressions based on the FA (valid forearm measurements)
and the known and valid Fc ranges for a dozen species or so and using
the lm models to "predict" Fc values for a few species that have FA
values but have not yet been recorded.  Hence there are no valid
echolocation call parameters.  R Code used is below discussion.

I have valid ranges for the known species FA (forearm measurements) and
Fc(minimum) and Fc (maximum).  So I  do two separate runs with the data
using the lm model one with FA~Fcmin and one FA~Fcmax.

The goal is to provide the predicted (estimated) values for the species
with known FA values but w/o verified Fc value ranges.

My concern is that the predicted values returned are much lower than the
true values for the verified species.  Therefore I am not confident the
predicted values for those w/o verified Fc ranges are useful.

One very helpful person looked at one simple data set I sent and showed
that the statistical differences between the true values and predicted
were not significant.

However Krebs' admonishment to students eons ago "Do not confuse
statistical significance with ecological significance" is true here.
The values of the predicted ranges are far lower than reality so the few
species that do not have field recorded Fc values  are suspect.  These
differences in predicted values from a true range will will make a
difference for potentially IDing the unknown calls.  A difference of
10kHz Fc generally suggests a different species, albeit some are much
closer and may only have a 5 kHz difference.

I am looking at acoustic data sets of calls from South America and there
are  many "sonospecies."
These are clearly separate species based on echolocation call parameters
that have yet had "faces & voices" matched.  We know that call
parameters are diagnostic for families and genera even when the species
is unknown.  It is then the Fc values that assist in identifying the
species within a cluster of calls from the same genus.

Sample of R code used:

Bats <- dget('C:/=Bat data working/Acoustic
Parameters/_Working/=Vespertilionidae/Bats.robj')

model.lm <- lm(formula=Fc ~ as.factor(FA),data=Bats,na.action=na.omit)
  > Anova(model.lm,type='II') Error in solve.default(L %*% V %*% t(L)) :
system is computationally singular: reciprocal condition number = 0
  > summary(model.lm)

Call:
lm(formula = Fc ~ as.factor(FA), data = Bats, na.action = na.omit)

Residuals:
ALL 5 residuals are 0: no residual degrees of freedom!

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   53.3 NA  NA   NA
as.factor(FA)34.4 -4.6 NA  NA   NA
as.factor(FA)35.4  2.3 NA  NA   NA
as.factor(FA)35.5  9.0 NA  NA   NA
as.factor(FA)40.5 -7.3 NA  NA   NA

Residual standard error: NaN on 0 degrees of freedom
(2 observations deleted due to missingness)
Multiple R-squared: 1,Adjusted R-squared:   NaN
F-statistic:   NaN on 4 and 0 DF,  p-value: NA

  > tmp<-predict(model.lm)
  > Bats[names(tmp),"predicted"]<-tmp
  > rm('tmp')
  > rm('model.lm')
  >


model.lm  <- lm(formula= Fc~ FA,data=Bats,na.action=na.omit)
Anova(model.lm,type='II')
summary(model.lm)
tmp<-predict(model.lm,Bats)
Bats[names(tmp),"Predicted.Fc"]<-tmp
rm('tmp')
rm('model.lm')

With the results it can be seen that the predicted Fc values on right
are not close to the true Fc values on left and then make me hesitant to
accept the 2 with NA predicted values. FYI Species are simple 6 letter
coded for genus and species.
Species FA  Fcmin   Fcmax   FcMinpredic FcMaxpredic
Myoalb  35.345.748.751.73   55.26
Myoata  37  NA  NA  49.52   52.59
Myokea  33.757.861.353.80   57.77
Myonig  34.551.655.752.77   56.52
Myooxy  40.545.747.644.98   47.09
Myorip  36  53.357.550.82   54.16
Myosim  38  NA  NA  48.

Re: [R-sig-eco] Question on unexpected regression results

2014-07-30 Thread Rich Shepard

On Wed, 30 Jul 2014, Bruce Miller wrote:


I have valid ranges for the known species FA (forearm measurements) and
Fc(minimum) and Fc (maximum). So I do two separate runs with the data
using the lm model one with FA~Fcmin and one FA~Fcmax.

The goal is to provide the predicted (estimated) values for the species
with known FA values but w/o verified Fc value ranges.



My concern is that the predicted values returned are much lower than the
true values for the verified species.  Therefore I am not confident the
predicted values for those w/o verified Fc ranges are useful.


Bruce,

  Since linear regression provides interscts and slopes for the _mean_ value
of the response variable across the range of explanatory variables, your
conundrum might be resolved by using quantile regression (R package
quantreg).

  Quantile regression looks at the relationship of the explantory
variable(s) at specified (your choice) quantiles of the response variable.
It might well be that the relationshop of Fc to FA is quite different at the
5% Fc quantile than it is at the 50% or 95% quantile.

  A starting point is Cade and Noon. 2003. A gentle introduction to quantile
regression for ecologists. Frontiers in Ecology and Environment 1:412--420.

HTH,

Rich

--
Richard B. Shepard, Ph.D.
Applied Ecosystem Services, Inc. | Troutdale, OR 97060 USA
www.appl-ecosys.com  Voice: 503-667-4517 Fax: 503-667-8863

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Question on unexpected regression results

2014-07-30 Thread Manuel SpĂ­nola
Dear Bruce,

Besides any other problems I can see that you consider FA as a factor and
that could be a problem with your data set.

Manuel


2014-07-30 7:10 GMT-06:00 Bruce Miller :

> Hi all,
>
> Sorry this is a bit long, but the explanation of what I want to do needs
> to be clear to avoid  issues such as this quote..."It is impossible to
> speak in such a way that you cannot be misunderstood." Karl Popper.
>
> I am running linear regression models, but I am getting expected results.
> I wonder what else I might try to derive an estimated value of bat
> echolocation parameters based on forearm measurements.  It is known that
> the size of the bat is negatively related to the characteristic
> frequency (Fc) of their echolocation calls (decades of my field work) .
> So in general larger guys have lower frequency calls and smaller guys
> have higher frequency calls.
>
> I have run the regressions based on the FA (valid forearm measurements)
> and the known and valid Fc ranges for a dozen species or so and using
> the lm models to "predict" Fc values for a few species that have FA
> values but have not yet been recorded.  Hence there are no valid
> echolocation call parameters.  R Code used is below discussion.
>
> I have valid ranges for the known species FA (forearm measurements) and
> Fc(minimum) and Fc (maximum).  So I  do two separate runs with the data
> using the lm model one with FA~Fcmin and one FA~Fcmax.
>
> The goal is to provide the predicted (estimated) values for the species
> with known FA values but w/o verified Fc value ranges.
>
> My concern is that the predicted values returned are much lower than the
> true values for the verified species.  Therefore I am not confident the
> predicted values for those w/o verified Fc ranges are useful.
>
> One very helpful person looked at one simple data set I sent and showed
> that the statistical differences between the true values and predicted
> were not significant.
>
> However Krebs' admonishment to students eons ago "Do not confuse
> statistical significance with ecological significance" is true here.
> The values of the predicted ranges are far lower than reality so the few
> species that do not have field recorded Fc values  are suspect.  These
> differences in predicted values from a true range will will make a
> difference for potentially IDing the unknown calls.  A difference of
> 10kHz Fc generally suggests a different species, albeit some are much
> closer and may only have a 5 kHz difference.
>
> I am looking at acoustic data sets of calls from South America and there
> are  many "sonospecies."
> These are clearly separate species based on echolocation call parameters
> that have yet had "faces & voices" matched.  We know that call
> parameters are diagnostic for families and genera even when the species
> is unknown.  It is then the Fc values that assist in identifying the
> species within a cluster of calls from the same genus.
>
> Sample of R code used:
>
> Bats <- dget('C:/=Bat data working/Acoustic
> Parameters/_Working/=Vespertilionidae/Bats.robj')
>
> model.lm <- lm(formula=Fc ~ as.factor(FA),data=Bats,na.action=na.omit)
>  > Anova(model.lm,type='II') Error in solve.default(L %*% V %*% t(L)) :
> system is computationally singular: reciprocal condition number = 0
>  > summary(model.lm)
>
> Call:
> lm(formula = Fc ~ as.factor(FA), data = Bats, na.action = na.omit)
>
> Residuals:
> ALL 5 residuals are 0: no residual degrees of freedom!
>
> Coefficients:
>Estimate Std. Error t value Pr(>|t|)
> (Intercept)   53.3 NA  NA   NA
> as.factor(FA)34.4 -4.6 NA  NA   NA
> as.factor(FA)35.4  2.3 NA  NA   NA
> as.factor(FA)35.5  9.0 NA  NA   NA
> as.factor(FA)40.5 -7.3 NA  NA   NA
>
> Residual standard error: NaN on 0 degrees of freedom
>(2 observations deleted due to missingness)
> Multiple R-squared: 1,Adjusted R-squared:   NaN
> F-statistic:   NaN on 4 and 0 DF,  p-value: NA
>
>  > tmp<-predict(model.lm)
>  > Bats[names(tmp),"predicted"]<-tmp
>  > rm('tmp')
>  > rm('model.lm')
>  >
>
> >model.lm  <- lm(formula= Fc~ FA,data=Bats,na.action=na.omit)
>
> >Anova(model.lm,type='II')
>
> >summary(model.lm)
>
> >tmp<-predict(model.lm,Bats)
>
> >Bats[names(tmp),"Predicted.Fc"]<-tmp
>
> >rm('tmp')
>
> >rm('model.lm')
>
> With the results it can be seen that the predicted Fc values on right
> are not close to the true Fc values on left and then make me hesitant to
> accept the 2 with NA predicted values. FYI Species are simple 6 letter
> coded for genus and species.
> Species FA  Fcmin   Fcmax   FcMinpredic FcMaxpredic
> Myoalb  35.345.748.751.73   55.26
> Myoata  37  NA  NA  49.52   52.59
> Myokea  33.757.861.353.80   57.77
> Myonig  34.551.655.752.77   56.52
> Myooxy  40.545.747.644.98   47.09
> Myorip  36  53.357.550.82   54.16
> Myo

[R-sig-eco] Question on unexpected regression results

2014-07-30 Thread Bruce Miller
Hi all,

Sorry this is a bit long, but the explanation of what I want to do needs 
to be clear to avoid  issues such as this quote..."It is impossible to 
speak in such a way that you cannot be misunderstood." Karl Popper.

I am running linear regression models, but I am getting expected results.
I wonder what else I might try to derive an estimated value of bat 
echolocation parameters based on forearm measurements.  It is known that 
the size of the bat is negatively related to the characteristic 
frequency (Fc) of their echolocation calls (decades of my field work) . 
So in general larger guys have lower frequency calls and smaller guys 
have higher frequency calls.

I have run the regressions based on the FA (valid forearm measurements) 
and the known and valid Fc ranges for a dozen species or so and using 
the lm models to "predict" Fc values for a few species that have FA 
values but have not yet been recorded.  Hence there are no valid 
echolocation call parameters.  R Code used is below discussion.

I have valid ranges for the known species FA (forearm measurements) and 
Fc(minimum) and Fc (maximum).  So I  do two separate runs with the data 
using the lm model one with FA~Fcmin and one FA~Fcmax.

The goal is to provide the predicted (estimated) values for the species 
with known FA values but w/o verified Fc value ranges.

My concern is that the predicted values returned are much lower than the 
true values for the verified species.  Therefore I am not confident the 
predicted values for those w/o verified Fc ranges are useful.

One very helpful person looked at one simple data set I sent and showed 
that the statistical differences between the true values and predicted 
were not significant.

However Krebs' admonishment to students eons ago "Do not confuse 
statistical significance with ecological significance" is true here.  
The values of the predicted ranges are far lower than reality so the few 
species that do not have field recorded Fc values  are suspect.  These 
differences in predicted values from a true range will will make a 
difference for potentially IDing the unknown calls.  A difference of 
10kHz Fc generally suggests a different species, albeit some are much 
closer and may only have a 5 kHz difference.

I am looking at acoustic data sets of calls from South America and there 
are  many "sonospecies."
These are clearly separate species based on echolocation call parameters 
that have yet had "faces & voices" matched.  We know that call 
parameters are diagnostic for families and genera even when the species 
is unknown.  It is then the Fc values that assist in identifying the 
species within a cluster of calls from the same genus.

Sample of R code used:

Bats <- dget('C:/=Bat data working/Acoustic 
Parameters/_Working/=Vespertilionidae/Bats.robj')

model.lm <- lm(formula=Fc ~ as.factor(FA),data=Bats,na.action=na.omit)
 > Anova(model.lm,type='II') Error in solve.default(L %*% V %*% t(L)) :
system is computationally singular: reciprocal condition number = 0
 > summary(model.lm)

Call:
lm(formula = Fc ~ as.factor(FA), data = Bats, na.action = na.omit)

Residuals:
ALL 5 residuals are 0: no residual degrees of freedom!

Coefficients:
   Estimate Std. Error t value Pr(>|t|)
(Intercept)   53.3 NA  NA   NA
as.factor(FA)34.4 -4.6 NA  NA   NA
as.factor(FA)35.4  2.3 NA  NA   NA
as.factor(FA)35.5  9.0 NA  NA   NA
as.factor(FA)40.5 -7.3 NA  NA   NA

Residual standard error: NaN on 0 degrees of freedom
   (2 observations deleted due to missingness)
Multiple R-squared: 1,Adjusted R-squared:   NaN
F-statistic:   NaN on 4 and 0 DF,  p-value: NA

 > tmp<-predict(model.lm)
 > Bats[names(tmp),"predicted"]<-tmp
 > rm('tmp')
 > rm('model.lm')
 >

>model.lm  <- lm(formula= Fc~ FA,data=Bats,na.action=na.omit)

>Anova(model.lm,type='II')

>summary(model.lm)

>tmp<-predict(model.lm,Bats)

>Bats[names(tmp),"Predicted.Fc"]<-tmp

>rm('tmp')

>rm('model.lm')

With the results it can be seen that the predicted Fc values on right 
are not close to the true Fc values on left and then make me hesitant to 
accept the 2 with NA predicted values. FYI Species are simple 6 letter 
coded for genus and species.
Species FA  Fcmin   Fcmax   FcMinpredic FcMaxpredic
Myoalb  35.345.748.751.73   55.26
Myoata  37  NA  NA  49.52   52.59
Myokea  33.757.861.353.80   57.77
Myonig  34.551.655.752.77   56.52
Myooxy  40.545.747.644.98   47.09
Myorip  36  53.357.550.82   54.16
Myosim  38  NA  NA  48.23   51.02


Perhaps simple linear regression is not the method to use?
Thanks for any additional suggestions.

Bruce

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecolo