Hi Russel,

What you see is the large uncertainty in “ancestral” states, which is part of 
the intercept here. The linear relationship that you overlaid on top of your 
data is the relationship predicted at the root of the tree (as if such a thing 
existed!). There is a lot of uncertainty about the intercept, but much less 
uncertainty in the slope. It looks like the slope is not affected by the 
inclusion or exclusion of monotremes. (for one possible reference on the 
greater precision in the slope versus the intercept, there’s this: 
http://dx.doi.org/10.1214/13-AOS1105 for the BM).

My second cent is that the phylogenetic predictions should be stable. The 
uncertainty in the intercept —and the large effect of including monotremes on 
the intercept— should not affect predictions, so long as you know for which 
species you want to make a prediction. If you want to make prediction for a 
species in a small clade “far” from monotremes, say, then the prediction is 
probably quite stable, even if you include monotremes: this is because the 
phylogenetic prediction should use the phylogenetic relationships for the 
species to be predicted. A prediction that uses the linear relationship at the 
root and ignores the placement of the species would be the worst-case scenario: 
for a mammal species with a completely unknown placement within mammals.

There’s probably a number of software that do phylogenetic prediction. I know 
of Rphylopars and PhyloNetworks.

my 2 cents…
Cecile

---
Cécile Ané, Professor (she/her)
H. I. Romnes Faculty Fellow
Departments of Statistics and of Botany
University of Wisconsin - Madison
www.stat.wisc.edu/~ane/<http://www.stat.wisc.edu/~ane/>

CALS statistical consulting lab: https://calslab.cals.wisc.edu/stat-consulting/



On Jun 29, 2021, at 5:37 PM, 
neovenatori...@gmail.com<mailto:neovenatori...@gmail.com> wrote:

Dear All,

So this is the main problem I'm facing (see attached figure, which should be 
small enough to post). When I calculate the best-fit line under a Brownian 
model, this produces a best-fit line that more or less bypasses the 
distribution of the data altogether. I did some testing and found that this 
result was driven solely by the presence of Monotremata, resulting in the model 
heavily downweighting all of the phylogenetic variation within Theria in favor 
of the deep divergence between Monotremata and Theria. Excluding Monotremata 
produces a PGLS fit that's comparable enough to the OLS and OU model fit to be 
justifiable (though I can't just throw out Monotremata for the sake of throwing 
it out).

I am planning to do a more theoretical investigation into the effect of 
Monotremata on the PGLS fit in a future study, but right now what I am trying 
to do is perform a study in which I use this data to construct a regression 
model that can be used to predict new data. Which is why I am trying to use AIC 
to potentially justify going with OLS or an OU model over a Brownian model. 
From a practical perspective the Brownian model is almost unusable because it 
produces systematically biased estimates with high error rates when applied to 
new data (error rate is roughly double that of both the OLS and OU model). This 
is especially the case because the data must be back-transformed into an 
arithmetic scale to be useable, and thus a seemingly minor difference in 
regression models results in a massive difference in predicted values. However, 
I need some objective test to show that OLS fits the data better than the 
Brownian model, hence why I was going with AIC. Overall, OLS does seem to 
outperform the Brownian model on average, but the variation in AIC is so high 
it is hard to interpret this.

This is kind of why I am leery of assuming a null Brownian model. A Brownian 
model, if anything, does not seem to accurately model the relationship between 
variables.

This is why I am having trouble figuring out how to do model selection. Just 
going with accuracy statistics like percent error or standard error of the 
estimate OLS is better from a purely practical sense (it doesn't work for the 
monotreme taxa, but it turns out that estimate error in the monotremes is only 
decreased by 10% in a Brownian model when it overestimates mass by nearly 75%, 
so the improvement really isn't worth it and using this for monotremes isn't 
recommended in the first place), but the reviewers are expressing skepticism 
over the fact that the Brownian model produces less useable results. And I'm 
not entirely sure the best way to go about the PGLS if using one of the 
birth-death trees isn't ideal, perhaps what Dr. Upham says about using the DNA 
tree might work better.

Ironically, an OU model might be argued to better fit the data, despite the 
concerns that Dr. Bapst mentioned. Looking at the distribution of signal even 
though signal is not random, it is more accurately described as most taxa 
hewing to a stable equilibrium with rapid, high magnitude shifts at certain 
evolutionary nodes, rather than the covariation between the two traits evolving 
in a Brownian fashion. I did some experiments with a PSR curve and the results 
seem to favor an OU model or other models with uneven rates of evolution rather 
than a pure Brownian model.

Of course, the broader issue I am facing is trying to deal with PGLS 
succinctly; the scope of the study isn't necessarily an in-depth comparison 
between different regression models, it's more looking at how this variable 
correlates with body mass for practical purposes (for which considering 
phylogeny is one part of that). It's definitely something to consider but I am 
trying to avoid manuscript bloat.

Sincerely,
Russell


        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Reply via email to