Hi Russel, What you see is the large uncertainty in “ancestral” states, which is part of the intercept here. The linear relationship that you overlaid on top of your data is the relationship predicted at the root of the tree (as if such a thing existed!). There is a lot of uncertainty about the intercept, but much less uncertainty in the slope. It looks like the slope is not affected by the inclusion or exclusion of monotremes. (for one possible reference on the greater precision in the slope versus the intercept, there’s this: http://dx.doi.org/10.1214/13-AOS1105 for the BM).
My second cent is that the phylogenetic predictions should be stable. The uncertainty in the intercept —and the large effect of including monotremes on the intercept— should not affect predictions, so long as you know for which species you want to make a prediction. If you want to make prediction for a species in a small clade “far” from monotremes, say, then the prediction is probably quite stable, even if you include monotremes: this is because the phylogenetic prediction should use the phylogenetic relationships for the species to be predicted. A prediction that uses the linear relationship at the root and ignores the placement of the species would be the worst-case scenario: for a mammal species with a completely unknown placement within mammals. There’s probably a number of software that do phylogenetic prediction. I know of Rphylopars and PhyloNetworks. my 2 cents… Cecile --- Cécile Ané, Professor (she/her) H. I. Romnes Faculty Fellow Departments of Statistics and of Botany University of Wisconsin - Madison www.stat.wisc.edu/~ane/<http://www.stat.wisc.edu/~ane/> CALS statistical consulting lab: https://calslab.cals.wisc.edu/stat-consulting/ On Jun 29, 2021, at 5:37 PM, neovenatori...@gmail.com<mailto:neovenatori...@gmail.com> wrote: Dear All, So this is the main problem I'm facing (see attached figure, which should be small enough to post). When I calculate the best-fit line under a Brownian model, this produces a best-fit line that more or less bypasses the distribution of the data altogether. I did some testing and found that this result was driven solely by the presence of Monotremata, resulting in the model heavily downweighting all of the phylogenetic variation within Theria in favor of the deep divergence between Monotremata and Theria. Excluding Monotremata produces a PGLS fit that's comparable enough to the OLS and OU model fit to be justifiable (though I can't just throw out Monotremata for the sake of throwing it out). I am planning to do a more theoretical investigation into the effect of Monotremata on the PGLS fit in a future study, but right now what I am trying to do is perform a study in which I use this data to construct a regression model that can be used to predict new data. Which is why I am trying to use AIC to potentially justify going with OLS or an OU model over a Brownian model. From a practical perspective the Brownian model is almost unusable because it produces systematically biased estimates with high error rates when applied to new data (error rate is roughly double that of both the OLS and OU model). This is especially the case because the data must be back-transformed into an arithmetic scale to be useable, and thus a seemingly minor difference in regression models results in a massive difference in predicted values. However, I need some objective test to show that OLS fits the data better than the Brownian model, hence why I was going with AIC. Overall, OLS does seem to outperform the Brownian model on average, but the variation in AIC is so high it is hard to interpret this. This is kind of why I am leery of assuming a null Brownian model. A Brownian model, if anything, does not seem to accurately model the relationship between variables. This is why I am having trouble figuring out how to do model selection. Just going with accuracy statistics like percent error or standard error of the estimate OLS is better from a purely practical sense (it doesn't work for the monotreme taxa, but it turns out that estimate error in the monotremes is only decreased by 10% in a Brownian model when it overestimates mass by nearly 75%, so the improvement really isn't worth it and using this for monotremes isn't recommended in the first place), but the reviewers are expressing skepticism over the fact that the Brownian model produces less useable results. And I'm not entirely sure the best way to go about the PGLS if using one of the birth-death trees isn't ideal, perhaps what Dr. Upham says about using the DNA tree might work better. Ironically, an OU model might be argued to better fit the data, despite the concerns that Dr. Bapst mentioned. Looking at the distribution of signal even though signal is not random, it is more accurately described as most taxa hewing to a stable equilibrium with rapid, high magnitude shifts at certain evolutionary nodes, rather than the covariation between the two traits evolving in a Brownian fashion. I did some experiments with a PSR curve and the results seem to favor an OU model or other models with uneven rates of evolution rather than a pure Brownian model. Of course, the broader issue I am facing is trying to deal with PGLS succinctly; the scope of the study isn't necessarily an in-depth comparison between different regression models, it's more looking at how this variable correlates with body mass for practical purposes (for which considering phylogeny is one part of that). It's definitely something to consider but I am trying to avoid manuscript bloat. Sincerely, Russell [[alternative HTML version deleted]] _______________________________________________ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/