Glad to help, and thanks for following up! More of us should follow your example of looking at parameter uncertainty and other issues with models.
Best, Brian From: R-sig-phylo <[email protected]> on behalf of cy_jiang <[email protected]> Date: Wednesday, February 4, 2026 at 07:02:25 To: [email protected] <[email protected]> Cc: [email protected] <[email protected]> Subject: Re: [R-sig-phylo] Interpreting boundary-hitting rates in corHMM hidden-state models [You don't often get email from [email protected]. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] Dear Brian, Thank you very much for this detailed roadmap. I am sorry for the late reply because it took me quite some time to implement your suggestions across 200 trees, but I wanted to confirm that your suggestions regarding data limitations and uncertainty were spot on. I followed your advice to assess the uncertainty of the boundary estimates (using dentist) and explored simplifying the model complexity by using a custom rate matrix to better match the information in the trees. These steps—combined with assessing model adequacy—helped clarify that the boundary hits were indeed reflecting weak identifiability of rate magnitudes rather than model failure. This has allowed me to interpret the model fits more robustly. Thanks again for the guidance! Best regards, Emma At 2025-12-19 16:49:59, "Brian O'Meara" <[email protected]> wrote: Dear Emma, Very good to question this! So one thing to consider is that there's not a lot of data for these models. If one state only occurs in a single clade, for example, the best estimate for its loss rate is zero: even though biologically it can be lost, it didn't do this on this tree. High rates can also reflect uncertainty: if a trait seemingly changed once on a very short branch, it could be the confidence interval for rate is 0.2 to 100000 changes per million years, but you're only seeing the point estimate. Very short terminal branches that differ in state can drive this, too. Things I'd do: * Look at uncertainty in the point estimates: run dentist or similar. * Consider using tip fog: there may be uncertainty or errors in your states that would tend to drive up rates * Make sure your branch lengths are in the right units. "100" is a high rate on branches in millions of years for many kinds of characters, but not if the branch lengths mean something else. * Look for near zero terminal branch lengths that differ in state from their neighbors: not necessarily wrong, but it's essentially telling the model that a change must have happened over near zero time * Consider collapsing some states or rates. If some of your states are rare on the tree, there's little information about changes between them. Can your question be answered by lumping "eats jellyfish" and "eats salps" into a single "eats living jello" state? Does it make sense to have a single "loss of predation" rate but multiple "gain of predation" rates? The default models (ARD, ER) tend to use the same complexity for everything, but that might not be the best fit for your data or questions: make a custom rate matrix. Hope this helps, Brian On Fri, Dec 19, 2025 at 5:44 AM cy_jiang <[email protected]> wrote: Dear R-sig-phylo list, I am fitting hidden-rate discrete trait models using corHMM (v2.8) across a posterior tree sample (200 trees of the same set of species), and I would appreciate guidance on how to interpret frequent boundary-hitting ML estimates. In particular, I observe two patterns: Lower-bound hits (rates → 0): Some transition rates repeatedly hit the lower bound across many trees. In a hidden-state setting, this can effectively shut off transitions or simplify the state graph. Is it reasonable to treat such fits as degenerate or unreliable for process-level interpretation, and to exclude models that frequently exhibit this behavior? Upper-bound hits (rates → large): Other rates—primarily within the fast regime—often hit the upper bound. Increasing the bound (e.g. 20 → 30 → 100) typically causes the same or another within-regime rate to peg the new limit, while regime decoding and downstream summaries remain stable. Is it reasonable to interpret this as weak estimability of rate magnitude (i.e. “very fast” but not precisely estimable), and to focus inference on derived quantities rather than the rate values themselves? More generally, is boundary behavior across tree uncertainty a reasonable criterion for model reliability, alongside likelihood or AIC, when choosing among hidden-state models? Any advice or relevant references would be greatly appreciated. I am happy to provide a minimal example if useful. Best regards, Emma [[alternative HTML version deleted]] _______________________________________________ R-sig-phylo mailing list - [email protected] https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/[email protected]/ [[alternative HTML version deleted]] _______________________________________________ R-sig-phylo mailing list - [email protected] https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/[email protected]/ [[alternative HTML version deleted]] _______________________________________________ R-sig-phylo mailing list - [email protected] https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/[email protected]/
