Glad to help, and thanks for following up! More of us should follow your 
example of looking at parameter uncertainty and other issues with models.

Best,
Brian


From: R-sig-phylo <[email protected]> on behalf of cy_jiang 
<[email protected]>
Date: Wednesday, February 4, 2026 at 07:02:25
To: [email protected] <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: [R-sig-phylo] Interpreting boundary-hitting rates in corHMM 
hidden-state models

[You don't often get email from [email protected]. Learn why this is important 
at https://aka.ms/LearnAboutSenderIdentification ]

Dear Brian,

Thank you very much for this detailed roadmap. I am sorry for the late reply 
because it took me quite some time to implement your suggestions across 200 
trees, but I wanted to confirm that your suggestions regarding data limitations 
and uncertainty were spot on.

I followed your advice to assess the uncertainty of the boundary estimates 
(using dentist) and explored simplifying the model complexity by using a custom 
rate matrix to better match the information in the trees.

These steps—combined with assessing model adequacy—helped clarify that the 
boundary hits were indeed reflecting weak identifiability of rate magnitudes 
rather than model failure. This has allowed me to interpret the model fits more 
robustly.

Thanks again for the guidance!

Best regards,

Emma




At 2025-12-19 16:49:59, "Brian O'Meara" <[email protected]> wrote:

Dear Emma,


Very good to question this!


So one thing to consider is that there's not a lot of data for these models. If 
one state only occurs in a single clade, for example, the best estimate for its 
loss rate is zero: even though biologically it can be lost, it didn't do this 
on this tree. High rates can also reflect uncertainty: if a trait seemingly 
changed once on a very short branch, it could be the confidence interval for 
rate is  0.2 to 100000 changes per million years, but you're only seeing the 
point estimate. Very short terminal branches that differ in state can drive 
this, too.


Things I'd do:


* Look at uncertainty in the point estimates: run dentist or similar.
* Consider using tip fog: there may be uncertainty or errors in your states 
that would tend to drive up rates
* Make sure your branch lengths are in the right units. "100" is a high rate on 
branches in millions of years for many kinds of characters, but not if the 
branch lengths mean something else.
* Look for near zero terminal branch lengths that differ in state from their 
neighbors: not necessarily wrong, but it's essentially telling the model that a 
change must have happened over near zero time
* Consider collapsing some states or rates. If some of your states are rare on 
the tree, there's little information about changes between them. Can your 
question be answered by lumping "eats jellyfish" and  "eats salps" into a 
single "eats living jello" state? Does it make sense to have a single "loss of 
predation" rate but multiple "gain of predation" rates? The default models 
(ARD, ER) tend to use the same complexity for everything, but that might not be 
the best fit for your data or questions: make a custom rate matrix.


Hope this helps,
Brian










On Fri, Dec 19, 2025 at 5:44 AM cy_jiang <[email protected]> wrote:

Dear R-sig-phylo list,



I am fitting hidden-rate discrete trait models using corHMM (v2.8) across a 
posterior tree sample (200 trees of the same set of species), and I would 
appreciate guidance on how to interpret frequent boundary-hitting ML estimates.



In particular, I observe two patterns:

Lower-bound hits (rates → 0):
Some transition rates repeatedly hit the lower bound across many trees. In a 
hidden-state setting, this can effectively shut off transitions or simplify the 
state graph. Is it reasonable to treat such fits as degenerate or unreliable 
for process-level interpretation, and to exclude models that frequently exhibit 
this behavior?

Upper-bound hits (rates → large):
Other rates—primarily within the fast regime—often hit the upper bound. 
Increasing the bound (e.g. 20 → 30 → 100) typically causes the same or another 
within-regime rate to peg the new limit, while regime decoding and downstream 
summaries remain stable. Is it reasonable to interpret this as weak 
estimability of rate magnitude (i.e. “very fast” but not precisely estimable), 
and to focus inference on derived quantities rather than the rate values 
themselves?

More generally, is boundary behavior across tree uncertainty a reasonable 
criterion for model reliability, alongside likelihood or AIC, when choosing 
among hidden-state models?

Any advice or relevant references would be greatly appreciated. I am happy to 
provide a minimal example if useful.

Best regards,

Emma
        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - [email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/[email protected]/

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - [email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/[email protected]/

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - [email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/[email protected]/

Reply via email to