Dear Emma,

Very good to question this!

So one thing to consider is that there's not a lot of data for these
models. If one state only occurs in a single clade, for example, the best
estimate for its loss rate is zero: even though biologically it can be
lost, it didn't do this on this tree. High rates can also reflect
uncertainty: if a trait seemingly changed once on a very short branch, it
could be the confidence interval for rate is  0.2 to 100000 changes per
million years, but you're only seeing the point estimate. Very short
terminal branches that differ in state can drive this, too.

Things I'd do:

* Look at uncertainty in the point estimates: run dentist or similar.
* Consider using tip fog: there may be uncertainty or errors in your states
that would tend to drive up rates
* Make sure your branch lengths are in the right units. "100" is a high
rate on branches in millions of years for many kinds of characters, but not
if the branch lengths mean something else.
* Look for near zero terminal branch lengths that differ in state from
their neighbors: not necessarily wrong, but it's essentially telling the
model that a change must have happened over near zero time
* Consider collapsing some states or rates. If some of your states are rare
on the tree, there's little information about changes between them. Can
your question be answered by lumping "eats jellyfish" and  "eats salps"
into a single "eats living jello" state? Does it make sense to have a
single "loss of predation" rate but multiple "gain of predation" rates? The
default models (ARD, ER) tend to use the same complexity for everything,
but that might not be the best fit for your data or questions: make a
custom rate matrix.

Hope this helps,
Brian





On Fri, Dec 19, 2025 at 5:44 AM cy_jiang <[email protected]> wrote:

> Dear R-sig-phylo list,
>
>
>
> I am fitting hidden-rate discrete trait models using corHMM (v2.8) across
> a posterior tree sample (200 trees of the same set of species), and I would
> appreciate guidance on how to interpret frequent boundary-hitting ML
> estimates.
>
>
>
> In particular, I observe two patterns:
>
> Lower-bound hits (rates → 0):
> Some transition rates repeatedly hit the lower bound across many trees. In
> a hidden-state setting, this can effectively shut off transitions or
> simplify the state graph. Is it reasonable to treat such fits as degenerate
> or unreliable for process-level interpretation, and to exclude models that
> frequently exhibit this behavior?
>
> Upper-bound hits (rates → large):
> Other rates—primarily within the fast regime—often hit the upper bound.
> Increasing the bound (e.g. 20 → 30 → 100) typically causes the same or
> another within-regime rate to peg the new limit, while regime decoding and
> downstream summaries remain stable. Is it reasonable to interpret this as
> weak estimability of rate magnitude (i.e. “very fast” but not precisely
> estimable), and to focus inference on derived quantities rather than the
> rate values themselves?
>
> More generally, is boundary behavior across tree uncertainty a reasonable
> criterion for model reliability, alongside likelihood or AIC, when choosing
> among hidden-state models?
>
> Any advice or relevant references would be greatly appreciated. I am happy
> to provide a minimal example if useful.
>
> Best regards,
>
> Emma
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-phylo mailing list - [email protected]
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at
> http://www.mail-archive.com/[email protected]/
>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - [email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/[email protected]/

Reply via email to