Klaus Schliep wrote --

> I suspect that some elements in this distance matrix are close to 0.75
> than pairwise distances for the bootstrap samples are likely to be
> equal or greater than 0.75. Most models ("K80", "JC69" etc.) are not
> defined for distances >=0.75 and will return Inf or NaN (the 0.75 can
> vary a bit, depending on the substitution model). bionj of course does
> not like building trees from infinite values as input. With short
> sequences the variances are of course larger and you are more likely
> to observe this, that's why your larger data set works fine.
> However in this cases NaN or Inf are the correct results!

I often have to deal with users of my PHYLIP package who
are upset at this happening with large distances when
the sequences are bootstrapped.  (In my package the distance is
set to -1.0 in that case, and it should not be used to make a tree).
NaN is not the correct value, but Inf is -- the correct distance
for (say) the Jukes-Cantor model or the Kimura 2-parameter model
when the sequences differ by more than 75% is (positive) infinity,
since these are inferred to be unrelated sequences.

> It would be nice to catch this "error" with a try and use only trees
> from finite distance matrices or set infinite values to a large value.
> But one should return a warning as these samples are likely to be
> biased.

Exactly.

J.F.
----
Joe Felsenstein         j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

_______________________________________________
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Reply via email to