Hello, I have been reading this
<http://samtools.github.io/bcftools/call-m.pdf> documentation that details
the math behind how bcftools calls a genotype and I am not sure whether or
not equation (1) is wrong. The document says the following equation is used
to calculate allele frequencies

f_x = sum_k( Q_k^x) / sum_{k,y}( Q_k^y )

As stated in the document, Q is defined as the (base) quality and D_x is
defined as the number of times a base x has been seen. With that being
said, I would think the allele frequency would be calculated as

f_x = D_x / sum_y( D_y )

Hopefully my notation conveys the idea I'm trying make. Does
samtools/bcftools calculate allele frequency using the base quality formula
like the document says or does it assume the base calls are 100% correct
and calculate the allele frequency using the above formula I made? If the
latter is true, then I think there is a typo in the documentation.

Josh Bradley
------------------------------------------------------------------------------
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to