Thanks for your reply,
You are right, the problem seems to be at the transition between two
overlapping amplicons. The overlap is from chr12:25,398,225-25,398,278 and this
region has over double the coverage of the non-overlapped flanking regions. The
locus I'm interested in (chr12:25398285) is to the right of the overlap.
Here's the depth reported by standard samtools 0.1.19 mpileup for a sample
where we successfully found the KRAS mutation. Mpileup is maxed out in the
overlap region and then coverage drops after 25398279 reflecting the true
coverage of 25398279-25398285.
25398270 8004 <- maxed out mpileup
25398271 8004
25398272 8004
25398273 8004
25398274 8004
25398275 8004
25398276 8004
25398277 8001
25398278 7881
25398279 2151 <- End of overlap, coverage decreases
25398280 2137
25398281 2137
25398282 2136
25398283 2135
25398284 2135
25398285 2137 <- mutation found here
And here's one that failed (a different sample):
Mpileup reports that coverage is much lower than it is in the bam file (4000x
in this case).
25398270 8021 <- maxed out mpileup
25398271 8021
25398272 8021
25398273 8020
25398274 8020
25398275 8020
25398276 8018
25398277 8011
25398278 7867
25398279 16 <- coverage transition point
25398280 9
25398281 9
25398282 9
25398283 10
25398284 13
25398285 17 <- Not enough depth to call a mutation
The two samples are roughtly the same, so I don't understand why the mpileup
gets truncated in the second sample but not the first.
Thanks again for your help,
S
On Wednesday, 18 June 2014, 8:14, Thomas W. Blackwell <[email protected]> wrote:
Since this is amplicon sequencing, is it possible that the primers
for a second amplicon fall right adjacent to the site of interest ?
If so, then widening the region slightly would suddenly double the
maximum depth found within the region and go over the 8000 limit.
This is just a guess. Could be something quite else.
- tom blackwell -
On Fri, 13 Jun 2014, Scott Newman wrote:
> Truncated output from mpileup:
Hi All,
We've got Illumina paired end reads from amplicon sequencing aligned with BWA.
We noticed that Varscan 2 was missing some very important cancer mutations in
KRAS codon 12.
We initially thought that it was VarScan's fault, but it turned out that the
problem was with the intermediate data coming from mpileup.
If you mpileup genome-wide and then see what you get at position chr12:25398285
(where we know there's a mutation). The coverage seems quite sparse when in
reality we have around 8000x.
>samtools mpileup -B -f $hg19 tmp.bam | grep 25398285
>chr12 25398285 C 17 A.......,,,,,^],^],^],^], HHHHHFAFFF0CFF000
I've tried single end, paired ends, A flag, B flag, samtools 0.18, samtools
0.19, and patched versions of samtools that allows >8000x coverage and I get
the same truncated result. The confusing thing is that when you confine mpileup
to a single basepair, you get the full 8000x coverage that you were expecting:
>samtools mpileup -r chr12:25398285-25398285 -B -f $hg19 tmp.bam | grep 25398285
>chr12 25398285 C 7999
>AAAAAAAA,,,,...,a,,aa,,,a,............A.A..AA... etc etc etc
HHGHHHHHHHHHH0HHHHHHHHFHHHHHHGHFHHHHHFCHHH0HHHHHHHHHGHHHGHFHHHH etc etc etc
But if you widen the region slighly, it truncates again.
samtools mpileup -r chr12:25398277-25398285 -B -f $hg19 tmp.bam | grep 25398285
chr12 25398285 C 17 A.......,,,,,^],^],^],^], HHHHHFAFFF0CFF000
We have 90 of these samples an the problem occurs in 1/3 of them.
Thanks for you help,
S
------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help