Hello,
    I'm attemping to get nucleotide frequency information at a number of
positions across a number of samples, and am having difficulty interpreting
some output.  Any insights would be appreciated.

I'm running the following command:

samtools mpileup  -BQ0 -d10000000 -l VariantBed.bed -uf $refFile $bam |
bcftools view -bcg - | bcftools view - > ${sampleName}_validation.vcf

I notice that this command creates an output file with
an unpredictable number of rows.  Running the command using the same bed
file on a set of different .bam files creates a set of output vcf files
with a wide distribution in numbers of rows.

I presumed that the difference in row numbers means that some positions
drop out on some .bam files because those samples lacked coverage where
other samples had coverage.

If that's the case, though, I don't know what to make of lines like the
following one:

1       2160881 .       G       .       28.2    .
DP=0;VDB=0.0003;;AC1=2;FQ=-30   PL      0

here, it looks like DP=0, but this position still got reported in the vcf
output.  I also don't see AC1 in the legend for INFO tags in the samtools
specification page, so I don't know what to make of a value of 2.

So, I am confused.  Positions with a positive value of DP and DP4 make
sense to me.  But why are some positions completely ommitted from the vcf
output, and other positions reporting a DP=0?

Thanks for any advice.

Best regards,
Jonathan
------------------------------------------------------------------------------
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to