This does not look like output from samtools 1.0 - it outputs VCFv4.2 petr
On Mon, 2014-09-22 at 14:53 -0400, jbr950 wrote: > Hi Petr, > Thanks for your reply. I grabbed the > samtools-bcftools-htslib-1.0_x64-linux binary and tried again. > > > When I'd used Samtools 0.1.18, the issue I emailed about was that the > number of lines in output varied by .bam file, and I didn't understand > why lines were being ommitted, and why not in a common manner. > > > Using version 1.0 and the same command (I checked on the older > samtools and it is producing output), I get no output at all. Mpileup > writes a header and then stops. I have copied and pasted the output. > > > > My .bed is 3 columns: chr start stop > with no header. > > > Any advice appreciated, thanks! > > > Jonathan > > > > > > > [mpileup] 1 samples in 1 input files > (mpileup) Max depth is above 1M. Potential memory hog! > [bcf_sync] incorrect number of fields (0 != 5) at 0:0 > [afs] 0:0.000 > ##fileformat=VCFv4.1 > ##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth"> > ##INFO=<ID=DP4,Number=4,Type=Integer,Description="# high-quality > ref-forward bases, ref-reverse, alt-forward and alt-reverse bases"> > ##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square > mapping quality of covering reads"> > ##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability of > all samples being the same"> > ##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood > estimate of the first ALT allele frequency (assuming HWE)"> > ##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood > estimate of the first ALT allele count (no HWE assumption)"> > ##INFO=<ID=G3,Number=3,Type=Float,Description="ML estimate of genotype > frequencies"> > ##INFO=<ID=HWE,Number=1,Type=Float,Description="Chi^2 based HWE test > P-value based on G3"> > ##INFO=<ID=CLR,Number=1,Type=Integer,Description="Log ratio of > genotype likelihoods with and without the constraint"> > ##INFO=<ID=UGT,Number=1,Type=String,Description="The most probable > unconstrained genotype configuration in the trio"> > ##INFO=<ID=CGT,Number=1,Type=String,Description="The most probable > constrained genotype configuration in the trio"> > ##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for strand > bias, baseQ bias, mapQ bias and tail distance bias"> > ##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the > variant is an INDEL."> > ##INFO=<ID=PC2,Number=2,Type=Integer,Description="Phred probability of > the nonRef allele frequency in group1 samples being larger (,smaller) > than in group2."> > ##INFO=<ID=PCHI2,Number=1,Type=Float,Description="Posterior weighted > chi^2 P-value for testing the association between group1 and group2 > samples."> > ##INFO=<ID=QCHI2,Number=1,Type=Integer,Description="Phred scaled > PCHI2."> > ##INFO=<ID=PR,Number=1,Type=Integer,Description="# permutations > yielding a smaller PCHI2."> > ##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance > Bias"> > ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> > ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality"> > ##FORMAT=<ID=GL,Number=3,Type=Float,Description="Likelihoods for > RR,RA,AA genotypes (R=ref,A=alt)"> > ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="# high-quality > bases"> > ##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand > bias P-value"> > ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of > Phred-scaled genotype likelihoods"> > #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT > > > > On Thu, Sep 4, 2014 at 5:20 AM, Petr Danecek <[email protected]> wrote: > Hi Jonathan, > > these are good questions. Could you please try with the latest > release, > I am happy to answer any remaining issues not solved by the > upgrade. > > Cheers, > Petr > > On Sun, 2014-08-17 at 01:34 -0400, Jo R wrote: > > Hello, > > I'm attemping to get nucleotide frequency information at > a number > > of positions across a number of samples, and am having > difficulty > > interpreting some output. Any insights would be > appreciated. > > > > > > I'm running the following command: > > > > > > samtools mpileup -BQ0 -d10000000 -l VariantBed.bed -uf > $refFile $bam > > | bcftools view -bcg - | bcftools view - > > > ${sampleName}_validation.vcf > > > > > > > > I notice that this command creates an output file with > > an unpredictable number of rows. Running the command using > the same > > bed file on a set of different .bam files creates a set of > output vcf > > files with a wide distribution in numbers of rows. > > > > > > I presumed that the difference in row numbers means that > some > > positions drop out on some .bam files because those samples > lacked > > coverage where other samples had coverage. > > > > > > If that's the case, though, I don't know what to make of > lines like > > the following one: > > > > > > 1 2160881 . G . 28.2 . > > DP=0;VDB=0.0003;;AC1=2;FQ=-30 PL 0 > > > > > > > > here, it looks like DP=0, but this position still got > reported in the > > vcf output. I also don't see AC1 in the legend for INFO > tags in the > > samtools specification page, so I don't know what to make of > a value > > of 2. > > > > > > So, I am confused. Positions with a positive value of DP > and DP4 make > > sense to me. But why are some positions completely ommitted > from the > > vcf output, and other positions reporting a DP=0? > > > > > > Thanks for any advice. > > > > > > Best regards, > > Jonathan > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > _______________________________________________ > > Samtools-help mailing list > > [email protected] > > https://lists.sourceforge.net/lists/listinfo/samtools-help > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome > Research > Limited, a charity registered in England with number 1021457 > and a > company registered in England with number 2742969, whose > registered > office is 215 Euston Road, London, NW1 2BE. > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. ------------------------------------------------------------------------------ Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk _______________________________________________ Samtools-help mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/samtools-help
