Hi Jonathan, On Tue, 2014-09-23 at 11:05 -0400, jbr950 wrote: > Hi Petr, > The workflow documentation looks great - this effort is undoubtedly > going to help a lot of researchers. Thanks for doing it.
the credits for the workflow should not go to me, the htslib.org page has been put together by Martin Pollard, Thomas Keane and others. :-) > In this instance, I actually don't need statistics and the raw pileups > are fine. Which brings me back to the original issue. With the new > version, the vcf outputs from mpileup, using the same .bed as input, > are still of varying lengths. Do you know of a way to force mpileup > to report data at every position in the .bed file for every run? Or > alternatively, do you know why some positions report a DP=0 and other > positions are omitted altogether? These DP=0 sites are emitted at places where there is an aligned read with an indel. The indel may not appear in the output when the mpileup machinery decides it is not significant enough. I hope this helps, Petr > Thanks again, > Jonathan > > > > On Tue, Sep 23, 2014 at 9:26 AM, Petr Danecek <[email protected]> > wrote: > There is a slowly growing documentation on the new website, > please check > the variant calling workflow > http://www.htslib.org/workflow/ > > You need to use `bcftools call`, not `bcftools view`. > > petr > > > On Tue, 2014-09-23 at 09:25 -0400, jbr950 wrote: > > Good catch - I was trying to use an old version of bcftools > with the > > new version of samtools.. > > > > > > So I changed that to point to the new version of bcftools, > and now: > > > > > > samtools-bcftools-htslib-1.0_x64-linux/bin/samtools mpileup > -BQ0 > > -d10000000 -l $variantBed -uf ${ref} $bamFile > test.bcf > > > > > > > > So far so good.. > > > > > > And then: > > cat test.bcf | > samtools-bcftools-htslib-1.0_x64-linux/bin/bcftools > > view -cg - > > Error: Could not parse --min-ac g > > > > > > > > On the other hand, calling: . > > cat test.bcf | > samtools-bcftools-htslib-1.0_x64-linux/bin/bcftools > > view- > > > > > > > > does create a nice output, so I'm inclined to let it go with > the > > defaults. I don't see equivalent arguments for the -c and -g > flags and > > I'm not particularly attached to the use of Bayesian > inference, > > although it would be good to understand what differences I > can expect > > between the old method (using -c) and the new (sans -c)? > > > > > > Thanks for your time and assistance! > > > > > > Jonathan > > > > > > > > > > > > On Tue, Sep 23, 2014 at 2:40 AM, Petr Danecek > <[email protected]> > > wrote: > > This does not look like output from samtools 1.0 - > it outputs > > VCFv4.2 > > > > petr > > > > On Mon, 2014-09-22 at 14:53 -0400, jbr950 wrote: > > > Hi Petr, > > > Thanks for your reply. I grabbed the > > > samtools-bcftools-htslib-1.0_x64-linux binary and > tried > > again. > > > > > > > > > When I'd used Samtools 0.1.18, the issue I emailed > about was > > that the > > > number of lines in output varied by .bam file, and > I didn't > > understand > > > why lines were being ommitted, and why not in a > common > > manner. > > > > > > > > > Using version 1.0 and the same command (I checked > on the > > older > > > samtools and it is producing output), I get no > output at > > all. Mpileup > > > writes a header and then stops. I have copied and > pasted > > the output. > > > > > > > > > > > > My .bed is 3 columns: chr start stop > > > with no header. > > > > > > > > > Any advice appreciated, thanks! > > > > > > > > > Jonathan > > > > > > > > > > > > > > > > > > > > > [mpileup] 1 samples in 1 input files > > > (mpileup) Max depth is above 1M. Potential memory > hog! > > > [bcf_sync] incorrect number of fields (0 != 5) at > 0:0 > > > [afs] 0:0.000 > > > ##fileformat=VCFv4.1 > > > > ##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read > > depth"> > > > > ##INFO=<ID=DP4,Number=4,Type=Integer,Description="# > > high-quality > > > ref-forward bases, ref-reverse, alt-forward and > alt-reverse > > bases"> > > > > > > ##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square > > > mapping quality of covering reads"> > > > > ##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred > > probability of > > > all samples being the same"> > > > > > > ##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood > > > estimate of the first ALT allele frequency > (assuming HWE)"> > > > > > > ##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood > > > estimate of the first ALT allele count (no HWE > assumption)"> > > > ##INFO=<ID=G3,Number=3,Type=Float,Description="ML > estimate > > of genotype > > > frequencies"> > > > > ##INFO=<ID=HWE,Number=1,Type=Float,Description="Chi^2 based > > HWE test > > > P-value based on G3"> > > > > ##INFO=<ID=CLR,Number=1,Type=Integer,Description="Log ratio > > of > > > genotype likelihoods with and without the > constraint"> > > > > ##INFO=<ID=UGT,Number=1,Type=String,Description="The most > > probable > > > unconstrained genotype configuration in the trio"> > > > > ##INFO=<ID=CGT,Number=1,Type=String,Description="The most > > probable > > > constrained genotype configuration in the trio"> > > > > ##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for > > strand > > > bias, baseQ bias, mapQ bias and tail distance > bias"> > > > > ##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates > > that the > > > variant is an INDEL."> > > > > ##INFO=<ID=PC2,Number=2,Type=Integer,Description="Phred > > probability of > > > the nonRef allele frequency in group1 samples > being larger > > (,smaller) > > > than in group2."> > > > > ##INFO=<ID=PCHI2,Number=1,Type=Float,Description="Posterior > > weighted > > > chi^2 P-value for testing the association between > group1 and > > group2 > > > samples."> > > > > ##INFO=<ID=QCHI2,Number=1,Type=Integer,Description="Phred > > scaled > > > PCHI2."> > > > ##INFO=<ID=PR,Number=1,Type=Integer,Description="# > > permutations > > > yielding a smaller PCHI2."> > > > > ##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant > > Distance > > > Bias"> > > > > ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> > > > > ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype > > Quality"> > > > > ##FORMAT=<ID=GL,Number=3,Type=Float,Description="Likelihoods > > for > > > RR,RA,AA genotypes (R=ref,A=alt)"> > > > > ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="# > > high-quality > > > bases"> > > > > > > ##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled > strand > > > bias P-value"> > > > > ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of > > > Phred-scaled genotype likelihoods"> > > > #CHROM POS ID REF ALT QUAL > FILTER INFO > > FORMAT > > > > > > > > > > > > On Thu, Sep 4, 2014 at 5:20 AM, Petr Danecek > > <[email protected]> wrote: > > > Hi Jonathan, > > > > > > these are good questions. Could you please > try with > > the latest > > > release, > > > I am happy to answer any remaining issues > not solved > > by the > > > upgrade. > > > > > > Cheers, > > > Petr > > > > > > On Sun, 2014-08-17 at 01:34 -0400, Jo R > wrote: > > > > Hello, > > > > I'm attemping to get nucleotide > frequency > > information at > > > a number > > > > of positions across a number of samples, > and am > > having > > > difficulty > > > > interpreting some output. Any insights > would be > > > appreciated. > > > > > > > > > > > > I'm running the following command: > > > > > > > > > > > > samtools mpileup -BQ0 -d10000000 -l > > VariantBed.bed -uf > > > $refFile $bam > > > > | bcftools view -bcg - | bcftools view - > > > > > > ${sampleName}_validation.vcf > > > > > > > > > > > > > > > > I notice that this command creates an > output file > > with > > > > an unpredictable number of rows. > Running the > > command using > > > the same > > > > bed file on a set of different .bam > files creates > > a set of > > > output vcf > > > > files with a wide distribution in > numbers of rows. > > > > > > > > > > > > I presumed that the difference in row > numbers > > means that > > > some > > > > positions drop out on some .bam files > because > > those samples > > > lacked > > > > coverage where other samples had > coverage. > > > > > > > > > > > > If that's the case, though, I don't know > what to > > make of > > > lines like > > > > the following one: > > > > > > > > > > > > 1 2160881 . G . > 28.2 . > > > > DP=0;VDB=0.0003;;AC1=2;FQ=-30 PL > 0 > > > > > > > > > > > > > > > > here, it looks like DP=0, but this > position still > > got > > > reported in the > > > > vcf output. I also don't see AC1 in the > legend > > for INFO > > > tags in the > > > > samtools specification page, so I don't > know what > > to make of > > > a value > > > > of 2. > > > > > > > > > > > > So, I am confused. Positions with a > positive > > value of DP > > > and DP4 make > > > > sense to me. But why are some positions > > completely ommitted > > > from the > > > > vcf output, and other positions > reporting a DP=0? > > > > > > > > > > > > Thanks for any advice. > > > > > > > > > > > > Best regards, > > > > Jonathan > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > > > _______________________________________________ > > > > Samtools-help mailing list > > > > [email protected] > > > > > > > https://lists.sourceforge.net/lists/listinfo/samtools-help > > > > > > > > > > > > > > > -- > > > The Wellcome Trust Sanger Institute is > operated by > > Genome > > > Research > > > Limited, a charity registered in England > with > > number 1021457 > > > and a > > > company registered in England with number > 2742969, > > whose > > > registered > > > office is 215 Euston Road, London, NW1 > 2BE. > > > > > > > > > > > > > > > > -- > > The Wellcome Trust Sanger Institute is operated by > Genome > > Research > > Limited, a charity registered in England with > number 1021457 > > and a > > company registered in England with number 2742969, > whose > > registered > > office is 215 Euston Road, London, NW1 2BE. > > > > > > > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome > Research > Limited, a charity registered in England with number 1021457 > and a > company registered in England with number 2742969, whose > registered > office is 215 Euston Road, London, NW1 2BE. > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. ------------------------------------------------------------------------------ Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk _______________________________________________ Samtools-help mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/samtools-help
