Hi Petr,
    Pardon me, I upgraded to the latest (htslib_1.0) and the -t option you
suggested didn't result in any output.  Appears that

    -t, --targets [^]<region>           similar to -r but streams rather
than index-jumps. Exclude regions with "^" prefix
    -r, --regions <region>              restrict to comma-separated list of
regions

Any thoughts?

Jonathan






On Thu, Oct 2, 2014 at 3:49 AM, Petr Danecek <[email protected]> wrote:

> Hi Jonathan,
>
> new versions of mpileup have the -t DPR,INFO/DPR option which outputs
> depth for each observed base.
>
> Cheers,
> petr
>
>
> On Wed, 2014-10-01 at 12:49 -0400, jbr950 wrote:
> > Hi Petr,
> >     Thanks for your time thus far.  I'm trying to query some bam files
> > at specified locations to have a look at the allele frequencies.  The
> > motivation is to 'validate' variants called in one dataset, in a novel
> > dataset using a more simple approach.  Ideally, I'd get the raw counts
> > of the number of reads mapping to A,T,C,G, and/or indel (if detected).
> >
> >
> > I tried mpileup two ways (with -cg in bcftools view to call variants,
> > and without), and  neither seems to provide me the data I would like
> > in the corresponding output:
> >
> > samtools mpileup  -BQ0 -d10000000 -l $variantBed -uf ${ref} $bamFile |
> > $bcftools view - > no_cg.vcf
> >
> > samtools mpileup  -BQ0 -d10000000 -l $variantBed -uf ${ref} $bamFile |
> > $bcftools view -cg -  > cg.vcf
> >
> >
> >
> > Then I check the context around 2 known variants:
> >
> >
> > Variant 1
> > grep -C 1 105213319 cg.vcf
> >
> > 14      105213318       .      A        .       117  .
> > DP=29;AF1=0;AC1=0;DP4=14,15,0,0;MQ=20;FQ=-114   PL      0
> > 14      105213319       .      G        C       25   .
> > DP=28;VDB=0.0398;AF1=0.5;AC1=1;DP4=10,9,4,5;MQ=20;FQ=28;PV4=1,1,1,1
> > GT:PL:GQ        0/1:55,0,128:58
> > 14      105213320       .      T        .       114  .
> > DP=28;AF1=0;AC1=0;DP4=14,14,0,0;MQ=20;FQ=-111   PL      0
> >
> >
> > Here, the variant was called and the DP4 field tells me the number of
> > reference and non-reference (presumably "C") alleles.  I am assuming
> > that there are 0 T's and A's.
> >
> >
> >
> >
> > grep -C 1 105213319 no_cg.vcf
> > 14      105213318       .      A        X       0    .
> > DP=29;I16=14,15,0,0,979,34421,0,0,580,11600,0,0,511,11223,0,0   PL
> >    0,87,198
> > 14      105213319       .      G        C,T,X   0    .
> >
> DP=28;I16=10,9,4,5,645,23047,338,12742,380,7600,180,3600,339,7223,177,4179;VDB=0.0398
>  PL      55,0,128,96,140,195,112,152,198,201
> > 14      105213320       .      T        X       0    .
> > DP=28;I16=14,14,0,0,924,32326,0,0,560,11200,0,0,520,11560,0,0   PL
> >    0,84,195
> >
> >
> > Here, I see that that the non-ref alleles consist of C's ant T's,
> > although I don't see any way to determine the relative proportion or
> > absolute number of each.
> >
> >
> >
> >
> >
> > Trying Variant 2:
> >  grep -C 1 154574018 cg.vcf
> > 1       154574017       .      T        .       283  .
> > DP=274;VDB=0.0365;AF1=0;AC1=0;DP4=146,128,0,0;MQ=20;FQ=-282     PL
> >    0
> > 1       154574018       .      C        .       113  .
> >
> DP=273;VDB=0.0365;AF1=0;AC1=0;DP4=110,100,31,32;MQ=20;FQ=-110;PV4=0.67,1,1,1
>   PL      0
> > 1       154574019       .      T        .       283  .
> > DP=278;VDB=0.0365;AF1=0;AC1=0;DP4=142,136,0,0;MQ=20;FQ=-282     PL
> >    0
> >
> >
> > This variant was not called at all.  I may be willing to "assume" that
> > the 63 'non-ref' reads reported in the DP4 field correspond to my
> > variant, but it would be nice to be sure.
> >
> >
> >
> >
> >
> > grep -C 1 154574018 no_cg.vcf
> > 1       154574017       .      T        X       0    .
> >
> DP=274;I16=146,128,0,0,10042,376816,0,0,5463,109209,0,0,5159,117099,0,0;VDB=0.0365
>     PL      0,255,255
> > 1       154574018       .      C        G,X     0    .
> >
> DP=273;I16=110,100,31,32,7610,283160,2311,86963,4183,83609,1260,25200,3918,88780,1242,28290;VDB=0.0365
> PL      0,83,117,255,255,220
> > 1       154574019       .      T        X       0    .
> >
> DP=278;I16=142,136,0,0,10104,376490,0,0,5543,110809,0,0,5167,116999,0,0;VDB=0.0365
>     PL      0,255,255
> >
> >
> >
> >
> > Here, the C to G is reported, but I again I only have total read depth
> > via the DP field, no breakdown by nucleotide (or am I missing it?).
> >
> >
> > It's true I'm using Samtools 0.1.18.  If upgrading to the latest would
> > give me a better option, I'm happy to do that, but after scouring the
> > available documentation, I don't see a way to do this.
> >
> >
> > In other words, I'd like to inquire at many positions the proportion
> > of A's, T's, C's, G's and/or detected indels.  Mpileup (at least how
> > I'm currently using it) gets me close but not all the way there.
> >
> >
> >
> >
> > Did it make sense?    I'd appreciate any suggestions.
> >
> >
> >
> > Best regards,
> > Jonathan
> >
> >
> >
> > On Tue, Sep 23, 2014 at 11:47 AM, Petr Danecek <[email protected]>
> > wrote:
> >         Hi Jonathan,
> >
> >         On Tue, 2014-09-23 at 11:05 -0400, jbr950 wrote:
> >         > Hi Petr,
> >         >    The workflow documentation looks great - this effort is
> >         undoubtedly
> >         > going to help a lot of researchers.  Thanks for doing it.
> >
> >         the credits for the workflow should not go to me, the
> >         htslib.org page
> >         has been put together by Martin Pollard, Thomas Keane and
> >         others.
> >         :-)
> >
> >         > In this instance, I actually don't need statistics and the
> >         raw pileups
> >         > are fine.  Which brings me back to the original issue.  With
> >         the new
> >         > version, the vcf outputs from mpileup, using the same .bed
> >         as input,
> >         > are still of varying  lengths.  Do you know of a way to
> >         force mpileup
> >         > to report data at every position in the .bed file for every
> >         run?  Or
> >         > alternatively, do you know why some positions report a DP=0
> >         and other
> >         > positions are omitted altogether?
> >
> >         These DP=0 sites are emitted at places where there is an
> >         aligned read
> >         with an indel. The indel may not appear in the output when the
> >         mpileup
> >         machinery decides it is not significant enough.
> >
> >         I hope this helps,
> >         Petr
> >
> >
> >         > Thanks again,
> >         > Jonathan
> >         >
> >         >
> >         >
> >         > On Tue, Sep 23, 2014 at 9:26 AM, Petr Danecek
> >         <[email protected]>
> >         > wrote:
> >         >         There is a slowly growing documentation on the new
> >         website,
> >         >         please check
> >         >         the variant calling workflow
> >         >         http://www.htslib.org/workflow/
> >         >
> >         >         You need to use `bcftools call`, not `bcftools
> >         view`.
> >         >
> >         >         petr
> >         >
> >         >
> >         >         On Tue, 2014-09-23 at 09:25 -0400, jbr950 wrote:
> >         >         > Good catch - I was trying to use an old version of
> >         bcftools
> >         >         with the
> >         >         > new version of samtools..
> >         >         >
> >         >         >
> >         >         > So I changed that to point to the new version of
> >         bcftools,
> >         >         and now:
> >         >         >
> >         >         >
> >         >         >
> >         samtools-bcftools-htslib-1.0_x64-linux/bin/samtools  mpileup
> >         >         -BQ0
> >         >         > -d10000000 -l $variantBed -uf ${ref} $bamFile >
> >         test.bcf
> >         >         >
> >         >         >
> >         >         >
> >         >         > So far so good..
> >         >         >
> >         >         >
> >         >         > And then:
> >         >         > cat test.bcf |
> >         >         samtools-bcftools-htslib-1.0_x64-linux/bin/bcftools
> >         >         > view -cg -
> >         >         > Error: Could not parse --min-ac g
> >         >         >
> >         >         >
> >         >         >
> >         >         > On the other hand, calling: .
> >         >         > cat test.bcf |
> >         >         samtools-bcftools-htslib-1.0_x64-linux/bin/bcftools
> >         >         > view-
> >         >         >
> >         >         >
> >         >         >
> >         >         > does create a nice output, so I'm inclined to let
> >         it go with
> >         >         the
> >         >         > defaults. I don't see equivalent arguments for the
> >         -c and -g
> >         >         flags and
> >         >         > I'm not particularly attached to the use of
> >         Bayesian
> >         >         inference,
> >         >         > although it would be good to understand what
> >         differences I
> >         >         can expect
> >         >         > between the old method (using -c) and the new
> >         (sans -c)?
> >         >         >
> >         >         >
> >         >         > Thanks for your time and assistance!
> >         >         >
> >         >         >
> >         >         > Jonathan
> >         >         >
> >         >         >
> >         >         >
> >         >         >
> >         >         >
> >         >         > On Tue, Sep 23, 2014 at 2:40 AM, Petr Danecek
> >         >         <[email protected]>
> >         >         > wrote:
> >         >         >         This does not look like output from
> >         samtools 1.0 -
> >         >         it outputs
> >         >         >         VCFv4.2
> >         >         >
> >         >         >         petr
> >         >         >
> >         >         >         On Mon, 2014-09-22 at 14:53 -0400, jbr950
> >         wrote:
> >         >         >         > Hi Petr,
> >         >         >         >     Thanks for your reply.  I grabbed
> >         the
> >         >         >         > samtools-bcftools-htslib-1.0_x64-linux
> >         binary and
> >         >         tried
> >         >         >         again.
> >         >         >         >
> >         >         >         >
> >         >         >         > When I'd used Samtools 0.1.18, the issue
> >         I emailed
> >         >         about was
> >         >         >         that the
> >         >         >         > number of lines in output varied by .bam
> >         file, and
> >         >         I didn't
> >         >         >         understand
> >         >         >         > why lines were being ommitted, and why
> >         not in a
> >         >         common
> >         >         >         manner.
> >         >         >         >
> >         >         >         >
> >         >         >         > Using version 1.0 and the same command
> >         (I checked
> >         >         on the
> >         >         >         older
> >         >         >         > samtools and it is producing output), I
> >         get no
> >         >         output at
> >         >         >         all.  Mpileup
> >         >         >         > writes a header and then stops.  I have
> >         copied and
> >         >         pasted
> >         >         >         the output.
> >         >         >         >
> >         >         >         >
> >         >         >         >
> >         >         >         > My .bed is 3 columns: chr   start   stop
> >         >         >         > with no header.
> >         >         >         >
> >         >         >         >
> >         >         >         > Any advice appreciated, thanks!
> >         >         >         >
> >         >         >         >
> >         >         >         > Jonathan
> >         >         >         >
> >         >         >         >
> >         >         >         >
> >         >         >         >
> >         >         >         >
> >         >         >         >
> >         >         >         > [mpileup] 1 samples in 1 input files
> >         >         >         > (mpileup) Max depth is above 1M.
> >         Potential memory
> >         >         hog!
> >         >         >         > [bcf_sync] incorrect number of fields
> >         (0 != 5) at
> >         >         0:0
> >         >         >         > [afs] 0:0.000
> >         >         >         > ##fileformat=VCFv4.1
> >         >         >         >
> >         >         ##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw
> >         read
> >         >         >         depth">
> >         >         >         >
> >         >         ##INFO=<ID=DP4,Number=4,Type=Integer,Description="#
> >         >         >         high-quality
> >         >         >         > ref-forward bases, ref-reverse,
> >         alt-forward and
> >         >         alt-reverse
> >         >         >         bases">
> >         >         >         >
> >         >         >
> >         >
> >         ##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square
> >         >         >         > mapping quality of covering reads">
> >         >         >         >
> >         >         ##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred
> >         >         >         probability of
> >         >         >         > all samples being the same">
> >         >         >         >
> >         >         >
> >         >
> >         ##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood
> >         >         >         > estimate of the first ALT allele
> >         frequency
> >         >         (assuming HWE)">
> >         >         >         >
> >         >         >
> >         >
> >         ##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood
> >         >         >         > estimate of the first ALT allele count
> >         (no HWE
> >         >         assumption)">
> >         >         >         >
> >         ##INFO=<ID=G3,Number=3,Type=Float,Description="ML
> >         >         estimate
> >         >         >         of genotype
> >         >         >         > frequencies">
> >         >         >         >
> >         >
> >          ##INFO=<ID=HWE,Number=1,Type=Float,Description="Chi^2 based
> >         >         >         HWE test
> >         >         >         > P-value based on G3">
> >         >         >         >
> >         >
> >          ##INFO=<ID=CLR,Number=1,Type=Integer,Description="Log ratio
> >         >         >         of
> >         >         >         > genotype likelihoods with and without
> >         the
> >         >         constraint">
> >         >         >         >
> >         >         ##INFO=<ID=UGT,Number=1,Type=String,Description="The
> >         most
> >         >         >         probable
> >         >         >         > unconstrained genotype configuration in
> >         the trio">
> >         >         >         >
> >         >         ##INFO=<ID=CGT,Number=1,Type=String,Description="The
> >         most
> >         >         >         probable
> >         >         >         > constrained genotype configuration in
> >         the trio">
> >         >         >         >
> >         >
> >          ##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for
> >         >         >         strand
> >         >         >         > bias, baseQ bias, mapQ bias and tail
> >         distance
> >         >         bias">
> >         >         >         >
> >         >
> >          ##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates
> >         >         >         that the
> >         >         >         > variant is an INDEL.">
> >         >         >         >
> >         >
> >          ##INFO=<ID=PC2,Number=2,Type=Integer,Description="Phred
> >         >         >         probability of
> >         >         >         > the nonRef allele frequency in group1
> >         samples
> >         >         being larger
> >         >         >         (,smaller)
> >         >         >         > than in group2.">
> >         >         >         >
> >         >
> >          ##INFO=<ID=PCHI2,Number=1,Type=Float,Description="Posterior
> >         >         >         weighted
> >         >         >         > chi^2 P-value for testing the
> >         association between
> >         >         group1 and
> >         >         >         group2
> >         >         >         > samples.">
> >         >         >         >
> >         >
> >          ##INFO=<ID=QCHI2,Number=1,Type=Integer,Description="Phred
> >         >         >         scaled
> >         >         >         > PCHI2.">
> >         >         >         >
> >         ##INFO=<ID=PR,Number=1,Type=Integer,Description="#
> >         >         >         permutations
> >         >         >         > yielding a smaller PCHI2.">
> >         >         >         >
> >         >
> >          ##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant
> >         >         >         Distance
> >         >         >         > Bias">
> >         >         >         >
> >         >
> >          ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
> >         >         >         >
> >         >
> >          ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype
> >         >         >         Quality">
> >         >         >         >
> >         >
> >          ##FORMAT=<ID=GL,Number=3,Type=Float,Description="Likelihoods
> >         >         >         for
> >         >         >         > RR,RA,AA genotypes (R=ref,A=alt)">
> >         >         >         >
> >         >         ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="#
> >         >         >         high-quality
> >         >         >         > bases">
> >         >         >         >
> >         >         >
> >         >
> >         ##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled
> strand
> >         >         >         > bias P-value">
> >         >         >         >
> >         >
> >          ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of
> >         >         >         > Phred-scaled genotype likelihoods">
> >         >         >         > #CHROM  POS     ID      REF     ALT
> >          QUAL
> >         >         FILTER  INFO
> >         >         >         FORMAT
> >         >         >         >
> >         >         >         >
> >         >         >         >
> >         >         >         > On Thu, Sep 4, 2014 at 5:20 AM, Petr
> >         Danecek
> >         >         >         <[email protected]> wrote:
> >         >         >         >         Hi Jonathan,
> >         >         >         >
> >         >         >         >         these are good questions. Could
> >         you please
> >         >         try with
> >         >         >         the latest
> >         >         >         >         release,
> >         >         >         >         I am happy to answer any
> >         remaining issues
> >         >         not solved
> >         >         >         by the
> >         >         >         >         upgrade.
> >         >         >         >
> >         >         >         >         Cheers,
> >         >         >         >         Petr
> >         >         >         >
> >         >         >         >         On Sun, 2014-08-17 at 01:34
> >         -0400, Jo R
> >         >         wrote:
> >         >         >         >         > Hello,
> >         >         >         >         >     I'm attemping to get
> >         nucleotide
> >         >         frequency
> >         >         >         information at
> >         >         >         >         a number
> >         >         >         >         > of positions across a number
> >         of samples,
> >         >         and am
> >         >         >         having
> >         >         >         >         difficulty
> >         >         >         >         > interpreting some output.  Any
> >         insights
> >         >         would be
> >         >         >         >         appreciated.
> >         >         >         >         >
> >         >         >         >         >
> >         >         >         >         > I'm running the following
> >         command:
> >         >         >         >         >
> >         >         >         >         >
> >         >         >         >         > samtools mpileup  -BQ0
> >         -d10000000 -l
> >         >         >         VariantBed.bed -uf
> >         >         >         >         $refFile $bam
> >         >         >         >         > | bcftools view -bcg - |
> >         bcftools view -
> >         >         >
> >         >         >         >         > ${sampleName}_validation.vcf
> >         >         >         >         >
> >         >         >         >         >
> >         >         >         >         >
> >         >         >         >         > I notice that this command
> >         creates an
> >         >         output file
> >         >         >         with
> >         >         >         >         > an unpredictable number of
> >         rows.
> >         >         Running the
> >         >         >         command using
> >         >         >         >         the same
> >         >         >         >         > bed file on a set of
> >         different .bam
> >         >         files creates
> >         >         >         a set of
> >         >         >         >         output vcf
> >         >         >         >         > files with a wide distribution
> >         in
> >         >         numbers of rows.
> >         >         >         >         >
> >         >         >         >         >
> >         >         >         >         > I presumed that the difference
> >         in row
> >         >         numbers
> >         >         >         means that
> >         >         >         >         some
> >         >         >         >         > positions drop out on
> >         some .bam files
> >         >         because
> >         >         >         those samples
> >         >         >         >         lacked
> >         >         >         >         > coverage where other samples
> >         had
> >         >         coverage.
> >         >         >         >         >
> >         >         >         >         >
> >         >         >         >         > If that's the case, though, I
> >         don't know
> >         >         what to
> >         >         >         make of
> >         >         >         >         lines like
> >         >         >         >         > the following one:
> >         >         >         >         >
> >         >         >         >         >
> >         >         >         >         > 1       2160881 .       G
> >            .
> >         >          28.2    .
> >         >         >         >         >
> >         DP=0;VDB=0.0003;;AC1=2;FQ=-30   PL
> >         >         0
> >         >         >         >         >
> >         >         >         >         >
> >         >         >         >         >
> >         >         >         >         > here, it looks like DP=0, but
> >         this
> >         >         position still
> >         >         >         got
> >         >         >         >         reported in the
> >         >         >         >         > vcf output.  I also don't see
> >         AC1 in the
> >         >         legend
> >         >         >         for INFO
> >         >         >         >         tags in the
> >         >         >         >         > samtools specification page,
> >         so I don't
> >         >         know what
> >         >         >         to make of
> >         >         >         >         a value
> >         >         >         >         > of 2.
> >         >         >         >         >
> >         >         >         >         >
> >         >         >         >         > So, I am confused.  Positions
> >         with a
> >         >         positive
> >         >         >         value of DP
> >         >         >         >         and DP4 make
> >         >         >         >         > sense to me.  But why are some
> >         positions
> >         >         >         completely ommitted
> >         >         >         >         from the
> >         >         >         >         > vcf output, and other
> >         positions
> >         >         reporting a DP=0?
> >         >         >         >         >
> >         >         >         >         >
> >         >         >         >         > Thanks for any advice.
> >         >         >         >         >
> >         >         >         >         >
> >         >         >         >         > Best regards,
> >         >         >         >         > Jonathan
> >         >         >         >         >
> >         >         >         >         >
> >         >         >         >         >
> >         >         >         >         >
> >         >         >         >
> >         >         >         >         >
> >         >         >         >
> >         >         >
> >         >
> >
> ------------------------------------------------------------------------------
> >         >         >         >         >
> >         >         _______________________________________________
> >         >         >         >         > Samtools-help mailing list
> >         >         >         >         >
> >         [email protected]
> >         >         >         >         >
> >         >         >
> >         >
> >         https://lists.sourceforge.net/lists/listinfo/samtools-help
> >         >         >         >
> >         >         >         >
> >         >         >         >
> >         >         >         >
> >         >         >         >         --
> >         >         >         >          The Wellcome Trust Sanger
> >         Institute is
> >         >         operated by
> >         >         >         Genome
> >         >         >         >         Research
> >         >         >         >          Limited, a charity registered
> >         in England
> >         >         with
> >         >         >         number 1021457
> >         >         >         >         and a
> >         >         >         >          company registered in England
> >         with number
> >         >         2742969,
> >         >         >         whose
> >         >         >         >         registered
> >         >         >         >          office is 215 Euston Road,
> >         London, NW1
> >         >         2BE.
> >         >         >         >
> >         >         >         >
> >         >         >
> >         >         >
> >         >         >
> >         >         >
> >         >         >         --
> >         >         >          The Wellcome Trust Sanger Institute is
> >         operated by
> >         >         Genome
> >         >         >         Research
> >         >         >          Limited, a charity registered in England
> >         with
> >         >         number 1021457
> >         >         >         and a
> >         >         >          company registered in England with number
> >         2742969,
> >         >         whose
> >         >         >         registered
> >         >         >          office is 215 Euston Road, London, NW1
> >         2BE.
> >         >         >
> >         >         >
> >         >         >
> >         >
> >         >
> >         >
> >         >
> >         >         --
> >         >          The Wellcome Trust Sanger Institute is operated by
> >         Genome
> >         >         Research
> >         >          Limited, a charity registered in England with
> >         number 1021457
> >         >         and a
> >         >          company registered in England with number 2742969,
> >         whose
> >         >         registered
> >         >          office is 215 Euston Road, London, NW1 2BE.
> >         >
> >         >
> >         >
> >
> >
> >
> >
> >         --
> >          The Wellcome Trust Sanger Institute is operated by Genome
> >         Research
> >          Limited, a charity registered in England with number 1021457
> >         and a
> >          company registered in England with number 2742969, whose
> >         registered
> >          office is 215 Euston Road, London, NW1 2BE.
> >
> >
> >
>
>
>
>
> --
>  The Wellcome Trust Sanger Institute is operated by Genome Research
>  Limited, a charity registered in England with number 1021457 and a
>  company registered in England with number 2742969, whose registered
>  office is 215 Euston Road, London, NW1 2BE.
>
------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to