On Mon, Nov 30, 2015 at 11:29:12AM +0100, Petr Danecek wrote:
> > HISEQ2500-09:92:H8PJKADXX:1:1101:5415:2846      99      NC_024331.1     
> > 24859997        4       76M3D1M2D23M    =       24860151        255     
> > GATATTTGAGTTAATGTATGATGTGAAATGTGACTTTTTATTACATACTGTATCGATTATGGGACTATAACTCAACATCAAGTAAGGCTGCTGTCACTTA
> >     
> > CCCFFFFFHHHHHJJJIJJJJJJIJJJJJJJJJJJJJJJJJJJJJJJJJJIJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJHHHHHHFFFFEEEEEEED
> >     AS:i:-37        XS:i:-58        XN:i:0  XM:i:2  XO:i:2  XG:i:5  NM:i:7  
> > MD:Z:48C23T3^TCA1^GG23  YS:i:-8 YT:Z:CP
> 
> > This is clearly not a uniquely mapping read (due to presence of XS
> > flag) but it has a MAPQ score of 4. Therefore, this would not be
> > filtered out using -q 1?
> 
> 
> Some (most?) aligners set mapping quality to 0 when the placement of a
> read is ambiguous. The XS flag is not checked by samtools.

IMO samtools should check XS, so I'm glad it doesn't.  XS is a
user-defined flag and not specified in the SAM standard, so while it
happens to mean one thing to Bowtie, it could mean something totally
different to another tool-chain.  (Probably won't, but could do.)  Nor
is it fair to claim the mapping quality should be 0 for ambiguous
reads, as technically it can still be higher for a completely random
placed read, depending on the number of alternate sites it can map
too.

Eg a mapping quality of 4 means 60% probability of the read being
mapped correctly, so slightly above random placement (out of 2
possible locations), but it's still probably a close repeat with 1 bp
different or something similar.  Similarly having 3 identical places
to align to means picking one at random should give mapping quality of
approx 2.

I guess the correct thing is to make sure that -q is specified high
enough, eg -q 5, to ensure the low mapping quality reads are filtered
out.  

James

-- 
James Bonfield ([email protected]) | Hora aderat briligi. Nunc et Slythia Tova
                                  | Plurima gyrabant gymbolitare vabo;
  A Staden Package developer:     | Et Borogovorum mimzebant undique formae,
https://sf.net/projects/staden/   | Momiferique omnes exgrabure Rathi. 


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to