On Mon, Nov 30, 2015 at 11:29:12AM +0100, Petr Danecek wrote: > > HISEQ2500-09:92:H8PJKADXX:1:1101:5415:2846 99 NC_024331.1 > > 24859997 4 76M3D1M2D23M = 24860151 255 > > GATATTTGAGTTAATGTATGATGTGAAATGTGACTTTTTATTACATACTGTATCGATTATGGGACTATAACTCAACATCAAGTAAGGCTGCTGTCACTTA > > > > CCCFFFFFHHHHHJJJIJJJJJJIJJJJJJJJJJJJJJJJJJJJJJJJJJIJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJHHHHHHFFFFEEEEEEED > > AS:i:-37 XS:i:-58 XN:i:0 XM:i:2 XO:i:2 XG:i:5 NM:i:7 > > MD:Z:48C23T3^TCA1^GG23 YS:i:-8 YT:Z:CP > > > This is clearly not a uniquely mapping read (due to presence of XS > > flag) but it has a MAPQ score of 4. Therefore, this would not be > > filtered out using -q 1? > > > Some (most?) aligners set mapping quality to 0 when the placement of a > read is ambiguous. The XS flag is not checked by samtools.
IMO samtools should check XS, so I'm glad it doesn't. XS is a user-defined flag and not specified in the SAM standard, so while it happens to mean one thing to Bowtie, it could mean something totally different to another tool-chain. (Probably won't, but could do.) Nor is it fair to claim the mapping quality should be 0 for ambiguous reads, as technically it can still be higher for a completely random placed read, depending on the number of alternate sites it can map too. Eg a mapping quality of 4 means 60% probability of the read being mapped correctly, so slightly above random placement (out of 2 possible locations), but it's still probably a close repeat with 1 bp different or something similar. Similarly having 3 identical places to align to means picking one at random should give mapping quality of approx 2. I guess the correct thing is to make sure that -q is specified high enough, eg -q 5, to ensure the low mapping quality reads are filtered out. James -- James Bonfield ([email protected]) | Hora aderat briligi. Nunc et Slythia Tova | Plurima gyrabant gymbolitare vabo; A Staden Package developer: | Et Borogovorum mimzebant undique formae, https://sf.net/projects/staden/ | Momiferique omnes exgrabure Rathi. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. ------------------------------------------------------------------------------ Go from Idea to Many App Stores Faster with Intel(R) XDK Give your users amazing mobile app experiences with Intel(R) XDK. Use one codebase in this all-in-one HTML5 development environment. Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs. http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140 _______________________________________________ Samtools-help mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/samtools-help
