On Thu, May 28, 2015 at 05:07:53PM -0700, Dario Copetti wrote: > Getting to the practical point, I took the value in col 9 of the > reads that have either a flag = 83 or 99, and the distribution is > similar to what I expect (between 30 and 40 kb). If I take the col 9 > values of the reads with flag 97 or 81, 99% of the hits are either > below 5 kb or above 90 kb. aren't these still pairs of reads, > aligned towards each other?
The proper-pair flag (2) is set by the aligner and it is up to that software to describe precisely how it defines a proper pair. Your own analysis seems to imply it is using the insert size as a tool for distinguishing between proper pair and not. Some aligners will subsample; so the first N reads get aligned and from this the insert size distribution is determined and normal vs abnormal insert sizes are then chosen.[1] However it really is outside the scope of SAM spec itself, so the aligner specific help forums would be better served to your query. James [1] This is why it is recommended that any realignment task - taking an existing set of aligned data and realigning using new parameters - is done with data that is not in a genome specific order (eg use name sorted instead) to avoid any genomic location biases on insert size. -- James Bonfield ([email protected]) | Hora aderat briligi. Nunc et Slythia Tova | Plurima gyrabant gymbolitare vabo; A Staden Package developer: | Et Borogovorum mimzebant undique formae, https://sf.net/projects/staden/ | Momiferique omnes exgrabure Rathi. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. ------------------------------------------------------------------------------ _______________________________________________ Samtools-help mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/samtools-help
