James, Thanks for the clarification, this helps.
Dario On 05/29/2015 01:30 AM, James Bonfield wrote: > On Thu, May 28, 2015 at 05:07:53PM -0700, Dario Copetti wrote: >> Getting to the practical point, I took the value in col 9 of the >> reads that have either a flag = 83 or 99, and the distribution is >> similar to what I expect (between 30 and 40 kb). If I take the col 9 >> values of the reads with flag 97 or 81, 99% of the hits are either >> below 5 kb or above 90 kb. aren't these still pairs of reads, >> aligned towards each other? > The proper-pair flag (2) is set by the aligner and it is up to that > software to describe precisely how it defines a proper pair. > > Your own analysis seems to imply it is using the insert size as a tool > for distinguishing between proper pair and not. Some aligners will > subsample; so the first N reads get aligned and from this the insert > size distribution is determined and normal vs abnormal insert sizes > are then chosen.[1] > > However it really is outside the scope of SAM spec itself, so the > aligner specific help forums would be better served to your query. > > James > > [1] This is why it is recommended that any realignment task - taking an > existing set of aligned data and realigning using new parameters - is > done with data that is not in a genome specific order (eg use name > sorted instead) to avoid any genomic location biases on insert size. > -- Dario Copetti, PhD Research Associate | Arizona Genomics Institute University of Arizona | BIO5 1657 E. Helen St. Tucson, AZ 85721, USA www.genome.arizona.edu ------------------------------------------------------------------------------ _______________________________________________ Samtools-help mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/samtools-help
