James,

Thanks for the clarification, this helps.

Dario


On 05/29/2015 01:30 AM, James Bonfield wrote:
> On Thu, May 28, 2015 at 05:07:53PM -0700, Dario Copetti wrote:
>> Getting to the practical point, I took the value in col 9 of the
>> reads that have either a flag = 83 or 99, and the distribution is
>> similar to what I expect (between 30 and 40 kb). If I take the col 9
>> values of the reads with flag 97 or 81, 99% of the hits are either
>> below 5 kb or above 90 kb. aren't these still pairs of reads,
>> aligned towards each other?
> The proper-pair flag (2) is set by the aligner and it is up to that
> software to describe precisely how it defines a proper pair.
>
> Your own analysis seems to imply it is using the insert size as a tool
> for distinguishing between proper pair and not.  Some aligners will
> subsample; so the first N reads get aligned and from this the insert
> size distribution is determined and normal vs abnormal insert sizes
> are then chosen.[1]
>
> However it really is outside the scope of SAM spec itself, so the
> aligner specific help forums would be better served to your query.
>
> James
>
> [1] This is why it is recommended that any realignment task - taking an
> existing set of aligned data and realigning using new parameters - is
> done with data that is not in a genome specific order (eg use name
> sorted instead) to avoid any genomic location biases on insert size.
>

-- 
Dario Copetti, PhD
Research Associate | Arizona Genomics Institute
University of Arizona | BIO5

1657 E. Helen St.
Tucson, AZ  85721, USA
www.genome.arizona.edu


------------------------------------------------------------------------------
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to