Hello,
I am having issues interpreting the result of the alignment of a MP
library to a reference. I aligned the adapter trimmed, RC reads with bwa
aln, then bwa sampe, then produced also the sorted bam file.
My goal is to determine how many pairs are aligned concordantly and at
which distance. If I count the diferent flags in the SAM file, i get
this table (# of reads, flag [grouped], and explanation):
3,159,234 77 both unmapped
3,159,234 141
2,150,609 113 paired, reverse, mate is reverse
2,150,608 177
2,143,666 65 paired, first
2,143,662 129
10,713 97 paired, mate is reverse, first in pair
10,711 145
10,610 81 paired, reverse strand, first in pair
10,610 161
7,930 83 paired, mapped in proper pair, read is reversed
7,930 163
7,767 99 paired, mapped in proper pair, mate is reversed
7,767 147
4 133 paired, unmapped, second in pair
2 149 paired, unmapped, second in pair, reverse strand
1 181 paired, unmapped, second in pair, mate reverse strand
Of course there is a lot of unmapped reads, and I will look into that.
My problem is that I can not understand the difference between some
flags: 97 and 81 vs 83 and 99 for example. Are 97 and 81 giving less
information than 83 and 99? To me the latter two are a subset of the two
- or am I missing something?
Getting to the practical point, I took the value in col 9 of the reads
that have either a flag = 83 or 99, and the distribution is similar to
what I expect (between 30 and 40 kb). If I take the col 9 values of the
reads with flag 97 or 81, 99% of the hits are either below 5 kb or above
90 kb. aren't these still pairs of reads, aligned towards each other?
Thanks,
Dario
--
Dario Copetti, PhD
Research Associate | Arizona Genomics Institute
University of Arizona | BIO5
1657 E. Helen St.
Tucson, AZ 85721, USA
www.genome.arizona.edu
------------------------------------------------------------------------------
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help