Hi there,

I'm investigating a mistake I made when running BWA (a little embarrassing, but 
it happens).  I had initially failed to use the -I flag on BWA to tell it to 
convert our Illumina ASCII-64 qualities appropriately.  Now I'm taking a look 
to see how much differnce it makes in our downstream analysis.

Anyway, that's not why I'm writing - while delving into this I've found 
something unexpected about the "identical" function, and I'd love to understand 
that better if possible.

I used scanBam to read in my two BWA output files - with and without the "-I" 
option when I ran BWA (same input seqs, everything else the same). As I 
expected, with the -I option BWA converted the qualities (my 
lane1read2_bwaWithI_bamscan object), and in the other it didn't (my 
lane1read2_bwaNotI_bamscan object).  That makes sense to me - here's how the 
quals look after reading BWA output in with scanBam.

### when I use the -I option with BWA:
> head(lane1read2_bwaWithI_bamscan[[1]][["qual"]])
  A PhredQuality instance of length 6
    width seq
[1]    50 HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHFHHHHFHFF
[2]    36 GFFEEHHFHHHHHHHEFEEEHHHHHFHHHEEFEE@E
[3]    50 ##################################################
[4]    31 HHHHHHHHHHHHHHHHHHHHHHEHHHHFHFH
[5]    24 HHHHHHHHHHHHHHDFHHHHHHHH
[6]    50 FFFFCD=8BGEBHGHHHHEHGFGHHDHEHFHEHHHHGHHHHHHHHHGHHH

### when I don't use the -I option with BWA:
> head(lane1read2_bwaNotI_bamscan[[1]][["qual"]]) 
  A PhredQuality instance of length 6
    width seq
[1]    50 gggggggggggggggggggggggggggggggggggggggggeggggegee
[2]    36 feeddggegggggggdedddgggggegggddedd_d
[3]    50 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
[4]    31 ggggggggggggggggggggggdggggegeg
[5]    24 ggggggggggggggcegggggggg
[6]    50 eeeebc\Wafdagfggggdgfefggcgdgegdggggfgggggggggfggg


The part I don't understand is when I use identical to test whether those two 
qual objects are the same, it says they are identical:

> identical( lane1read2_bwaWithI_bamscan[[1]][["qual"]], 
> lane1read2_bwaNotI_bamscan[[1]][["qual"]]) 
[1] TRUE

In one sense they're the same (if you convert from Illumina qualities to normal 
Phred qualities), but how did identical know to do that?  I might be 
misunderstanding identical - I thought that EVERYTHING about the object had to 
be identical.  

I want to understand this as it seems like a dangerous misunderstanding - I use 
identical a fair amount to make sure I'm not screwing things up.

Really I should probably have converted the "qual" item in my scamBam-generated 
list to a SolexaQuality object to keep track of the fact they're offset by 64 
and not 33, but that's OK. At this point I just want to take a look at what's 
changed about the BWA output when I use that -I flag.

Any thoughts?

thanks,

Janet

_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to