Hi there, I'm investigating a mistake I made when running BWA (a little embarrassing, but it happens). I had initially failed to use the -I flag on BWA to tell it to convert our Illumina ASCII-64 qualities appropriately. Now I'm taking a look to see how much differnce it makes in our downstream analysis.
Anyway, that's not why I'm writing - while delving into this I've found something unexpected about the "identical" function, and I'd love to understand that better if possible. I used scanBam to read in my two BWA output files - with and without the "-I" option when I ran BWA (same input seqs, everything else the same). As I expected, with the -I option BWA converted the qualities (my lane1read2_bwaWithI_bamscan object), and in the other it didn't (my lane1read2_bwaNotI_bamscan object). That makes sense to me - here's how the quals look after reading BWA output in with scanBam. ### when I use the -I option with BWA: > head(lane1read2_bwaWithI_bamscan[[1]][["qual"]]) A PhredQuality instance of length 6 width seq [1] 50 HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHFHHHHFHFF [2] 36 GFFEEHHFHHHHHHHEFEEEHHHHHFHHHEEFEE@E [3] 50 ################################################## [4] 31 HHHHHHHHHHHHHHHHHHHHHHEHHHHFHFH [5] 24 HHHHHHHHHHHHHHDFHHHHHHHH [6] 50 FFFFCD=8BGEBHGHHHHEHGFGHHDHEHFHEHHHHGHHHHHHHHHGHHH ### when I don't use the -I option with BWA: > head(lane1read2_bwaNotI_bamscan[[1]][["qual"]]) A PhredQuality instance of length 6 width seq [1] 50 gggggggggggggggggggggggggggggggggggggggggeggggegee [2] 36 feeddggegggggggdedddgggggegggddedd_d [3] 50 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB [4] 31 ggggggggggggggggggggggdggggegeg [5] 24 ggggggggggggggcegggggggg [6] 50 eeeebc\Wafdagfggggdgfefggcgdgegdggggfgggggggggfggg The part I don't understand is when I use identical to test whether those two qual objects are the same, it says they are identical: > identical( lane1read2_bwaWithI_bamscan[[1]][["qual"]], > lane1read2_bwaNotI_bamscan[[1]][["qual"]]) [1] TRUE In one sense they're the same (if you convert from Illumina qualities to normal Phred qualities), but how did identical know to do that? I might be misunderstanding identical - I thought that EVERYTHING about the object had to be identical. I want to understand this as it seems like a dangerous misunderstanding - I use identical a fair amount to make sure I'm not screwing things up. Really I should probably have converted the "qual" item in my scamBam-generated list to a SolexaQuality object to keep track of the fact they're offset by 64 and not 33, but that's OK. At this point I just want to take a look at what's changed about the BWA output when I use that -I flag. Any thoughts? thanks, Janet _______________________________________________ Bioc-sig-sequencing mailing list Bioc-sig-sequencing@r-project.org https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing