Hi Angie, It seems like writing a script that sends queries to your website would take a lot longer, given the time restraints posed by the site (1/15sec) than using stand alone Blat. Is there any way that I can modify stand alone Blat to get the same output as the website (including the SNP base)?
Kyle Tretina On Tue, Mar 16, 2010 at 11:15 PM, Angie Hinrichs <[email protected]> wrote: > Hi Kyle, > > You're right, there is nothing explicit in the HTML returned from that > command that gives mismatch coordinates. You will need to parse the > alignment section and go through base-by-base to find the nearest mismatch > to the SNP base. In HTML, the section starts like this: > > <PRE><B>Alignment between genome (hg19 chr16:79237287-79237887, + strand; > 601 bp) and dbSNP sequence (rs870; 601 bp)</ B > > ID (including gaps) 99.8%, coverage (of both) 100.0% > > Then there are one or more triplets of lines like this (reference, > match/mismatch indicators, flank): > > 79237487 > AAACAAACAGCTTGTTTGTGGTTCGTCCTGAAATCCTCCCTGCTCACAAAACAGCCAGCTACTTGGTTTTCTAAAAGACGTAATTTTGCAGGCAGACTTC > 79237586 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 00000201 > AAACAAACAGCTTGTTTGTGGTTCGTCCTGAAATCCTCCCTGCTCACAAAACAGCCAGCTACTTGGTTTTCTAAAAGACGTAATTTTGCAGGCAGACTTC > 00000300 > > Then the SNP base itself: > > <B>79237587 G 79237587 > > 00000301 R 00000301 > > And then more triplets (the first one beginning with </B>): > > </B>79237588 > TAGAGCCATTCTGTGCAGAAGAAGGGAAGGGAGAAGCTGTTTGTTTTACCTGTAGTATGAAGATATTCTTTGCGCTGTTAGAACTGAGCTCATTAATTCT > 79237687 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 00000302 > TAGAGCCATTCTGTGCAGAAGAAGGGAAGGGAGAAGCTGTTTGTTTTACCTGTAGTATGAAGATATTCTTTGCGCTGTTAGAACTGAGCTCATTAATTCT > 00000401 > > As you noticed, when there is a mismatch, there is a space between the |'s. > So one approach would be to collect the |||'s from each flank, and step > backwards from the end of the left flank & forwards from the start of the > right flank, looking for the closest spaces. I personally would prefer to > collect the actual sequences, and step through reference bases vs. flanking > bases -- I think bugs would become more obvious that way. > > It ain't pretty, but if I had to do it (without leveraging my experience > with our C code base), that's how I would do it. > > Hope that helps, > Angie > > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
