Hi Angie,

It seems like writing a script that sends queries to your website would take
a lot longer, given the time restraints posed by the site (1/15sec) than
using stand alone Blat. Is there any way that I can modify stand alone Blat
to get the same output as the website (including the SNP base)?




Kyle Tretina


On Tue, Mar 16, 2010 at 11:15 PM, Angie Hinrichs <[email protected]> wrote:

> Hi Kyle,
>
> You're right, there is nothing explicit in the HTML returned from that
> command that gives mismatch coordinates. You will need to parse the
> alignment section and go through base-by-base to find the nearest mismatch
> to the SNP base. In HTML, the section starts like this:
>
> <PRE><B>Alignment between genome (hg19 chr16:79237287-79237887, + strand;
> 601 bp) and dbSNP sequence (rs870; 601 bp)</ B >
> ID (including gaps) 99.8%, coverage (of both) 100.0%
>
> Then there are one or more triplets of lines like this (reference,
> match/mismatch indicators, flank):
>
> 79237487
> AAACAAACAGCTTGTTTGTGGTTCGTCCTGAAATCCTCCCTGCTCACAAAACAGCCAGCTACTTGGTTTTCTAAAAGACGTAATTTTGCAGGCAGACTTC
> 79237586
>
> ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> 00000201
> AAACAAACAGCTTGTTTGTGGTTCGTCCTGAAATCCTCCCTGCTCACAAAACAGCCAGCTACTTGGTTTTCTAAAAGACGTAATTTTGCAGGCAGACTTC
> 00000300
>
> Then the SNP base itself:
>
> <B>79237587 G 79237587
>
> 00000301 R 00000301
>
> And then more triplets (the first one beginning with </B>):
>
> </B>79237588
> TAGAGCCATTCTGTGCAGAAGAAGGGAAGGGAGAAGCTGTTTGTTTTACCTGTAGTATGAAGATATTCTTTGCGCTGTTAGAACTGAGCTCATTAATTCT
> 79237687
>
> ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> 00000302
> TAGAGCCATTCTGTGCAGAAGAAGGGAAGGGAGAAGCTGTTTGTTTTACCTGTAGTATGAAGATATTCTTTGCGCTGTTAGAACTGAGCTCATTAATTCT
> 00000401
>
> As you noticed, when there is a mismatch, there is a space between the |'s.
>  So one approach would be to collect the |||'s from each flank, and step
> backwards from the end of the left flank & forwards from the start of the
> right flank, looking for the closest spaces.  I personally would prefer to
> collect the actual sequences, and step through reference bases vs. flanking
> bases -- I think bugs would become more obvious that way.
>
> It ain't pretty, but if I had to do it (without leveraging my experience
> with our C code base), that's how I would do it.
>
> Hope that helps,
> Angie
>
>
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to