Hi Shaun,

This is a response from one of our engineers who looked into your question:

This 3000 base sequence is aligning to the - strand.
(Note that blat really just aligns the reverse-complement of the query 
to the positive strand for nucleotides)

So that means that the first 50, 60, 70 bases are going to be on the 
right side of the
screen if you show the blat result.

And sure enough, the repeat tracks shows that this is an ALU SINE.

Although we do not use repeat-masking of the genome itself with 
hgBlat/gfServer,
there is still the over-used-tile effect that will prevent over-used 
sequences from becoming
seed-hits and starting an alignment attempt.

If one needs to study the perfect and complete alignment of repeated 
sequences,
no matter how small, then BLAT simply is not the right tool for that job.

Please contact the mail list ([email protected]) again if you have any 
further questions.

Katrina Learned
UCSC Genome Bioinformatics Group

On 9/9/11 3:48 PM, Shaun Cordes wrote:
> Hi
>
> We have identified a sequence on human chr22 that web-based BLAT struggles to 
> identify when presented as a query.  When you BLAT a large sequence around 
> this region (3000 bp), BLAT identifies the correct region on chr22.  However, 
> when you BLAT the first 50 bp of the same sequence, chr22 does not even come 
> up in the BLAT results even though it has 100% identity to the queried 
> sequence. The problem remains when you BLAT the first 60bp but is corrected 
> when you BLAT the first 70bp.
>
> Can you please explain why BLAT behaves this way? The query sequence is below.
>
> GAGGTTGCAGTGAGCTGAGATTGTGCCACTGCACTCTAGCCTGGGGAACAGAGCGAGATTTGGTCTCCAAAAACAAAACACAACAACAACAAAAAAATGGAGACAAGGGCAGCAAAAGGGAGGTCACCCACAAAGCGTGCATGTGTCTCGTGATTTTTTTTTCTTTTTTTTTCTTTGTGAGACAGAGTCTTGCTCTGTTGCCCAGGCTGGAGTACAGTGGCGCAATCTGGGTTCACTGCAAGCTCTGCCTCCCGGGTTCATGCCATTCTTCGTTACATCTAGAGAATGTCTAACGCCTCTTCTGGATAGATATTCAGGCGCCCGCCACCACCACGCTTGGCTAATTTTTTGTATTTTTTAGTAGACACAGGGTTTCACCATCTTAGCCATGATGTTCTCAATCTCCTGACCTCGTAATCTGCCCACCTCCGCCTTCCAAAGTGCTGGGATTACAGGCATGAGCCACCGTGCCTGGACCTTGTATTCTAATTGAGACAGGGTCTCACTTTGTCACCCAGACTGGAGTGCAGTGGCATGATCATGGCTGTCTGCAGCCTCGACCTCCTGGGGCTAAGCACTCCTCCCAACTAAGGCTGCTGAGTAGCTGGGACCACAGGCGTGTGCCACCACGACCAGCTAATTTTTTTTTTTTTTTTTTTTTTTTTTTAAATACAGGGTCTCGCTATCTCGCTCAGGCTGGTCTCTAACTCCTGGGCTCAAGCTATCTTCCCACCTCAGCTTCCCAATGTGCTGGGATTACAGGTGTGAGCCACTGCGCCTGGCCTGTGTCCCGTGTTTCGATTGCATGTTTACAGCGGCAGGGTGTGTGTGTGTGGGTGTGTGTGCGCGTGCGTGTGGTGGGGCAGTGGGGGGCTTCTCCTCCTTCAGACAACAAATATGTGTGGAGCTCTGACCTACCAGGCCCACTGCCAGATGCAGGAGGCACAAAGATGACCAAATCAGCCATAGTGGGGGTTGCGGAGTTCA!
 CA!
>   
> TGCCTCGGGAAACAGAGGAATCAATAGGCTCTTATTGGACAGTGTGGTTACCTGCAGGGCAGGGACAGCTTGCCTGGGGGTGTCAGGGCAGGGTTCCCAGCTGAGGCCCCAAGGAGTGGGGAAGCCAGTGAGCAAGCACAGGGGAGAATTCCAGGTAAGGGAGCCCAGGCTGGAGTCCCAGAGGGGAGAGAAGGCGTGGAAGCAGCATGTGCTTGGGAGAGAGGCGGGTGCGTGAGAGGTGTGGGGAGGGTATTGGAAGTAAGGGGCATGGCTGGAGCAGGGGCAGAGGGAAAGAGAGAACGTCCCTGAAGACCCTGGGAAGCCTGGTGCTCAGCGTGACCCCAGGGAGAAACAGGCTTGTTGGGTGAGAGGGAGAAAGAAGCTTCTCCACGTCTCACGGGATGTCCTCCCCTGGACCCCACCAGCTCCCCGTCCACACCGGGCACCCACCTCTGCAGCAGCTGCTTCAGGCTGAACACGTGCGAGGGCTCCTGGGAGGCCTCTACCTGCAGCACGTGCTCTATGGCTCTCCGCTTGTAGGTCAGCAAGTTCTCCATTTCCTAAGGAGGCACAGTTGGCTGGTGGGCAGGGATCCAGGCTGCAGACGAGGTCTCTGCCCCTCCTAGCCCTGCCCTCTGCTCTCAGCTGAGCCCCTCTCGGCAAAGGGGCACACAGCCCCAGCCCTGGCTAGTCAGCCCCACGATTCTGAACGAGTCTTTGCTTCGCCTCAGTCACAGCCAGCCTGGTCTTGCCTGCGGCCAGGGACAGTTGAAGAGATCAGGTGGGATGATGGAGAGGAAGGGGAATCTCTGTCAACTGCCAAGTGCTTTGCAAAGGCATGAAATTAGGGTCCGGAGGCCACGAACCATCTGGGGTTGGCAGTTTCATCTGGTTCAACGAATGCCATCATTCTCAGTGGCAGAAAGGGACTGGCAGAAAGGGCGCATTTCTGGGGCTTCCAAGCTCCCCCATGATGTGTGGCTCA!
 CTC!
>   TGTGAGGACAGCCTCAGGCCCACTGCCTGGGGCTGCAGAGGGGGGCATCTCCCATCTACCCATGCCTGAG
> AATGCAGGACAAGAGGACCAAATGCCCCTCTTGACTCGCCCAGGAGGAAGCCCCTGGGGGCACTGCACTTGTCAATTCAGTGGGTGAAATGCAGAAAGAAGGGAGTGTCTGCTGTCTGCCATCTGTAAATTGGATCCTTATAATAGCCCCTGAGCTAGCTGAGCTCATGCAAGGAACATTTACTGAGCCCTTACTAGGTGCCAGGATTGGGGATGTGGTGACAGAGATGGACATGATCCTTATCCTTATGGGATCTGCCGTCATTCTTTTTTTTTTTTTTTTTTTTTTTTTTGGGAGACGGAGTTTCCCTCTTGTTGCCCAGGCTGGAGTGCAATGGCGCAATCTCGGCTCACTGCAACCTCCGCCTCCTGGGTTCAAGCGATTCTCCTGCCTCGGCCTCCCAAGTAGCTGGGATGACAGGCATGTGCCACCACGCCCAGCTAATTTTGTACTTTCAGTAGAGACGGGATTTCTCCATGTTGGTCAGGCTGGTCTCGAAATCCTGACCTCAGGTGATCCGCCAGCCTCGGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGCCACCGCATCTGGCCGATCTGTGGTCATTCTATGGACAAGAAAACTGAAACTGATGAGACTTGTGTCACAACTGATAGGCTGAGACCTTGGCAAGAAAATGACAGGATGTCAGAGCTGGCCAGGTCATTGAGCCCAACCTCATTTTACACGTGCGGGAACTAAGGCCCAGAGAGGCAAAGTAACTTGCCAGGGGTCACCCAGCAGGTAAACCAGGCCAGGACTCAGACGCAGAGACAGTACTGGTTATGGGTAGTGGTGGTCAGAGGGGTCAGTCCTTGTTCCTTCAGTTCACATTTAGGGCTGTCATGTGTGGTGGTGCAATGTGGACACAGTGCAAAGGCATTCGACTGGGTGTGAGTGCAGTGTGCCGGCCCAGCTGTGTGCCCTAGAA
>
>
> thanks!
>
> Shaun
>
>
> Shaun Cordes, PhD | Customer Support Scientist | Complete Genomics, Inc.
> Toll-free:  (855) 267-5358 | Direct:  (650) 943-2651
> [email protected]
>
> Complete Genomics User Community
> community.completegenomics.com
>
>
> ---
>
> ________________________________
>
> The contents of this e-mail and any attachments are confidential and only for 
> use by the intended recipient. Any unauthorized use, distribution or copying 
> of this message is strictly prohibited. If you are not the intended recipient 
> please inform the sender immediately by reply e-mail and delete this message 
> from your system. Thank you for your co-operation.
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome


_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to