http://genome.ucsc.edu/FAQ/FAQblat.html#blat8

QUOTE:
The formula to find the shortest query size that will guarantee a match 
(if matching tiles are not marked as overused) is:
    2 * stepSize + tileSize - 1

But this formula works for very short dna sequences that still require 2 
tile hits.

If you have protein, and it only requires one hit,
but t = tileSize = 5 and s = stepSize defaults to t,
then the formula for minimum guaranteed found protein is:

    1 * stepSize + tileSize - 1

And that works out to 5 + 5 - 1 = 9, but your query
only has 7 perfect matches.

Even with a stepSize = s = 4, the minimum is still 8.

However, when at the limits of sensitivity, you do sometimes
get lucky, some will just happen to be in the right position
to create a hit. Changing from stepSize 5 to 4 can make that
happen.  Of course, it would also mean that although you would
gain some sensitivity, in the general case there would be
cases lost that used to work on stepSize 5.

s = 5
t = 5

for stepsize = tileSize = 5,
min guaranteed size = s - 1 + t = 5 - 1 + 5 = 9

database target tiles:

xxxxx
      xxxxx
           xxxxx

query string of minimum length 9:

123456789
  123456789
   123456789
    123456789
     123456789

for stepsize=4, tileSize = 5,
min guaranteed size = 4 - 1 + 5 = 8.

database target tiles:

xxxxx
     xxxxx
         xxxxx
             xxxxx

query string of minimum length 8:

12345678
  12345678
   12345678
    12345678


On 05/09/12 09:24, Oscar Conchillo Solé wrote:
> Hello all!
>      I've been using blat this year quite a lot, with very good results 
> for proteome comparison, but I recenly found a weard case with two 
> proteins which are quite similar (I paste the aligment below) and blat 
> returns an empty file (just the field descriptions) like they were 
> completely different.
> 
> Is this result I reproduce here normal? I found it weard beacuse there 
> is a fragment of 7 AA which are identical and with the "-tileSize=5" 
> option (default) I got no results but with "-tileSize=4" I find a match.
> 
> Thank you very much and congratulations for your program, much more easy 
> to use, port and install than NCBI blast
> 
> Suporting information:
> 
> my two proteins are (expresed in UNIPROT codes):
>      P0C054
>      Q9HZ98
> 
> my blat version:
>      blat - Standalone BLAT v. 34 fast sequence search command line tool 
> (linux x86_32)
> 
> executed command:
>      blat -prot Q9HZ98.UP.fasta P0C054.UP.fasta 
> P0C054+Q9HZ98.blatdefaults.psl
> 
> aligment of those two proteins with clustalw:
> CLUSTAL 2.0.10 multiple sequence alignment
> 
> 
> sp|P0C054|IBPA_ECOLI        
> MRN-FDLSPLYRSAIGFDRLFNHLENNQSQSNGG-YPPYNVELVDENHYR
> tr|Q9HZ98|Q9HZ98_PSEAE      
> MSNAFSLAPLFRHSVGFDRFNDLFESALRNEAGSTYPPYNVEKHGDDEYR
>                              * * *.*:**:* ::****: : :*.   :. *. *******  
> .::.**
> 
> sp|P0C054|IBPA_ECOLI        
> IAIAVAGFAESELEITAQDNLLVVKGAHADEQKE-RTYLYQGIAERNFER
> tr|Q9HZ98|Q9HZ98_PSEAE      
> IVIAAAGFQEEDLDLQVERGVLTVSGGKREKSTDNVTYLHQGIAQRAFKL
>                              *.**.*** *.:*:: .: .:*.*.*.: ::..:  
> ***:****:* *:
> 
> sp|P0C054|IBPA_ECOLI        
> KFQLAENIHVRGANLVNGLLYIDLERVIPEAKKPRRIEIN---------
> tr|Q9HZ98|Q9HZ98_PSEAE      
> SFRLADHIEVKAASLANGLLNIDLVRLVPEEAKPKRIAINGQRPALDNQ
>                              .*:**::*.*:.*.*.**** *** *::**  **:** **
> 
> 
> 
> 
> 
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to