Hi, Middha!

What you found is some minor inconsistency
in the default filtering levels for the different output
types.  I ran the query using the data you gave me for
psl, blast, and blast8.

The psl output showed nothing,
the blast showed 2 hits as you mentioned,
and the blast8 showed 4 hits.

Wondering what could explain that,
and noticing that the high-scoring hit
has low-percent-identity,
I re-ran the 3 output types but with these flags:
  -minScore=0 -minIdentity=80
This time all 3 output types show very similar output.
If I do the psl with -fine also, then it is even more similar.

The alignment that seems to have a high e-val
is actually considered by blat to be a low-quality
alignment because of the many mismatches.

First of all, blat at default stepsize
can't even be guaranteed to find alignments that
are less than 90% or 95% identity.  Although changing
stepSize and some other params may increase sensitivity,
queries like the one you are trying to align are
somewhat distant and with low homology.  Perhaps aligning
in protein space would help, but there are limits
all the same.

Secondly, psl is the preferred output if you are going to
use BLAT a lot.  The blast and blast8 output are just there
for a convenience.  The e-val for blast blast8 in particular has a known
issue where it has a hard-wired assumed size of a mammalian genome
or about 3GB (Gigabases) for the database size, despite the fact
that your database is actually only 89k bases in this example.
On the other hand, the psl format supports chaining of exons
into a full rna alignment which can be very useful.

Typically people use additional tools such as pslCDnaFilter or pslReps 
to post-filter the psls generated by blat, and do not try to do any 
fancy filtering with the blat commandline options.

Good Luck!

-Galt

> Hi,
> 
> I had a blat issue that is really interesting. Running blat to generate
> blast like report on a query, I get the best alignment with 2e-09.
> However, running the same query and reference, using blast8 as the
> output (tabular results), I see the alignment results with 4.4e-216!
> 
> Is there something I missed here? I used the default blat command.
> Attached are the query and reference for you to reproduce the results:
> $ blat w.db w -out=blast w.b
> $ blat w.db w -out=blast8 w.b8
> 
> $ cat w.b
> BLASTN 2.2.11 [blat]
> 
> Reference:  Kent, WJ. (2002) BLAT - The BLAST-like alignment tool
> 
> Query= gi|166079852|gb|EU294160.1|
>          (963 letters)
> 
> Database: w.db 
>            1 sequences; 86,140 total letters
> 
> Searching.done
>                                                                  Score
> E
> Sequences producing significant alignments:                      (bits)
> Value
> 
> NODE_40_length_86118_cov_24.091490
> 61   2e-09
> NODE_40_length_86118_cov_24.091490
> 13   3e+05
> 
> 
> 
>> NODE_40_length_86118_cov_24.091490 
>           Length = 86140
> 
>  Score = 61 bits (156), Expect = 2e-09
>  Identities = 32/34 (94%)
>  Strand = Minus / Plus
> 
> Query: 34    ttcataacggcacctttaccgaaagatttctcca 1
>              |||||||||||||||||||| |||||||| ||||
> Sbjct: 41411 ttcataacggcacctttaccaaaagatttttcca 41444
> 
> 
>  Score = 13 bits (35), Expect = 3e+05
>  Identities = 7/7 (100%)
>  Strand = Minus / Plus
> 
> Query: 68    acttgaa 62
>              |||||||
> Sbjct: 41368 acttgaa 41374
> 
>   Database: w.db
> 
> $ cat w.b8
> gi|166079852|gb|EU294160.1|     NODE_40_length_86118_cov_24.091490
> 80.60   603     113     1       344     946     41097   40499   4.4e-216
> 747.0
> gi|166079852|gb|EU294160.1|     NODE_40_length_86118_cov_24.091490
> 80.72   249     48      0       78      326     41367   41119   3.6e-86
> 315.0
> gi|166079852|gb|EU294160.1|     NODE_40_length_86118_cov_24.091490
> 94.12   34      2       0       1       34      41444   41411   1.7e-09
> 61.0
> gi|166079852|gb|EU294160.1|     NODE_40_length_86118_cov_24.091490
> 100.00  7       0       0       62      68      41374   41368   2.7e+05
> 13.0
> 
>  <<w>>  <<w.db>> 
> Sumit Middha | Informatics Specialist | BSI | Mayo Clinic Rochester |
> 507-284-4706 | Stabile 11-02-16
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to