Thanks Galt,

This is very helpful. Thanks also for the suggestion of LASTZ which I see is 
also a very nice tool 

Avi

--- On Tue, 3/9/10, Galt Barber <[email protected]> wrote:

> From: Galt Barber <[email protected]>
> Subject: Re: [Genome] gfServer/gfClient and -tileSize
> To: [email protected]
> Date: Tuesday, March 9, 2010, 8:44 PM
> 
>  >> -stepSize=5 is less sensitive than the default
> stepSize.
> 
> This does not seem generally true.  Of course it may
> be that blat
> sees many new things at stepSize 5 compared to 11,
> but misses a few old things that it used to see.
> It is after all sampling every 5th position of the target
> genome instead of every 11th position.  That is all.
> 
> In general, blat is good for cDna and RNA of the size you
> mentioned
> (100-500bp).  However, as Jim pointed out, as the
> %Identity drops
> over greater evolutionary distance, it's harder for BLAT to
> find
> the exact tile hits which reduces its sensitivity. 
> Lastz tends to do
> better for human-rodent distances or greater.
> 
> You can try various things to increase BLAT's sensitivity,
> but you may find that the speed runs much slower at
> high-sensitivity
> settings.  This could make it 10x to 100x slower than
> the default.
> 
> Certainly setting -repMatch higher may help with borderline
> repetitive
> regions, but again at a time cost.
> 
> Here is the default formula for repMatch:
>   repMatch = 1024 * (tileSize/stepSize).
> You can increase it from there.
> 
> You might also run it with or without -fine
> and see if that helps you get more exons.
> 
> You could also try these.
> 
> -oneOff=N   If set to 1 this allows one
> mismatch in tile and still
>                
> triggers an alignments.  Default is 0.
> 
> -minMatch=N sets the number of tile matches.  Usually
> set from 2 to 4
>                
> Default is 2 for nucleotide, 1 for protein.
> 
> -maxGap=N sets the size of maximum gap between tiles in a
> clump.
>          
>    Usually set from 0 to 3.  Default is
> 2.
>                
> Only relevent for minMatch > 1.
> 
> As noted before, extra sensitivity runs slower:
> oneOff=1
> minMatch=1
> minMatch=2 maxGap=3
> 
> -Galt
> 
> Ar 3/9/2010 7:59 AM, scríobh Fungazid:
> > Thanks Jim,
> >
> > I am looking into LASTZ and will try replace or
> combine it in my script.
> 
> I need to see if it is faster enough for large-scale search
> with my 
> computer
> 
> and if it can be used and parsed like Blast and Blat.
> 
> still, at this point, trying to optimize Blat could be
> helpful for me
> 
> because it tends to find most hits.
> >
> > Avi
> >
> 
> 
> > --- On Tue, 3/9/10, Jim Kent<[email protected]> 
> wrote:
> >
> >> From: Jim Kent<[email protected]>
> >> Subject: Re: [Genome] gfServer/gfClient and
> -tileSize
> >> To: "Fungazid"<[email protected]>
> >> Cc: [email protected],
> [email protected]
> >> Date: Tuesday, March 9, 2010, 3:46 PM
> >> Hi Avi - blat really is not the best
> >> tool for primate/rodent alignments.  I'd
> suggest you
> >> switch to lastz from Penn State University. 
> See
> >> http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.01.50/README.lastz-1.01.50.html.
> >>
> >>
> >>
> >> On Mar 9, 2010, at 7:58 AM, Fungazid wrote:
> >>
> >>> Thank you Galt for your detailed information,
> >>>
> >>> I understand the optimal configuration depends
> the
> >> needs. So... my query sequences are cDNAs of
> 100-5000bp. One
> >> of the goals is to detect variations like intron
> retention
> >> between related mammals like primates vs. rodents
> (therefore
> >> I need genomes as targets).
> >>> The basic configuration finds most but not all
> HSPs
> >> per hit (accordingly sometimes small exons are not
> detected,
> >> or larger intronic regions). But the optimization
> is
> >> problematic because I see that often even
> -stepSize=5 is
> >> less sensitive than the default stepSize. As far
> as I
> >> understand this can happen because of repetitive
> sequences
> >> that are ignored if they occur too many times
> when
> >> sensitivity rises. Should I increase -repMatch to
> prevent it
> >> ? but which value is the program default repMatch
> for
> >> [-stepSize=5,-tileSize=10] and for
> >> [-stepSize=5,-tileSize=default] ?
> >>>
> >>> thanks,
> >>> Avi
> >>>
> >>>
> >>> -repMatch
> >>>
> >>> --- On Mon, 3/8/10, Galt Barber<[email protected]>
> >> wrote:
> >>>
> >>>> From: Galt Barber<[email protected]>
> >>>> Subject: Re: [Genome] gfServer/gfClient
> and
> >> -tileSize
> >>>> To: [email protected]
> >>>> Date: Monday, March 8, 2010, 7:35 PM
> >>>>
> >>>> Higher tileSize increases memory,
> >>>> increases speed, decreases sensitivity
> slightly.
> >>>>
> >>>> The default tileSize 11 is very good.
> >>>> On rare occasions you see 10 or 12 used.
> >>>> Smaller tileSizes tend to lead to
> >>>> dramatically longer runtime.
> >>>>
> >>>> It's a little complex to state easily
> >>>> in a formula because there are multiple
> >>>> phases internally that have each
> different
> >>>> characteristics.
> >>>>
> >>>> The default stepSize is just tileSize.
> >>>> This means that you are sampling a
> >>>> position of the genome every stepSize
> bases.
> >>>>
> >>>> For PCR primer searching, we leave
> tileSize at 11
> >>>> and lower stepSize to 5 for increased
> >>>> sensitivity.  Of course this will
> also
> >>>> cause the runtime to grow.
> >>>>
> >>>> Increasing sensitivity means increasing
> >>>> the number of hits, and each hit that
> >>>> has to be explored can take a lot of
> >>>> processing.
> >>>>
> >>>> And of course, whatever generalizations
> >>>> one would make, the real power, speed,
> >>>> and memory-required will depend
> >>>> on the characteristics of the genome,
> >>>> the queries.  Not to mention several
> >> command-line
> >>>> switches that are available.
> >>>>
> >>>> But luckily the defaults have good
> >>>> performance and sensitivity
> >>>> for a wide-range of applications.
> >>>>
> >>>> If you are doing short-reads then
> >>>> perhaps one of the many good freely
> >>>> available short-read aligners like
> >>>> would be useful.
> >>>>
> >>>> BLAT is free for non-commercial use.
> >>>>
> >>>> -Galt
> >>>>
> >>>> Ar 3/8/2010 7:03 AM, scríobh Fungazid:
> >>>>> Hello people,
> >>>>>
> >>>>>
> >>>>> About gfServer/gfClient :
> >>>>>
> >>>>> I see that higher -tileSize leads to
> higher
> >> memory
> >>>> requirement. Does higher -tileSize
> expected to
> >> decrease
> >>>> detection power ?
> >>>>> In addition, should higher -tileSize
> enhance
> >> the speed
> >>>> of gfServer/gfClient ?
> >>>>>
> >>>>> And, what is the -stepSize and how it
> effects
> >> the
> >>>> detection power, speed and memory
> requirement ?
> >>>>>
> >>>>>
> >>>>> Thanks,
> >>>>> Avi
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >> _______________________________________________
> >>>>> Genome maillist  -  [email protected]
> >>>>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
> >>>>
> >>>>
> _______________________________________________
> >>>> Genome maillist  -  [email protected]
> >>>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
> >>>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> _______________________________________________
> >>> Genome maillist  -  [email protected]
> >>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
> >>
> >>
> >
> >
> >
> >
> >
> > _______________________________________________
> > Genome maillist  -  [email protected]
> > https://lists.soe.ucsc.edu/mailman/listinfo/genome
> 
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
> 


      


_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to