Please note what the -mask argument says in the help message:

-mask=type  Mask out repeats.  Alignments won't be started in masked region
             but may extend through it in nucleotide searches.  Masked areas
             are ignored entirely in protein or translated searches. Types are
                  lower - mask out lower cased sequence
                  upper - mask out upper cased sequence
                  out   - mask according to database.out RepeatMasker .out file
                  file.out - mask database according to RepeatMasker file.out

Therefore, if you do not want alignments to start in masked areas of the genome,
then use -mask=lower with the mm9.2bit file.  The mm9.2bit file indicates
masked regions with lower case letters.  I guess the question would be, why
do you specifically want to exclude starting alignments in masked areas of
the genome ?  Are your query sequences designed to never be in repeats and
therefore should not be in repeats ?  Have you tried running the blat command
with mm9.2bit as the 'database', your sequences as the 'query' and viewing
the resulting output.psl file in a custom track ?  Try it without arguments
to see what you obtain.  Use pslFilter on your output.psl file to limit your
results to desired criteria.  There are over 30 other psl manipulation programs
available to work with your raw output.

To visualize your results, take your psl output from your blat result and
submit that as custom track input.  Then you can see all of your alignments
and you are in complete control of how you use blat.

On another note, if you would like to study how we operate blat here, you can
see over 500 examples of blat commands in our 'make doc' which is a record
of every procedure we perform here to produce the genome browsers.  Studying
these examples will give you a good idea of how blat is used and different
ways of using the arguments to achieve desired results.  In your copy of the 
kent
source tree, go to: src/hg/makeDb/doc/ and run a grep for "blat " on *.txt
in that directory.  Look at the lines from that result to see typical blat
operations.

--Hiram

Peng Yu wrote:
> On Fri, Jun 18, 2010 at 10:34 AM, Hiram Clawson <[email protected]> wrote:
>> You do not need Repeat Masker .out files to run blat.
>> Use the argument -mask=lower if you want to exclude
>> the already masked areas of the target sequence.
> 
> I know that I can run blat without Repeat Masker. But I want to mask
> the repeat regions.
> 
>> If you want to run a blat against the entire genome, simply
>> give the mm9.2bit file as the target sequence  (the 'database' argument).
>> You will need a machine with at least 8 Gb of memory to run
>> the entire genome at once.  Depending upon how often you need
>> to run blat, it may be better to set up gfServer and use the
>> gfClient to run repeated blat operations.
> 
> I'm still confused. If I need RepeatMasker, no matter whether mm9.bit
> is the database argument or a list.txt file (which includes the path
> to all the chromosome fasta files) is, I will get the masked results
> as long as I use -mask=lower option, right?
> 
> All I need is the blat psl output. The query fasta files in total have
> a few hundred short sequences (30nt - 75nt). Visualization of a number
> of matches on the browser is a secondary thing. Therefore, I don't see
> a clear a reason why my application need gfServer and gfClient. Would
> you please let me know what type of applications need gfServer and
> gfClient?
> 
> 
> 
>> --Hiram
>>
>> ----- Original Message -----
>> From: "Peng Yu" <[email protected]>
>> To: "Hiram Clawson" <[email protected]>
>> Cc: [email protected]
>> Sent: Friday, June 18, 2010 8:24:53 AM GMT -08:00 Tijuana / Baja California
>> Subject: Re: [Genome] RepeatMasker .out and file.out file for mm9
>>
>> On Fri, Jun 18, 2010 at 12:17 AM, Hiram Clawson <[email protected]> wrote:
>>> What is it that you are trying to mask ?  The mm9.2bit file
>>> is already masked.
>> I want to mask the regions same as the regions in the genome browser
>> RepeatMaster track. The description on the bigZips URL doesn't say
>> mm9.2bit is masked.
>>
>> mm9.2bit - contains the complete mm9 Mouse Genome
>>    in the 2bit format.
>>
>> Just to double check if I understand you correctly. Do you mean that I
>> should just use something like the following command without using the
>> -mask=type option, since mm9.2bit already includes the mask. Or even
>> with mm9.2bit, I still need to use -mask=type? there are multiple .out
>> files in chromOut.tar.gz, would you please show me how to let blat use
>> all of them, if I want to blat against all the chromosomes rather than
>> just one chromosome a time?
>>
>> blat -t=dna -q=dna -tileSize=11 -stepSize=5 -minScore=20 -ooc=11.ooc
>> -out=psl mm9.2bit query.fasta query.psl
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to