I'm not sure what you described although I thought I understood.

Suppose I have 10 million short sequences to be aligned to the human
genome. It is making sense to split the 10 million sequences in 10
files (each 1 million). Then I run 10 blat commands simultaneously.
Each blat command will load all the chromosomes. Are you suggesting to
break all_human_chromosomes.list into a number of smaller lists?

blat -t=dna -q=dna -tileSize=11 -stepSize=5
all_human_chromosomes.list short_seq0.fa short_seq0.psl
...
blat -t=dna -q=dna -tileSize=11 -stepSize=5
all_human_chromosomes.list short_seq9.fa short_seq9.psl


On Tue, Jun 15, 2010 at 7:14 PM, Hiram Clawson <[email protected]> wrote:
> No, this is not what I describe.  Only the tiny portion of the
> target genome is loaded and the tiny portion of the query genome
> is loaded.  Nothing is duplicated between processes.  We regularly
> do this with genomes here and can get perhaps 100,000 processes
> running on a 1,000 CPU core super computer and get the complete
> genome to genome alignment done in a few hours.  This is much
> more simple and efficient than trying to write a complicated
> parallel functional program that would be difficult to operate
> in a variety of operating systems.  The operating system
> itself is optimized to manage the separate threads of the
> individual processes that it manages.  We don't have to
> duplicate that complication.

-- 
Regards,
Peng

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to