I'm not sure what you described although I thought I understood. Suppose I have 10 million short sequences to be aligned to the human genome. It is making sense to split the 10 million sequences in 10 files (each 1 million). Then I run 10 blat commands simultaneously. Each blat command will load all the chromosomes. Are you suggesting to break all_human_chromosomes.list into a number of smaller lists?
blat -t=dna -q=dna -tileSize=11 -stepSize=5 all_human_chromosomes.list short_seq0.fa short_seq0.psl ... blat -t=dna -q=dna -tileSize=11 -stepSize=5 all_human_chromosomes.list short_seq9.fa short_seq9.psl On Tue, Jun 15, 2010 at 7:14 PM, Hiram Clawson <[email protected]> wrote: > No, this is not what I describe. Only the tiny portion of the > target genome is loaded and the tiny portion of the query genome > is loaded. Nothing is duplicated between processes. We regularly > do this with genomes here and can get perhaps 100,000 processes > running on a 1,000 CPU core super computer and get the complete > genome to genome alignment done in a few hours. This is much > more simple and efficient than trying to write a complicated > parallel functional program that would be difficult to operate > in a variety of operating systems. The operating system > itself is optimized to manage the separate threads of the > individual processes that it manages. We don't have to > duplicate that complication. -- Regards, Peng _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
