pslSort is used when you have 100's of thousands of psl files from
a massive blat run on a supercomputer and you need to get all the
results put back together into a single file.  It is sorted by
chrom name (qName) and chromStart (qStart).  You can perform the same sort with
the unix 'sort' command:
sort -k10,10 -k12,12n
which also functions in a two stage procedure in exactly the
same manner.  See also:
http://en.wikipedia.org/wiki/Sort_algorithm

The sort works by making temporary sorted larger files in a temporary 
directory(stage 1)
then continuing to put those files together into the final result (stage 2).

Please note the complete usage message explains this procedure:
> pslSort - merge and sort psCluster .psl output files
> usage:
>   pslSort dirs[1|2] outFile tempDir inDir(s)
> This will sort all of the .psl files in the directories
> inDirs in two stages - first into temporary files in tempDir
> and second into outFile.  The device on tempDir needs to have
> enough space (typically 15-20 gigabytes if processing whole genome)
>   pslSort g2g[1|2] outFile tempDir inDir(s)
> This will sort a genome to genome alignment, reflecting the
> alignments across the diagonal.
> 
> Adding 1 or 2 after the dirs or g2g will limit the program to
> only the first or second pass repectively of the sort

--Hiram

Peng Yu wrote:
> I tried some example and want to understand what pslSort bases on for
> sorting. So far, I don't see clear what the sorting criterion is.
> 
> The help page is sparse for me to understand. Would you please let me
> know how psl files are sorted?
> 
> What are the two stages for? Why are there two stages rather one
> stage? I think that g2g option sort the alignment based on genomic
> location. But is it the start (or end, middle of the alignment)? What
> the option 'dirs' is based on?

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to