pslSort is used when you have 100's of thousands of psl files from a massive blat run on a supercomputer and you need to get all the results put back together into a single file. It is sorted by chrom name (qName) and chromStart (qStart). You can perform the same sort with the unix 'sort' command: sort -k10,10 -k12,12n which also functions in a two stage procedure in exactly the same manner. See also: http://en.wikipedia.org/wiki/Sort_algorithm
The sort works by making temporary sorted larger files in a temporary directory(stage 1) then continuing to put those files together into the final result (stage 2). Please note the complete usage message explains this procedure: > pslSort - merge and sort psCluster .psl output files > usage: > pslSort dirs[1|2] outFile tempDir inDir(s) > This will sort all of the .psl files in the directories > inDirs in two stages - first into temporary files in tempDir > and second into outFile. The device on tempDir needs to have > enough space (typically 15-20 gigabytes if processing whole genome) > pslSort g2g[1|2] outFile tempDir inDir(s) > This will sort a genome to genome alignment, reflecting the > alignments across the diagonal. > > Adding 1 or 2 after the dirs or g2g will limit the program to > only the first or second pass repectively of the sort --Hiram Peng Yu wrote: > I tried some example and want to understand what pslSort bases on for > sorting. So far, I don't see clear what the sorting criterion is. > > The help page is sparse for me to understand. Would you please let me > know how psl files are sorted? > > What are the two stages for? Why are there two stages rather one > stage? I think that g2g option sort the alignment based on genomic > location. But is it the start (or end, middle of the alignment)? What > the option 'dirs' is based on? _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
