Hi Hiram, That makes much sense. Thank you very much for the clarification and example code.
Cheers, Dave On Sat, 04 Sep 2010 02:09:50 +0900, Hiram Clawson <[email protected]> wrote: > Good Morning Dave: > > You do not want to use one 11.ooc file from one genome > on a different genome. You can simply construct the 11.ooc > file for your genome: > > $ blat yourGenome.2bit \ > /dev/null /dev/null -tileSize=11 -makeOoc=yourGenome.11.ooc \ > -repMatch=1024 > > Adjust the repMatch number based on the size of your genome. > 1024 is used for human sequence. For example, given a 'faSize' > measurement of your genome: > $ twoBitToFa yourGenome.2bit stdout | faSize stdin > 2914958544 bases (162452744 N's 2752505800 real 1439244378 upper > 1313261422 lower > using the "real" bases measurement of 2752505800, calculate > the ratio to hg19 "real" bases of 2897310462: > awk 'BEGIN{printf "%.6f\n", 2752505800 / 2897310462 * 1024}' > 972.821510 > > Round down the answer to the nearest 50: repMatch=950 in this example. > > --Hiram > > Dave Tang wrote: >> Dear list, >> The blatSuite.zip file (downloaded from >> http://genome-test.cse.ucsc.edu/~kent/exe/linux/) comes with a 11.ooc >> file. I couldn't find any information regarding which genome was used >> to generate this file. Richard asked a similar question here >> https://lists.soe.ucsc.edu/pipermail/genome/2004-February/003964.html: >> 1) What's the difference going to between genome versions? >> Is it worth re-creating a new version or will the ooc file >> produce similar results? >> 2) Does it make sense to run the ooc file for the human on >> the mouse genome? >> Additionally on the FAQ >> (http://genome.ucsc.edu/FAQ/FAQblat.html#blat6), it was mentioned that >> "The 11.ooc file contains sequences determined to be over-represented >> in the genome sequence." >> So it was a bit confusing to me; do all genomes have these >> over-represented sequences, hence the default 11.ooc file that comes >> with the blatSuite.zip? Or I should generate my own ooc file as has >> been pointed out in previous emails from this mailing list? >> Thank you very much for you help. >> Best, >> Dave -- Dave Tang _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
