Hello Yuan, The data in hg19/chromosomes/*fa.gz is not masked. It is the sequence exactly as published by the data source, as described in the README file in this directory.
Both hg19/bigZips/chromFa.tar.gz and hg19/bigZips/chromFamasked.tar.gz are masked, as described in the README file in this directory. The file hg19/bigZips/hg19.2bit contains unmasked sequence. It is a version of what is contained in /chromosomes/*fa.gz, merged into one file, converted to 2bit format. 2bit format explained: http://genome.ucsc.edu/FAQ/FAQformat.html#format7 Hopefully this resolves the content questions, but please let us know if anything is still unclear, Jennifer --------------------------------- Jennifer Jackson UCSC Genome Informatics Group http://genome.ucsc.edu/ On 4/6/10 7:12 AM, Yuan Hao wrote: > Dear list, > > May I have a question about the human genome assembly 19 available at > ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/ > that whether this genome sequence is repeat-masked or not?If masked, > in which way? I know there is another directory to get the same genome > sequence (ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/), > where contains two assemblies differ in the way to mask repeats, i.e. > chromFa.tar.gz and chromFamasked.tar.gz. I am not sure which one > corresponds to the genome under /chromosomes directory. Thank you very > much in advance! > > Regards, > Yuan > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
