Hello Yuan,

The data in hg19/chromosomes/*fa.gz is not masked. It is the sequence 
exactly as published by the data source, as described in the README file 
in this directory.

Both hg19/bigZips/chromFa.tar.gz and hg19/bigZips/chromFamasked.tar.gz 
are masked, as described in the README file in this directory.

The file hg19/bigZips/hg19.2bit contains unmasked sequence. It is a 
version of what is contained in /chromosomes/*fa.gz, merged into one 
file, converted to 2bit format.

2bit format explained:
http://genome.ucsc.edu/FAQ/FAQformat.html#format7

Hopefully this resolves the content questions, but please let us know if 
anything is still unclear,

Jennifer

---------------------------------
Jennifer Jackson
UCSC Genome Informatics Group
http://genome.ucsc.edu/

On 4/6/10 7:12 AM, Yuan Hao wrote:
> Dear list,
>
> May I have a question about the human genome assembly 19 available at 
> ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/
>    that whether this genome sequence is repeat-masked or not?If masked,
> in which way? I know there is another directory to get the same genome
> sequence (ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/),
> where contains two assemblies differ in the way to mask repeats, i.e.
> chromFa.tar.gz and chromFamasked.tar.gz. I am not sure which one
> corresponds to the genome under /chromosomes directory. Thank you very
> much in advance!
>
> Regards,
> Yuan
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to