Hello,

I like to download the human genome from your website to run it through
a tool to scan the genome for position specific scoring matrices of
transcription factor binding sites. Unfortunately the tool does not like
(at least not on my machine) to take complete genome files. Therefore I
downloaded for a start the 1st chromosome fasta-file "chr1.fa" at
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/

What I do not understand, is why this file starts with several 1000 "N"
bases (telomer?). In contrast when I use the "DNA" link, to directly
download subsequences of the human genome I get something which looks
like this:

>hg18_dna range=chr1:1-1000 5'pad=0 3'pad=0 strand=+ repeatMasking=none
TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA
ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC
CCTAACCCAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCC

and when I use "blat" with this sequence as input I indeed get a hit starting 
with absolute coordinate "1".

Unfortunately I do understand much of how the human genome assemby was carried 
out. My guess is that the table in the file

chromAgp.tar.gz 

might explain the observed discrepancies. Anyhow, what I need is  plain 
fasta-files of individual chromosomes where absolute coordinates correlate with 
what is depicted using the UCSC browser.
Unfortunately when I try to simply get the full chromosome sequence with the 
"DNA" subsequence retrieval tool my computer hangs up (for the larger 
chromosomes).

I would be grateful if somebody knows an alternative method to retrieve such a 
file from UCSC using the most recent genome assembly.

Thanks.

Marek





-- 

Dr. Marek Bartkuhn 

Institut für Genetik
Justus-Liebig-Universität Giessen

Heinrich-Buff-Ring 58
35392 Giessen

Germany


tel.: +49-641-9935479
fax:  +49-641-9935469

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to