Hello, I like to download the human genome from your website to run it through a tool to scan the genome for position specific scoring matrices of transcription factor binding sites. Unfortunately the tool does not like (at least not on my machine) to take complete genome files. Therefore I downloaded for a start the 1st chromosome fasta-file "chr1.fa" at http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/
What I do not understand, is why this file starts with several 1000 "N" bases (telomer?). In contrast when I use the "DNA" link, to directly download subsequences of the human genome I get something which looks like this: >hg18_dna range=chr1:1-1000 5'pad=0 3'pad=0 strand=+ repeatMasking=none TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC CCTAACCCAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCC and when I use "blat" with this sequence as input I indeed get a hit starting with absolute coordinate "1". Unfortunately I do understand much of how the human genome assemby was carried out. My guess is that the table in the file chromAgp.tar.gz might explain the observed discrepancies. Anyhow, what I need is plain fasta-files of individual chromosomes where absolute coordinates correlate with what is depicted using the UCSC browser. Unfortunately when I try to simply get the full chromosome sequence with the "DNA" subsequence retrieval tool my computer hangs up (for the larger chromosomes). I would be grateful if somebody knows an alternative method to retrieve such a file from UCSC using the most recent genome assembly. Thanks. Marek -- Dr. Marek Bartkuhn Institut für Genetik Justus-Liebig-Universität Giessen Heinrich-Buff-Ring 58 35392 Giessen Germany tel.: +49-641-9935479 fax: +49-641-9935469 _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
