Hello Marek,

First of all, to assuage your curiousity, let me link you to information on how 
genomes are assembled:
http://www.ncbi.nlm.nih.gov/genome/assembly/assembly.shtml

The next thing I noticed is that while you are downloading the chr1.fa file for 
hg19, your output from "get DNA" is from hg18.  That is the source of the 
mismatch you have observed.  Is is best to stick with one assembly or the 
other.   

The N's you are seeing at the beginning of the chromosome on hg19 can be 
visualized by turning on both the "Assembly" and "Gap" tracks.  When you click 
into the Gap track, you can read more about the telomere you are seeing.

I hope this information is helpful to you.  Please don't hesitate to contact us 
again if you require further assistance.

Kayla Smith
UCSC Genome Bioinformatics Group


----- "Marek Bartkuhn" <[email protected]> wrote:

> Hello,
> 
> I like to download the human genome from your website to run it
> through
> a tool to scan the genome for position specific scoring matrices of
> transcription factor binding sites. Unfortunately the tool does not
> like
> (at least not on my machine) to take complete genome files. Therefore
> I
> downloaded for a start the 1st chromosome fasta-file "chr1.fa" at
> http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/
> 
> What I do not understand, is why this file starts with several 1000
> "N"
> bases (telomer?). In contrast when I use the "DNA" link, to directly
> download subsequences of the human genome I get something which looks
> like this:
> 
> >hg18_dna range=chr1:1-1000 5'pad=0 3'pad=0 strand=+
> repeatMasking=none
> TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA
> ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC
> CCTAACCCAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCC
> 
> and when I use "blat" with this sequence as input I indeed get a hit
> starting with absolute coordinate "1".
> 
> Unfortunately I do understand much of how the human genome assemby was
> carried out. My guess is that the table in the file
> 
> chromAgp.tar.gz
> 
> might explain the observed discrepancies. Anyhow, what I need is 
> plain fasta-files of individual chromosomes where absolute coordinates
> correlate with what is depicted using the UCSC browser.
> Unfortunately when I try to simply get the full chromosome sequence
> with the "DNA" subsequence retrieval tool my computer hangs up (for
> the larger chromosomes).
> 
> I would be grateful if somebody knows an alternative method to
> retrieve such a file from UCSC using the most recent genome assembly.
> 
> Thanks.
> 
> Marek
> 
> 
> 
> 
> 
> --
> 
> Dr. Marek Bartkuhn
> 
> Institut für Genetik
> Justus-Liebig-Universität Giessen
> 
> Heinrich-Buff-Ring 58
> 35392 Giessen
> 
> Germany
> 
> 
> tel.: +49-641-9935479
> fax:  +49-641-9935469
> 
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to