Hello Marek,
First of all, to assuage your curiousity, let me link you to information on how genomes are assembled: http://www.ncbi.nlm.nih.gov/genome/assembly/assembly.shtml The next thing I noticed is that while you are downloading the chr1.fa file for hg19, your output from "get DNA" is from hg18. That is the source of the mismatch you have observed. Is is best to stick with one assembly or the other. The N's you are seeing at the beginning of the chromosome on hg19 can be visualized by turning on both the "Assembly" and "Gap" tracks. When you click into the Gap track, you can read more about the telomere you are seeing. I hope this information is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group ----- "Marek Bartkuhn" <[email protected]> wrote: > Hello, > > I like to download the human genome from your website to run it > through > a tool to scan the genome for position specific scoring matrices of > transcription factor binding sites. Unfortunately the tool does not > like > (at least not on my machine) to take complete genome files. Therefore > I > downloaded for a start the 1st chromosome fasta-file "chr1.fa" at > http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/ > > What I do not understand, is why this file starts with several 1000 > "N" > bases (telomer?). In contrast when I use the "DNA" link, to directly > download subsequences of the human genome I get something which looks > like this: > > >hg18_dna range=chr1:1-1000 5'pad=0 3'pad=0 strand=+ > repeatMasking=none > TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA > ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC > CCTAACCCAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCC > > and when I use "blat" with this sequence as input I indeed get a hit > starting with absolute coordinate "1". > > Unfortunately I do understand much of how the human genome assemby was > carried out. My guess is that the table in the file > > chromAgp.tar.gz > > might explain the observed discrepancies. Anyhow, what I need is > plain fasta-files of individual chromosomes where absolute coordinates > correlate with what is depicted using the UCSC browser. > Unfortunately when I try to simply get the full chromosome sequence > with the "DNA" subsequence retrieval tool my computer hangs up (for > the larger chromosomes). > > I would be grateful if somebody knows an alternative method to > retrieve such a file from UCSC using the most recent genome assembly. > > Thanks. > > Marek > > > > > > -- > > Dr. Marek Bartkuhn > > Institut für Genetik > Justus-Liebig-Universität Giessen > > Heinrich-Buff-Ring 58 > 35392 Giessen > > Germany > > > tel.: +49-641-9935479 > fax: +49-641-9935469 > > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
