There is a perl script in the kent source tree which can read the
Ensembl files from their database dumps: seq_region.txt and assembly.txt
to construct a type of "lift" file from Ensembl GeneScaffold coordinates
to the target genome scaffold coordinates: 
src/hg/utils/automation/ensGeneScaffold.pl

http://genome-source.cse.ucsc.edu/gitweb/?p=kent.git;a=blob;f=src/hg/utils/automation/ensGeneScaffolds.pl

The resulting "lift" file is used with the kent command: liftAcross

http://genome-source.cse.ucsc.edu/gitweb/?p=kent.git;a=tree;f=src/hg/liftAcross

This is used with the Ensembl GTF file dumps of the genes to "liftAcross" those 
elements
to scaffold coordinates.  The "structure" of the gene deteriorates since the 
relationship
between UTFs, transcription start sites and the exons is no longer available. 

--Hiram

----- Original Message -----
From: "asas asasa" <[email protected]>
To: [email protected]
Cc: "Hiram Clawson" <[email protected]>
Sent: Monday, July 18, 2011 1:37:40 PM
Subject: Re: [Genome] ensembl versions in the test site


Hi Hiram and thanks, 


As felCat3 ensGene refers to version 62, and I downloaded the cDNA data of 
ensembl version 62, it is reasonable that there are no contradictions. Yet, in 
the examples we discussed, it is not clear how it is possible to map specific 
regions in the complete cDNA to specific regions in the scaffold, as the sizes 
do not match (e.g. I would like to be able to say that the coordinates 
11000-11102 in a scaffold are mapped to coordinates 500-600, in the complete 
cDNA sequence). 


Best, 
Assaf 
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to