There is a perl script in the kent source tree which can read the Ensembl files from their database dumps: seq_region.txt and assembly.txt to construct a type of "lift" file from Ensembl GeneScaffold coordinates to the target genome scaffold coordinates: src/hg/utils/automation/ensGeneScaffold.pl
http://genome-source.cse.ucsc.edu/gitweb/?p=kent.git;a=blob;f=src/hg/utils/automation/ensGeneScaffolds.pl The resulting "lift" file is used with the kent command: liftAcross http://genome-source.cse.ucsc.edu/gitweb/?p=kent.git;a=tree;f=src/hg/liftAcross This is used with the Ensembl GTF file dumps of the genes to "liftAcross" those elements to scaffold coordinates. The "structure" of the gene deteriorates since the relationship between UTFs, transcription start sites and the exons is no longer available. --Hiram ----- Original Message ----- From: "asas asasa" <[email protected]> To: [email protected] Cc: "Hiram Clawson" <[email protected]> Sent: Monday, July 18, 2011 1:37:40 PM Subject: Re: [Genome] ensembl versions in the test site Hi Hiram and thanks, As felCat3 ensGene refers to version 62, and I downloaded the cDNA data of ensembl version 62, it is reasonable that there are no contradictions. Yet, in the examples we discussed, it is not clear how it is possible to map specific regions in the complete cDNA to specific regions in the scaffold, as the sizes do not match (e.g. I would like to be able to say that the coordinates 11000-11102 in a scaffold are mapped to coordinates 500-600, in the complete cDNA sequence). Best, Assaf _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
