Hi Hiram, My problem is how to know which part of the cDNA is mapped to which part of the scaffold. In fact this problem exists also in the main site, in the case of felCat3 for example. Attached the cDNAs of 3 examples downloaded in ensembl 62 ftp. The exons starts/ends appear in the ensGene for felCat3. Accordingly:
* the gene ENSFCAT00000010563 is mapped to 3 scaffolds and the sum of all mapped exon sizes is 3180 bp, while the total size of the cDNA is 5409 bp. After removing the Ns we get 3150 bp, which is still not equal. * For ENSFCAT00000005360 the exons sum is 486 bp, while the cDNA is 675, and only after removing the Ns we get 486 bp. * in ENSFCAT00000015608 the total size of cDNA is 2295 bp including Ns block, which is equal to the sum of exons. Best, Assaf On Mon, Jul 18, 2011 at 3:25 AM, Hiram Clawson <[email protected]> wrote: > Can you please specify an example problem ? > Preferably one that is different between the test > site and the public site if that is possible. > > --Hiram > > ----- Original Message ----- > From: "asas asasa" > Sent: Sunday, July 17, 2011 3:26:31 PM > Subject: Re: [Genome] ensembl versions in the test site > > Hi Luvina and all, > > for the builds in the main site, in enseGene table, after summation of all > exon sizes in exonstarts,exonEnds (using zero-based start, and one-based > end), we get exactly the total size of the transcript, as it appears in the > relevant ensembl fasta file. > Yet, for builds in the test site, with fragmented transcript sequences > (occurence of large Ns blocks within the transcripts), the exon sizes do > not > sum up correctly. > Generally, I would like to be able to exactly map the transcript to the > genomic sequences, so how this could be done ? is there a way to do so > despite the forgoing problem ? > > Assaf >
_______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
