Hi Sim,

There is no full definition of a gene due to the lack of information which 
means you can't get all of the exons defined if they aren't present which is 
why a gene transcript is marked as "CDS is incomplete"

Gene prediction tracks are imprecise "predictions" of what might be there but 
are not actually in the reference sequence. The actual gene mRNA
that was used to "predict" a gene at this location may have come from some 
other tissue that has a different actual DNA sequence. We can map the mRNA 
almost exactly to the reference sequence, but a few bases may not be present in 
the mapping and so we mark the CDS as "incomplete".

One of our engineers suggests reading papers on N-SCAN and Augustus.

If you have further questions, please don't hesitate in contacting the mailing 
list: [email protected].

Vanessa Kirkup Swing
UCSC Genome Bioinformatics Group


----- Original Message -----
From: "SIM Ngak Leng" <[email protected]>
To: [email protected]
Sent: Wednesday, May 25, 2011 6:58:18 PM
Subject: [Genome] Assembling exons from incomplete genes

Greetings,

I am trying to programmatically assemble exons using the data obtained from the 
USCS website (ensGene.txt.gz, ccdsGene.txt.gz, etc.) to from the resulting 
amino acid sequence.

I understand that some records are incomplete (ie, cdsStartStat isn't cmpl) so 
that the list of nucleotide sequence obtained isn't a multiple of 3.

Is there any way to generate the protein from these records using exonFrames, 
or other methods? And if so, how should I go about doing it?

Thank you in advance.

Regards,
Sim Ngak Leng
Bioinformatics Specialist
Genome Institute of Singapore

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to