Hello Alexandra,

It seems there is some confusion about what the alignment blocks (exons) 
represent versus the coordinates for txStart/End and cdsStart/End.

The exons are the regions of the entire transcript that align to 
genomic. This includes the 5'UTR, CDS, and 3'UTR.

The first exon starts at position 1467732 +1 = 1467733.

*note about adding 1 to the start: Remember that you need to add 1 to 
convert the 0-based, half-open coordinates (which is how UCSC stores 
coordinates in the mySQL tables and most files) to be 1-based, 
fully-closed coordinates (which is what is in the display).

The cdsStart in contained within this first exon (block), starting at 
position 1467752 +1 = 1467753. For this exon, the first part of it is 5' 
UTR and second part of it is CDS (coding).

This can be seen in the graphical display in the browser, drill in close 
to the position and notice the thin and thick display. Thin represents 
non-coding, thick represents coding.

If you choose to export "CDS" data from the Table browser, in any format 
(including fasta, in batch), it will be limited to the portion of the 
transcript defined by the CDS. For a quick way to get the protein 
sequence per transcript, locate the sequence in the Browser assembly 
viewer, click on the sequence, scroll down a bit on the description page 
and use the "Links to Sequences: -> Predicted Protein".

Some help links:
http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#GeneDisplay
http://genomewiki.ucsc.edu/index.php/Coordinate_Transforms
http://genome.ucsc.edu/FAQ/FAQformat.html#format9
http://genome.ucsc.edu/FAQ/FAQformat.html#format1
http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#Sequence

If you have any follow-up questions, we would be glad to offer more 
assistance,
Jen

---------------------------------
Jennifer Jackson
UCSC Genome Informatics Group
http://genome.ucsc.edu/

On 4/28/10 12:40 AM, Rapoport Alexandra wrote:
> Greetings!
> Debugging my script I found the following annotation, and I think there is a
> problem:
>
> C.elegans, assembly ce6, CHR
> refGene table entry:
>
> #bin  name    chrom   strand  txStart txEnd   cdsStart        cdsEnd  
> exonCount       exonStarts      exonEnds        score   name2   cdsStartStat  
>   cdsEndStat      exonFrames
> 596   NM_061576       chrII   +       1467732 1469600 1467752 1469560 7       
> 1467732,1468089,1468422,1468750,1468880,1468999,1469128,        
> 1468040,1468373,1468542,1468830,1468953,1469069,1469600,
>
> Through genome browser the first codon starts at position 1467752 and not
> at  1467732 (as in database). If I use the data from database, it is
> impossible to get the right protein sequence.
> Is there any possibility to handle such cases automatically (I mean by some
> script and not by looking at the result)? Actually, I find this one by
> accident :)
> Sincerely Yours,
>       Alexandra Rapoport
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to