Hello Maria,

The format of this table is genePred, as defined here in the FAQ:
http://genome.ucsc.edu/FAQ/FAQformat.html#format9

     uint    txStart;            "Transcription start position"
     uint    txEnd;              "Transcription end position"
     uint    cdsStart;           "Coding region start"
     uint    cdsEnd;             "Coding region end"

The query for this data is the mRna transcript, the target is the 
reference genome, and the coding region and associated protein is 
defined by the input sources and added into the alignment record and 
browser display (UCSC does not define the CDS).

For a coding transcript, the alignment would have three main parts 
defined by these four coordinates:
5' UTR
Coding region (CDS)
3' UTR

However, when the cdsStart and cdsEnd are equal - this is a signal that 
the transcript is non-coding (or at least no coding region has yet been 
defined). Sometimes these cdsStart/End values will be equal to the 
txStart (as in the UCSC Genes track) and for other gene tracks these 
values will be equal to the txEnd (for example, in the RefSeq Genes 
track). But this is just a processing detail - the main interpretation 
of non-coding applies to both.

We hope this helps,
Thanks
Jennifer

---------------------------------
Jennifer Jackson
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu/

On 2/17/10 9:28 AM, Maria Poptsova wrote:
>    Hello,
>
> I downloaded UCSC gene annotation track as follows:
>
> http://genome.ucsc.edu/cgi-bin/hgTables
> group: 'Genes and Gene Prediction Tracks'
> track: 'UCSC Genes'
> table: 'knownGene'
>
> The table looks like:
> #name   chrom   strand  txStart txEnd   cdsStart        cdsEnd
> exonCount       exonStarts      exonEnds        proteinID       alignID
> uc001aaa.2      chr1    +       1115    4121    1115    1115    3
> 1115,2475,3083, 2090,2584,4121,         uc001aaa.2
> uc009vip.1      chr1    +       1115    4272    1115    1115    2
> 1115,2475,      2090,4272,              uc009vip.1
> uc009vis.1      chr1    -       4268    6628    4268    4268    4
> 4268,4832,5658,6469,    4692,4901,5805,6628,            uc009vis.1
> uc001aag.1      chr1    -       5658    7231    5658    5658    4
> 5658,6469,6738,7095,    5810,6628,6918,7231,            uc001aag.1
>
> I have few questions:
>
> 1. Why do cdsStart and cdsEnd have the same coordinate? I would think
> that cdsStart should coincide with first exon's start and cdsEnd should
> coincide with the last exon's end.
> 2. In some cases cdsStart differs from cdsEnd but also differs from
> first exon's start, and cdsEnd differs from the last exon's end. For
> example:
>
> uc001abe.2      chr1    -       653074  654579  653880  654321  1
> 653074, 654579,         uc001abe.2
>
> Could you please explain what are cdsStart and cdsEnd?
>
> Thank you,
> Maria
>
>
>
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to