Hello Maria,
The format of this table is genePred, as defined here in the FAQ:
http://genome.ucsc.edu/FAQ/FAQformat.html#format9
uint txStart; "Transcription start position"
uint txEnd; "Transcription end position"
uint cdsStart; "Coding region start"
uint cdsEnd; "Coding region end"
The query for this data is the mRna transcript, the target is the
reference genome, and the coding region and associated protein is
defined by the input sources and added into the alignment record and
browser display (UCSC does not define the CDS).
For a coding transcript, the alignment would have three main parts
defined by these four coordinates:
5' UTR
Coding region (CDS)
3' UTR
However, when the cdsStart and cdsEnd are equal - this is a signal that
the transcript is non-coding (or at least no coding region has yet been
defined). Sometimes these cdsStart/End values will be equal to the
txStart (as in the UCSC Genes track) and for other gene tracks these
values will be equal to the txEnd (for example, in the RefSeq Genes
track). But this is just a processing detail - the main interpretation
of non-coding applies to both.
We hope this helps,
Thanks
Jennifer
---------------------------------
Jennifer Jackson
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu/
On 2/17/10 9:28 AM, Maria Poptsova wrote:
> Hello,
>
> I downloaded UCSC gene annotation track as follows:
>
> http://genome.ucsc.edu/cgi-bin/hgTables
> group: 'Genes and Gene Prediction Tracks'
> track: 'UCSC Genes'
> table: 'knownGene'
>
> The table looks like:
> #name chrom strand txStart txEnd cdsStart cdsEnd
> exonCount exonStarts exonEnds proteinID alignID
> uc001aaa.2 chr1 + 1115 4121 1115 1115 3
> 1115,2475,3083, 2090,2584,4121, uc001aaa.2
> uc009vip.1 chr1 + 1115 4272 1115 1115 2
> 1115,2475, 2090,4272, uc009vip.1
> uc009vis.1 chr1 - 4268 6628 4268 4268 4
> 4268,4832,5658,6469, 4692,4901,5805,6628, uc009vis.1
> uc001aag.1 chr1 - 5658 7231 5658 5658 4
> 5658,6469,6738,7095, 5810,6628,6918,7231, uc001aag.1
>
> I have few questions:
>
> 1. Why do cdsStart and cdsEnd have the same coordinate? I would think
> that cdsStart should coincide with first exon's start and cdsEnd should
> coincide with the last exon's end.
> 2. In some cases cdsStart differs from cdsEnd but also differs from
> first exon's start, and cdsEnd differs from the last exon's end. For
> example:
>
> uc001abe.2 chr1 - 653074 654579 653880 654321 1
> 653074, 654579, uc001abe.2
>
> Could you please explain what are cdsStart and cdsEnd?
>
> Thank you,
> Maria
>
>
>
> _______________________________________________
> Genome maillist - [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist - [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome