Hello Pete,

For data in the mRna track, the information for cds comes directly from 
the Genbank data sheet. As you noticed, there are some notational 
characters in this information sometimes. These are important, 
especially the arrows, since it means that the CDS is incomplete in the 
mRNA (extends towards to arrow to an unspecified position). There are 
descriptions of how to interpret these annotations at NCBI. One place to 
start: http://www.ncbi.nlm.nih.gov/collab/FT/index.html#2.3

When viewing the details page for a sequence in the mRNA track, this cds 
data is exactly displayed. For an example, using hg19, search for 
"EF143990". Then click on the sequence name in the Browser display to 
find the description page with this listed "CDS: 1..>108".

Are you perhaps examining other tracks (from the Gene and Gene 
Prediction group?) when you are seeing more clearly defined coding 
regions? If so, then you will need to use the tables associated with 
those tracks to extract coding regions (CDS). For a genePred table, the 
table field labels are cdsStart and cdsEnd. These are genomic 
coordinates, so follow the UCSC coordinate rules: zero-based, half open, 
with respect to (+) strand. To covert to 1-based, fully closed, add +1 
to the start to convert to the actual base covered plus reverse the 
coordinates for seqs aligned to the (-) strand) for the actual (-) 
strand coding range(s).
http://genomewiki.ucsc.edu/index.php/Coordinate_Transforms

Hopefully this helps to make the data more clear. If you have a specific 
example where the table data differs from the display, we would be glad 
to take a look.

1) Please note exactly how the track data was found: assembly, track, 
identifier, position (as a double check), and what cds data you see on 
the description page.

2) Please note exactly how the table data was found: assembly, track, 
identifier, position, tables used/linked and how, and what data you find 
in the cds.name field.

We hope this helps, but followup questions are welcomed,
Jennifer

---------------------------------
Jennifer Jackson
UCSC Genome Informatics Group
http://genome.ucsc.edu/

On 5/11/10 11:13 AM, Pete Shepard wrote:
> Dear Genome Browser Folks,
>
> I have been trying to extract mrna information from your all_mrna table. I
> would like to get the coding start and end information for each mrna,
> currently I am doing this using the cds.name field that has the start and
> stop of some of the mrna but some numbers have>  or<  included in this
> field. In the browser, this information seems to be available for each gene
> but I not using my method. I am wondering if there is a better way of
> getting this information?
>
> TIA
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to