Hi Vikram,

To get the list of protein coding, canonical genes in GFF format, you 
will need to do a two-part extraction from the Table Browser 
(http://genome.ucsc.edu/cgi-bin/hgTables). The first part involves 
getting a list of canonical genes (ie. no splice variants), while the 
second part involves filtering out non-coding genes by looking for genes 
where the cdsStart does not equal the cdsEnd (our notation for a 
non-coding gene 
https://lists.soe.ucsc.edu/pipermail/genome/2009-July/019588.html).

Getting a list of canonical genes:
1. Go to the Table Browser and select your genome and assembly of interest.
2. UCSC Genes should automatically be selected as the track. Select 
"knownCanonical" from the table pull down menu.
3. Select "selected fields from primary and related tables" as the 
output format and enter a file name for the output file. Click "get output".
4. Select "transcript" and then click "get output".
Please see this previous mailing list question for clarification about 
the construction of the knownCanonical table:
https://lists.soe.ucsc.edu/pipermail/genome/2005-July/008123.html

Filtering out non-coding genes:
1. Go back to the Table Browser and select "knownGene" from the table 
pull down menu.
2. To upload our list of canonical genes from before, click "upload 
list" next to identifiers. Select your file and click "submit".
3. Make a filter by clicking "create" next to filter. For cdsStart, 
select "!=" from the pull down menu and type "hg19.knownGene.cdsEnd" 
into the text box. Click "submit".
4. Select "GTF - gene transfer format" as the output format (GTF is very 
similar to GFF; see this page for more information: 
http://genome.ucsc.edu/FAQ/FAQformat#format4) and click "get output".

I hope this information is helpful.  Please feel free to contact the 
mail list again if you require further assistance.

Best,
Mary
------------------
Mary Goldman
UCSC Bioinformatics Group

On 9/3/10 11:59 AM, Vikram Agarwal wrote:
>    Hello,
>
> I would like to extract the coordinates for all protein-coding gene
> models listed in UCSC genes in gff format.  In the genome browser, it
> has an option to restrict the viewing of splice variants to show only
> one gene model per gene.  I would like to extract only one model per
> gene according to the criterion that this option takes.  Is there an
> easy way to accomplish this while also removing non-coding genes?  Also,
> is there information somewhere about the criterion the genome browser
> takes to view only one gene model?
>
> Help is greatly appreciated!
>
> Thank you,
> Vikram
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>    
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to