Hello,

I would like to report a concern I have about UCSC genes listings:

The UCSC gene ID: uc001avb.2 does not have any RefSeq ID 
cross-referenced in the table knownToRefseq.  On the genome browser this 
UCSC gene was clearly derived from RefSeq ID: NM_001010847.1 
<http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Search&db=Nucleotide&term=NM_001010847&doptcmdl=GenBank&tool=genome.ucsc.edu>.
 

The genes should have been generated according to this procedure described:
"For RefSeq transcripts the RefSeq protein prediction is used directly 
instead of this procedure."

Is there any way to fix this or to identify all UCSC gene IDs in which 
this occurs?

Best,
Vikram

On 09/07/2010 08:00 PM, Mary Goldman wrote:
> Hi Vikram,
>
> To get the list of protein coding, canonical genes in GFF format, you
> will need to do a two-part extraction from the Table Browser
> (http://genome.ucsc.edu/cgi-bin/hgTables). The first part involves
> getting a list of canonical genes (ie. no splice variants), while the
> second part involves filtering out non-coding genes by looking for genes
> where the cdsStart does not equal the cdsEnd (our notation for a
> non-coding gene
> https://lists.soe.ucsc.edu/pipermail/genome/2009-July/019588.html).
>
> Getting a list of canonical genes:
> 1. Go to the Table Browser and select your genome and assembly of interest.
> 2. UCSC Genes should automatically be selected as the track. Select
> "knownCanonical" from the table pull down menu.
> 3. Select "selected fields from primary and related tables" as the
> output format and enter a file name for the output file. Click "get output".
> 4. Select "transcript" and then click "get output".
> Please see this previous mailing list question for clarification about
> the construction of the knownCanonical table:
> https://lists.soe.ucsc.edu/pipermail/genome/2005-July/008123.html
>
> Filtering out non-coding genes:
> 1. Go back to the Table Browser and select "knownGene" from the table
> pull down menu.
> 2. To upload our list of canonical genes from before, click "upload
> list" next to identifiers. Select your file and click "submit".
> 3. Make a filter by clicking "create" next to filter. For cdsStart,
> select "!=" from the pull down menu and type "hg19.knownGene.cdsEnd"
> into the text box. Click "submit".
> 4. Select "GTF - gene transfer format" as the output format (GTF is very
> similar to GFF; see this page for more information:
> http://genome.ucsc.edu/FAQ/FAQformat#format4) and click "get output".
>
> I hope this information is helpful.  Please feel free to contact the
> mail list again if you require further assistance.
>
> Best,
> Mary
> ------------------
> Mary Goldman
> UCSC Bioinformatics Group
>
> On 9/3/10 11:59 AM, Vikram Agarwal wrote:
>>     Hello,
>>
>> I would like to extract the coordinates for all protein-coding gene
>> models listed in UCSC genes in gff format.  In the genome browser, it
>> has an option to restrict the viewing of splice variants to show only
>> one gene model per gene.  I would like to extract only one model per
>> gene according to the criterion that this option takes.  Is there an
>> easy way to accomplish this while also removing non-coding genes?  Also,
>> is there information somewhere about the criterion the genome browser
>> takes to view only one gene model?
>>
>> Help is greatly appreciated!
>>
>> Thank you,
>> Vikram
>> _______________________________________________
>> Genome maillist  -  [email protected]
>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to