Dear Vikram,

I've looked into this, and I see what's going on.  The current version
of UCSC Genes was built using a download of RefSeq sequences dated Aug
24, 2009.  The sequence NM_001010847.1 was added to RefSeq as a
reviewed sequence on June 30, 2010.  When NM_001010847.1 was released,
it replaced three predicted RefSeq entries: XM_059074.6, XM_943661.3,
and XM_001713942.2.  Because UCSC Genes does not link to predicted
RefSeq sequences, there were no links recorded between uc001avb.2 and
these sequences.  There were links between uc001avb.2 and the
non-RefSeq sequences that it was built from, such as LRC38_HUMAN.

We are working on updating UCSC Genes, and there should be a new
version within a few months.  In the meantime, UCSC Genes will not
contain links to any RefSeq sequences that were added since Aug 24,
2009.

I hope this information helps.  If you have any more questions, feel
free to reply to this mail thread.

Cheers,

Melissa


On Thu, Sep 9, 2010 at 3:42 PM, Vikram Agarwal <[email protected]> wrote:
>  Hello,
>
> I would like to report a concern I have about UCSC genes listings:
>
> The UCSC gene ID: uc001avb.2 does not have any RefSeq ID
> cross-referenced in the table knownToRefseq.  On the genome browser this
> UCSC gene was clearly derived from RefSeq ID: NM_001010847.1
> <http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Search&db=Nucleotide&term=NM_001010847&doptcmdl=GenBank&tool=genome.ucsc.edu>.
>
> The genes should have been generated according to this procedure described:
> "For RefSeq transcripts the RefSeq protein prediction is used directly
> instead of this procedure."
>
> Is there any way to fix this or to identify all UCSC gene IDs in which
> this occurs?
>
> Best,
> Vikram
>
> On 09/07/2010 08:00 PM, Mary Goldman wrote:
>> Hi Vikram,
>>
>> To get the list of protein coding, canonical genes in GFF format, you
>> will need to do a two-part extraction from the Table Browser
>> (http://genome.ucsc.edu/cgi-bin/hgTables). The first part involves
>> getting a list of canonical genes (ie. no splice variants), while the
>> second part involves filtering out non-coding genes by looking for genes
>> where the cdsStart does not equal the cdsEnd (our notation for a
>> non-coding gene
>> https://lists.soe.ucsc.edu/pipermail/genome/2009-July/019588.html).
>>
>> Getting a list of canonical genes:
>> 1. Go to the Table Browser and select your genome and assembly of interest.
>> 2. UCSC Genes should automatically be selected as the track. Select
>> "knownCanonical" from the table pull down menu.
>> 3. Select "selected fields from primary and related tables" as the
>> output format and enter a file name for the output file. Click "get output".
>> 4. Select "transcript" and then click "get output".
>> Please see this previous mailing list question for clarification about
>> the construction of the knownCanonical table:
>> https://lists.soe.ucsc.edu/pipermail/genome/2005-July/008123.html
>>
>> Filtering out non-coding genes:
>> 1. Go back to the Table Browser and select "knownGene" from the table
>> pull down menu.
>> 2. To upload our list of canonical genes from before, click "upload
>> list" next to identifiers. Select your file and click "submit".
>> 3. Make a filter by clicking "create" next to filter. For cdsStart,
>> select "!=" from the pull down menu and type "hg19.knownGene.cdsEnd"
>> into the text box. Click "submit".
>> 4. Select "GTF - gene transfer format" as the output format (GTF is very
>> similar to GFF; see this page for more information:
>> http://genome.ucsc.edu/FAQ/FAQformat#format4) and click "get output".
>>
>> I hope this information is helpful.  Please feel free to contact the
>> mail list again if you require further assistance.
>>
>> Best,
>> Mary
>> ------------------
>> Mary Goldman
>> UCSC Bioinformatics Group
>>
>> On 9/3/10 11:59 AM, Vikram Agarwal wrote:
>>>     Hello,
>>>
>>> I would like to extract the coordinates for all protein-coding gene
>>> models listed in UCSC genes in gff format.  In the genome browser, it
>>> has an option to restrict the viewing of splice variants to show only
>>> one gene model per gene.  I would like to extract only one model per
>>> gene according to the criterion that this option takes.  Is there an
>>> easy way to accomplish this while also removing non-coding genes?  Also,
>>> is there information somewhere about the criterion the genome browser
>>> takes to view only one gene model?
>>>
>>> Help is greatly appreciated!
>>>
>>> Thank you,
>>> Vikram
>>> _______________________________________________
>>> Genome maillist  -  [email protected]
>>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>>
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to