Thanks for the clarification!

Andrew

On Thu, Apr 15, 2010 at 10:55 PM, Jennifer Jackson <[email protected]> wrote:

> Hello Andrew,
>
> The gene area you are examining is complex.
>
> Technically, UCSC has clustered these transcripts into two distinct genes
> (as noted by the different cluster IDs). The two clusters do not share
> exons, which is a requirement of the gene clustering algorithm used by the
> UCSC Genes processing (see the track description for all the details).
>
> Scientifically, the entire transcript set appears to be related, with the
> annotation noting that the upstream group is protein coding and regulatory
> (transcription factor) in function and the downstream group is non-coding
> with vaguely defined oncogene function noted. The two groups are not the
> same gene (in the classical sense) and they are clearly not paralogs. Given
> this data, merging these clusters together or using a single representative
> transcript would probably result in a loss of information.
>
> So, why is MYC associated with both? Likely a function of how the gene
> symbols are brought into the processing for the track. The genes are
> related. The kgXref table can bring in associations through many sources and
> sometimes the gene symbols/labels should be interpreted to mean "associated
> with gene X" rather than "is gene X". It looks like the second, non-coding
> gene has stronger MYC annotation via RefSeq, but the best advice is to
> examine all of the evidence yourself (at UCSC and the external
> sources/literature) to flush out the exact details.
>
> Hopefully this helps,
> Jennifer
>
>
> ---------------------------------
> Jennifer Jackson
> UCSC Genome Informatics Group
> http://genome.ucsc.edu/
>
>
> On 4/15/10 2:31 PM, Andrew Yee wrote:
>
>> When I was using the knownCanonical table to find the canonical transcript
>> for MYC, I find that there are two entries.  See below.  I also included
>> some fields from hg19.kgXref fields.  Is there an accepted method to
>> determine which one is the most "canonical" transcript?  Perhaps use the
>> transcript where there is a "NM" as the prefix in refseq?
>>
>> Thanks,
>> Andrew
>>
>> #hg19.knownCanonical.chrom hg19.knownCanonical.chromStart
>> hg19.knownCanonical.chromEnd hg19.knownCanonical.clusterId
>> hg19.knownCanonical.transcript hg19.knownCanonical.protein
>> hg19.kgXref.kgID
>> hg19.kgXref.mRNA hg19.kgXref.spID hg19.kgXref.spDisplayID
>> hg19.kgXref.geneSymbol hg19.kgXref.refseq
>>
>> chr8    128748314       128753678       24861   uc003ysi.2      uc003ysi.2
>>      uc003ysi.2      NM_002467       A0N2G3  A0N2G3_HUMAN    MYC
>> NM_002467
>> chr8    128806778       129113498       24862   uc010mdq.2      uc010mdq.2
>>      uc010mdq.2      NR_003367                       MYC     NR_003367
>> _______________________________________________
>> Genome maillist  -  [email protected]
>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>
>
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to