Thanks for the clarification! Andrew
On Thu, Apr 15, 2010 at 10:55 PM, Jennifer Jackson <[email protected]> wrote: > Hello Andrew, > > The gene area you are examining is complex. > > Technically, UCSC has clustered these transcripts into two distinct genes > (as noted by the different cluster IDs). The two clusters do not share > exons, which is a requirement of the gene clustering algorithm used by the > UCSC Genes processing (see the track description for all the details). > > Scientifically, the entire transcript set appears to be related, with the > annotation noting that the upstream group is protein coding and regulatory > (transcription factor) in function and the downstream group is non-coding > with vaguely defined oncogene function noted. The two groups are not the > same gene (in the classical sense) and they are clearly not paralogs. Given > this data, merging these clusters together or using a single representative > transcript would probably result in a loss of information. > > So, why is MYC associated with both? Likely a function of how the gene > symbols are brought into the processing for the track. The genes are > related. The kgXref table can bring in associations through many sources and > sometimes the gene symbols/labels should be interpreted to mean "associated > with gene X" rather than "is gene X". It looks like the second, non-coding > gene has stronger MYC annotation via RefSeq, but the best advice is to > examine all of the evidence yourself (at UCSC and the external > sources/literature) to flush out the exact details. > > Hopefully this helps, > Jennifer > > > --------------------------------- > Jennifer Jackson > UCSC Genome Informatics Group > http://genome.ucsc.edu/ > > > On 4/15/10 2:31 PM, Andrew Yee wrote: > >> When I was using the knownCanonical table to find the canonical transcript >> for MYC, I find that there are two entries. See below. I also included >> some fields from hg19.kgXref fields. Is there an accepted method to >> determine which one is the most "canonical" transcript? Perhaps use the >> transcript where there is a "NM" as the prefix in refseq? >> >> Thanks, >> Andrew >> >> #hg19.knownCanonical.chrom hg19.knownCanonical.chromStart >> hg19.knownCanonical.chromEnd hg19.knownCanonical.clusterId >> hg19.knownCanonical.transcript hg19.knownCanonical.protein >> hg19.kgXref.kgID >> hg19.kgXref.mRNA hg19.kgXref.spID hg19.kgXref.spDisplayID >> hg19.kgXref.geneSymbol hg19.kgXref.refseq >> >> chr8 128748314 128753678 24861 uc003ysi.2 uc003ysi.2 >> uc003ysi.2 NM_002467 A0N2G3 A0N2G3_HUMAN MYC >> NM_002467 >> chr8 128806778 129113498 24862 uc010mdq.2 uc010mdq.2 >> uc010mdq.2 NR_003367 MYC NR_003367 >> _______________________________________________ >> Genome maillist - [email protected] >> https://lists.soe.ucsc.edu/mailman/listinfo/genome >> > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
