Hi Jennifer,

Thanks for the reply, it's helped to clear things up.


I've come across another example where two clusters (95 and 96) have
overlapping CDS regions, and also share CDS and UTR exons. From a process
perspective, do you know why these haven't been merged?


Thanks!

On Fri, Apr 9, 2010 at 2:53 AM, Jennifer Jackson <[email protected]> wrote:

> Hello,
>
> Your examples represent the presence of genes within genes.
>
> The larger gene spans multi-exons, has both 5' and 3' UTRs, and a confirmed
> translated/coding region (CDS).
>
> The smaller gene(s) fall into the intron regions of the larger gene's
> global footprint and are generally non-coding and one or a few short exons.
>
> Scientifically, this is valid gene organization on the genome. From a
> process perspective, these transcripts/genes are distinct clusters in UCSC
> Genes because they do not share exons.
>
> Hopefully this helps,
> Thanks,
> Jennifer
>
> ---------------------------------
> Jennifer Jackson
> UCSC Genome Informatics Group
> http://genome.ucsc.edu/
>
>
> On 4/8/10 1:38 PM, Bio X2Y wrote:
>
>> Hi,
>>
>>
>> A number of 'clusters' in the UCSC Gene annotation overlap on the same
>> strand (if you consider the boundaries of a cluster to be the minimum
>> txstart and the maximum txend of the cluster's transcripts).
>>
>>
>> This was queried in a previous post (
>> https://lists.soe.ucsc.edu/pipermail/genome/2009-October/020325.html),
>> where
>> Jennifer/Jim explained that clustering is driven by proteins, and
>> non-coding
>> transcripts are merged into the cluster with which they share the greatest
>> overlap.
>>
>>
>> While this explains some of the cluster overlaps, it doesn't shed light on
>> the scenario where non-coding transcripts are allowed to exist as
>> standalone
>> clusters, even though they fall within the boundaries of a larger cluster.
>>
>>
>> For example, in the HG19 version of UCSC Genes, cluster 8145 contains 8
>> transcripts (5 coding + 3 non-coding). The annotation also contains 41
>> smaller non-coding clusters, all of which fall completely within the
>> boundaries of cluster 8145. The smaller clusters are numbered
>> consecutively
>> from 8146 to 8186 (inclusive).
>>
>>
>> Is this scenario intentional, or is it possibly an unintended artifact of
>> the clustering process?
>>
>>
>> Thanks!
>> _______________________________________________
>> Genome maillist  -  [email protected]
>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>
>
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to