Hi,
A number of 'clusters' in the UCSC Gene annotation overlap on the same strand (if you consider the boundaries of a cluster to be the minimum txstart and the maximum txend of the cluster's transcripts). This was queried in a previous post ( https://lists.soe.ucsc.edu/pipermail/genome/2009-October/020325.html), where Jennifer/Jim explained that clustering is driven by proteins, and non-coding transcripts are merged into the cluster with which they share the greatest overlap. While this explains some of the cluster overlaps, it doesn't shed light on the scenario where non-coding transcripts are allowed to exist as standalone clusters, even though they fall within the boundaries of a larger cluster. For example, in the HG19 version of UCSC Genes, cluster 8145 contains 8 transcripts (5 coding + 3 non-coding). The annotation also contains 41 smaller non-coding clusters, all of which fall completely within the boundaries of cluster 8145. The smaller clusters are numbered consecutively from 8146 to 8186 (inclusive). Is this scenario intentional, or is it possibly an unintended artifact of the clustering process? Thanks! _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
