Hi Dan, Thanks for the feedback. One of our engineers took a look and found that including the "start codon" in the GTF is a bug in the table browser. We've logged it as a bug.
The 5' of this gene is missing in the reference assembly. This is a problem that the GRC knows about, see GRC Incident: HG-146. We do include truncated gene models (as does Ensembl and Gencode). To find out exactly what aligned, you can use the refSeqAli table (which is in PSL format, http://genome.ucsc.edu/FAQ/FAQformat.html#format2). Please let us know if you have any additional questions: [email protected] - Greg Roe UCSC Genome Bioinformatics Group On 1/30/12 2:29 AM, Dan Richards wrote: > Hi, > > Using hg19 RefSeq gene model (from Table Browser, Genes+Prediction group; > RefSeq Genes track; table: refGene; output format: GTF) returns for example: > > chrX hg19_refGene start_codon 76709647 76709649 > 0.000000 + . gene_id "NM_003868"; transcript_id > "NM_003868"; > chrX hg19_refGene CDS 76709647 76709751 > 0.000000 + 0 gene_id "NM_003868"; transcript_id > "NM_003868"; > chrX hg19_refGene exon 76709647 76709751 > 0.000000 + . gene_id "NM_003868"; transcript_id > "NM_003868"; > chrX hg19_refGene CDS 76711768 76712010 > 0.000000 + 0 gene_id "NM_003868"; transcript_id > "NM_003868"; > chrX hg19_refGene stop_codon 76712011 76712013 > 0.000000 + . gene_id "NM_003868"; transcript_id > "NM_003868"; > chrX hg19_refGene exon 76711768 76712013 > 0.000000 + . gene_id "NM_003868"; transcript_id > "NM_003868"; > which incorrectly indicates that the start codon in the first three bases > on the first aligned CDS exon. > > In fact, in cases like there, the first exon is not aligned to hg19, so the > 'first' CDS exon that appears in the hg19 alignment is actually midway > through the coding sequence: > > http://genome.ucsc.edu/cgi-bin/hgc?hgsid=240218517&g=htcCdnaAli&i=NM_003868&c=chrX&l=76709054&r=76712605&o=76709646&aliTable=refSeqAli&table=refGene > > Why are such partial coding alignments included in gene models? > > If they are intentionally included, it seems minimally the 'start_codon' > entry in the gene model should be removed to avoid inaccurate inferences > based on the assumption that the start codon is actually at that location. > Is there a way to determine which refGene alignments do not have an aligned > CDS start in the reference genome? > > Dan > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
