Dear UCSC Genome Browser Team

A question from a user of my software (CC'ed) lead me to notice a
potential bug in the UCSC Genome Table Browser.

According to the GFF specs, the value in the start column of a GFF or GTF
file must never be larger than the value in the end column. However, the
Table Browser does return such lines.

Steps to reproduce:

In the Table Browser, select the "NCBI37/mm9" assembly, the "UCSC Genes"
track and the "known genes" table. As region, set "chr1:40547900-40548100",
and requested "GTF" output format. 

The output contains the following line, describing the last exon of
transcript 'uc007aug.1' (gene name Il18r1):

chr1    mm9_knownGene   CDS     40547903        40547900        0.000000        
+       1       gene_id
"uc007aug.1"; transcript_id "uc007aug.1"; 

In this line, the CDS seems to have negative length, the end is left of
the start!

The other transcripts of this gene do not have such a strange exon,
rather, the exon seems to actually extend to 40548061.

Also note the two lines following the faulty one:

chr1    mm9_knownGene   stop_codon      40547901        40547903        
0.000000        +       .       gene_id
"uc007aug.1"; transcript_id "uc007aug.1"; 
chr1    mm9_knownGene   exon    40547903        40548425        0.000000        
+       .       gene_id
"uc007aug.1"; transcript_id "uc007aug.1"; 

A stop codon is listed that does not appear in the other transcripts of
the same genes that contain this exon. For example, transcript uc007auh.1
(for which this exon is not final) has its open reading frame spanning the
place of the erroneous stop codon:

chr1    mm9_knownGene   CDS     40547903        40548061        0.000000        
+       2       gene_id
"uc007auh.1"; transcript_id "uc007auh.1"; 

Paola (the user who stumbled over this when my script gave an error due to
the end being before the start) wrote that she encountered 104 such lines
in the entire mm9 GTF file.

Could it be that you have some bug in the treatment of prematurely
poly-adenylated transcripts?

Best regards
  Simon Anders


+---
| Dr. Simon Anders, Dipl.-Phys.
| European Molecular Biology Laboratory (EMBL), Heidelberg
| office phone +49-6221-387-8632
| preferred (permanent) e-mail: [email protected]
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to