Hello Simon, Thank you for reporting this problem so clearly. We are able to reproduce and are close to a solution (we have isolated the source of the problem).
For now, please ignore CDS lines with start > end (and the immediately following stop-codon lines). There is a logic problem in our code with stop codons that span across two exons. Once we have a solution, we will send you an update and make the correction to the Table browser output tools and to download source code. We apologize for the inconvenience that this has caused you and your colleagues and again thank you very much for notifying us about the problem! We will be in touch, Jennifer --------------------------------- Jennifer Jackson UCSC Genome Informatics Group http://genome.ucsc.edu/ On 5/26/10 1:27 PM, Simon Anders wrote: > Dear UCSC Genome Browser Team > > A question from a user of my software (CC'ed) lead me to notice a > potential bug in the UCSC Genome Table Browser. > > According to the GFF specs, the value in the start column of a GFF or GTF > file must never be larger than the value in the end column. However, the > Table Browser does return such lines. > > Steps to reproduce: > > In the Table Browser, select the "NCBI37/mm9" assembly, the "UCSC Genes" > track and the "known genes" table. As region, set "chr1:40547900-40548100", > and requested "GTF" output format. > > The output contains the following line, describing the last exon of > transcript 'uc007aug.1' (gene name Il18r1): > > chr1 mm9_knownGene CDS 40547903 40547900 0.000000 > + 1 gene_id > "uc007aug.1"; transcript_id "uc007aug.1"; > > In this line, the CDS seems to have negative length, the end is left of > the start! > > The other transcripts of this gene do not have such a strange exon, > rather, the exon seems to actually extend to 40548061. > > Also note the two lines following the faulty one: > > chr1 mm9_knownGene stop_codon 40547901 40547903 > 0.000000 + . gene_id > "uc007aug.1"; transcript_id "uc007aug.1"; > chr1 mm9_knownGene exon 40547903 40548425 0.000000 > + . gene_id > "uc007aug.1"; transcript_id "uc007aug.1"; > > A stop codon is listed that does not appear in the other transcripts of > the same genes that contain this exon. For example, transcript uc007auh.1 > (for which this exon is not final) has its open reading frame spanning the > place of the erroneous stop codon: > > chr1 mm9_knownGene CDS 40547903 40548061 0.000000 > + 2 gene_id > "uc007auh.1"; transcript_id "uc007auh.1"; > > Paola (the user who stumbled over this when my script gave an error due to > the end being before the start) wrote that she encountered 104 such lines > in the entire mm9 GTF file. > > Could it be that you have some bug in the treatment of prematurely > poly-adenylated transcripts? > > Best regards > Simon Anders > > > +--- > | Dr. Simon Anders, Dipl.-Phys. > | European Molecular Biology Laboratory (EMBL), Heidelberg > | office phone +49-6221-387-8632 > | preferred (permanent) e-mail: [email protected] > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
