Hi Simon,

We have isolated the problem but have not made the correction yet. We 
cannot give an exact date when this will be released.

If you want to check back in a few weeks for an update, that may be the 
best way to follow-up and see where we are with the correction.

Thanks again for your patience,
Jennifer

On 5/26/10 3:40 PM, Jennifer Jackson wrote:
> Hello Simon,
>
> Thank you for reporting this problem so clearly. We are able to
> reproduce and are close to a solution (we have isolated the source of
> the problem).
>
> For now, please ignore CDS lines with start > end (and the immediately
> following stop-codon lines). There is a logic problem in our code with
> stop codons that span across two exons.
>
> Once we have a solution, we will send you an update and make the
> correction to the Table browser output tools and to download source code.
>
> We apologize for the inconvenience that this has caused you and your
> colleagues and again thank you very much for notifying us about the
> problem!
>
> We will be in touch,
>
> Jennifer
>
> ---------------------------------
> Jennifer Jackson
> UCSC Genome Informatics Group
> http://genome.ucsc.edu/
>
> On 5/26/10 1:27 PM, Simon Anders wrote:
>> Dear UCSC Genome Browser Team
>>
>> A question from a user of my software (CC'ed) lead me to notice a
>> potential bug in the UCSC Genome Table Browser.
>>
>> According to the GFF specs, the value in the start column of a GFF or GTF
>> file must never be larger than the value in the end column. However, the
>> Table Browser does return such lines.
>>
>> Steps to reproduce:
>>
>> In the Table Browser, select the "NCBI37/mm9" assembly, the "UCSC Genes"
>> track and the "known genes" table. As region, set
>> "chr1:40547900-40548100",
>> and requested "GTF" output format.
>>
>> The output contains the following line, describing the last exon of
>> transcript 'uc007aug.1' (gene name Il18r1):
>>
>> chr1 mm9_knownGene CDS 40547903 40547900 0.000000 + 1 gene_id
>> "uc007aug.1"; transcript_id "uc007aug.1";
>>
>> In this line, the CDS seems to have negative length, the end is left of
>> the start!
>>
>> The other transcripts of this gene do not have such a strange exon,
>> rather, the exon seems to actually extend to 40548061.
>>
>> Also note the two lines following the faulty one:
>>
>> chr1 mm9_knownGene stop_codon 40547901 40547903 0.000000 + . gene_id
>> "uc007aug.1"; transcript_id "uc007aug.1";
>> chr1 mm9_knownGene exon 40547903 40548425 0.000000 + . gene_id
>> "uc007aug.1"; transcript_id "uc007aug.1";
>>
>> A stop codon is listed that does not appear in the other transcripts of
>> the same genes that contain this exon. For example, transcript uc007auh.1
>> (for which this exon is not final) has its open reading frame spanning
>> the
>> place of the erroneous stop codon:
>>
>> chr1 mm9_knownGene CDS 40547903 40548061 0.000000 + 2 gene_id
>> "uc007auh.1"; transcript_id "uc007auh.1";
>>
>> Paola (the user who stumbled over this when my script gave an error
>> due to
>> the end being before the start) wrote that she encountered 104 such lines
>> in the entire mm9 GTF file.
>>
>> Could it be that you have some bug in the treatment of prematurely
>> poly-adenylated transcripts?
>>
>> Best regards
>> Simon Anders
>>
>>
>> +---
>> | Dr. Simon Anders, Dipl.-Phys.
>> | European Molecular Biology Laboratory (EMBL), Heidelberg
>> | office phone +49-6221-387-8632
>> | preferred (permanent) e-mail: [email protected]
>> _______________________________________________
>> Genome maillist - [email protected]
>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to