Hi Sutada, We are having trouble making your example convert using the gff3ToGenePred program. Could you please provide us with the command you are using to get the conversion. Also, if you want to provide the file, which would be helpful to us, please email it to me directly (off-list).
Another thing to note, which might explain some of what you are observing is that gff3 and GenePred file types have different coordinate systems, which will result in a single base difference in your start coordinates. For GFF3, the coordinates have a 1-based start; from the Data File Formats page (http://hgwdev.cse.ucsc.edu/FAQ/FAQformat.html#format3): start - The starting position of the feature in the sequence. The first base is numbered 1. For GenePred, the coordinates have a 0-based start, meaning that the first base is numbered 0. In your email, you stated: For aug_v1.1-nomask.15578.t1...The first position of the first exon should be 2 not 0... However, it looks to me like your original GFF had the first position of the first exon at 1 (not 2 as you said): Scaffold375 AUGUSTUS exon 1 76 . - . Parent=aug_v1.1-nomask.15578.t1 So, in GenePred, it makes sense that the first exon would be at 0 rather than 1 because the gff3ToGenePred program must convert the GFF3 1-based start coordinates into 0-based start coordinates for the GenePred format). When GenePred is displayed in the browser, the browser will display items with start coordinate of 0 as 1. For more information about 0-based start, see: http://genome.ucsc.edu/FAQ/FAQtracks.html#tracks1. If this coordinate difference doesn't explain the frameshift you were seeing, please send us the information above (command you're using to get the conversion, and, if you'd like, the file). Katrina Learned UCSC Genome Bioinformatics Group On 11/15/11 4:15 PM, sutada Mungpakdee wrote: > Hi, > > Does anyone has this problem when convert gff3 to GenePred? > > When the program convert gff3 to GenePred, the position of exons were > changed and gave frameshift. As I would like to annotate snp > (synonymous/non-synonymous for frameshift SNP) and another software will > count number of BPs from coding start to coding end and then divided by 3. > With exon position generated by gff3toGenePred give the total bp not equal to > number of codon, have one/two residues more. > > Here is gff3 file, > > > Scaffold17 AUGUSTUS gene 1 2885 0.28 - . > ID=aug_v1.1-nomask.01791 > Scaffold17 AUGUSTUS transcript 1 2885 0.28 - > . ID=aug_v1.1-nomask.01791.t1;Parent=aug_v1.1-nomask.01791 > Scaffold17 AUGUSTUS intron 45 470 1 - . > Parent=aug_v1.1-nomask.01791.t1 > Scaffold17 AUGUSTUS intron 552 1459 1 - . > Parent=aug_v1.1-nomask.01791.t1 > Scaffold17 AUGUSTUS CDS 1 44 1 - 0 > ID=aug_v1.1-nomask.01791.t1.cds;Parent=aug_v1.1-nomask.01791.t1 > Scaffold17 AUGUSTUS exon 1 44 . - . > Parent=aug_v1.1-nomask.01791.t1 > Scaffold17 AUGUSTUS CDS 471 551 1 - 0 > ID=aug_v1.1-nomask.01791.t1.cds;Parent=aug_v1.1-nomask.01791.t1 > Scaffold17 AUGUSTUS exon 471 551 . - . > Parent=aug_v1.1-nomask.01791.t1 > Scaffold17 AUGUSTUS CDS 1460 1477 1 - 0 > ID=aug_v1.1-nomask.01791.t1.cds;Parent=aug_v1.1-nomask.01791.t1 > Scaffold17 AUGUSTUS exon 1460 1489 . - . > Parent=aug_v1.1-nomask.01791.t1 > Scaffold17 AUGUSTUS start_codon 1475 1477 . - > 0 Parent=aug_v1.1-nomask.01791.t1 > Scaffold17 AUGUSTUS exon 2011 2086 . - . > Parent=aug_v1.1-nomask.01791.t1 > Scaffold17 AUGUSTUS exon 2377 2451 . - . > Parent=aug_v1.1-nomask.01791.t1 > Scaffold17 AUGUSTUS exon 2776 2885 . - . > Parent=aug_v1.1-nomask.01791.t1 > Scaffold17 AUGUSTUS transcription_start_site 2885 2885 > . - . Parent=aug_v1.1-nomask.01791.t1 > > Scaffold375 AUGUSTUS gene 1 605 0.1 - . > ID=aug_v1.1-nomask.15578 > Scaffold375 AUGUSTUS transcript 1 605 0.1 - > . ID=aug_v1.1-nomask.15578.t1;Parent=aug_v1.1-nomask.15578 > Scaffold375 AUGUSTUS intron 77 373 0.94 - . > Parent=aug_v1.1-nomask.15578.t1 > Scaffold375 AUGUSTUS CDS 1 76 0.91 - 2 > ID=aug_v1.1-nomask.15578.t1.cds;Parent=aug_v1.1-nomask.15578.t1 > Scaffold375 AUGUSTUS exon 1 76 . - . > Parent=aug_v1.1-nomask.15578.t1 > Scaffold375 AUGUSTUS CDS 374 500 0.6 - 0 > ID=aug_v1.1-nomask.15578.t1.cds;Parent=aug_v1.1-nomask.15578.t1 > Scaffold375 AUGUSTUS exon 374 605 . - . > Parent=aug_v1.1-nomask.15578.t1 > Scaffold375 AUGUSTUS start_codon 498 500 . - > 0 Parent=aug_v1.1-nomask.15578.t1 > Scaffold375 AUGUSTUS transcription_start_site 605 605 > . - . Parent=aug_v1.1-nomask.15578.t1 > > > Here is GenePred I got from this gene. > aug_v1.1-nomask.01791.t1 Scaffold17 - 0 2885 0 > 1477 6 0,470,1459,2010,2376,2775, 44,551,1489,2086,2451,2885, > aug_v1.1-nomask.01791 > aug_v1.1-nomask.15578.t1 Scaffold375 - 0 605 0 > 500 2 0,373, 76,605, aug_v1.1-nomask.15578 > > For aug_v1.1-nomask.15578.t1 mRNA lenght = 76+127 = 203 bp which is have 2bp > extra. It should be 201 bp. The first position of the first exon should be 2 > not 0 then the total length = 74+127=201 bp = 67 codons. Do you have any > suggestion? I don't know how to fix the file as almost all genes are > frameshifted. > > Best regards, > Sutada > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
