Hi Sutada,

We are having trouble making your example convert using the 
gff3ToGenePred program. Could you please provide us with the command you 
are using to get the conversion. Also, if you want to provide the file, 
which would be helpful to us, please email it to me directly (off-list).

Another thing to note, which might explain some of what you are 
observing is that gff3 and GenePred file types have different coordinate 
systems, which will result in a single base difference in your start 
coordinates.

For GFF3, the coordinates have a 1-based start; from the Data File 
Formats page (http://hgwdev.cse.ucsc.edu/FAQ/FAQformat.html#format3):

start - The starting position of the feature in the sequence. The first 
base is numbered 1.

For GenePred, the coordinates have a 0-based start, meaning that the 
first base is numbered 0.

In your email, you stated:

    For aug_v1.1-nomask.15578.t1...The first position of the first exon
    should be 2 not 0...

However, it looks to me like your original GFF had the first position of 
the first exon at 1 (not 2 as you said):

Scaffold375     AUGUSTUS        exon    1       76      .       -       .       
Parent=aug_v1.1-nomask.15578.t1


So, in GenePred, it makes sense that the first exon would be at 0 rather 
than 1 because the gff3ToGenePred program must convert the GFF3 1-based 
start coordinates into 0-based start coordinates for the GenePred 
format). When GenePred is displayed in the browser, the browser will 
display items with start coordinate of 0 as 1. For more information 
about 0-based start, see: http://genome.ucsc.edu/FAQ/FAQtracks.html#tracks1.

If this coordinate difference doesn't explain the frameshift you were 
seeing, please send us the information above (command you're using to 
get the conversion, and, if you'd like, the file).

Katrina Learned
UCSC Genome Bioinformatics Group


On 11/15/11 4:15 PM, sutada Mungpakdee wrote:
> Hi,
>
> Does anyone has this problem when convert gff3 to GenePred?
>
>   When the program convert gff3 to GenePred, the position of exons were 
> changed and gave frameshift. As I would like to annotate snp 
> (synonymous/non-synonymous for frameshift SNP) and another software will 
> count number of BPs from coding start to coding end and then divided by 3. 
> With exon position generated by gff3toGenePred give the total bp not equal to 
> number of codon, have one/two residues more.
>
> Here is gff3 file,
>
>
> Scaffold17    AUGUSTUS        gene    1       2885    0.28    -       .       
> ID=aug_v1.1-nomask.01791
> Scaffold17    AUGUSTUS        transcript      1       2885    0.28    -       
> .       ID=aug_v1.1-nomask.01791.t1;Parent=aug_v1.1-nomask.01791
> Scaffold17    AUGUSTUS        intron  45      470     1       -       .       
> Parent=aug_v1.1-nomask.01791.t1
> Scaffold17    AUGUSTUS        intron  552     1459    1       -       .       
> Parent=aug_v1.1-nomask.01791.t1
> Scaffold17    AUGUSTUS        CDS     1       44      1       -       0       
> ID=aug_v1.1-nomask.01791.t1.cds;Parent=aug_v1.1-nomask.01791.t1
> Scaffold17    AUGUSTUS        exon    1       44      .       -       .       
> Parent=aug_v1.1-nomask.01791.t1
> Scaffold17    AUGUSTUS        CDS     471     551     1       -       0       
> ID=aug_v1.1-nomask.01791.t1.cds;Parent=aug_v1.1-nomask.01791.t1
> Scaffold17    AUGUSTUS        exon    471     551     .       -       .       
> Parent=aug_v1.1-nomask.01791.t1
> Scaffold17    AUGUSTUS        CDS     1460    1477    1       -       0       
> ID=aug_v1.1-nomask.01791.t1.cds;Parent=aug_v1.1-nomask.01791.t1
> Scaffold17    AUGUSTUS        exon    1460    1489    .       -       .       
> Parent=aug_v1.1-nomask.01791.t1
> Scaffold17    AUGUSTUS        start_codon     1475    1477    .       -       
> 0       Parent=aug_v1.1-nomask.01791.t1
> Scaffold17    AUGUSTUS        exon    2011    2086    .       -       .       
> Parent=aug_v1.1-nomask.01791.t1
> Scaffold17    AUGUSTUS        exon    2377    2451    .       -       .       
> Parent=aug_v1.1-nomask.01791.t1
> Scaffold17    AUGUSTUS        exon    2776    2885    .       -       .       
> Parent=aug_v1.1-nomask.01791.t1
> Scaffold17    AUGUSTUS        transcription_start_site        2885    2885    
> .       -       .       Parent=aug_v1.1-nomask.01791.t1
>
> Scaffold375   AUGUSTUS        gene    1       605     0.1     -       .       
> ID=aug_v1.1-nomask.15578
> Scaffold375   AUGUSTUS        transcript      1       605     0.1     -       
> .       ID=aug_v1.1-nomask.15578.t1;Parent=aug_v1.1-nomask.15578
> Scaffold375   AUGUSTUS        intron  77      373     0.94    -       .       
> Parent=aug_v1.1-nomask.15578.t1
> Scaffold375   AUGUSTUS        CDS     1       76      0.91    -       2       
> ID=aug_v1.1-nomask.15578.t1.cds;Parent=aug_v1.1-nomask.15578.t1
> Scaffold375   AUGUSTUS        exon    1       76      .       -       .       
> Parent=aug_v1.1-nomask.15578.t1
> Scaffold375   AUGUSTUS        CDS     374     500     0.6     -       0       
> ID=aug_v1.1-nomask.15578.t1.cds;Parent=aug_v1.1-nomask.15578.t1
> Scaffold375   AUGUSTUS        exon    374     605     .       -       .       
> Parent=aug_v1.1-nomask.15578.t1
> Scaffold375   AUGUSTUS        start_codon     498     500     .       -       
> 0       Parent=aug_v1.1-nomask.15578.t1
> Scaffold375   AUGUSTUS        transcription_start_site        605     605     
> .       -       .       Parent=aug_v1.1-nomask.15578.t1
>
>
> Here is GenePred I got from this gene.
> aug_v1.1-nomask.01791.t1      Scaffold17      -       0       2885    0       
> 1477    6       0,470,1459,2010,2376,2775,      44,551,1489,2086,2451,2885,   
>   aug_v1.1-nomask.01791
> aug_v1.1-nomask.15578.t1      Scaffold375     -       0       605     0       
> 500     2       0,373,  76,605, aug_v1.1-nomask.15578
>
> For aug_v1.1-nomask.15578.t1 mRNA lenght = 76+127 = 203 bp which is have 2bp 
> extra. It should be 201 bp. The first position of the first exon should be 2 
> not 0 then the total length = 74+127=201 bp = 67 codons. Do you have any 
> suggestion?  I don't know how to fix the file as almost all genes are 
> frameshifted.
>
> Best regards,
> Sutada
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to