Re: [Artemis-users] Export annotated Artemis data to CLC Bio
Dear Tim, Import in seqret was not faster than manual adaptation of the artemis output file. This is how I did it (no guarantee...): 1) Replace fasta_record with source, keep the right number of spaces (features start at column 22), add the qualifiers /organism=text /mol_type=genomic DNA (these are compulsary) under each FT source line (use a perl script, see example =a=). 2) on top add ID, DT, DE, OS lines (see example =b=, XX is empty line) and add FH Key Location/Qualifiers Then make an FT source line that spans the whole sequence. Add the qualifiers /organism=text /mol_type=genomic DNA (these are compulsary) and /focus /focus is to indicate that the sequence consists of more sources, I would rather use the word contig, because /focus implies that the annotated genome is a contiguous sequence, which is not the case for me. Unfortunately /contig is not an official qualifier. As far as I can judge this is sufficient for import of all features into CLC. I only would like to tell CLC that it is not an contiguous sequence. Examples =a= example perl oneliner To insert after the line with FT source: FT /organism=Organism name FT /mol_type=genomic DNA perl -p -i -e 's/FT source\s{10}\d+\.\.\d+/$\nFT \/organism=\Organism name\\nFT \/mol_type=\genomic DNA\/g' SourceFileArtemis.art =b= Header ID linear; genomic DNA; XX XX DT 02-NOV-2010 XX DE Organism name genome XX OS Organism name XX FH Key Location/Qualifiers FH FT source 1..500 FT /organism=Organism name FT /mol_type=genomic DNA FT /note=text FT /focus Hi Jack Artemis is not really meant as a conversion tool between formats and in particular EMBL/GenBank to GFF, although it will have a go. You could try EMBOSS (seqret) to convert. However, it sounds like you have multiple fasta records in your file which may cause problems if you are writing out embl files. So you may want to try writing the sequence out (File-Write-All bases). Open this single sequence file and then read your annotation into the sequence entry. Then write out the file as EMBL. Regards Tim On 10/21/10 12:26 PM, Jack van de Vossenberg j.vandevossenb...@science.ru.nl wrote: Addition: For Artemis export to GFF, I changed fasta_record to source, which is a Feature Key in standard nomenclature*, and added the mandatory fields /organism= and /mol_type=, but every time I get a message that the source field cannot be exported. Is that normal behaviour? Can anyone tell what goes wrong? Cheers, Jack * http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html On 10/21/2010 10:27 AM, j.vandevossenb...@science.ru.nl wrote: Dear all, I have annotated genome data in Artemis, and I would like to import the result of that annotation into CLC Bio Genomics Workbench (http://www.clcbio.com/). I tried direct import, selected all entries and exported from Artemis to EMBL, Genbank and Sequin. None of these were recognized by CLC, even though it should be able to import many file formats (http://www.clcbio.com/index.php?id=426). I tried SFF, which does not include sequence data. So I used a separate sequence file, the contigs concatenated into one large fasta sequence. CLC has a SFF import filter, which is very picky about the sequence names (read CLC SFF import manual). I managed to let it import SFF, but I did not see any annotation at all, I think because all ORFs are named artemis (gff_seqname artemis). Contig names are lost in SFF, so this option may import all annotated genes, but lose contig info. SFF does not recognize fasta record so I should rename this into something (but what? I tried contig, source, but the GFF file keeps on using ORFs only, all named gff_seqname artemis). Does anyone have experience with this? I thought of using another program as intermediate to convert Artemis data into CLC readable data. Thanks for your help, Jack ___ Artemis-users mailing list Artemis-users@sanger.ac.uk http://lists.sanger.ac.uk/mailman/listinfo/artemis-users ___ Artemis-users mailing list Artemis-users@sanger.ac.uk http://lists.sanger.ac.uk/mailman/listinfo/artemis-users -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. ___ Artemis-users mailing list Artemis-users@sanger.ac.uk http://lists.sanger.ac.uk/mailman/listinfo/artemis-users
Re: [Artemis-users] Export annotated Artemis data to CLC Bio
Addition: For Artemis export to GFF, I changed fasta_record to source, which is a Feature Key in standard nomenclature*, and added the mandatory fields /organism= and /mol_type=, but every time I get a message that the source field cannot be exported. Is that normal behaviour? Can anyone tell what goes wrong? Cheers, Jack * http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html On 10/21/2010 10:27 AM, j.vandevossenb...@science.ru.nl wrote: Dear all, I have annotated genome data in Artemis, and I would like to import the result of that annotation into CLC Bio Genomics Workbench (http://www.clcbio.com/). I tried direct import, selected all entries and exported from Artemis to EMBL, Genbank and Sequin. None of these were recognized by CLC, even though it should be able to import many file formats (http://www.clcbio.com/index.php?id=426). I tried SFF, which does not include sequence data. So I used a separate sequence file, the contigs concatenated into one large fasta sequence. CLC has a SFF import filter, which is very picky about the sequence names (read CLC SFF import manual). I managed to let it import SFF, but I did not see any annotation at all, I think because all ORFs are named artemis (gff_seqname artemis). Contig names are lost in SFF, so this option may import all annotated genes, but lose contig info. SFF does not recognize fasta record so I should rename this into something (but what? I tried contig, source, but the GFF file keeps on using ORFs only, all named gff_seqname artemis). Does anyone have experience with this? I thought of using another program as intermediate to convert Artemis data into CLC readable data. Thanks for your help, Jack ___ Artemis-users mailing list Artemis-users@sanger.ac.uk http://lists.sanger.ac.uk/mailman/listinfo/artemis-users ___ Artemis-users mailing list Artemis-users@sanger.ac.uk http://lists.sanger.ac.uk/mailman/listinfo/artemis-users
Re: [Artemis-users] Export annotated Artemis data to CLC Bio
Hi Jack Artemis is not really meant as a conversion tool between formats and in particular EMBL/GenBank to GFF, although it will have a go. You could try EMBOSS (seqret) to convert. However, it sounds like you have multiple fasta records in your file which may cause problems if you are writing out embl files. So you may want to try writing the sequence out (File-Write-All bases). Open this single sequence file and then read your annotation into the sequence entry. Then write out the file as EMBL. Regards Tim On 10/21/10 12:26 PM, Jack van de Vossenberg j.vandevossenb...@science.ru.nl wrote: Addition: For Artemis export to GFF, I changed fasta_record to source, which is a Feature Key in standard nomenclature*, and added the mandatory fields /organism= and /mol_type=, but every time I get a message that the source field cannot be exported. Is that normal behaviour? Can anyone tell what goes wrong? Cheers, Jack * http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html On 10/21/2010 10:27 AM, j.vandevossenb...@science.ru.nl wrote: Dear all, I have annotated genome data in Artemis, and I would like to import the result of that annotation into CLC Bio Genomics Workbench (http://www.clcbio.com/). I tried direct import, selected all entries and exported from Artemis to EMBL, Genbank and Sequin. None of these were recognized by CLC, even though it should be able to import many file formats (http://www.clcbio.com/index.php?id=426). I tried SFF, which does not include sequence data. So I used a separate sequence file, the contigs concatenated into one large fasta sequence. CLC has a SFF import filter, which is very picky about the sequence names (read CLC SFF import manual). I managed to let it import SFF, but I did not see any annotation at all, I think because all ORFs are named artemis (gff_seqname artemis). Contig names are lost in SFF, so this option may import all annotated genes, but lose contig info. SFF does not recognize fasta record so I should rename this into something (but what? I tried contig, source, but the GFF file keeps on using ORFs only, all named gff_seqname artemis). Does anyone have experience with this? I thought of using another program as intermediate to convert Artemis data into CLC readable data. Thanks for your help, Jack ___ Artemis-users mailing list Artemis-users@sanger.ac.uk http://lists.sanger.ac.uk/mailman/listinfo/artemis-users ___ Artemis-users mailing list Artemis-users@sanger.ac.uk http://lists.sanger.ac.uk/mailman/listinfo/artemis-users ___ Artemis-users mailing list Artemis-users@sanger.ac.uk http://lists.sanger.ac.uk/mailman/listinfo/artemis-users