Re: [Artemis-users] Export annotated Artemis data to CLC Bio
Note: The first "total sequence" lines are not necessary (the first 5 FT lines in example =b=). This makes sense. Not sure, yet, if CLC treats the sequences als separate contigs. Cheers, Jack > Dear Tim, > > Import in seqret was not faster than manual adaptation of the artemis > output file. This is how I did it (no guarantee...): > > > 1) Replace fasta_record with source, keep the right number of spaces > (features start at column 22), add the qualifiers /organism="text" > /mol_type="genomic DNA" (these are compulsary) under each FT source line > (use a perl script, see example =a=). > > 2) on top add ID, DT, DE, OS lines (see example =b=, XX is empty line) and > add FH Key Location/Qualifiers > Then make an FT source line that spans the whole sequence. Add the > qualifiers /organism="text" /mol_type="genomic DNA" (these are > compulsary) and /focus > > /focus is to indicate that the sequence consists of more "sources", I > would rather use the word contig, because /focus implies that the > annotated genome is a contiguous sequence, which is not the case for me. > Unfortunately /contig is not an official qualifier. > > As far as I can judge this is sufficient for import of all features into > CLC. I only would like to tell CLC that it is not an contiguous sequence. > > > Examples > > =a= example perl oneliner > To insert after the line with FT source: > FT /organism="Organism name" > FT /mol_type="genomic DNA" > > perl -p -i -e 's/FT source\s{10}\d+\.\.\d+/$&\nFT > \/organism=\"Organism name\"\nFT \/mol_type=\"genomic > DNA\"/g' SourceFileArtemis.art > > =b= Header > ID linear; genomic DNA; > XX > XX > DT 02-NOV-2010 > XX > DE Organism name genome > XX > OS Organism name > XX > FH Key Location/Qualifiers > FH > FT source 1..500 > FT /organism=">Organism name" > FT /mol_type="genomic DNA" > FT /note="text" > FT /focus > > >> Hi Jack >> >> Artemis is not really meant as a conversion tool between formats and in >> particular EMBL/GenBank to GFF, although it will have a go. You could >> try >> EMBOSS (seqret) to convert. However, it sounds like you have multiple >> fasta >> records in your file which may cause problems if you are writing out >> embl >> files. So you may want to try writing the sequence out (File->Write->All >> bases). Open this single sequence file and then read your annotation >> into >> the sequence entry. Then write out the file as EMBL. >> >> Regards >> Tim >> >> >> On 10/21/10 12:26 PM, "Jack van de Vossenberg" >> wrote: >> >>> Addition: >>> For Artemis export to GFF, I changed "fasta_record" to "source", which >>> is a Feature Key in standard nomenclature*, and added the mandatory >>> fields /organism= and /mol_type=, but every time I get a message that >>> the source field cannot be exported. >>> >>> Is that normal behaviour? Can anyone tell what goes wrong? >>> >>> Cheers, Jack >>> >>> * >>> http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html >>> >>> On 10/21/2010 10:27 AM, j.vandevossenb...@science.ru.nl wrote: Dear all, I have annotated genome data in Artemis, and I would like to import the result of that annotation into CLC Bio Genomics Workbench (http://www.clcbio.com/). I tried direct import, selected all entries and exported from Artemis to EMBL, Genbank and Sequin. None of these were recognized by CLC, even though it should be able to import many file formats (http://www.clcbio.com/index.php?id=426). I tried SFF, which does not include sequence data. So I used a separate sequence file, the contigs concatenated into one large fasta sequence. CLC has a SFF import filter, which is very picky about the sequence names (read CLC SFF import manual). I managed to let it import SFF, but I did not see any annotation at all, I think because all ORFs are named artemis ("gff_seqname artemis"). Contig names are lost in SFF, so this option may import all annotated genes, but lose contig info. SFF does not recognize "fasta record" so I should rename this into something (but what? I tried "contig", "source", but the GFF file keeps on using ORFs only, all named "gff_seqname artemis"). Does anyone have experience with this? I thought of using another program as intermediate to convert Artemis data into CLC readable data. Thanks for your help, Jack ___ Artemis-users mailing list Artemis-users@sanger.ac.uk http://lists.sanger.ac.uk/mailman/listinfo/artemis-users >>> >>> >>> ___ >>> Artemis-users mailing list >>> Artemis-users@sanger.ac.uk >>> http://lists.sanger.ac.uk/mailman/listinfo/artemis
Re: [Artemis-users] Export annotated Artemis data to CLC Bio
Dear Tim, Import in seqret was not faster than manual adaptation of the artemis output file. This is how I did it (no guarantee...): 1) Replace fasta_record with source, keep the right number of spaces (features start at column 22), add the qualifiers /organism="text" /mol_type="genomic DNA" (these are compulsary) under each FT source line (use a perl script, see example =a=). 2) on top add ID, DT, DE, OS lines (see example =b=, XX is empty line) and add FH Key Location/Qualifiers Then make an FT source line that spans the whole sequence. Add the qualifiers /organism="text" /mol_type="genomic DNA" (these are compulsary) and /focus /focus is to indicate that the sequence consists of more "sources", I would rather use the word contig, because /focus implies that the annotated genome is a contiguous sequence, which is not the case for me. Unfortunately /contig is not an official qualifier. As far as I can judge this is sufficient for import of all features into CLC. I only would like to tell CLC that it is not an contiguous sequence. Examples =a= example perl oneliner To insert after the line with FT source: FT /organism="Organism name" FT /mol_type="genomic DNA" perl -p -i -e 's/FT source\s{10}\d+\.\.\d+/$&\nFT \/organism=\"Organism name\"\nFT \/mol_type=\"genomic DNA\"/g' SourceFileArtemis.art =b= Header ID linear; genomic DNA; XX XX DT 02-NOV-2010 XX DE Organism name genome XX OS Organism name XX FH Key Location/Qualifiers FH FT source 1..500 FT /organism=">Organism name" FT /mol_type="genomic DNA" FT /note="text" FT /focus > Hi Jack > > Artemis is not really meant as a conversion tool between formats and in > particular EMBL/GenBank to GFF, although it will have a go. You could try > EMBOSS (seqret) to convert. However, it sounds like you have multiple > fasta > records in your file which may cause problems if you are writing out embl > files. So you may want to try writing the sequence out (File->Write->All > bases). Open this single sequence file and then read your annotation into > the sequence entry. Then write out the file as EMBL. > > Regards > Tim > > > On 10/21/10 12:26 PM, "Jack van de Vossenberg" > wrote: > >> Addition: >> For Artemis export to GFF, I changed "fasta_record" to "source", which >> is a Feature Key in standard nomenclature*, and added the mandatory >> fields /organism= and /mol_type=, but every time I get a message that >> the source field cannot be exported. >> >> Is that normal behaviour? Can anyone tell what goes wrong? >> >> Cheers, Jack >> >> * >> http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html >> >> On 10/21/2010 10:27 AM, j.vandevossenb...@science.ru.nl wrote: >>> Dear all, >>> >>> I have annotated genome data in Artemis, and I would like to import the >>> result of that annotation into CLC Bio Genomics Workbench >>> (http://www.clcbio.com/). >>> >>> I tried direct import, selected all entries and exported from Artemis >>> to >>> EMBL, Genbank and Sequin. None of these were recognized by CLC, even >>> though it should be able to import many file formats >>> (http://www.clcbio.com/index.php?id=426). >>> I tried SFF, which does not include sequence data. So I used a separate >>> sequence file, the contigs concatenated into one large fasta sequence. >>> CLC >>> has a SFF import filter, which is very picky about the sequence names >>> (read CLC SFF import manual). I managed to let it import SFF, but I did >>> not see any annotation at all, I think because all ORFs are named >>> artemis >>> ("gff_seqname artemis"). Contig names are lost in SFF, so this option >>> may >>> import all annotated genes, but lose contig info. SFF does not >>> recognize >>> "fasta record" so I should rename this into something (but what? I >>> tried >>> "contig", "source", but the GFF file keeps on using ORFs only, all >>> named >>> "gff_seqname artemis"). >>> >>> Does anyone have experience with this? I thought of using another >>> program >>> as intermediate to convert Artemis data into CLC readable data. >>> >>> Thanks for your help, Jack >>> >>> >>> ___ >>> Artemis-users mailing list >>> Artemis-users@sanger.ac.uk >>> http://lists.sanger.ac.uk/mailman/listinfo/artemis-users >>> >> >> >> ___ >> Artemis-users mailing list >> Artemis-users@sanger.ac.uk >> http://lists.sanger.ac.uk/mailman/listinfo/artemis-users > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > ___ Artemis-users mailing list Artemis-users@sange
Re: [Artemis-users] Export annotated Artemis data to CLC Bio
Hi Jack Artemis is not really meant as a conversion tool between formats and in particular EMBL/GenBank to GFF, although it will have a go. You could try EMBOSS (seqret) to convert. However, it sounds like you have multiple fasta records in your file which may cause problems if you are writing out embl files. So you may want to try writing the sequence out (File->Write->All bases). Open this single sequence file and then read your annotation into the sequence entry. Then write out the file as EMBL. Regards Tim On 10/21/10 12:26 PM, "Jack van de Vossenberg" wrote: > Addition: > For Artemis export to GFF, I changed "fasta_record" to "source", which > is a Feature Key in standard nomenclature*, and added the mandatory > fields /organism= and /mol_type=, but every time I get a message that > the source field cannot be exported. > > Is that normal behaviour? Can anyone tell what goes wrong? > > Cheers, Jack > > * http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html > > On 10/21/2010 10:27 AM, j.vandevossenb...@science.ru.nl wrote: >> Dear all, >> >> I have annotated genome data in Artemis, and I would like to import the >> result of that annotation into CLC Bio Genomics Workbench >> (http://www.clcbio.com/). >> >> I tried direct import, selected all entries and exported from Artemis to >> EMBL, Genbank and Sequin. None of these were recognized by CLC, even >> though it should be able to import many file formats >> (http://www.clcbio.com/index.php?id=426). >> I tried SFF, which does not include sequence data. So I used a separate >> sequence file, the contigs concatenated into one large fasta sequence. CLC >> has a SFF import filter, which is very picky about the sequence names >> (read CLC SFF import manual). I managed to let it import SFF, but I did >> not see any annotation at all, I think because all ORFs are named artemis >> ("gff_seqname artemis"). Contig names are lost in SFF, so this option may >> import all annotated genes, but lose contig info. SFF does not recognize >> "fasta record" so I should rename this into something (but what? I tried >> "contig", "source", but the GFF file keeps on using ORFs only, all named >> "gff_seqname artemis"). >> >> Does anyone have experience with this? I thought of using another program >> as intermediate to convert Artemis data into CLC readable data. >> >> Thanks for your help, Jack >> >> >> ___ >> Artemis-users mailing list >> Artemis-users@sanger.ac.uk >> http://lists.sanger.ac.uk/mailman/listinfo/artemis-users >> > > > ___ > Artemis-users mailing list > Artemis-users@sanger.ac.uk > http://lists.sanger.ac.uk/mailman/listinfo/artemis-users ___ Artemis-users mailing list Artemis-users@sanger.ac.uk http://lists.sanger.ac.uk/mailman/listinfo/artemis-users
Re: [Artemis-users] Export annotated Artemis data to CLC Bio
Addition: For Artemis export to GFF, I changed "fasta_record" to "source", which is a Feature Key in standard nomenclature*, and added the mandatory fields /organism= and /mol_type=, but every time I get a message that the source field cannot be exported. Is that normal behaviour? Can anyone tell what goes wrong? Cheers, Jack * http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html On 10/21/2010 10:27 AM, j.vandevossenb...@science.ru.nl wrote: Dear all, I have annotated genome data in Artemis, and I would like to import the result of that annotation into CLC Bio Genomics Workbench (http://www.clcbio.com/). I tried direct import, selected all entries and exported from Artemis to EMBL, Genbank and Sequin. None of these were recognized by CLC, even though it should be able to import many file formats (http://www.clcbio.com/index.php?id=426). I tried SFF, which does not include sequence data. So I used a separate sequence file, the contigs concatenated into one large fasta sequence. CLC has a SFF import filter, which is very picky about the sequence names (read CLC SFF import manual). I managed to let it import SFF, but I did not see any annotation at all, I think because all ORFs are named artemis ("gff_seqname artemis"). Contig names are lost in SFF, so this option may import all annotated genes, but lose contig info. SFF does not recognize "fasta record" so I should rename this into something (but what? I tried "contig", "source", but the GFF file keeps on using ORFs only, all named "gff_seqname artemis"). Does anyone have experience with this? I thought of using another program as intermediate to convert Artemis data into CLC readable data. Thanks for your help, Jack ___ Artemis-users mailing list Artemis-users@sanger.ac.uk http://lists.sanger.ac.uk/mailman/listinfo/artemis-users ___ Artemis-users mailing list Artemis-users@sanger.ac.uk http://lists.sanger.ac.uk/mailman/listinfo/artemis-users
[Artemis-users] Export annotated Artemis data to CLC Bio
Dear all, I have annotated genome data in Artemis, and I would like to import the result of that annotation into CLC Bio Genomics Workbench (http://www.clcbio.com/). I tried direct import, selected all entries and exported from Artemis to EMBL, Genbank and Sequin. None of these were recognized by CLC, even though it should be able to import many file formats (http://www.clcbio.com/index.php?id=426). I tried SFF, which does not include sequence data. So I used a separate sequence file, the contigs concatenated into one large fasta sequence. CLC has a SFF import filter, which is very picky about the sequence names (read CLC SFF import manual). I managed to let it import SFF, but I did not see any annotation at all, I think because all ORFs are named artemis ("gff_seqname artemis"). Contig names are lost in SFF, so this option may import all annotated genes, but lose contig info. SFF does not recognize "fasta record" so I should rename this into something (but what? I tried "contig", "source", but the GFF file keeps on using ORFs only, all named "gff_seqnameartemis"). Does anyone have experience with this? I thought of using another program as intermediate to convert Artemis data into CLC readable data. Thanks for your help, Jack ___ Artemis-users mailing list Artemis-users@sanger.ac.uk http://lists.sanger.ac.uk/mailman/listinfo/artemis-users