Re: [Artemis-users] Export annotated Artemis data to CLC Bio

2010-11-02 Thread J . vandeVossenberg
Note:

The first "total sequence" lines are not necessary (the first 5 FT lines
in example =b=). This makes sense. Not sure, yet, if CLC treats the
sequences als separate contigs.

Cheers, Jack

> Dear Tim,
>
> Import in seqret was not faster than manual adaptation of the artemis
> output file. This is how I did it (no guarantee...):
>
>
> 1) Replace fasta_record with source, keep the right number of spaces
> (features start at column 22), add  the qualifiers /organism="text"
> /mol_type="genomic DNA" (these are compulsary) under each FT   source line
> (use a perl script, see example =a=).
>
> 2) on top add ID, DT, DE, OS lines (see example =b=, XX is empty line) and
> add FH   Key Location/Qualifiers
> Then make an FT   source line that spans the whole sequence. Add the
> qualifiers /organism="text"  /mol_type="genomic DNA" (these are
> compulsary) and /focus
>
> /focus is to indicate that the sequence consists of more "sources", I
> would rather use the word contig, because /focus implies that the
> annotated genome is a contiguous sequence, which is not the case for me.
> Unfortunately /contig is not an official qualifier.
>
> As far as I can judge this is sufficient for import of all features into
> CLC. I only would like to tell CLC that it is not an contiguous sequence.
>
>
> Examples
>
> =a= example perl oneliner
> To insert after the line with FT   source:
> FT   /organism="Organism name"
> FT   /mol_type="genomic DNA"
>
> perl -p -i -e 's/FT   source\s{10}\d+\.\.\d+/$&\nFT
> \/organism=\"Organism name\"\nFT   \/mol_type=\"genomic
> DNA\"/g' SourceFileArtemis.art
>
> =b= Header
> ID   linear; genomic DNA;
> XX
> XX
> DT   02-NOV-2010
> XX
> DE   Organism name genome
> XX
> OS   Organism name
> XX
> FH   Key Location/Qualifiers
> FH
> FT   source  1..500
> FT   /organism=">Organism name"
> FT   /mol_type="genomic DNA"
> FT   /note="text"
> FT   /focus
>
>
>> Hi Jack
>>
>> Artemis is not really meant as a conversion tool between formats and in
>> particular EMBL/GenBank to GFF, although it will have a go. You could
>> try
>> EMBOSS (seqret) to convert. However, it sounds like you have multiple
>> fasta
>> records in your file which may cause problems if you are writing out
>> embl
>> files. So you may want to try writing the sequence out (File->Write->All
>> bases). Open this single sequence file and then read your annotation
>> into
>> the sequence entry. Then write out the file as EMBL.
>>
>> Regards
>> Tim
>>
>>
>> On 10/21/10 12:26 PM, "Jack van de Vossenberg"
>>  wrote:
>>
>>> Addition:
>>> For Artemis export to GFF, I changed "fasta_record" to "source", which
>>> is a Feature Key in standard nomenclature*, and added the mandatory
>>> fields /organism= and /mol_type=, but every time I get a message that
>>> the source field cannot be exported.
>>>
>>> Is that normal behaviour? Can anyone tell what goes wrong?
>>>
>>> Cheers, Jack
>>>
>>> *
>>> http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html
>>>
>>> On 10/21/2010 10:27 AM, j.vandevossenb...@science.ru.nl wrote:
 Dear all,

 I have annotated genome data in Artemis, and I would like to import
 the
 result of that annotation into CLC Bio Genomics Workbench
 (http://www.clcbio.com/).

 I tried direct import, selected all entries and exported from Artemis
 to
 EMBL, Genbank and Sequin. None of these were recognized by CLC, even
 though it should be able to import many file formats
 (http://www.clcbio.com/index.php?id=426).
 I tried SFF, which does not include sequence data. So I used a
 separate
 sequence file, the contigs concatenated into one large fasta sequence.
 CLC
 has a SFF import filter, which is very picky about the sequence names
 (read CLC SFF import manual). I managed to let it import SFF, but I
 did
 not see any annotation at all, I think because all ORFs are named
 artemis
 ("gff_seqname artemis"). Contig names are lost in SFF, so this option
 may
 import all annotated genes, but lose contig info. SFF does not
 recognize
 "fasta record" so I should rename this into something (but what? I
 tried
 "contig", "source", but the GFF file keeps on using ORFs only, all
 named
 "gff_seqname artemis").

 Does anyone have experience with this? I thought of using another
 program
 as intermediate to convert Artemis data into CLC readable data.

 Thanks for your help, Jack


 ___
 Artemis-users mailing list
 Artemis-users@sanger.ac.uk
 http://lists.sanger.ac.uk/mailman/listinfo/artemis-users

>>>
>>>
>>> ___
>>> Artemis-users mailing list
>>> Artemis-users@sanger.ac.uk
>>> http://lists.sanger.ac.uk/mailman/listinfo/artemis

Re: [Artemis-users] Export annotated Artemis data to CLC Bio

2010-11-02 Thread J . vandeVossenberg
Dear Tim,

Import in seqret was not faster than manual adaptation of the artemis
output file. This is how I did it (no guarantee...):


1) Replace fasta_record with source, keep the right number of spaces
(features start at column 22), add  the qualifiers /organism="text" 
/mol_type="genomic DNA" (these are compulsary) under each FT   source line
(use a perl script, see example =a=).

2) on top add ID, DT, DE, OS lines (see example =b=, XX is empty line) and
add FH   Key Location/Qualifiers
Then make an FT   source line that spans the whole sequence. Add the
qualifiers /organism="text"  /mol_type="genomic DNA" (these are
compulsary) and /focus

/focus is to indicate that the sequence consists of more "sources", I
would rather use the word contig, because /focus implies that the
annotated genome is a contiguous sequence, which is not the case for me.
Unfortunately /contig is not an official qualifier.

As far as I can judge this is sufficient for import of all features into
CLC. I only would like to tell CLC that it is not an contiguous sequence.


Examples

=a= example perl oneliner
To insert after the line with FT   source:
FT   /organism="Organism name"
FT   /mol_type="genomic DNA"

perl -p -i -e 's/FT   source\s{10}\d+\.\.\d+/$&\nFT  
\/organism=\"Organism name\"\nFT   \/mol_type=\"genomic
DNA\"/g' SourceFileArtemis.art

=b= Header
ID   linear; genomic DNA;
XX
XX
DT   02-NOV-2010
XX
DE   Organism name genome
XX
OS   Organism name
XX
FH   Key Location/Qualifiers
FH
FT   source  1..500
FT   /organism=">Organism name"
FT   /mol_type="genomic DNA"
FT   /note="text"
FT   /focus


> Hi Jack
>
> Artemis is not really meant as a conversion tool between formats and in
> particular EMBL/GenBank to GFF, although it will have a go. You could try
> EMBOSS (seqret) to convert. However, it sounds like you have multiple
> fasta
> records in your file which may cause problems if you are writing out embl
> files. So you may want to try writing the sequence out (File->Write->All
> bases). Open this single sequence file and then read your annotation into
> the sequence entry. Then write out the file as EMBL.
>
> Regards
> Tim
>
>
> On 10/21/10 12:26 PM, "Jack van de Vossenberg"
>  wrote:
>
>> Addition:
>> For Artemis export to GFF, I changed "fasta_record" to "source", which
>> is a Feature Key in standard nomenclature*, and added the mandatory
>> fields /organism= and /mol_type=, but every time I get a message that
>> the source field cannot be exported.
>>
>> Is that normal behaviour? Can anyone tell what goes wrong?
>>
>> Cheers, Jack
>>
>> *
>> http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html
>>
>> On 10/21/2010 10:27 AM, j.vandevossenb...@science.ru.nl wrote:
>>> Dear all,
>>>
>>> I have annotated genome data in Artemis, and I would like to import the
>>> result of that annotation into CLC Bio Genomics Workbench
>>> (http://www.clcbio.com/).
>>>
>>> I tried direct import, selected all entries and exported from Artemis
>>> to
>>> EMBL, Genbank and Sequin. None of these were recognized by CLC, even
>>> though it should be able to import many file formats
>>> (http://www.clcbio.com/index.php?id=426).
>>> I tried SFF, which does not include sequence data. So I used a separate
>>> sequence file, the contigs concatenated into one large fasta sequence.
>>> CLC
>>> has a SFF import filter, which is very picky about the sequence names
>>> (read CLC SFF import manual). I managed to let it import SFF, but I did
>>> not see any annotation at all, I think because all ORFs are named
>>> artemis
>>> ("gff_seqname artemis"). Contig names are lost in SFF, so this option
>>> may
>>> import all annotated genes, but lose contig info. SFF does not
>>> recognize
>>> "fasta record" so I should rename this into something (but what? I
>>> tried
>>> "contig", "source", but the GFF file keeps on using ORFs only, all
>>> named
>>> "gff_seqname artemis").
>>>
>>> Does anyone have experience with this? I thought of using another
>>> program
>>> as intermediate to convert Artemis data into CLC readable data.
>>>
>>> Thanks for your help, Jack
>>>
>>>
>>> ___
>>> Artemis-users mailing list
>>> Artemis-users@sanger.ac.uk
>>> http://lists.sanger.ac.uk/mailman/listinfo/artemis-users
>>>
>>
>>
>> ___
>> Artemis-users mailing list
>> Artemis-users@sanger.ac.uk
>> http://lists.sanger.ac.uk/mailman/listinfo/artemis-users
>
>
>
>
> --
>  The Wellcome Trust Sanger Institute is operated by Genome Research
>  Limited, a charity registered in England with number 1021457 and a
>  company registered in England with number 2742969, whose registered
>  office is 215 Euston Road, London, NW1 2BE.
>



___
Artemis-users mailing list
Artemis-users@sange

Re: [Artemis-users] Export annotated Artemis data to CLC Bio

2010-10-21 Thread Tim Carver
Hi Jack

Artemis is not really meant as a conversion tool between formats and in
particular EMBL/GenBank to GFF, although it will have a go. You could try
EMBOSS (seqret) to convert. However, it sounds like you have multiple fasta
records in your file which may cause problems if you are writing out embl
files. So you may want to try writing the sequence out (File->Write->All
bases). Open this single sequence file and then read your annotation into
the sequence entry. Then write out the file as EMBL.

Regards
Tim


On 10/21/10 12:26 PM, "Jack van de Vossenberg"
 wrote:

> Addition:
> For Artemis export to GFF, I changed "fasta_record" to "source", which
> is a Feature Key in standard nomenclature*, and added the mandatory
> fields /organism= and /mol_type=, but every time I get a message that
> the source field cannot be exported.
> 
> Is that normal behaviour? Can anyone tell what goes wrong?
> 
> Cheers, Jack
> 
> * http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html
> 
> On 10/21/2010 10:27 AM, j.vandevossenb...@science.ru.nl wrote:
>> Dear all,
>> 
>> I have annotated genome data in Artemis, and I would like to import the
>> result of that annotation into CLC Bio Genomics Workbench
>> (http://www.clcbio.com/).
>> 
>> I tried direct import, selected all entries and exported from Artemis to
>> EMBL, Genbank and Sequin. None of these were recognized by CLC, even
>> though it should be able to import many file formats
>> (http://www.clcbio.com/index.php?id=426).
>> I tried SFF, which does not include sequence data. So I used a separate
>> sequence file, the contigs concatenated into one large fasta sequence. CLC
>> has a SFF import filter, which is very picky about the sequence names
>> (read CLC SFF import manual). I managed to let it import SFF, but I did
>> not see any annotation at all, I think because all ORFs are named artemis
>> ("gff_seqname artemis"). Contig names are lost in SFF, so this option may
>> import all annotated genes, but lose contig info. SFF does not recognize
>> "fasta record" so I should rename this into something (but what? I tried
>> "contig", "source", but the GFF file keeps on using ORFs only, all named
>> "gff_seqname artemis").
>> 
>> Does anyone have experience with this? I thought of using another program
>> as intermediate to convert Artemis data into CLC readable data.
>> 
>> Thanks for your help, Jack
>> 
>> 
>> ___
>> Artemis-users mailing list
>> Artemis-users@sanger.ac.uk
>> http://lists.sanger.ac.uk/mailman/listinfo/artemis-users
>>
> 
> 
> ___
> Artemis-users mailing list
> Artemis-users@sanger.ac.uk
> http://lists.sanger.ac.uk/mailman/listinfo/artemis-users



___
Artemis-users mailing list
Artemis-users@sanger.ac.uk
http://lists.sanger.ac.uk/mailman/listinfo/artemis-users


Re: [Artemis-users] Export annotated Artemis data to CLC Bio

2010-10-21 Thread Jack van de Vossenberg

Addition:
For Artemis export to GFF, I changed "fasta_record" to "source", which 
is a Feature Key in standard nomenclature*, and added the mandatory 
fields /organism= and /mol_type=, but every time I get a message that 
the source field cannot be exported.


Is that normal behaviour? Can anyone tell what goes wrong?

Cheers, Jack

* http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html

On 10/21/2010 10:27 AM, j.vandevossenb...@science.ru.nl wrote:

Dear all,

I have annotated genome data in Artemis, and I would like to import the
result of that annotation into CLC Bio Genomics Workbench
(http://www.clcbio.com/).

I tried direct import, selected all entries and exported from Artemis to
EMBL, Genbank and Sequin. None of these were recognized by CLC, even
though it should be able to import many file formats
(http://www.clcbio.com/index.php?id=426).
I tried SFF, which does not include sequence data. So I used a separate
sequence file, the contigs concatenated into one large fasta sequence. CLC
has a SFF import filter, which is very picky about the sequence names
(read CLC SFF import manual). I managed to let it import SFF, but I did
not see any annotation at all, I think because all ORFs are named artemis
("gff_seqname  artemis"). Contig names are lost in SFF, so this option may
import all annotated genes, but lose contig info. SFF does not recognize
"fasta record" so I should rename this into something (but what? I tried
"contig", "source", but the GFF file keeps on using ORFs only, all named
"gff_seqname   artemis").

Does anyone have experience with this? I thought of using another program
as intermediate to convert Artemis data into CLC readable data.

Thanks for your help, Jack


___
Artemis-users mailing list
Artemis-users@sanger.ac.uk
http://lists.sanger.ac.uk/mailman/listinfo/artemis-users
   



___
Artemis-users mailing list
Artemis-users@sanger.ac.uk
http://lists.sanger.ac.uk/mailman/listinfo/artemis-users


[Artemis-users] Export annotated Artemis data to CLC Bio

2010-10-21 Thread J . vandeVossenberg
Dear all,

I have annotated genome data in Artemis, and I would like to import the
result of that annotation into CLC Bio Genomics Workbench
(http://www.clcbio.com/).

I tried direct import, selected all entries and exported from Artemis to
EMBL, Genbank and Sequin. None of these were recognized by CLC, even
though it should be able to import many file formats
(http://www.clcbio.com/index.php?id=426).
I tried SFF, which does not include sequence data. So I used a separate
sequence file, the contigs concatenated into one large fasta sequence. CLC
has a SFF import filter, which is very picky about the sequence names
(read CLC SFF import manual). I managed to let it import SFF, but I did
not see any annotation at all, I think because all ORFs are named artemis
("gff_seqname   artemis"). Contig names are lost in SFF, so this option may
import all annotated genes, but lose contig info. SFF does not recognize
"fasta record" so I should rename this into something (but what? I tried
"contig", "source", but the GFF file keeps on using ORFs only, all named
"gff_seqnameartemis").

Does anyone have experience with this? I thought of using another program
as intermediate to convert Artemis data into CLC readable data.

Thanks for your help, Jack


___
Artemis-users mailing list
Artemis-users@sanger.ac.uk
http://lists.sanger.ac.uk/mailman/listinfo/artemis-users