Re: [ccp4bb] sequence format conversion
K- One bioinformatics tool that converts nucleitude sequence formats across many such formats is Biology Workbench, out of UCSD Dexter Kennedy, MD Thumbed from my iPhone On May 8, 2012, at 12:02 AM, K Singh wrote: > Dear All > I was looking for a script or an informatics tool enabling me to > change the sequence from FASTA format to something like following: > >> FASTA FORMAT > abcdefghijklmnopqrstuvwxyz > > to > > 1 abcde fghij > 11 klmno pqrst > 21 uvwxy z > > > Many thanks in advance > > Regards > Kris
Re: [ccp4bb] sequence format conversion
A good tool should leave "b" as is: it is ASX (the standard ambiguity code for ASP or ASN). "j", "o" and "u" are a different matter :-) http://www.uniprot.org/manual/non_std "Selenocyteine [sic!] and pyrrolysine are represented in the sequence using the one-letter codes U for selenocysteine and O for pyrrolysine" --Gerard ** Gerard J. Kleywegt http://xray.bmc.uu.se/gerard mailto:ger...@xray.bmc.uu.se ** The opinions in this message are fictional. Any similarity to actual opinions, living or dead, is purely coincidental. ** Little known gastromathematical curiosity: let "z" be the radius and "a" the thickness of a pizza. Then the volume of that pizza is equal to pi*z*z*a ! **
Re: [ccp4bb] sequence format conversion
On Tue, 2012-05-08 at 09:22 +0100, Marko Hyvonen wrote: > PS. fasta format needs ">" as a first line with (optional) description in > the input file. And not sure what amino acids "b" and "j" would get > converted to :-) A good tool should leave "b" as is: it is ASX (the standard ambiguity code for ASP or ASN). "j", "o" and "u" are a different matter :-) Regards, Peter. -- Peter Keller Tel.: +44 (0)1223 353033 Global Phasing Ltd., Fax.: +44 (0)1223 366889 Sheraton House, Castle Park, Cambridge CB3 0AX United Kingdom
Re: [ccp4bb] sequence format conversion
Guess, you are looking for ReadSeq; with GenBank|gb set as output format. http://www.ebi.ac.uk/cgi-bin/readseq.cgi However, no idea how to get only 10 residues in a line, if you are specific. best Manish > > From: K Singh >To: CCP4BB@JISCMAIL.AC.UK >Sent: Tuesday, May 8, 2012 12:32 PM >Subject: [ccp4bb] sequence format conversion > >Dear All >I was looking for a script or an informatics tool enabling me to >change the sequence from FASTA format to something like following: > >>FASTA FORMAT >abcdefghijklmnopqrstuvwxyz > >to > > 1 abcde fghij >11 klmno pqrst >21 uvwxy z > > >Many thanks in advance > >Regards >Kris > > >
Re: [ccp4bb] sequence format conversion
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Dear Kris, except for formatting, something like this: sed -e "s/\(.\)\(.\)/ \1 \2\n/g" test.fasta | awk '{count = (10*LN +1); print count, $0; ++LN}' (all on one line) should do the job. Cheers, Tim On 05/08/12 09:02, K Singh wrote: > Dear All I was looking for a script or an informatics tool enabling > me to change the sequence from FASTA format to something like > following: > >> FASTA FORMAT > abcdefghijklmnopqrstuvwxyz > > to > > 1 abcde fghij 11 klmno pqrst 21 uvwxy z > > > Many thanks in advance > > Regards Kris > - -- - -- Dr Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iD8DBQFPqNxkUxlJ7aRr7hoRAhgaAKDvcRqxAuHMC+Ek9LHzufVBEvIQZACgz17H goSKt88kLCJX5GXcI5Sl6iE= =HkFQ -END PGP SIGNATURE-
Re: [ccp4bb] sequence format conversion
Surely a sequence analysis tools are the easiest way to do it. I'd recommend EMBOSS (open source and runs nicely on most platforms - the "ccp4" of sequence analysis for me at least) http://emboss.sourceforge.net/ Seqret (SEQuence RETurn) program: seqret -out test.seq -osformat gcg test.fasta Marko PS. fasta format needs ">" as a first line with (optional) description in the input file. And not sure what amino acids "b" and "j" would get converted to :-) On Tue, 8 May 2012, Francois Berenger wrote: More seriously, there is the babel command from Open Babel in case the second format you show has a known name. On 05/08/2012 04:46 PM, Francois Berenger wrote: Hello, The tool is called awk. There is also another tool called Perl, but I won't recommend it. Regards, F. On 05/08/2012 04:02 PM, K Singh wrote: Dear All I was looking for a script or an informatics tool enabling me to change the sequence from FASTA format to something like following: FASTA FORMAT abcdefghijklmnopqrstuvwxyz to 1 abcde fghij 11 klmno pqrst 21 uvwxy z Many thanks in advance Regards Kris _ Marko Hyvonen Department of Biochemistry, University of Cambridge ma...@cryst.bioc.cam.ac.uk http://www-cryst.bioc.cam.ac.uk/groups/hyvonen tel:+44-(0)1223-766 044 mobile: +44-(0)7796-174 877 fax:+44-(0)1223-766 002 --
Re: [ccp4bb] sequence format conversion
More seriously, there is the babel command from Open Babel in case the second format you show has a known name. On 05/08/2012 04:46 PM, Francois Berenger wrote: Hello, The tool is called awk. There is also another tool called Perl, but I won't recommend it. Regards, F. On 05/08/2012 04:02 PM, K Singh wrote: Dear All I was looking for a script or an informatics tool enabling me to change the sequence from FASTA format to something like following: FASTA FORMAT abcdefghijklmnopqrstuvwxyz to 1 abcde fghij 11 klmno pqrst 21 uvwxy z Many thanks in advance Regards Kris
Re: [ccp4bb] sequence format conversion
Hello, The tool is called awk. There is also another tool called Perl, but I won't recommend it. Regards, F. On 05/08/2012 04:02 PM, K Singh wrote: Dear All I was looking for a script or an informatics tool enabling me to change the sequence from FASTA format to something like following: FASTA FORMAT abcdefghijklmnopqrstuvwxyz to 1 abcde fghij 11 klmno pqrst 21 uvwxy z Many thanks in advance Regards Kris