Hi Paul, thank for your interest. I am talking about multiple protein sequence alignments generated by the program clustalW (see http://www.ebi.ac.uk/clustalw/help.html for additional information). Since the sequences to be aligned can be very long, in the output clustalW split the sequences in fragments, and every fragment starts with the name of the aligned sequence (please check the attahced file clustal.aln to see an example). This is what I previouly called block. If the program parseclustal.pl (also attached) was working properly the output should look like the file "parseclustal.aln" (see attachement), but instead the result is that seen in the attached badparseclustal.aln. I run the program with the following command "parseclustal.pl clustal.aln > output"
Please, let me know if you find a solution and thanks for your help.
Regards,
Pedro Reche
-- *************************************************************************** PEDRO a. RECHE gallardo, pHD TL: 617 632 3824 Scientist, Mol.Immnunol.Foundation, FX: 617 632 3351 Dana-Farber Cancer Institute, EM: [EMAIL PROTECTED] Harvard Medical School, URL: http://www.reche.org 44 Binney Street, D610C, Boston, MA 02115 ***************************************************************************
YPK1 SQLSWKRLLMKGYIPPYKPAVS-----NSMDTSNFDEEFTR---EKPIDSVVDEYLSESV------QKQF YPK2 KDISWKKLLLKGYIPPYKPIVK-----SEIDTANFDQEFTK---EKPIDSVVDEYLSASI------QKQF KPCA_HUMAN RRIDWEKLENREIQPPFKPKVC------GKGAENFDKFFTR---GQPVLTPPDQLVIANID-----QSDF KPCZ_HUMAN RSIDWDLLEKKQALPPFQPQIT-----DDYGLDNFDTQFTS---EPVQLTPDDEDAIKRID-----QSEF KAPA KEVVWEKLLSRNIETPYEPPIQ----QGQGDTSQFDKYPE----EDINYGVQGEDPYADL------FRDF KAPC NEVIWEKLLARYIETPYEPPIQ----QGQGDTSQFDRYPE----EEFNYGIQGEDPYMDL------MKEF KAPB SEVVWERLLAKDIETPYEPPIT----SGIGDTSLFDQYPE----EQLDYGIQGDDPYAEY------FQDF KS6_HUMAN RHINWEELLARKVEPPFKPLLQ-----SEEDVSQFDSKFTR---QTPVDSP-DDSTLSESA-----NQVF KPC1 RNINFDDILNLRVKPPYIPEIK-----SPEDTSYFEQEFTS---APPTLTPLPSVLTTSQ------QEEF KRAC_BOVIN ASIVWQDVYEKKLSPPFKPQVT-----SETDTRYFDEEFTA---QMITITPPDQDDSMEGVDS-ERRPHF SCH9 ADIDWEALKQKKIPPPFKPHLV-----SETDTSNFDPEFTT---ASTSYMNKHQPMMTATPLSPAMQAKF KGP1_DROME LGFDWDGLASQLLIPPFVRPIA-----HPTDVRYFDRFPC------DLNEPPDELSGWDA--------DF ARK2_RAT KGIDWQYVYLRKYPPPLIPPRGEVNAADAFDIGSFDEEDTKG--IKLLDCDQDLYKNFPLMISERWQQEV DBFB AEINFETLRTS--SPPFIPQLD-----DETDAGYFDDFTNEEDMAKYADVFKRQNKLSAMVDDSAVDSKL DBF2 ADINFSTLRSM--IPPFTPQLD-----SETDAGYFDDFTSEADMAKYADVFKRQDKLTAMVDDSAVSSKL YPK1 SQLSWKRLLMKGYIPPYKPAVS-----NSMDTSNFDEEFTR---EKPIDSVVDEYLSESV------QKQF YPK2 KDISWKKLLLKGYIPPYKPIVK-----SEIDTANFDQEFTK---EKPIDSVVDEYLSASI------QKQF KPCA_HUMAN RRIDWEKLENREIQPPFKPKVC------GKGAENFDKFFTR---GQPVLTPPDQLVIANID-----QSDF KPCZ_HUMAN RSIDWDLLEKKQALPPFQPQIT-----DDYGLDNFDTQFTS---EPVQLTPDDEDAIKRID-----QSEF KAPA KEVVWEKLLSRNIETPYEPPIQ----QGQGDTSQFDKYPE----EDINYGVQGEDPYADL------FRDF KAPC NEVIWEKLLARYIETPYEPPIQ----QGQGDTSQFDRYPE----EEFNYGIQGEDPYMDL------MKEF KAPB SEVVWERLLAKDIETPYEPPIT----SGIGDTSLFDQYPE----EQLDYGIQGDDPYAEY------FQDF KS6_HUMAN RHINWEELLARKVEPPFKPLLQ-----SEEDVSQFDSKFTR---QTPVDSP-DDSTLSESA-----NQVF KPC1 RNINFDDILNLRVKPPYIPEIK-----SPEDTSYFEQEFTS---APPTLTPLPSVLTTSQ------QEEF KRAC_BOVIN ASIVWQDVYEKKLSPPFKPQVT-----SETDTRYFDEEFTA---QMITITPPDQDDSMEGVDS-ERRPHF SCH9 ADIDWEALKQKKIPPPFKPHLV-----SETDTSNFDPEFTT---ASTSYMNKHQPMMTATPLSPAMQAKF KGP1_DROME LGFDWDGLASQLLIPPFVRPIA-----HPTDVRYFDRFPC------DLNEPPDELSGWDA--------DF ARK2_RAT KGIDWQYVYLRKYPPPLIPPRGEVNAADAFDIGSFDEEDTKG--IKLLDCDQDLYKNFPLMISERWQQEV DBFB AEINFETLRTS--SPPFIPQLD-----DETDAGYFDDFTNEEDMAKYADVFKRQNKLSAMVDDSAVDSKL DBF2 ADINFSTLRSM--IPPFTPQLD-----SETDAGYFDDFTSEADMAKYADVFKRQDKLTAMVDDSAVSSKL
CLUSTAL W(1.60) multiple sequence alignment
YPK1 SQLSWKRLLMKGYIPPYKPAVS-----NSMDTSNFDEEFTR---EKPIDSVVDEYLSESV
YPK2 KDISWKKLLLKGYIPPYKPIVK-----SEIDTANFDQEFTK---EKPIDSVVDEYLSASI
KPCA_HUMAN RRIDWEKLENREIQPPFKPKVC------GKGAENFDKFFTR---GQPVLTPPDQLVIANI
KPCZ_HUMAN RSIDWDLLEKKQALPPFQPQIT-----DDYGLDNFDTQFTS---EPVQLTPDDEDAIKRI
KAPA KEVVWEKLLSRNIETPYEPPIQ----QGQGDTSQFDKYPE----EDINYGVQGEDPYADL
KAPC NEVIWEKLLARYIETPYEPPIQ----QGQGDTSQFDRYPE----EEFNYGIQGEDPYMDL
KAPB SEVVWERLLAKDIETPYEPPIT----SGIGDTSLFDQYPE----EQLDYGIQGDDPYAEY
KS6_HUMAN RHINWEELLARKVEPPFKPLLQ-----SEEDVSQFDSKFTR---QTPVDSP-DDSTLSES
KPC1 RNINFDDILNLRVKPPYIPEIK-----SPEDTSYFEQEFTS---APPTLTPLPSVLTTSQ
KRAC_BOVIN ASIVWQDVYEKKLSPPFKPQVT-----SETDTRYFDEEFTA---QMITITPPDQDDSMEG
SCH9 ADIDWEALKQKKIPPPFKPHLV-----SETDTSNFDPEFTT---ASTSYMNKHQPMMTAT
KGP1_DROME LGFDWDGLASQLLIPPFVRPIA-----HPTDVRYFDRFPC------DLNEPPDELSGWDA
ARK2_RAT KGIDWQYVYLRKYPPPLIPPRGEVNAADAFDIGSFDEEDTKG--IKLLDCDQDLYKNFPL
DBFB AEINFETLRTS--SPPFIPQLD-----DETDAGYFDDFTNEEDMAKYADVFKRQNKLSAM
DBF2 ADINFSTLRSM--IPPFTPQLD-----SETDAGYFDDFTSEADMAKYADVFKRQDKLTAM
* *.
YPK1 ------QKQF
YPK2 ------QKQF
KPCA_HUMAN D-----QSDF
KPCZ_HUMAN D-----QSEF
KAPA ------FRDF
KAPC ------MKEF
KAPB ------FQDF
KS6_HUMAN A-----NQVF
KPC1 ------QEEF
KRAC_BOVIN VDS-ERRPHF
SCH9 PLSPAMQAKF
KGP1_DROME --------DF
ARK2_RAT MISERWQQEV
DBFB VDDSAVDSKL
DBF2 VDDSAVSSKL
YPK1 SQLSWKRLLMKGYIPPYKPAVS-----NSMDTSNFDEEFTR---EKPIDSVVDEYLSESV------QKQF YPK2 KDISWKKLLLKGYIPPYKPIVK-----SEIDTANFDQEFTK---EKPIDSVVDEYLSASI------QKQF KPCA_HUMAN RRIDWEKLENREIQPPFKPKVC------GKGAENFDKFFTR---GQPVLTPPDQLVIANID-----QSDF KPCZ_HUMAN RSIDWDLLEKKQALPPFQPQIT-----DDYGLDNFDTQFTS---EPVQLTPDDEDAIKRID-----QSEF KAPA KEVVWEKLLSRNIETPYEPPIQ----QGQGDTSQFDKYPE----EDINYGVQGEDPYADL------FRDF KAPC NEVIWEKLLARYIETPYEPPIQ----QGQGDTSQFDRYPE----EEFNYGIQGEDPYMDL------MKEF KAPB SEVVWERLLAKDIETPYEPPIT----SGIGDTSLFDQYPE----EQLDYGIQGDDPYAEY------FQDF KS6_HUMAN RHINWEELLARKVEPPFKPLLQ-----SEEDVSQFDSKFTR---QTPVDSP-DDSTLSESA-----NQVF KPC1 RNINFDDILNLRVKPPYIPEIK-----SPEDTSYFEQEFTS---APPTLTPLPSVLTTSQ------QEEF KRAC_BOVIN ASIVWQDVYEKKLSPPFKPQVT-----SETDTRYFDEEFTA---QMITITPPDQDDSMEGVDS-ERRPHF SCH9 ADIDWEALKQKKIPPPFKPHLV-----SETDTSNFDPEFTT---ASTSYMNKHQPMMTATPLSPAMQAKF KGP1_DROME LGFDWDGLASQLLIPPFVRPIA-----HPTDVRYFDRFPC------DLNEPPDELSGWDA--------DF ARK2_RAT KGIDWQYVYLRKYPPPLIPPRGEVNAADAFDIGSFDEEDTKG--IKLLDCDQDLYKNFPLMISERWQQEV DBFB AEINFETLRTS--SPPFIPQLD-----DETDAGYFDDFTNEEDMAKYADVFKRQNKLSAMVDDSAVDSKL DBF2 ADINFSTLRSM--IPPFTPQLD-----SETDAGYFDDFTSEADMAKYADVFKRQDKLTAMVDDSAVSSKL
parseclustal.pl