Hello, I used tfextract on the Transfac 6.4 <site.dat> file to be able to use this on tfscan, but it does not parse the file properly. Part of the problem that I saw with the Transfac site.dat 6.4 file were:
1 - many entries had more that 1 motif sequences (the SQ line); these subsequently weren't included in the parsed output AC R00018 XX ID MOUSE$ACRD_01 XX DT 20.06.1990 (created); ewi. DT 24.08.1995 (updated); hiwi. CO Copyright (C), Biobase GmbH. XX TY D XX DE AChR delta (acetylcholine receptor, delta-subunit); Gene: G000457. XX SQ TGCCTGG. SQ TGCCCTTG. SQ TGCCCTAA. SQ TGGCAAAC. XX SF -148 . . . 2 - Some motif sequences were broken up to 2 lines, for example.. AC R00709 XX ID HA$HMGCR_02 XX DT 20.06.1990 (created); ewi. DT 06.09.1995 (updated); ewi. CO Copyright (C), Biobase GmbH. XX TY D XX DE HMGCOAR (HMG-CoA reductase); Gene: G000157. XX SQ TGCTGGAACTCGACCAGCTATTGGTTGGCTCGGCCGTGGTGAGAGATGGTGCGGTGCCCG SQ TTCTCC. Thanks in advance for fixing tfextract Ramil --------------------------------- Ramil P. Mauleon Bioinformatics Specialist International Rice Research Institute DAPO Box 7777, Metro Manila, Philippines email: r.maul...@cgiar.org <mailto:r.maul...@cgiar.org> phone: 632-580-5600 ext 2508 ; fax: 632-580-5699 --------------------------------- _______________________________________________ EMBOSS mailing list EMBOSS@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss