Greetings EMBOSS users! I have ~ 18000 files, each with clustal formatted protein alignments derived from Pfam-A.full. Some of these files are large > 500MB in size, the largest alignment is 3GB!
I need to calculate the following alignment statistics A. average aligned length B. std. dev. of aligned length C. average of pairwise sequence ID % D. std. dev. of pairwise sequence ID % Here are my 2 problems that I seek help with: 1. I can calculate A and C using alistat that comes with UBUNTU, but not B or D. 2. For the really large alignments, there is no option due to RAM requirements, and so I've used alistat's -f (fast) option, which estimates average %id by "sampling" If EMBOSS has tools / tricks to report A - D, while having reasonable RAM and disk-usage footprints, and quick processing times, please let me know. I am open to suggestions regarding other tools as well. I look forward to your replies. Thanks, in advance. Sincerely, Anand
_______________________________________________ EMBOSS mailing list [email protected] https://mailman.open-bio.org/mailman/listinfo/emboss
