On the BioPerl mailing list we often get requests like the following:


Within a given biosequence with length X, find substrings of min. length A and max. length B that contain the pattern P at least C times but no more than D times.

A more concrete example: Find all substrings 12 characters long (A = B = 12) that have at least 7 (C = 7, D = 12 implictly) 'I' or 'L' characters (P = [IL]) in it.

The naive approach is a "sliding window" method, but it seems to me that a pattern matching approach would be more efficient. And it sounds like a great little challenge for the brilliant minds of FWP. The "best" version will find it's way into a BioPerl module (with appropriate attribution, of course). Golfing is not the goal here (but Golf-ed solutions are still welcome, if you must).

Enjoy,

-Aaron

Reply via email to