On the BioPerl mailing list we often get requests like the following:
Within a given biosequence with length X, find substrings of min. length A and max. length B that contain the pattern P at least C times but no more than D times.
A more concrete example: Find all substrings 12 characters long (A = B = 12) that have at least 7 (C = 7, D = 12 implictly) 'I' or 'L' characters (P = [IL]) in it.
The naive approach is a "sliding window" method, but it seems to me that a pattern matching approach would be more efficient. And it sounds like a great little challenge for the brilliant minds of FWP. The "best" version will find it's way into a BioPerl module (with appropriate attribution, of course). Golfing is not the goal here (but Golf-ed solutions are still welcome, if you must).
Enjoy,
-Aaron
- Re: pattern finding problem Aaron J. Mackey
- Re: pattern finding problem Ronald J Kimball
- Re: pattern finding problem Ronald J Kimball
- Re: pattern finding problem John Douglas Porter
- Re: pattern finding problem Aaron J. Mackey
- Re: pattern finding problem Aaron J. Mackey
- Re: pattern finding problem Quantum Mechanic