[algogeeks] Re: DNA sequence

Gene Sun, 14 May 2006 07:33:16 -0700

metalik wrote:
> I am interested in one of the problems that I encountered in one of the
> server for programming contest but I could not solve it for a long
> time.
>
> Given small number (<=10) of DNA disease segments (of characters
> A,G,T,C) of small length <=10, ask how many DNA sequences of a given
> length (<=20000000000) do not contain those sequences. The answer
> should be the remainder of division by 100000. For details, please look
> at the link:  http://acm.pku.edu.cn/JudgeOnline/problem?id=2778.
>
> The limit is 1s.


This is a nice problem.  The fact that length can be big is a strong
hint that some kind of divide-and-conquer approach is needed.

So here's one idea:  Define F(n, P) to be the number (mod 100000) of
strings of length n that do not contain any of the set of sequences P.


Then when n is "sufficiently large", we have

F(n, P) = F(floor(n/2),  P) * F(ceiling(n/2), P) - k

where k is the number of unique ways that some pattern in P can occur
accross the "seam" between two strings that contain no pattern in P.
This is an interesting problem in its own right.

The bases case will be (I think)
  * 4^n when when n is less than the length of the shortest pattern
  * 4^n - |S| when n is equal to the length of the shortest pattern,
where |S| is the number of unique shorest patterns.


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to algogeeks@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/algogeeks
-~----------~----~----~----~------~----~------~--~---

[algogeeks] Re: DNA sequence

Reply via email to