How about keeping two pointers - startp and endp.
Keep a count of frequencies of keywords between startp and endp, both
inclusive. We can use an array / hash table for this.
Now, the minimum length substring has to start with a keyword and has to end
with a keyword.

1. Initially startp=0 and endp=1. L = infinity
2. Increment endp till you encounter a keyword or it exceeds the total
length. Update frequencies. Check if you have all the keywords. (This can be
done in O(1) using a bitmap or similar). If it exceeds the total length,
QUIT.
3. If the str(startp,endp) contains all keywords and length < L, save values
of startp and endp.
4. Now, increment startp, till you get a keyword. If the str(startp,endp)
still contains all keywords, update saved values of startp and endp.
5. Repeat step 4 till str(startp,endp) does NOT contain all keywords.
6. Goto step 2.

The final values of startp and endp should give you the minimum summary.
Since both values go from 0 to N-1, its O(N).

~Vishal

On 9/24/07, daizisheng <[EMAIL PROTECTED]> wrote:
>
>
> I think hash method is ok, at lease in expectation way it's O(n)
> why not use it? it's very effeciently
>
> I think there should be some worst case O(n) algorithm, but it will be
> more complex and not as effecient as the above one in practise
>
>
> Sticker 写道:
> > Declare: this question is originally from Google. The original form
> > is: given a document, how to find a shortest summary containing all
> > the key words. After translated, it will be: given a sequence, how to
> > find a shortest substring that contains all the items required.
> > Example: a sequence "abaccdefgacel" and a set of key words "a", "c",
> > "d" and "e". The shortest substring contianing all of the key words is
> > "accde". Find one of such shortest substrings. In the original
> > question, there is time complexity boundary O(N), where N is the
> > length of the sequence.
> >
> > To me, this problem is rather questionable. So far my solution
> > requires a hash table that gives no conflict and can locate a key word
> > in O(1) time. Otherwise the time complexity will definitely be related
> > to the number of key words.
> >
> > Does anyone have idea for a O(N) algorithm to solve this problem?
> >
> >
> > >
> >
> >
>
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to algogeeks@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/algogeeks
-~----------~----~----~----~------~----~------~--~---

Reply via email to