the problem is you need a hash table to maintain all the keywords,:) because they do not have to be a single characters, or you can store them in array, but then you need binary search,:)
Vishal 写道: > How about keeping two pointers - startp and endp. > Keep a count of frequencies of keywords between startp and endp, both > inclusive. We can use an array / hash table for this. > Now, the minimum length substring has to start with a keyword and has > to end with a keyword. > > 1. Initially startp=0 and endp=1. L = infinity > 2. Increment endp till you encounter a keyword or it exceeds the total > length. Update frequencies. Check if you have all the keywords. (This > can be done in O(1) using a bitmap or similar). If it exceeds the > total length, QUIT. > 3. If the str(startp,endp) contains all keywords and length < L, save > values of startp and endp. > 4. Now, increment startp, till you get a keyword. If the > str(startp,endp) still contains all keywords, update saved values of > startp and endp. > 5. Repeat step 4 till str(startp,endp) does NOT contain all keywords. > 6. Goto step 2. > > The final values of startp and endp should give you the minimum summary. > Since both values go from 0 to N-1, its O(N). > > ~Vishal > > On 9/24/07, *daizisheng* <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> wrote: > > > I think hash method is ok, at lease in expectation way it's O(n) > why not use it? it's very effeciently > > I think there should be some worst case O(n) algorithm, but it will be > more complex and not as effecient as the above one in practise > > > Sticker 写道: > > Declare: this question is originally from Google. The original form > > is: given a document, how to find a shortest summary containing all > > the key words. After translated, it will be: given a sequence, > how to > > find a shortest substring that contains all the items required. > > Example: a sequence "abaccdefgacel" and a set of key words "a", "c", > > "d" and "e". The shortest substring contianing all of the key > words is > > "accde". Find one of such shortest substrings. In the original > > question, there is time complexity boundary O(N), where N is the > > length of the sequence. > > > > To me, this problem is rather questionable. So far my solution > > requires a hash table that gives no conflict and can locate a > key word > > in O(1) time. Otherwise the time complexity will definitely be > related > > to the number of key words. > > > > Does anyone have idea for a O(N) algorithm to solve this problem? > > > > > > > > > > > > > > > > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Algorithm Geeks" group. To post to this group, send email to algogeeks@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/algogeeks -~----------~----~----~----~------~----~------~--~---