We might even use String Trie. The search time would be O(m) where m is the length of maximum length keyword. Since m<<N (normally), it would be as good as O(1). This would of course require preprocessing. Again, I am assuming no constraint on space.
On 9/25/07, Sticker <[EMAIL PROTECTED]> wrote: > > > To Vishal: My idea is similar to yours. I like to use hash table as > well. But I wonder which hash function can you use to insert and find > keywords with O(1) time? Keywords are not single characters. They are > normal words. That's basically what I am aftering. > > On Sep 25, 2:11 pm, Mayur <[EMAIL PROTECTED]> wrote: > > Another possibility is to first pre-process the keywords into > > automaton-like structure (Google for Aho Corasick string matcher), and > > then use it over the document. This would probably be helpful only if > > the number of keywords is much smaller than the document itself.. > > > > On 9/25/07, daizisheng <[EMAIL PROTECTED]> wrote: > > > > > > > > > Vishal 写道: > > > > Hash table should give you O(1) insertion and search complexity; > which > > > > is what we need, right? > > > > There is no constraint on space complexity, I believe. > > > > > > On 9/24/07, * daizisheng* <[EMAIL PROTECTED] > > > > <mailto:[EMAIL PROTECTED]>> wrote: > > > > > > the problem is you need a hash table to maintain all the > keywords,:) > > > > because they do not have to be a single characters, or you can > > > > store them in > > > > array, but then you need binary search,:) > > > > > > Vishal 写道: > > > > > How about keeping two pointers - startp and endp. > > > > > Keep a count of frequencies of keywords between startp and > endp, > > > > both > > > > > inclusive. We can use an array / hash table for this. > > > > > Now, the minimum length substring has to start with a keyword > > > > and has > > > > > to end with a keyword. > > > > > > > 1. Initially startp=0 and endp=1. L = infinity > > > > > 2. Increment endp till you encounter a keyword or it exceeds > the > > > > total > > > > > length. Update frequencies. Check if you have all the > keywords. > > > > (This > > > > > can be done in O(1) using a bitmap or similar). If it exceeds > the > > > > > total length, QUIT. > > > > > 3. If the str(startp,endp) contains all keywords and length < > L, > > > > save > > > > > values of startp and endp. > > > > > 4. Now, increment startp, till you get a keyword. If the > > > > > str(startp,endp) still contains all keywords, update saved > values of > > > > > startp and endp. > > > > > 5. Repeat step 4 till str(startp,endp) does NOT contain all > > > > keywords. > > > > > 6. Goto step 2. > > > > > > > The final values of startp and endp should give you the > minimum > > > > summary. > > > > > Since both values go from 0 to N-1, its O(N). > > > > > > > ~Vishal > > > > > > > On 9/24/07, *daizisheng* < [EMAIL PROTECTED] > > > > <mailto:[EMAIL PROTECTED]> > > > > > <mailto:[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>> > wrote: > > > > > > > I think hash method is ok, at lease in expectation way > it's > > > > O(n) > > > > > why not use it? it's very effeciently > > > > > > > I think there should be some worst case O(n) algorithm, > but > > > > it will be > > > > > more complex and not as effecient as the above one in > practise > > > > > > > Sticker 写道: > > > > > > Declare: this question is originally from Google. The > > > > original form > > > > > > is: given a document, how to find a shortest summary > > > > containing all > > > > > > the key words. After translated, it will be: given a > > > > sequence, > > > > > how to > > > > > > find a shortest substring that contains all the items > > > > required. > > > > > > Example: a sequence "abaccdefgacel" and a set of key > words > > > > "a", "c", > > > > > > "d" and "e". The shortest substring contianing all of > the key > > > > > words is > > > > > > "accde". Find one of such shortest substrings. In the > original > > > > > > question, there is time complexity boundary O(N), where > N > > > > is the > > > > > > length of the sequence. > > > > > > > > To me, this problem is rather questionable. So far my > solution > > > > > > requires a hash table that gives no conflict and can > locate a > > > > > key word > > > > > > in O(1) time. Otherwise the time complexity will > > > > definitely be > > > > > related > > > > > > to the number of key words. > > > > > > > > Does anyone have idea for a O(N) algorithm to solve this > > > > problem? > > > > > yes, that's we need. but seems the starter of this thread who did not > > > like hash,:) > > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Algorithm Geeks" group. To post to this group, send email to algogeeks@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/algogeeks -~----------~----~----~----~------~----~------~--~---