Another possibility is to first pre-process the keywords into
automaton-like structure (Google for Aho Corasick string matcher), and
then use it over the document. This would probably be helpful only if
the number of keywords is much smaller than the document itself..

On 9/25/07, daizisheng <[EMAIL PROTECTED]> wrote:
>
> Vishal 写道:
> > Hash table should give you O(1) insertion and search complexity; which
> > is what we need, right?
> > There is no constraint on space complexity, I believe.
> >
> > On 9/24/07, * daizisheng* <[EMAIL PROTECTED]
> > <mailto:[EMAIL PROTECTED]>> wrote:
> >
> >
> >     the problem is you need a hash table to maintain all the keywords,:)
> >     because they do not have to be a single characters, or you can
> >     store them in
> >     array, but then you need binary search,:)
> >
> >     Vishal 写道:
> >     > How about keeping two pointers - startp and endp.
> >     > Keep a count of frequencies of keywords between startp and endp,
> >     both
> >     > inclusive. We can use an array / hash table for this.
> >     > Now, the minimum length substring has to start with a keyword
> >     and has
> >     > to end with a keyword.
> >     >
> >     > 1. Initially startp=0 and endp=1. L = infinity
> >     > 2. Increment endp till you encounter a keyword or it exceeds the
> >     total
> >     > length. Update frequencies. Check if you have all the keywords.
> >     (This
> >     > can be done in O(1) using a bitmap or similar). If it exceeds the
> >     > total length, QUIT.
> >     > 3. If the str(startp,endp) contains all keywords and length < L,
> >     save
> >     > values of startp and endp.
> >     > 4. Now, increment startp, till you get a keyword. If the
> >     > str(startp,endp) still contains all keywords, update saved values of
> >     > startp and endp.
> >     > 5. Repeat step 4 till str(startp,endp) does NOT contain all
> >     keywords.
> >     > 6. Goto step 2.
> >     >
> >     > The final values of startp and endp should give you the minimum
> >     summary.
> >     > Since both values go from 0 to N-1, its O(N).
> >     >
> >     > ~Vishal
> >     >
> >     > On 9/24/07, *daizisheng* < [EMAIL PROTECTED]
> >     <mailto:[EMAIL PROTECTED]>
> >     > <mailto:[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>> wrote:
> >     >
> >     >
> >     >     I think hash method is ok, at lease in expectation way it's
> >     O(n)
> >     >     why not use it? it's very effeciently
> >     >
> >     >     I think there should be some worst case O(n) algorithm, but
> >     it will be
> >     >     more complex and not as effecient as the above one in practise
> >     >
> >     >
> >     >     Sticker 写道:
> >     >     > Declare: this question is originally from Google. The
> >     original form
> >     >     > is: given a document, how to find a shortest summary
> >     containing all
> >     >     > the key words. After translated, it will be: given a
> >     sequence,
> >     >     how to
> >     >     > find a shortest substring that contains all the items
> >     required.
> >     >     > Example: a sequence "abaccdefgacel" and a set of key words
> >     "a", "c",
> >     >     > "d" and "e". The shortest substring contianing all of the key
> >     >     words is
> >     >     > "accde". Find one of such shortest substrings. In the original
> >     >     > question, there is time complexity boundary O(N), where N
> >     is the
> >     >     > length of the sequence.
> >     >     >
> >     >     > To me, this problem is rather questionable. So far my solution
> >     >     > requires a hash table that gives no conflict and can locate a
> >     >     key word
> >     >     > in O(1) time. Otherwise the time complexity will
> >     definitely be
> >     >     related
> >     >     > to the number of key words.
> >     >     >
> >     >     > Does anyone have idea for a O(N) algorithm to solve this
> >     problem?
> >     >     >
> >     >     >
> >     >     > >
> >     >     >
> >     >     >
> >     >
> >     >
> >     >
> >     >
> >     >
> >     >
> >     > >
> >
> >
> >
> >
> >
> >
> > >
> yes, that's we need. but seems the starter of this thread who did not
> like hash,:)
>
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to algogeeks@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/algogeeks
-~----------~----~----~----~------~----~------~--~---

Reply via email to