We might even use String Trie. The search time would be O(m) where m is the
length of maximum length keyword. Since m<<N (normally), it would be as good
as O(1).
This would of course require preprocessing. Again, I am assuming no
constraint on space.

On 9/25/07, Sticker <[EMAIL PROTECTED]> wrote:
>
>
> To Vishal: My idea is similar to yours. I like to use hash table as
> well. But I wonder which hash function can you use to insert and find
> keywords with O(1) time? Keywords are not single characters. They are
> normal words. That's basically what I am aftering.
>
> On Sep 25, 2:11 pm, Mayur <[EMAIL PROTECTED]> wrote:
> > Another possibility is to first pre-process the keywords into
> > automaton-like structure (Google for Aho Corasick string matcher), and
> > then use it over the document. This would probably be helpful only if
> > the number of keywords is much smaller than the document itself..
> >
> > On 9/25/07, daizisheng <[EMAIL PROTECTED]> wrote:
> >
> >
> >
> > > Vishal 写道:
> > > > Hash table should give you O(1) insertion and search complexity;
> which
> > > > is what we need, right?
> > > > There is no constraint on space complexity, I believe.
> >
> > > > On 9/24/07, * daizisheng* <[EMAIL PROTECTED]
> > > > <mailto:[EMAIL PROTECTED]>> wrote:
> >
> > > >     the problem is you need a hash table to maintain all the
> keywords,:)
> > > >     because they do not have to be a single characters, or you can
> > > >     store them in
> > > >     array, but then you need binary search,:)
> >
> > > >     Vishal 写道:
> > > >     > How about keeping two pointers - startp and endp.
> > > >     > Keep a count of frequencies of keywords between startp and
> endp,
> > > >     both
> > > >     > inclusive. We can use an array / hash table for this.
> > > >     > Now, the minimum length substring has to start with a keyword
> > > >     and has
> > > >     > to end with a keyword.
> >
> > > >     > 1. Initially startp=0 and endp=1. L = infinity
> > > >     > 2. Increment endp till you encounter a keyword or it exceeds
> the
> > > >     total
> > > >     > length. Update frequencies. Check if you have all the
> keywords.
> > > >     (This
> > > >     > can be done in O(1) using a bitmap or similar). If it exceeds
> the
> > > >     > total length, QUIT.
> > > >     > 3. If the str(startp,endp) contains all keywords and length <
> L,
> > > >     save
> > > >     > values of startp and endp.
> > > >     > 4. Now, increment startp, till you get a keyword. If the
> > > >     > str(startp,endp) still contains all keywords, update saved
> values of
> > > >     > startp and endp.
> > > >     > 5. Repeat step 4 till str(startp,endp) does NOT contain all
> > > >     keywords.
> > > >     > 6. Goto step 2.
> >
> > > >     > The final values of startp and endp should give you the
> minimum
> > > >     summary.
> > > >     > Since both values go from 0 to N-1, its O(N).
> >
> > > >     > ~Vishal
> >
> > > >     > On 9/24/07, *daizisheng* < [EMAIL PROTECTED]
> > > >     <mailto:[EMAIL PROTECTED]>
> > > >     > <mailto:[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>>
> wrote:
> >
> > > >     >     I think hash method is ok, at lease in expectation way
> it's
> > > >     O(n)
> > > >     >     why not use it? it's very effeciently
> >
> > > >     >     I think there should be some worst case O(n) algorithm,
> but
> > > >     it will be
> > > >     >     more complex and not as effecient as the above one in
> practise
> >
> > > >     >     Sticker 写道:
> > > >     >     > Declare: this question is originally from Google. The
> > > >     original form
> > > >     >     > is: given a document, how to find a shortest summary
> > > >     containing all
> > > >     >     > the key words. After translated, it will be: given a
> > > >     sequence,
> > > >     >     how to
> > > >     >     > find a shortest substring that contains all the items
> > > >     required.
> > > >     >     > Example: a sequence "abaccdefgacel" and a set of key
> words
> > > >     "a", "c",
> > > >     >     > "d" and "e". The shortest substring contianing all of
> the key
> > > >     >     words is
> > > >     >     > "accde". Find one of such shortest substrings. In the
> original
> > > >     >     > question, there is time complexity boundary O(N), where
> N
> > > >     is the
> > > >     >     > length of the sequence.
> >
> > > >     >     > To me, this problem is rather questionable. So far my
> solution
> > > >     >     > requires a hash table that gives no conflict and can
> locate a
> > > >     >     key word
> > > >     >     > in O(1) time. Otherwise the time complexity will
> > > >     definitely be
> > > >     >     related
> > > >     >     > to the number of key words.
> >
> > > >     >     > Does anyone have idea for a O(N) algorithm to solve this
> > > >     problem?
> >
> > > yes, that's we need. but seems the starter of this thread who did not
> > > like hash,:)
>
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to algogeeks@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/algogeeks
-~----------~----~----~----~------~----~------~--~---

Reply via email to