[algogeeks] Re: Google Interview Question: find shortest summary containing all key words

Andrey Sun, 30 Sep 2007 23:08:28 -0700

On 1 окт, 06:20, Sticker <[EMAIL PROTECTED]> wrote:
> En, it is the idea. You assume each keyword is a single character and
> you use a map to hash key words and their counts. Each time you extend
> the range to right hand side, you may increase the counts of some
> found key words and each time you shrink the range from the left hand
> side, you decrease the counts of some out-of-range key words.
>
Exactly! pretty simple algorithm.


> For the case that keywords are not single characters, you need a
> complex hash. Even if keywords are single characters, you need
> lg(number of keywords) to locate the position of each keyword in the
> map,which is implemented in the STL map container.


Look at
> Example: a sequence "abaccdefgacel" and a set of key words "a", "c",
> "d" and "e". The shortest substring contianing all of the key words is
> "accde".

each word is a single character! more over It makes no difference we
can numerate all the words in the world :) and rewrite the program
using numbers instead of characters.


> May I draw the final conclusion that the final time complexity "must"
> have to involve the total length of keywords and the length of the
> input text, rather than linear to the input text's length? (Of course
> you can say that in usual case the length of keywords << length of
> input text. But this assumption should not remove the length of
> keywords in the time complexity analysis thus should not sufficient to
> make the algorithm works linear to the length of text.) Am I right?
>

Let's write new algorithm based on my previous with integers instead
words.

Ruby like code :)

  @words = {}                        # Hash {word => sequence number }

1.
  for word in inputSequence
     words[word] = words.size unless words[word]   #  If word not in
hash put it there
 
#  and assign sequence number.
  end
2.
  for word in keyWords
     words[word] = words.size unless words[word]
  end

# This procedure takes N log(N) + M * log(N + M)
#       N - size of input sequence
#       M - number of keywords

3. Run previous algorithm on integers!

The total time will be
N log(N) + M * log (N + M) + N + M


So if we don't have all word already enumerated you are right.

> On Oct 1, 4:10 am, Andrey <[EMAIL PROTECTED]> wrote:
>
> > Check this program:
>
> > #include <map>
> > #include <string>
> > #include <iostream>
> > #include <algorithm>
> > #include <iterator>
>
> > using namespace std;
>
> > class Range : public pair<const char*, const char*> {
> >   public:
> >         size_t size() const { return second - first; }
> >         Range(const char* f=0, const char * s=0)
> >                 : pair<const char*, const char*>(f ,s) {}
>
> > };
>
> > ostream & operator << (ostream & out, const Range& r) {
> >         out<<"range size: "<<r.size()<<" [";
> >         copy(r.first, r.second, ostream_iterator<char>(out, ""));
> >         out<<"]";
> >         return out;
>
> > }
>
> > typedef map<char, int> SetCnt;
>
> > // array based implementation will have complexity O(m)
> > bool checkSet(const SetCnt & setCnt) {
> >         SetCnt::const_iterator sIt = setCnt.begin(), sEnd = setCnt.end();
> >         for (SetCnt::const_iterator sIt = setCnt.begin(); sIt != sEnd; +
> > +sIt ) {
> >                 if (sIt->second == 0) {
> >                         return false;
> >                 }
> >         }
> >         return true;
>
> > }
>
> > // array based implementation will have complexity O(1)
> > bool checkAndShrink(SetCnt & setCnt, char c) {
> >         SetCnt::iterator sIt = setCnt.find(c);
> >         if (sIt == setCnt.end()) {
> >                 return true;
> >         }
> >         if (sIt->second > 1) {
> >                 sIt->second--;
> >                 return true;
> >         }
> >         return false;
>
> > }
>
> > // aproximate complexity is O(n*m + m)
> > //   where n - is size of input sequence and m - is number of key
> > words.
> > Range solve(const char * strBegin, const char * strEnd,
> >                         const char * setBegin, const char * setEnd) {
> >         SetCnt setCnt;
>
> >         // complexity O(m)
> >         for (const char * cIt = setBegin; cIt != setEnd; cIt++ ) {
> >                 setCnt[*cIt] = 0;
> >         }
>
> >         bool solved = false;
> >         Range range(strBegin, strEnd), result(range);
> >         for (const char * sIt = strBegin; sIt != strEnd; sIt++ ) {
> >                 SetCnt::iterator it = setCnt.find(*sIt);
> >                 if (it != setCnt.end()) {
> >                         setCnt[*sIt] += 1;
> >                         range.second = sIt + 1;
> >                         if (checkSet(setCnt)) {
> >                                 solved = true;
> >                                 while (checkAndShrink(setCnt, 
> > *range.first)) {
> >                                         range.first++;
> >                                 }
> >                                 if (result.size() > range.size()) {
> >                                         result = range;
> >                                 }
> >                                 // Debug
> >                                 cout<<range<<endl;
> >                         }
> >                 }
> >         }
> >         return solved ? result : Range(0, 0);
>
> > }
>
> > int main() {
> >         const char * string = "abrebbqbcdfagbasdfbbaggqqebbcg--324c";
> >         const char * set = "abc";
>
> >         Range range = solve(string, string + strlen(string), set, set +
> > strlen(set));
> >         if (range.size()) {
> >                 cout<<"The shortes subsequence is: "<<range<<endl;
> >         } else {
> >                 cout<<"No solution..."<<endl;
> >         }
> >         return 0;
>
>  > }


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to algogeeks@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/algogeeks
-~----------~----~----~----~------~----~------~--~---
[algogeeks] Re: Google Interview Question: find shortest summary containing all key words

Reply via email to