Travers Naran wrote: > Here's the basic idea. I have a dictionary of substrings (the substrings > stored as keys). I have a list of strings. I want to find out, for each > word in the dictionary, how many times the substring occurs non-overlapping > occurrences there are in the list of strings. This is the best I could > come up with: > > for j in self.dict.keys(): > c = 0 > for i in tc.strings: > c += i.count(j) > self.dict[j]['count'] += c > > I tried Boyer-Moore-Horspool as someone else suggested in the group, but it > was even slower (probably because of the time spent setting up the > pattern). Even when I split the algorithm in two, doing the set-up before > the inner loop and only testing the pattern in the inner loop, it was still > slow. > > So, am I doing this the optimal way or am I overlooking something _really_ > simple to do? > > Statistics on the data: > The dictionary has 3000+ entries in it. > There are several thousand strings. > The strings average 10-26 characters in length.
Find a character that doesn't exist in any of self.dict.keys(). '\x00' will probably do. whole = '\x00'.join(tc.strings) for key in self.dict.keys(): self.dict[key]['count'] = whole.count(key) -- Robert Kern [EMAIL PROTECTED] "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter -- http://mail.python.org/mailman/listinfo/python-list