Efficient mechanism for str.startswith on a set.
I need to iterate through a file, checking whether each line 'startswith()' a string that belongs to a set. Normally, the most efficient way I would do this would be: strs=set(['foo','bar']) for line in file: if line.strip() in strs: print line However, for this case I need to do a startswith like this: for line in file: for s in strs: if line.startswith(s): print line This is obviously the least efficient manner to do this as I'll always be iterating over the entire 'strs'. I know I could make a binary tree out of 'strs' but that's a little more work that don't have time to do today. I know there should be something out there if I just ask nicely enough. Thanks ahead of time. -Brian -- http://mail.python.org/mailman/listinfo/python-list
Re: Efficient mechanism for str.startswith on a set.
Brian Cole wrote: >I need to iterate through a file, checking whether each line > 'startswith()' a string that belongs to a set. > > Normally, the most efficient way I would do this would be: > strs=set(['foo','bar']) > for line in file: >if line.strip() in strs: >print line > > However, for this case I need to do a startswith like this: > for line in file: > for s in strs: > if line.startswith(s): > print line > > This is obviously the least efficient manner to do this as I'll always > be iterating over the entire 'strs'. I know I could make a binary tree > out of 'strs' but that's a little more work that don't have time to do > today. I know there should be something out there if I just ask nicely > enough. the RE approach used in this MultiReplace class can be quite efficient: http://effbot.org/zone/python-replace.htm e.g. strs.sort() # lexical order strs.reverse() # use longest match first pattern = string.join(map(re.escape, strs), "|") match = re.compile(pattern).match for line in file: if match(line): print line a less general solution would be to check line[:x] against a dictionary, where x is max(len(s) for s in strs), and use a fallback test if the test fails (which up- dates the dictionary). this only works well if x is relatively small, the files are relatively regular and not too large, and most lines match. -- http://mail.python.org/mailman/listinfo/python-list
Re: Efficient mechanism for str.startswith on a set.
Brian Cole <[EMAIL PROTECTED]> writes: > This is obviously the least efficient manner to do this as I'll always > be iterating over the entire 'strs'. I know I could make a binary tree > out of 'strs' but that's a little more work that don't have time to do > today. I know there should be something out there if I just ask nicely > enough. This may be overkill but it's the right approach in principle: http://osteele.com/software/python/fsa/ -- http://mail.python.org/mailman/listinfo/python-list