Efficient mechanism for str.startswith on a set.

2006-01-10 Thread Brian Cole
I need to iterate through a file, checking whether each line
'startswith()' a string that belongs to a set.

Normally, the most efficient way I would do this would be:
strs=set(['foo','bar'])
for line in file:
if line.strip() in strs:
print line

However, for this case I need to do a startswith like this:
for line in file:
 for s in strs:
 if line.startswith(s):
 print line

This is obviously the least efficient manner to do this as I'll always
be iterating over the entire 'strs'. I know I could make a binary tree
out of 'strs' but that's a little more work that don't have time to do
today. I know there should be something out there if I just ask nicely
enough.

Thanks ahead of time.

-Brian
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Efficient mechanism for str.startswith on a set.

2006-01-10 Thread Fredrik Lundh
Brian Cole wrote:

>I need to iterate through a file, checking whether each line
> 'startswith()' a string that belongs to a set.
>
> Normally, the most efficient way I would do this would be:
> strs=set(['foo','bar'])
> for line in file:
>if line.strip() in strs:
>print line
>
> However, for this case I need to do a startswith like this:
> for line in file:
> for s in strs:
> if line.startswith(s):
> print line
>
> This is obviously the least efficient manner to do this as I'll always
> be iterating over the entire 'strs'. I know I could make a binary tree
> out of 'strs' but that's a little more work that don't have time to do
> today. I know there should be something out there if I just ask nicely
> enough.

the RE approach used in this MultiReplace class can be quite efficient:

http://effbot.org/zone/python-replace.htm

e.g.

strs.sort() # lexical order
strs.reverse() # use longest match first
pattern = string.join(map(re.escape, strs), "|")
match = re.compile(pattern).match

for line in file:
if match(line):
print line

a less general solution would be to check line[:x] against a dictionary, where
x is max(len(s) for s in strs), and use a fallback test if the test fails 
(which up-
dates the dictionary).  this only works well if x is relatively small, the 
files are
relatively regular and not too large, and most lines match.

 



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Efficient mechanism for str.startswith on a set.

2006-01-10 Thread Paul Rubin
Brian Cole <[EMAIL PROTECTED]> writes:
> This is obviously the least efficient manner to do this as I'll always
> be iterating over the entire 'strs'. I know I could make a binary tree
> out of 'strs' but that's a little more work that don't have time to do
> today. I know there should be something out there if I just ask nicely
> enough.

This may be overkill but it's the right approach in principle:

http://osteele.com/software/python/fsa/
-- 
http://mail.python.org/mailman/listinfo/python-list