Re: aligning a set of word substrings to sentence

Fredrik Lundh Thu, 01 Dec 2005 11:34:25 -0800

Steven Bethard wrote:

> I feel like there should be a simpler solution (maybe with the re
> module?) but I can't figure one out.  Any suggestions?


using the finditer pattern I just posted in another thread:

tokens = ['She', "'s", 'gon', 'na', 'write', 'a', 'book', '?']
text = '''\
She's gonna write
a book?'''

import re

tokens.sort() # lexical order
tokens.reverse() # look for longest match first
pattern = "|".join(map(re.escape, tokens))
pattern = re.compile(pattern)

I get

print [m.span() for m in pattern.finditer(text)]
[(0, 3), (3, 5), (6, 9), (9, 11), (12, 17), (18, 19), (20, 24), (24, 25)]

which seems to match your version pretty well.

hope this helps!

</F> 



-- 
http://mail.python.org/mailman/listinfo/python-list

Re: aligning a set of word substrings to sentence

Reply via email to