On Thu, 15 Dec 2005 20:33:42 +0000, Simon Brunning <[EMAIL PROTECTED]> wrote:
>On 15 Dec 2005 12:26:07 -0800, Mystilleef <[EMAIL PROTECTED]> wrote: >> I want a pattern that scans the entire string but avoids >> returning duplicate matches. For example "cat", "cate", >> "cater" may all well be valid matches, but I don't want >> duplicate matches of any of them. I know I can filter the >> list containing found matches myself, but that is somewhat >> expensive for a list containing thousands of matches. > >Probably the cheapest way of de-duping the list would be to dump it >straight into a set, provided that you aren't concerned about the >order. > Or if concerned, maybe try a combination like: >>> s = """\ ... I want a pattern that scans the entire string but avoids ... returning duplicate matches. For example "cat", "cate", ... "cater" may all well be valid matches, but I don't want ... duplicate matches of any of them. I know I can filter the ... list containing found matches myself, but that is somewhat ... expensive for a list containing thousands of matches. ... """ >>> import re >>> rxo = re.compile(r'cat(?:er|e)?') >>> rxo.findall(s) ['cate', 'cat', 'cate', 'cater', 'cate'] >>> seen = set() >>> [w for w in (m.group(0) for m in rxo.finditer(s)) if w not in seen and not >>> seen.add(w)] ['cate', 'cat', 'cater'] BTW, note to put longer ambiguous match first in re, e.g., not r'cat(?:e|er)?') for above. Regards, Bengt Richter -- http://mail.python.org/mailman/listinfo/python-list