Re: how to remove the same words in the paragraph

Tim Chase Mon, 09 Nov 2009 04:16:12 -0800

I think simple regex may come handy,

  p=re.compile(r'(.+) .*\1')    #note the space
  s=p.search("python and i love python")
  s.groups()
  (' python',)


But that matches for only one double word.Someone else could light up here
to extract all the double words.Then they can be removed from the original
paragraph.


This has multiple problems:

>>> p = re.compile(r'(.+) .*\1')
>>> s = p.search("python one two one two python")
>>> s.groups()
('python',)
>>> s = p.search("python one two one two python one")
>>> s.groups() # guess what happened to the 2nd "one"...
('python one',)

and even once you have the list of theoretical duplicates (bychanging the regexp to r'\b(\w+)\b.*?\1' perhaps), you still haveto worry about emitting the first instance but not subsequentinstances.


-tkc




--
http://mail.python.org/mailman/listinfo/python-list

Re: how to remove the same words in the paragraph

Reply via email to