<[EMAIL PROTECTED]> escribió en el mensaje news:[EMAIL PROTECTED] > hi > suppose i have a string like > > test1?test2t-test3*test4*test5$test6#test7*test8 > > how can i construct the regexp to get test3*test4*test5 and > test7*test8, ie, i want to match * and the words before and after? > thanks
I suppose this is just an example and you mean "any word" instead of test1, test2, etc. So your pattern would be: word*word*word*word, that is, word* repeated many times, followed by another word. To match a word we'll use "\w+", to match an * we have to use "\*" (it's a special character) So the regexp would be: "(\w+\*)+\w+" Since we are not interested in the () as a group by itself -it was just to describe the repeating pattern- we change it into a non-grouping parenthesis. Final version: "(?:\w+\*)+\w+" import re rexp = re.compile(r"(?:\w+\*)+\w+") lines = [ 'test1?test2t-test3*test4*test5$test6#test7*test8', 'test1?test2t-test3*test4$test6#test7_test8', 'test1?nada-que-ver$esto.no.matchea', 'test1?test2t-test3*test4*', 'test1?test2t-test3*test4', 'test1?test2t-test3*', ] for line in lines: print line for txt in rexp.findall(line): print '->', txt Test it with some corner cases and see if it does what you expect: no "*", starting with "*", ending with "*", embedded whitespace before and after the "*", whitespace inside a word, the very definition of "word"... -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list