On Jan 18, 11:04 pm, tom <badoug...@gmail.com> wrote: > hi... > > trying to figure out how to solve what should be an easy python/regex/ > wildcard/replace issue. > > i've tried a number of different approaches.. so i must be missing > something... > > my initial sample text are: > > Soo Choi</span>LONGEDITBOX">Apryl Berney > Soo Choi</span>LONGEDITBOX">Joel Franks > Joel Franks</span>GEDITBOX">Alexander Yamato > > and i'm trying to get > > Soo Choi foo Apryl Berney > Soo Choi foo Joel Franks > Joel Franks foo Alexander Yamato > > the issue i'm facing.. is how to start at "</" and end at '">' and > substitute inclusive of the stuff inside the regex... > > i've tried derivations of > > name=re.sub("</s[^>]*\">"," foo ",name) > > but i'm missing something... > > thoughts... thanks > > tom
The problem here is that </s matches itself correctly. However, [^>]* consumes anything that's not > and then stops when it hits something that is >. So, [^>]* consumes "pan" in each case, then tries to match \">, but fails since there isn't a ", so the match ends. It never makes it to the second >. I agree with Chris Rebert, regexes are dangerous because the number of possible cases where you can match isn't always clear (see the above explanation :). Also, if the number of comparisons you have to do isn't high, they can be inefficient. However, for your limited set of examples the following should work: aList = ['Soo Choi</span>LONGEDITBOX">Apryl Berney', 'Soo Choi</span>LONGEDITBOX">Joel Franks', 'Joel Franks</span>GEDITBOX">Alexander Yamato'] matcher = re.compile(r"<[\w\W]*>") newList = [] for x in aList: newList.append(matcher.sub(" foo ", x)) print newList David -- http://mail.python.org/mailman/listinfo/python-list