Re: Regexp Neg. set of chars HowTo?
In <[EMAIL PROTECTED]>, durumdara wrote: > I found a solution my question in regexp way too: > import re > testtext = " minion battalion nation dion sion wion alion" > m = re.compile("[^t^l]ion") > print m.findall(testtext) > > I search for all text that not lion and tion. And ^ion. The first ^ in that character group "negates" that group, the second is a literal ^, so I guess you meant "[^tl]ion". Ciao, Marc 'BlackJack' Rintsch -- http://mail.python.org/mailman/listinfo/python-list
Re: Regexp Neg. set of chars HowTo?
Hi! Thanks for this! I'll use that! I found a solution my question in regexp way too: import re testtext = " minion battalion nation dion sion wion alion" m = re.compile("[^t^l]ion") print m.findall(testtext) I search for all text that not lion and tion. dd Paul McGuire wrote: > It looks like you are trying to de-hyphenate words that have been > broken across line breaks. > > Well, this isn't a regexp solution, it uses pyparsing instead. But > I've added a number of other test cases which may be problematic for an > re. > > -- Paul > -- http://mail.python.org/mailman/listinfo/python-list
Re: Regexp Neg. set of chars HowTo?
On Dec 20, 7:40 am, durumdara <[EMAIL PROTECTED]> wrote: > Hi! > > I want to replace some seqs. in a html. > Let: > a- > b > = ab > > but: > xxx - > b > must be unchanged, because it is not word split. > > I want to search and replace with re, but I don't know how to neg. this > set ['\ \n\t']. > > This time I use full set without these chars, but neg. is better and > shorter. > > Ok, I can use [^\s], but I want to know, how to neg. set of chars. > sNorm1= '([^[\ \t\n]]{1})\-\\n' - this is not working. > > Thanks for the help: > dd > > sNorm1= '([%s]{1})\-\\n' > c = range(0, 256) > c.remove(32) > c.remove(13) > c.remove(10) > c.remove(9) > s = ["\\%s" % (hex(v).replace('00x', '')) for v in c] > sNorm1 = sNorm1 % ("".join(s)) > print sNorm1 > > def Normalize(Text): > > rx = re.compile(sNorm1) > def replacer(match): > return match.group(1) > return rx.sub(replacer, Text) > > print Normalize('a -\nb') > print Normalize('a-\nb') > sys.exit() It looks like you are trying to de-hyphenate words that have been broken across line breaks. Well, this isn't a regexp solution, it uses pyparsing instead. But I've added a number of other test cases which may be problematic for an re. -- Paul from pyparsing import makeHTMLTags,Literal,Word,alphas,Suppress brTag,brEndTag = makeHTMLTags("br") hyphen = Literal("-") hyphen.leaveWhitespace() # don't skip whitespace before matching this collapse = Word(alphas) + Suppress(hyphen) + Suppress(brTag) \ + Word(alphas) # define action to replace expression with the word before hyphen # concatenated with the word after the tag collapse.setParseAction(lambda toks: toks[0]+toks[1]) print collapse.transformString('a -\nb') print collapse.transformString('a-\nb') print collapse.transformString('a-\nb') print collapse.transformString('a-\nb') print collapse.transformString('a- \nb') -- http://mail.python.org/mailman/listinfo/python-list
Regexp Neg. set of chars HowTo?
Hi! I want to replace some seqs. in a html. Let: a- b = ab but: xxx - b must be unchanged, because it is not word split. I want to search and replace with re, but I don't know how to neg. this set ['\ \n\t']. This time I use full set without these chars, but neg. is better and shorter. Ok, I can use [^\s], but I want to know, how to neg. set of chars. sNorm1= '([^[\ \t\n]]{1})\-\\n' - this is not working. Thanks for the help: dd sNorm1= '([%s]{1})\-\\n' c = range(0, 256) c.remove(32) c.remove(13) c.remove(10) c.remove(9) s = ["\\%s" % (hex(v).replace('00x', '')) for v in c] sNorm1 = sNorm1 % ("".join(s)) print sNorm1 def Normalize(Text): rx = re.compile(sNorm1) def replacer(match): return match.group(1) return rx.sub(replacer, Text) print Normalize('a -\nb') print Normalize('a-\nb') sys.exit() -- http://mail.python.org/mailman/listinfo/python-list