Re: Regexp Neg. set of chars HowTo?
Hi! Thanks for this! I'll use that! I found a solution my question in regexp way too: import re testtext = minion battalion nation dion sion wion alion m = re.compile([^t^l]ion) print m.findall(testtext) I search for all text that not lion and tion. dd Paul McGuire wrote: It looks like you are trying to de-hyphenate words that have been broken across line breaks. Well, this isn't a regexp solution, it uses pyparsing instead. But I've added a number of other test cases which may be problematic for an re. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: Regexp Neg. set of chars HowTo?
In [EMAIL PROTECTED], durumdara wrote: I found a solution my question in regexp way too: import re testtext = minion battalion nation dion sion wion alion m = re.compile([^t^l]ion) print m.findall(testtext) I search for all text that not lion and tion. And ^ion. The first ^ in that character group negates that group, the second is a literal ^, so I guess you meant [^tl]ion. Ciao, Marc 'BlackJack' Rintsch -- http://mail.python.org/mailman/listinfo/python-list
Re: Regexp Neg. set of chars HowTo?
On Dec 20, 7:40 am, durumdara [EMAIL PROTECTED] wrote: Hi! I want to replace some seqs. in a html. Let: a- b = ab but: xxx - b must be unchanged, because it is not word split. I want to search and replace with re, but I don't know how to neg. this set ['\ \n\t']. This time I use full set without these chars, but neg. is better and shorter. Ok, I can use [^\s], but I want to know, how to neg. set of chars. sNorm1= '([^[\ \t\n]]{1})\-\br\ \/\\n' - this is not working. Thanks for the help: dd sNorm1= '([%s]{1})\-\br\ \/\\n' c = range(0, 256) c.remove(32) c.remove(13) c.remove(10) c.remove(9) s = [\\%s % (hex(v).replace('00x', '')) for v in c] sNorm1 = sNorm1 % (.join(s)) print sNorm1 def Normalize(Text): rx = re.compile(sNorm1) def replacer(match): return match.group(1) return rx.sub(replacer, Text) print Normalize('a -br /\nb') print Normalize('a-br /\nb') sys.exit() It looks like you are trying to de-hyphenate words that have been broken across line breaks. Well, this isn't a regexp solution, it uses pyparsing instead. But I've added a number of other test cases which may be problematic for an re. -- Paul from pyparsing import makeHTMLTags,Literal,Word,alphas,Suppress brTag,brEndTag = makeHTMLTags(br) hyphen = Literal(-) hyphen.leaveWhitespace() # don't skip whitespace before matching this collapse = Word(alphas) + Suppress(hyphen) + Suppress(brTag) \ + Word(alphas) # define action to replace expression with the word before hyphen # concatenated with the word after the BR tag collapse.setParseAction(lambda toks: toks[0]+toks[1]) print collapse.transformString('a -br /\nb') print collapse.transformString('a-br /\nb') print collapse.transformString('a-br/\nb') print collapse.transformString('a-br\nb') print collapse.transformString('a- BR clear=all\nb') -- http://mail.python.org/mailman/listinfo/python-list
Regexp Neg. set of chars HowTo?
Hi! I want to replace some seqs. in a html. Let: a- b = ab but: xxx - b must be unchanged, because it is not word split. I want to search and replace with re, but I don't know how to neg. this set ['\ \n\t']. This time I use full set without these chars, but neg. is better and shorter. Ok, I can use [^\s], but I want to know, how to neg. set of chars. sNorm1= '([^[\ \t\n]]{1})\-\br\ \/\\n' - this is not working. Thanks for the help: dd sNorm1= '([%s]{1})\-\br\ \/\\n' c = range(0, 256) c.remove(32) c.remove(13) c.remove(10) c.remove(9) s = [\\%s % (hex(v).replace('00x', '')) for v in c] sNorm1 = sNorm1 % (.join(s)) print sNorm1 def Normalize(Text): rx = re.compile(sNorm1) def replacer(match): return match.group(1) return rx.sub(replacer, Text) print Normalize('a -br /\nb') print Normalize('a-br /\nb') sys.exit() -- http://mail.python.org/mailman/listinfo/python-list