Re: Regexp Neg. set of chars HowTo?

2006-12-22 Thread durumdara
Hi!

Thanks for this! I'll use that!

I found a solution my question in regexp way too:
import re
testtext =  minion battalion nation dion sion wion alion
m = re.compile([^t^l]ion)
print m.findall(testtext)

I search for all text that not lion and tion.

dd

Paul McGuire wrote:
 It looks like you are trying to de-hyphenate words that have been
 broken across line breaks.

 Well, this isn't a regexp solution, it uses pyparsing instead.  But
 I've added a number of other test cases which may be problematic for an
 re.

 -- Paul
   

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regexp Neg. set of chars HowTo?

2006-12-22 Thread Marc 'BlackJack' Rintsch
In [EMAIL PROTECTED], durumdara
wrote:

 I found a solution my question in regexp way too:
 import re
 testtext =  minion battalion nation dion sion wion alion
 m = re.compile([^t^l]ion)
 print m.findall(testtext)
 
 I search for all text that not lion and tion.

And ^ion.  The first ^ in that character group negates that group, the
second is a literal ^, so I guess you meant [^tl]ion.

Ciao,
Marc 'BlackJack' Rintsch
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regexp Neg. set of chars HowTo?

2006-12-21 Thread Paul McGuire

On Dec 20, 7:40 am, durumdara [EMAIL PROTECTED] wrote:
 Hi!

 I want to replace some seqs. in a html.
 Let:
 a-
 b
 = ab

 but:
 xxx -
 b
 must be unchanged, because it is not word split.

 I want to search and replace with re, but I don't know how to neg. this
 set ['\ \n\t'].

 This time I use full set without these chars, but neg. is better and
 shorter.

 Ok, I can use [^\s], but I want to know, how to neg. set of chars.
 sNorm1= '([^[\ \t\n]]{1})\-\br\ \/\\n' - this is not working.

 Thanks for the help:
 dd

 sNorm1= '([%s]{1})\-\br\ \/\\n'
 c = range(0, 256)
 c.remove(32)
 c.remove(13)
 c.remove(10)
 c.remove(9)
 s = [\\%s % (hex(v).replace('00x', '')) for v in c]
 sNorm1 = sNorm1 % (.join(s))
 print sNorm1

 def Normalize(Text):

 rx = re.compile(sNorm1)
 def replacer(match):
 return match.group(1)
 return rx.sub(replacer, Text)

 print Normalize('a -br /\nb')
 print Normalize('a-br /\nb')
 sys.exit()

It looks like you are trying to de-hyphenate words that have been
broken across line breaks.

Well, this isn't a regexp solution, it uses pyparsing instead.  But
I've added a number of other test cases which may be problematic for an
re.

-- Paul

from pyparsing import makeHTMLTags,Literal,Word,alphas,Suppress

brTag,brEndTag = makeHTMLTags(br)
hyphen = Literal(-)
hyphen.leaveWhitespace() # don't skip whitespace before matching this

collapse = Word(alphas) + Suppress(hyphen) + Suppress(brTag) \
+ Word(alphas)
# define action to replace expression with the word before hyphen
# concatenated with the word after the BR tag
collapse.setParseAction(lambda toks: toks[0]+toks[1])

print collapse.transformString('a -br /\nb')
print collapse.transformString('a-br /\nb')
print collapse.transformString('a-br/\nb')
print collapse.transformString('a-br\nb')
print collapse.transformString('a- BR clear=all\nb')

-- 
http://mail.python.org/mailman/listinfo/python-list


Regexp Neg. set of chars HowTo?

2006-12-20 Thread durumdara
Hi!

I want to replace some seqs. in a html.
Let:
a-
b
= ab

but:
xxx -
b
must be unchanged, because it is not word split.

I want to search and replace with re, but I don't know how to neg. this 
set ['\ \n\t'].

This time I use full set without these chars, but neg. is better and 
shorter.

Ok, I can use [^\s], but I want to know, how to neg. set of chars.
sNorm1= '([^[\ \t\n]]{1})\-\br\ \/\\n' - this is not working.

Thanks for the help:
dd

sNorm1= '([%s]{1})\-\br\ \/\\n'
c = range(0, 256)
c.remove(32)
c.remove(13)
c.remove(10)
c.remove(9)
s = [\\%s % (hex(v).replace('00x', '')) for v in c]
sNorm1 = sNorm1 % (.join(s))
print sNorm1

def Normalize(Text):

rx = re.compile(sNorm1)
def replacer(match):
return match.group(1)
return rx.sub(replacer, Text)

print Normalize('a -br /\nb')
print Normalize('a-br /\nb')
sys.exit()

-- 
http://mail.python.org/mailman/listinfo/python-list