[EMAIL PROTECTED] wrote: > I have a regular expression that is approximately 100k bytes. (It is > basically a list of all known norwegian postal numbers and the > corresponding place with | in between. I know this is not the intended > use for regular expressions, but it should nonetheless work. > > the pattern is > ur'(N-|NO-)?(5259 HJELLESTAD|4026 STAVANGER|4027 STAVANGER........|8305 > SVOLVÆR)' > > The error message I get is: > RuntimeError: internal error in regular expression engine
you're most likely exceeding the allowed code size (usually 64k). however, putting all postal numbers in a single RE is a horrid abuse of the RE engine. why not just scan for "(N-|NO-)?(\d+)" and use a dictionary to check if you have a valid match? postcodes = { "5269": "HJELLESTAD", ... "9999": "ØSTRE FJORDVIDDA", } for m in re.finditer("(N-|NO-)?(\d+) ", text): prefix, number = m.groups() try: place = postcodes[number] except KeyError: continue if not text.startswith(place, m.end()): continue # got a match! print prefix, number, place </F>
-- http://mail.python.org/mailman/listinfo/python-list