On Aug 10, 7:56 am, Paul Hankin <[EMAIL PROTECTED]> wrote: > On Aug 10, 2:30 pm, [EMAIL PROTECTED] wrote: > > > I'm trying to use regular expressions to help me quickly extract the > > contents of messages that my application will receive. > > Don't use regexps for parsing complex data; they're limited, > completely unreadable, and hugely difficult to debug. Your code is > well written, and you've already reached the limits of the power of > regexps, and it's difficult to read. > > Have a look at pyparsing for a simple solution to your > problem.http://pyparsing.wikispaces.com/ > > -- > Paul Hankin
Well, predictably, the pyparsing solution is simple UNTIL we get to the "multidict" options field. Pyparsing has a Dict construct that has the same limitations as Python's dict - only the last key-value would be retained. So I had to write a parse action to manually stitch the key-value groups into the parsed tokens' internal key-value dict. With the basic grammar implemented in pyparsing, it would now be very easy to make some of these internal expressions optional (using Optional wrappers), or parseable in any order (using '&' operator instead of '+' - '&' enforces presence of all values, but in any order). -- Paul from pyparsing import Suppress, Literal, Combine, oneOf, Word, alphanums, \ restOfLine, ZeroOrMore, Group, ParseResults LBRACE,RBRACE,EQ = map(Suppress,"{}=") keylabel = lambda s : Literal(s) + EQ grp_msg_type = Combine("xpl-" + oneOf("cmnd stat trig")) (GROUP_MESSAGE_TYPE) grp_hop = keylabel("hop") + Word("123456789",exact=1)(GROUP_HOP) grp_source = keylabel("source") + Combine(Word(alphanums,max=8) (GROUP_SRC_VENDOR_ID) + '-' + Word(alphanums,max=8) (GROUP_SRC_DEVICE_ID) + '.' + Word(alphanums,max=16) (GROUP_SRC_INSTANCE_ID) )(GROUP_SOURCE) grp_target = keylabel("target") + Combine('*'|Word(alphanums,max=8) (GROUP_TGT_VENDOR_ID) + '-' + Word(alphanums,max=8) (GROUP_TGT_DEVICE_ID) + '.' + Word(alphanums,max=16) (GROUP_TGT_INSTANCE_ID) )(GROUP_TARGET) grp_schema = Combine(Word(alphanums,max=8)(GROUP_SCHEMA_CLASS) + '.' + Word(alphanums,max=8)(GROUP_SCHEMA_TYPE) )(GROUP_SCHEMA) option_key = Word(alphanums+'-',max=16) #~ option_val = Word(printables+' ',max=64) option_val = restOfLine options = (LBRACE + ZeroOrMore(Group(option_key("key") + EQ + option_val("value"))) + RBRACE)("options") # this parse action will take the raw key=value groups and add them to # the current results' named tokens def make_options_dict(tokens): for k,v in tokens.asList(): if k not in tokens: tokens[k] = ParseResults([]) tokens[k] += ParseResults(v) # delete redundant key-value created by pyparsing del tokens["options"] return tokens options.setParseAction(make_options_dict) msgFormat = (grp_msg_type + LBRACE + grp_hop + grp_source + grp_target + RBRACE + grp_schema + options) # parse each message for msgstr in msgdata: msg = msgFormat.parseString(msgstr) #~ print msg.dump() print "Message type:", msg.message_type print "Hop:", msg.hop print "Options:" print msg.options.dump() print Prints: Message type: xpl-stat Hop: 1 Options: [['interval', '10']] - interval: ['10'] Message type: xpl-stat Hop: 1 Options: [['reconf', 'newconf'], ['option', 'interval '], ['option', 'group[16]'], ['option', 'filter[16]']] - option: ['interval ', 'group[16]', 'filter[16]'] - reconf: ['newconf'] -- http://mail.python.org/mailman/listinfo/python-list