On Mar 27, 6:51 am, "R. David Murray" <rdmur...@bitdance.com> wrote: > OK, I've got a little problem that I'd like to ask the assembled minds > for help with. I can write code to parse this, but I'm thinking it may > be possible to do it with regexes. My regex foo isn't that good, so if > anyone is willing to help (or offer an alternate parsing suggestion) > I would be greatful. (This has to be stdlib only, by the way, I > can't introduce any new modules into the application so pyparsing is > not an option.) > > The challenge is to turn a string like this: > > a=1,b="0234,)#($)@", k="7" > > into this: > > [("a", "1"), ("b", "0234,)#($)#"), ("k", "7")]
The challenge is for you to explain unambiguously what you want. 1. a=1 => "1" and k="7" => "7" ... is this a mistake or are the quotes optional in the original string when not required to protect a comma? 2. What is the rule that explains the transmogrification of @ to # in your example? 3. Is the input guaranteed to be syntactically correct? The following should do close enough to what you want; adjust as appropriate. >>> import re >>> s = """a=1,b="0234,)#($)@", k="7" """ >>> rx = re.compile(r'[ ]*(\w+)=([^",]+|"[^"]*")[ ]*(?:,|$)') >>> rx.findall(s) [('a', '1'), ('b', '"0234,)#($)@"'), ('k', '"7"')] >>> rx.findall('a=1, *DODGY*SYNTAX* b=2') [('a', '1'), ('b', '2')] >>> HTH, John -- http://mail.python.org/mailman/listinfo/python-list