On Nov 28, 11:32 am, "Ryan Krauss" <[EMAIL PROTECTED]> wrote: > I need to parse the following string: > > $$\pmatrix{{\it x_2}\cr 0\cr 1\cr }=\pmatrix{\left({{{\it m_2}\,s^2 > }\over{k}}+1\right)\,{\it x_1}-{{F}\over{k}}\cr -{{{\it m_2}\,s^2\,F > }\over{k}}-F+\left({\it m_2}\,s^2\,\left({{{\it m_2}\,s^2}\over{k}}+1 > \right)+{\it m_2}\,s^2\right)\,{\it x_1}\cr 1\cr }$$ > > The first thing I need to do is extract the arguments to \pmatrix{ } > on both the left and right hand sides of the equal sign, so that the > first argument is extracted as > > {\it x_2}\cr 0\cr 1\cr > > and the second is > > \left({{{\it m_2}\,s^2 > }\over{k}}+1\right)\,{\it x_1}-{{F}\over{k}}\cr -{{{\it m_2}\,s^2\,F > }\over{k}}-F+\left({\it m_2}\,s^2\,\left({{{\it m_2}\,s^2}\over{k}}+1 > \right)+{\it m_2}\,s^2\right)\,{\it x_1}\cr 1\cr > > The trick is that there are extra curly braces inside the \pmatrix{ } > strings and I don't know how to write a regexp that would count the > number of open and close curly braces and make sure they match, so > that it can find the correct ending curly brace. >
As Tim Grove points out, writing a grammar for this expression is really pretty simple, especially using the latest version of pyparsing, which includes a new helper method, nestedExpr. Here is the whole program to parse your example: from pyparsing import * data = r"""$$\pmatrix{{\it x_2}\cr 0\cr 1\cr }= \pmatrix{\left({{{\it m_2}\,s^2 }\over{k}}+1\right)\,{\it x_1}-{{F}\over{k}}\cr -{{{\it m_2}\,s^2\,F }\over{k}}-F+\left({\it m_2}\,s^2\,\left({{{\it m_2}\,s^2}\over{k}}+1 \right)+{\it m_2}\,s^2\right)\,{\it x_1}\cr 1\cr }$$""" PMATRIX = Literal(r"\pmatrix") nestedBraces = nestedExpr("{","}") grammar = "$$" + PMATRIX + nestedBraces + "=" + \ PMATRIX + nestedBraces + \ "$$" res = grammar.parseString(data) print res This prints the following: ['$$', '\\pmatrix', [['\\it', 'x_2'], '\\cr', '0\\cr', '1\\cr'], '=', '\\pmatrix', ['\\left(', [[['\\it', 'm_2'], '\\,s^2'], '\\over', ['k']], '+1\\right)\\,', ['\\it', 'x_1'], '-', [['F'], '\\over', ['k']], '\\cr', '-', [[['\\it', 'm_2'], '\\,s^2\\,F'], '\\over', ['k']], '-F+\\left(', ['\\it', 'm_2'], '\\,s^2\\,\\left(', [[['\\it', 'm_2'], '\\,s^2'], '\\over', ['k']], '+1', '\\right)+', ['\\it', 'm_2'], '\\,s^2\\right)\\,', ['\\it', 'x_1'], '\\cr', '1\\cr'], '$$'] Okay, maybe this looks a bit messy. But believe it or not, the returned results give you access to each grammar element as: ['$$', '\\pmatrix', [nested arg list], '=', '\\pmatrix', [nestedArgList], '$$'] Not only has the parser handled the {} nesting levels, but it has structured the returned tokens according to that nesting. (The '{}'s are gone now, since their delimiting function has been replaced by the nesting hierarchy in the results.) You could use tuple assignment to get at the individual fields: dummy,dummy,lhs_args,dummy,dummy,rhs_args,dummy = res Or you could access the fields in res using list indexing: lhs_args, rhs_args = res[2],res[5] But both of these methods will break if you decide to extend the grammar with additional or optional fields. A safer approach is to give the grammar elements results names, as in this slightly modified version of grammar: grammar = "$$" + PMATRIX + nestedBraces("lhs_args") + "=" + \ PMATRIX + nestedBraces("rhs_args") + \ "$$" Now you can access the parsed fields as if the results were a dict with keys "lhs_args" and "rhs_args", or as an object with attributes named "lhs_args" and "rhs_args": res = grammar.parseString(data) print res["lhs_args"] print res["rhs_args"] print res.lhs_args print res.rhs_args Note that the default behavior of nestedExpr is to give back a nested list of the elements according to how the original text was nested within braces. If you just want the original text, add a parse action to nestedBraces to do this for you (keepOriginalText is another pyparsing builtin). The parse action is executed at parse time so that there is no post- processing needed after the parsed results are returned: nestedBraces.setParseAction(keepOriginalText) grammar = "$$" + PMATRIX + nestedBraces("lhs_args") + "=" + \ PMATRIX + nestedBraces("rhs_args") + \ "$$" res = grammar.parseString(data) print res print res.lhs_args print res.rhs_args Now this program returns the original text for the nested brace expressions: ['$$', '\\pmatrix', '{{\\it x_2}\\cr 0\\cr 1\\cr }', '=', '\\pmatrix', '{\\left({{{\\it m_2}\\,s^2 \n }\\over{k}}+1\\right)\\,{\\it x_1}-{{F}\ \over{k}}\\cr -{{{\\it m_2}\\,s^2\\,F \n }\\over{k}}-F+\\left({\\it m_2}\\,s^2\\,\\left({{{\\it m_2}\\,s^2}\\over{k}}+1 \n \\right)+{\\it m_2}\\,s^2\\right)\\,{\\it x_1}\\cr 1\\cr }', '$$'] ['{{\\it x_2}\\cr 0\\cr 1\\cr }'] ['{\\left({{{\\it m_2}\\,s^2 \n }\\over{k}}+1\\right)\\,{\\it x_1}-{{F} \\over{k}}\\cr -{{{\\it m_2}\\,s^2\\,F \n }\\over{k}}-F+\\left({\\it m_2}\\,s^2\\,\\left({{{\\it m_2}\\,s^2}\\over{k}}+1 \n \\right)+{\\it m_2}\\,s^2\\right)\\,{\\it x_1}\\cr 1\\cr }'] You can find more info on pyparsing at http://pyparsing.wikispaces.com. Cheers! -- Paul -- http://mail.python.org/mailman/listinfo/python-list