On Jun 12, 6:06 am, David C. Ullrich <[EMAIL PROTECTED]> wrote: > There's no regex that detects balanced parentheses, > or is there? > > That is, search('and so ((x+y)+z) = (x+(y+z))') > should return '((x+y)+z)'. > > Not just a theoretical question, I'm cleaning up > a large body of TeX code and a regex that did that > would be very convenient. (Yes, I know it's not > hard to detect balanced parentheses by hand...) > > I don't know that stuff, but I seen to recall reading > that there's a theoretical notion of "regular expression" > in CS, and a regular expression in that sense cannot > do this. But in the same place I read that the actual > regexes in programming languages include things which > are not regular expressions in that theoretical sense. > > David C. Ullrich
Pyparsing includes several helper methods for building common expression patterns, such as delimitedList, oneOf, operatorPrecedence, countedArray - and a fairly recent addition, nestedExpr. nestedExpr creates an expression for matching nested text within opening and closing delimiters, such as ()'s, []'s, {}'s, etc. The default delimiters are ()'s. You can also specify a content expression, so that pyparsing will look for and construct meaningful results. The default is to return any text nested within the delimiters, broken up by whitespace. Here is your sample string parsed using the default nestedExpr: >>> from pyparsing import nestedExpr >>> for e in nestedExpr().searchString('and so ((x+y)+z) = (x+(y+z))'): ... print e[0] ... [['x+y'], '+z'] ['x+', ['y+z']] Pyparsing found 2 matches in your input string. Note that the parens are gone from the results - nestedExpr returns a nested list structure, with nesting corresponding to the ()'s found in the original string. Pyparsing supports parse-time callbacks, called 'parse actions', and it comes with several commonly used methods, such as removeQuotes, upcaseTokens, and keepOriginalText. The purpose of keepOriginalText is to revert any structuring or parsing an expression or other parse actions might do, and just return the originally matched text. Here is how keepOriginalText gives you back just the nested parenthetical expressions, without any additional processing or grouping: >>> from pyparsing import keepOriginalText >>> matchedParens = nestedExpr().setParseAction(keepOriginalText) >>> for e in matchedParens.searchString('and so ((x+y)+z) = (x+(y+z))'): ... print e[0] ... ((x+y)+z) (x+(y+z)) -- Paul -- http://mail.python.org/mailman/listinfo/python-list