Re: Breaking up Strings correctly:
En Tue, 10 Apr 2007 08:12:53 -0300, Michael Yanowitz <[EMAIL PROTECTED]> escribió: > I guess what I was looking for was something simpler than parsing. > I may actually use some of what you posted. But I am hoping that > if given a string such as: > '((($IP = "127.1.2.3") AND ($AX < 15)) OR (($IP = "127.1.2.4") AND ($AY > != > 0)))' > something like split(), where I can pass it something like [' AND ', ' > OR > ', ' XOR '] > will split the string by AND, OR, or XOR. > BUT split it up in such a way to preserve the parentheses order, so > that > it will > split on the outermost parenthesis. > So that the above string becomes: > ['OR', '(($IP = "127.1.2.3") AND ($AX < 15))', '(($IP = "127.1.2.4") AND > ($AY != 0))'] > No need to do this recursively, I can repeat the process, however if I > wish on each > string in the list and get: > ['OR', ['AND', '($IP = "127.1.2.3")', '($AX < 15)'], ['AND', '($IP = > "127.1.2.4")', '($AY != 0)']] > > Can this be done without parsers? This is exactly what parsers do. Sure, it can be done without using a preexistent general parser, but you'll be writing your own specialized one by hand. > Perhaps with some variation of re or > split. Regular expressions cannot represent arbitrary expressions like yours (simply because they're not regular). If you know beforehand that all input has some fixed form, like "condition AND condition OR condition AND condition", or at least a finite set of fixed forms, it could be done with many re's. But I think it's much more work than using PyParsing or similar tools. If you have some bizarre constraints (parserphobia?) or for whatever reason don't want to use such tools, the infix evaluator posted yesterday by Gerard Flanagan could be an alternative (it only uses standard modules). > Has something like this already been written? Yes, hundreds of times since programmable computers exist: they're known as "lexers" and "parsers" :) -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list
RE: Breaking up Strings correctly:
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Adam Atlas Sent: Monday, April 09, 2007 11:28 PM To: python-list@python.org Subject: Re: Breaking up Strings correctly: On Apr 9, 8:19 am, "Michael Yanowitz" <[EMAIL PROTECTED]> wrote: > Hello: > >I have been searching for an easy solution, and hopefully one > has already been written, so I don't want to reinvent the wheel: Pyparsing is indeed a fine package, but if Paul gets to plug his module, then so do I! :) I have a package called ZestyParser... a lot of it is inspired by Pyparsing, actually, but I'm going in a different direction in many areas. (One major goal is to be crazily dynamic and flexible on the inside. And it hasn't failed me thus far; I've used it to easily parse grammars that would make lex and yacc scream in horror.) Here's how I'd do it... from ZestyParser import * from ZestyParser.Helpers import * varName = Token(r'\$(\w+)', group=1) varVal = QuoteHelper() | Int sp = Skip(Token(r'\s*')) comparison = sp.pad(varName + CompositeToken([RawToken(sym) for sym in ('=','<','>','>=','<=','!=')]) + varVal) #Maybe I should "borrow" PyParsing's OneOf idea :) expr = ExpressionHelper(( comparison, (RawToken('(') + Only(_top_) + RawToken(')')), oper('NOT', ops=UNARY), oper('AND'), oper('OR'), )) Now you can scan for `expr` and get a return value like [[['IP', '=', '127.1.2.3'], ['AX', '<', 15]], [['IP', '=', '127.1.2.4'], ['AY', '! =', 0]]] (for the example you gave). Note that this example uses several features that won't be available until the next release, but it's coming soon. So Michael, though you'd still be able to parse this with the current version, the code wouldn't look as nice as this or the Pyparsing version. Maybe just add it to your watchlist. :) - Adam -- Thanks for your and Gerard's and Gabriel's responses. I guess what I was looking for was something simpler than parsing. I may actually use some of what you posted. But I am hoping that if given a string such as: '((($IP = "127.1.2.3") AND ($AX < 15)) OR (($IP = "127.1.2.4") AND ($AY != 0)))' something like split(), where I can pass it something like [' AND ', ' OR ', ' XOR '] will split the string by AND, OR, or XOR. BUT split it up in such a way to preserve the parentheses order, so that it will split on the outermost parenthesis. So that the above string becomes: ['OR', '(($IP = "127.1.2.3") AND ($AX < 15))', '(($IP = "127.1.2.4") AND ($AY != 0))'] No need to do this recursively, I can repeat the process, however if I wish on each string in the list and get: ['OR', ['AND', '($IP = "127.1.2.3")', '($AX < 15)'], ['AND', '($IP = "127.1.2.4")', '($AY != 0)']] Can this be done without parsers? Perhaps with some variation of re or split. Has something like this already been written? Thanks in advance: -- http://mail.python.org/mailman/listinfo/python-list
Re: Breaking up Strings correctly:
On Apr 9, 8:19 am, "Michael Yanowitz" <[EMAIL PROTECTED]> wrote: > Hello: > >I have been searching for an easy solution, and hopefully one > has already been written, so I don't want to reinvent the wheel: Pyparsing is indeed a fine package, but if Paul gets to plug his module, then so do I! :) I have a package called ZestyParser... a lot of it is inspired by Pyparsing, actually, but I'm going in a different direction in many areas. (One major goal is to be crazily dynamic and flexible on the inside. And it hasn't failed me thus far; I've used it to easily parse grammars that would make lex and yacc scream in horror.) Here's how I'd do it... from ZestyParser import * from ZestyParser.Helpers import * varName = Token(r'\$(\w+)', group=1) varVal = QuoteHelper() | Int sp = Skip(Token(r'\s*')) comparison = sp.pad(varName + CompositeToken([RawToken(sym) for sym in ('=','<','>','>=','<=','!=')]) + varVal) #Maybe I should "borrow" PyParsing's OneOf idea :) expr = ExpressionHelper(( comparison, (RawToken('(') + Only(_top_) + RawToken(')')), oper('NOT', ops=UNARY), oper('AND'), oper('OR'), )) Now you can scan for `expr` and get a return value like [[['IP', '=', '127.1.2.3'], ['AX', '<', 15]], [['IP', '=', '127.1.2.4'], ['AY', '! =', 0]]] (for the example you gave). Note that this example uses several features that won't be available until the next release, but it's coming soon. So Michael, though you'd still be able to parse this with the current version, the code wouldn't look as nice as this or the Pyparsing version. Maybe just add it to your watchlist. :) - Adam -- http://mail.python.org/mailman/listinfo/python-list
Re: Breaking up Strings correctly:
En Mon, 09 Apr 2007 12:39:44 -0300, Paul McGuire <[EMAIL PROTECTED]> escribió: > On Apr 9, 7:19 am, "Michael Yanowitz" <[EMAIL PROTECTED]> wrote: >> >>Suppose I have a string of expressions such as: >> "((($IP = "127.1.2.3") AND ($AX < 15)) OR (($IP = "127.1.2.4") AND ($AY >> != >> 0))) >> I would like to split up into something like: >> >> [ "OR", >> ["AND", "($IP = "127.1.2.3")", "($AX < 15)"], >> ["AND", "(($IP = "127.1.2.4")", ($AY != 0))"] ] >> > > This problem is right down the pyparsing fairway! Pyparsing is a > module for defining recursive-descent parsers, and it has some built- > in help just for applications such as this. Sometimes I've seen you proposing the usage of PyParsing on problems that, in my opinion, were better solved using some other standard tools, but this time you're absolutely right: this is perfectly suited for PyParsing! :) -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list
Re: Breaking up Strings correctly:
On Apr 9, 1:19 pm, "Michael Yanowitz" <[EMAIL PROTECTED]> wrote: > Hello: > >I have been searching for an easy solution, and hopefully one > has already been written, so I don't want to reinvent the wheel: > >Suppose I have a string of expressions such as: > "((($IP = "127.1.2.3") AND ($AX < 15)) OR (($IP = "127.1.2.4") AND ($AY != > 0))) > I would like to split up into something like: > [ "OR", > "(($IP = "127.1.2.3") AND ($AX < 15))", > "(($IP = "127.1.2.4") AND ($AY != 0))" ] > > which I may then decide to or not to further split into: > [ "OR", > ["AND", "($IP = "127.1.2.3")", "($AX < 15)"], > ["AND", "(($IP = "127.1.2.4")", ($AY != 0))"] ] > > Is there an easy way to do this? If you look into infix to prefix conversion algorithms it might help you. The following seems to work with the example you give, but not tested further: data = ''' ((($IP = "127.1.2.3") AND ($AX < 15)) OR (($IP = "127.1.2.4") AND ($AY != 0))) ''' import tokenize from cStringIO import StringIO opstack = [] valstack = [] s = '' g = tokenize.generate_tokens(StringIO(data).readline) # tokenize the string for _, tokval, _, _, _ in g: if tokval in ['(', ')', 'AND', 'OR']: if tokval != ')': opstack.append(tokval) else: if s: valstack.append(s) s = '' while opstack[-1] != '(': op = opstack.pop() rhs = valstack.pop() lhs = valstack.pop() valstack.append([op, lhs, rhs]) opstack.pop() else: s += tokval.strip() print valstack [['OR', ['AND', '$IP="127.1.2.3"', '$AX<15'], ['AND', '$IP="127.1.2.4"', '$AY!=0']]] Gerard -- http://mail.python.org/mailman/listinfo/python-list
Re: Breaking up Strings correctly:
On Apr 9, 7:19 am, "Michael Yanowitz" <[EMAIL PROTECTED]> wrote: > Hello: > >I have been searching for an easy solution, and hopefully one > has already been written, so I don't want to reinvent the wheel: > >Suppose I have a string of expressions such as: > "((($IP = "127.1.2.3") AND ($AX < 15)) OR (($IP = "127.1.2.4") AND ($AY != > 0))) > I would like to split up into something like: > [ "OR", > "(($IP = "127.1.2.3") AND ($AX < 15))", > "(($IP = "127.1.2.4") AND ($AY != 0))" ] > > which I may then decide to or not to further split into: > [ "OR", > ["AND", "($IP = "127.1.2.3")", "($AX < 15)"], > ["AND", "(($IP = "127.1.2.4")", ($AY != 0))"] ] > > Is there an easy way to do this? > I tried using regular expressions, re, but I don't think it is > recursive enough. I really want to break it up from: > (E1 AND_or_OR E2) and make that int [AND_or_OR, E1, E2] > and apply the same to E1 and E2 recursively until E1[0] != '(' > >But the main problem I am running to is, how do I split this up > by outer parentheseis. So that I get the proper '(' and ')' to split > this upper correctly? > > Thanks in advance: > Michael Yanowitz This problem is right down the pyparsing fairway! Pyparsing is a module for defining recursive-descent parsers, and it has some built- in help just for applications such as this. You start by defining the basic elements of the text to be parsed. In your sample text, you are combining a number of relational comparisons, made up of variable names and literal integers and quoted strings. Using pyparsing classes, we define these: varName = Word("$",alphas, min=2) integer = Word("0123456789").setParseAction( lambda t : int(t[0]) ) varVal = dblQuotedString | integer varName is a "word" starting with a $, followed by 1 or more alphas. integer is a "word" made up of 1 or more digits, and we add a parsing action to convert these to Python ints. varVal shows that a value can be an integer or a dblQuotedString (a common expression included with pyparsing). Next we define the set of relational operators, and the comparison expression: relationalOp = oneOf("= < > >= <= !=") comparison = Group(varName + relationalOp + varVal) The comparison expression is grouped so as to keep tokens separate from surrounding expressions. Now the most complicated part, to use the operatorPrecedence method from pyparsing. It is possible to create the recursive grammar explicitly, but this is another application that is very common, so pyparsing includes a helper for it too. Here is your set of operations defined using operatorPrecedence: boolExpr = operatorPrecedence( comparison, [ ( "AND", 2, opAssoc.LEFT ), ( "OR", 2, opAssoc.LEFT ), ]) operatorPrecedence takes 2 arguments: the base-level or atom expression (in your case, the comparison expression), and a list of tuples listing the operators in descending priority. Each tuple gives the operator, the number of operands (1 or 2), and whether it is right or left associative. Now the only thing left to do is use boolExpr to parse your test string: results = boolExpr.parseString('((($IP = "127.1.2.3") AND ($AX < 15)) OR (($IP = "127.1.2.4") AND ($AY != 0)))') pyparsing returns parsed tokens as a rich object of type ParseResults. This object can be accessed as a list, dict, or object instance with named attributes. For this example, we'll actually create a nested list using ParseResults' asList method. Passing this list to the pprint module we get: pprint.pprint( results.asList() ) prints '$IP', '=', '"127.1.2.3"'], 'AND', ['$AX', '<', 15]], 'OR', [['$IP', '=', '"127.1.2.4"'], 'AND', ['$AY', '!=', 0 Here is the whole program in one chunk (I also added support for NOT - higher priority than AND, and right-associative): test = '((($IP = "127.1.2.3") AND ($AX < 15)) OR (($IP = "127.1.2.4") AND ($AY != 0)))' from pyparsing import oneOf, Word, alphas, dblQuotedString, nums, \ Literal, Group, operatorPrecedence, opAssoc varName = Word("$",alphas) integer = Word(nums).setParseAction( lambda t : int(t[0]) ) varVal = dblQuotedString | integer relationalOp = oneOf("= < > >= <= !=") comparison = Group(varName + relationalOp + varVal) boolExpr = operatorPrecedence( comparison, [ ( "NOT", 1, opAssoc.RIGHT ), ( "AND", 2, opAssoc.LEFT ), ( "OR", 2, opAssoc.LEFT ), ]) import pprint pprint.pprint( boolExpr.parseString(test).asList() ) The pyparsing wiki includes some related examples, SimpleBool.py and SimpleArith.py - go to http://pyparsing.wikispaces.com/Examples. -- Paul -- http://mail.python.org/mailman/listinfo/python-list