On Jan 13, 7:15 pm, Paul McGuire <pt...@austin.rr.com> wrote: > On Jan 5, 1:49 pm, Tim Chase <python.l...@tim.thechases.com> wrote: > > > > > vsoler wrote: > > > Hence, I need toparseExcel formulas. Can I do it by means only of re > > > (regular expressions)? > > > > I know that for simple formulas such as "=3*A7+5" it is indeed > > > possible. What about complex for formulas that include functions, > > > sheet names and possibly other *.xls files? > > > Where things start getting ugly is when you have nested function > > calls, such as > > > =if(Sum(A1:A25)>42,Min(B1:B25), if(Sum(C1:C25)>3.14, > > (Min(C1:C25)+3)*18,Max(B1:B25))) > > > Regular expressions don't do well with nested parens (especially > > arbitrarily-nesting-depth such as are possible), so I'd suggest > > going for a full-blown parsing solution like pyparsing. > > > If you have fair control over what can be contained in the > > formulas and you know they won't contain nested parens/functions, > > you might be able to formulate some sort of "kinda, sorta, maybe > > parses some forms of formulas" regexp. > > > -tkc > > This might give the OP a running start:
Unfortunately "this" will blow up after only a few paces; see below ... > > from pyparsing import (CaselessKeyword, Suppress, Word, alphas, > alphanums, nums, Optional, Group, oneOf, Forward, Regex, > operatorPrecedence, opAssoc, dblQuotedString) > > test1 = "=3*A7+5" > test2 = "=3*Sheet1!$A$7+5" test2a ="=3*'Sheet 1'!$A$7+5" test2b ="=3*'O''Reilly''s sheet'!$A$7+5" > test3 = "=if(Sum(A1:A25)>42,Min(B1:B25), " \ > "if(Sum(C1:C25)>3.14, (Min(C1:C25)+3)*18,Max(B1:B25)))" Many functions can take a variable number of args and they are not restricted to cell references e.g. test3a = "=sum(a1:a25,10,min(b1,c2,d3))" The arg separator is comma or semicolon depending on the locale ... a parser should accept either. > EQ,EXCL,LPAR,RPAR,COLON,COMMA,DOLLAR = map(Suppress, '=!():,$') > sheetRef = Word(alphas, alphanums) > colRef = Optional(DOLLAR) + Word(alphas,max=2) > rowRef = Optional(DOLLAR) + Word(nums) > cellRef = Group(Optional(sheetRef + EXCL)("sheet") + colRef("col") + > rowRef("row")) > > cellRange = (Group(cellRef("start") + COLON + cellRef("end")) > ("range") > | cellRef ) > > expr = Forward() > > COMPARISON_OP = oneOf("< = > >= <= != <>") > condExpr = expr + COMPARISON_OP + expr > > ifFunc = (CaselessKeyword("if") + > LPAR + > Group(condExpr)("condition") + that should be any expression; at run-time it expects a boolean (TRUE or FALSE) or a number (0 means false, non-0 means true). Text causes a #VALUE! error. Trying to subdivide expressions into conditional / numeric /text just won't work. > COMMA + expr("if_true") + > COMMA + expr("if_false") + RPAR) > statFunc = lambda name : CaselessKeyword(name) + LPAR + cellRange + > RPAR > sumFunc = statFunc("sum") > minFunc = statFunc("min") > maxFunc = statFunc("max") > aveFunc = statFunc("ave") > funcCall = ifFunc | sumFunc | minFunc | maxFunc | aveFunc > > multOp = oneOf("* /") > addOp = oneOf("+ -") needs power op "^" > numericLiteral = Regex(r"\-?\d+(\.\d+)?") Sorry, that "-" in there is a unary minus operator. What about 1e23 ? > operand = numericLiteral | funcCall | cellRange | cellRef > arithExpr = operatorPrecedence(operand, > [ > (multOp, 2, opAssoc.LEFT), > (addOp, 2, opAssoc.LEFT), > ]) > > textOperand = dblQuotedString | cellRef > textExpr = operatorPrecedence(textOperand, > [ > ('&', 2, opAssoc.LEFT), > ]) Excel evaluates excessively permissively, and the punters are definitely not known for self-restraint. The above just won't work: 2.3 & 4.5 produces text "2.34.5", while "2.3" + "4.5" produces number 6.8. -- http://mail.python.org/mailman/listinfo/python-list