On Apr 16, 3:27 am, "7stud" <[EMAIL PROTECTED]> wrote: > <sample problem snipped> > Any tips?
7stud - Here is the modified code, followed by my comments. Oh, one general comment - you mention that you are quite facile with regexp's. pyparsing has a slightly different philosophy from that of regular expressions, especially in the areas of whitespace skipping and backtracking. pyparsing will automatically skip whitespace between parsing expressions, whereas regexp's require explicit '\s*' (unless you specify the magic "whitespace between elements allowed" attribute which I don't remember its magic attribute character at the moment, but I rarely see regexp examples use it). And pyparsing is purely a left-to-right recursive descent parser generator. It wont look ahead to the next element past a repetition operation to see when to stop repeating. There's an FAQ on this on the wiki. ------------------ from pyparsing import Word, alphas, commaSeparatedList, delimitedList, sglQuotedString, removeQuotes name = Word(alphas) lookFor = name + "=" + "[" + commaSeparatedList + "]" # comment #0 my_file = """\ mara = [ 'U2FsdGVkX185IX5PnFbzUYSKg+wMyYg9', 'U2FsdGVkX1+BCxltXVTQ2+mo83Si9oAV0sasmIGHVyk=', 'U2FsdGVkX18iUS8hYBXgyWctqpWPypVz6Fj49KYsB8s=' ]""" my_file = "".join(my_file.splitlines()) # uncomment next line once debugging of grammar is finished # my_file = open("aaa.txt").read() # comment #1 #~ my_file = open("aaa.txt") #~ for line in my_file: for line in [my_file,]: alist = lookFor.parseString(line) globals()[alist[0] ] = [ alist[3].strip("'"), alist[4].strip("'"), alist[5].strip("'") ] # comment #2 def stripSingleQuotes(s): return s.strip("'") globals()[alist[0] ] = map(stripSingleQuotes, alist[3:-1] ) print mara[2] mara = None # comment #3 lookFor = name.setResultsName("var") + "=" + "[" + \ commaSeparatedList.setResultsName("listValues") + "]" alist = lookFor.parseString(my_file) # evaluate parsed assignment globals()[ alist.var ] = map(stripSingleQuotes, alist.listValues ) print len(mara), mara[1] # comment #4 lookFor = name.setResultsName("var") + "=" + "[" + \ delimitedList( sglQuotedString.setParseAction(removeQuotes) )\ .setResultsName("listValues") + "]" alist = lookFor.parseString(my_file) globals()[ alist.var ] = list( alist.listValues ) print len(mara), mara[1] ------------------ Comment #0: When I am debugging a pyparsing application, I find it easier to embed the input text, or a subset of it, into the program itself using a triple-quoted string. Then later, I'll go back and change to reading data from an input file. Purely a matter of taste, but it simplifies posting to mailing lists and newsgroups. Comment #1: Since you are going line by line in reading the input file, be *sure* you have the complete assignment expression on each line. Since pyparsing will read past line breaks for you, and since your input file contains only this one assignment, you might be better off calling parseString with: alist = lookFor.parseString( my_file.read() ) Comment #2: Your assignment of the "mara" global is a bit clunky on two fronts: - the explicit accessing of elements 3,4, and 5 - the repeated calls to strip("'") You can access the pyparsing returned tokens (passed as a ParseResults object) using slices. In your case, you want the elements 3 through n-1, so alist[3:-1] will give you this. It's nice to avoid hard- coding things like list lengths and numbers of list elements. Note that you can also use len to find out the length of the list. As for calling strip("'") for each of these elements, have you learned to use Python's map built-in yet? Define a function or lambda that takes a single element, return from the function what you want done with that element, and then call map with that function, and the list you want to process. This modified version of your call is more resilient than the original. Comment #3: Personally, I am not keen on using too much explicit indexing into the returned results. This is another area where pyparsing goes beyond typical lexing and tokenizing. Just as you can assign names to fields in regexp's, pyparsing allows you to give names to elements within the parsed results. You can then access these by name, using either dict or object attribute syntax. This gets rid of most if not all of the magic numbers from your code, and makes it yet again more resilient in light of changes in the future. (Say for example you decided to suppress the "=", "[", and "]" punctuation from the parsed results. The parsing logic would remain the same, but the returned tokens would contain only the significant content, the variable name and list contents. Using explicit list indexing would force you to renumber the list elements you are extracting, but with results names, no change would be required.) Comment #4: I thought I'd show you an alternative to commaSeparatedList, called delimitedList. delimitedList is a method that gives you more control over the elements you expect to find within the list, and what to do with them when you find them. delimitedList is a constructor method that takes a pyparsing expression 'expr' and expands it to 'expr + ZeroOrMore(Suppress(",") + expr)'. You can also change the delimiter from ',' to some other character, or even to a pyparsing expression. Pyparsing includes predefined expressions for some common text patterns, such as single and double quoted strings, and comments of various forms. Look for a directory called htmldoc in your pyparsing directory tree, and open the index.html file there to look through the classes and methods defined for you in pyparsing. Or just type "help(pyparsing)" in the Python interpreter (after typing "import pyparsing" first, of course). Now that we have access to the expression defined to be matched within the list, we can attach a parse action. A parse action will get run against the matched tokens at parse time, and can be used to modify the matched data before continuing. In this example, we'd like to remove those annoying opening and closing quotation marks. Again, this is such a common task that pyparsing includes a built-in for this, called removeQuotes. It is equivalent to the following: removeQuotes = lambda tokens : tokens[0][1:-1] What?! No verifying that the first and last characters are in fact quotation marks? Nope. Another part of the pyparsing philosophy is that parse actions *know* that they will only be called with text that matches their associated input pattern. removeQuotes is a parse action that *knows* that the string passed to it will have opening and closing "'" or '"' characters. You'll also see this quite often when parsing integers: integer = Word(nums).setParseAction(lambda toks: int(toks[0])) No testing for "are the characters all numeric?" or trapping for ValueError. We *know* that the only time this lambda will be invoked is after having matched a word group composed only of numeric digits. Any way, to wrap up this comment, now that we have attached a parse action to remove the "'" characters as we parse, the listValues field is ready to use as is from the parseString method, without having to clutter our code up with maps, or lambdas, or other post-processing junk. Enjoy! -- Paul -- http://mail.python.org/mailman/listinfo/python-list