Thanks to everyone for the help and feedback. It's amazing to me that I've been dealing with odd log files and other outputs for quite a while, and never really stumbled onto a parser as a solution.
I got this far, with Paul's help, which manages my current set of files: from pyparsing import nestedExpr,Word,alphanums,QuotedString from pprint import pprint import re import glob files = glob.glob('wsout/*') for file in files : text = open(file).read() text = re.sub('"\[',' [',text) # These 2 lines just drop double quotes text = re.sub('\]"','] ',text) # that aren't related to a string text = re.sub('\[\]','None',text) # this drops the empty [] text = '[ ' + text + ' ]' # Needs an outer layer content = Word(alphanums+"-_./()*=#\\${}| :,;[EMAIL PROTECTED]&%%") | QuotedString('"',multiline=True) structure = nestedExpr("[", "]", content).parseString(text) pprint(structure[0].asList()) I'm sure there are cooler ways to do some of that. I spent most of my time expanding the characters that constitute content. I'm concerned that over time I'll have things break as other characters show up. Specifically a few of the nodes are of German locale.. so I could get some odd international characters. It looks like pyparser has a constant for printable characters. I'm not sure if I can just use that, without worrying about it? At any rate, thumbs up on the parser! Definitely going to add to my toolbox. On Thu, Apr 24, 2008 at 8:19 AM, Mark Wooding <[EMAIL PROTECTED]> wrote: > > Eric Wertman <[EMAIL PROTECTED]> wrote: > > > I have a set of files with this kind of content (it's dumped from > > WebSphere): > > > > [propertySet "[[resourceProperties "[[[description "This is a required > > property. This is an actual database name, and its not the locally > > catalogued database name. The Universal JDBC Driver does not rely on > > > information catalogued in the DB2 database directory."] > > [name databaseName] > > [required true] > > [type java.lang.String] > > [value DB2Foo]] ...> > > Looks to me like S-expressions with square brackets instead of the > normal round ones. I'll bet that the correct lexical analysis is > approximately > > [ open-list > propertySet symbol > " open-string > [ open-list > [ open-list > resourceProperties symbol > " open-string (not close-string!) > ... > > so it also looks as if strings aren't properly escaped. > > This is definitely not a pretty syntax. I'd suggest an initial > tokenization pass for the lexical syntax > > [ open-list > ] close-list > "[ open-qlist > ]" close-qlist > "..." string > whitespace ignore > anything-else symbol > > Correct nesting should give you two kinds of lists -- which I've shown > as `list' and `qlist' (for quoted-list), though given the nastiness of > the dump you showed, there's no guarantee of correctness. > > Turn the input string (or file) into a list (generator?) of lexical > objects above; then scan that recursively. The lists (or qlists) seem > to have two basic forms: > > * properties, that is a list of the form [SYMBOL VALUE ...] which can > be thought of as a declaration that some property, named by the > SYMBOL, has a particular VALUE (or maybe VALUEs); and > > * property lists, which are just lists of properties. > > Property lists can be usefully turned into Python dictionaries, indexed > by their SYMBOLs, assuming that they don't try to declare the same > property twice. > > There are, alas, other kinds of lists too -- one of the property lists > contains a property `[value []]' which simply contains an empty list. > > The right first-cut rule for disambiguation is probably that a property > list is a non-empty list, all of whose items look like properties, and a > property is an entry in a property list, and (initially at least) > restrict properties to the simple form [SYMBOL VALUE] rather than > allowing multiple values. > > Does any of this help? > > (In fact, this syntax looks so much like a demented kind of S-expression > that I'd probably try to parse it, initially at least, by using a Common > Lisp system's reader and a custom readtable, but that may not be useful > to you.) > > -- [mdw] > > > > -- > http://mail.python.org/mailman/listinfo/python-list > -- http://mail.python.org/mailman/listinfo/python-list