Eric Wertman <[EMAIL PROTECTED]> wrote: > I have a set of files with this kind of content (it's dumped from > WebSphere): > > [propertySet "[[resourceProperties "[[[description "This is a required > property. This is an actual database name, and its not the locally > catalogued database name. The Universal JDBC Driver does not rely on > information catalogued in the DB2 database directory."] > [name databaseName] > [required true] > [type java.lang.String] > [value DB2Foo]] ...>
Looks to me like S-expressions with square brackets instead of the normal round ones. I'll bet that the correct lexical analysis is approximately [ open-list propertySet symbol " open-string [ open-list [ open-list resourceProperties symbol " open-string (not close-string!) ... so it also looks as if strings aren't properly escaped. This is definitely not a pretty syntax. I'd suggest an initial tokenization pass for the lexical syntax [ open-list ] close-list "[ open-qlist ]" close-qlist "..." string whitespace ignore anything-else symbol Correct nesting should give you two kinds of lists -- which I've shown as `list' and `qlist' (for quoted-list), though given the nastiness of the dump you showed, there's no guarantee of correctness. Turn the input string (or file) into a list (generator?) of lexical objects above; then scan that recursively. The lists (or qlists) seem to have two basic forms: * properties, that is a list of the form [SYMBOL VALUE ...] which can be thought of as a declaration that some property, named by the SYMBOL, has a particular VALUE (or maybe VALUEs); and * property lists, which are just lists of properties. Property lists can be usefully turned into Python dictionaries, indexed by their SYMBOLs, assuming that they don't try to declare the same property twice. There are, alas, other kinds of lists too -- one of the property lists contains a property `[value []]' which simply contains an empty list. The right first-cut rule for disambiguation is probably that a property list is a non-empty list, all of whose items look like properties, and a property is an entry in a property list, and (initially at least) restrict properties to the simple form [SYMBOL VALUE] rather than allowing multiple values. Does any of this help? (In fact, this syntax looks so much like a demented kind of S-expression that I'd probably try to parse it, initially at least, by using a Common Lisp system's reader and a custom readtable, but that may not be useful to you.) -- [mdw] -- http://mail.python.org/mailman/listinfo/python-list