On Jul 26, 3:08 am, Stargaming <[EMAIL PROTECTED]> wrote: > On Wed, 25 Jul 2007 09:16:26 -0700, kyosohma wrote: > > On Jul 25, 10:46 am, [EMAIL PROTECTED] wrote: > >> Hello, > > >> I have a situation where I have a file that contains text similar to: > > >> myValue1 = contents of value1 > >> myValue2 = contents of value2 but > >> with a new line here > >> myValue3 = contents of value3 > > >> My first approach was to open the file, use readlines to split the > >> lines on the "=" delimiter into a key/value pair (to be stored in a > >> dict). > > >> After processing a couple files I noticed its possible that a newline > >> can be present in the value as shown in myValue2. > > >> In this case its not an option to say remove the newlines if its a > >> "multi line" value as the value data needs to stay intact. > > >> I'm a bit confused as how to go about getting this to work. > > >> Any suggestions on an approach would be greatly appreciated! > > > I'm confused. You don't want the newline to be present, but you can't > > remove it because the data has to stay intact? If you don't want to > > change it, then what's the problem? > > > Mike > > It's obviously that simple line-by-line filtering won't handle multi-line > statements. > > You could solve that by saving the last item you added something to and, > if the line currently handles doesn't look like an assignment, append it > to this item. You might run into problems with such data: > > foo = modern maths > proved that 1 = 1 > bar = single > > If your dataset always has indendation on subsequent lines, you might use > this. Or if the key's name is always just one word. >
My take: all of the above, plus: Given that you want to extract stuff of the form <LHS> = <RHS> I'd suggest developing a fairly precise regular expression for LHS, maybe even for RHS, and trying this on as many of these files as you can. Why an RE for RHS? Consider: foo = somebody said "I think that REs = trouble maybe_better = pyparsing" :-) -- http://mail.python.org/mailman/listinfo/python-list