> > I'm in a process of rewriting a bash/awk/sed script -- that grew to > > big -- in python. I can rewrite it in a simple line-by-line way but > > that results in ugly python code and I'm sure there is a simple > > pythonic way. > > > > The bash script processed text files of the form: > > > > ############################### > > key1 value1 > > key2 value2 > > key3 value3 > > > > key4 value4 > > spec11 spec12 spec13 spec14 > > spec21 spec22 spec23 spec24 > > spec31 spec32 spec33 spec34 > > > > key5 value5 > > key6 value6 > > > > key7 value7 > > more11 more12 more13 > > more21 more22 more23 > > > > key8 value8 > > ################################### > > > > I guess you get the point. If a line has two entries it is a key/value > > pair which should end up in a dictionary. If a key/value pair is > > followed by consequtive lines with more then two entries, it is a > > matrix that should end up in a list of lists (matrix) that can be > > identified by the key preceeding it. The empty line after the last > > line of a matrix signifies that the matrix is finished and we are back > > to a key/value situation. Note that a matrix is always preceeded by a > > key/value pair so that it can really be identified by the key. > > > > Any elegant solution for this? > > > My solution expects correctly formatted input and parses it into > separate key/value and matrix holding dicts: > > > from StringIO import StringIO > > fileText = '''\ > key1 value1 > key2 value2 > key3 value3 > > key4 value4 > spec11 spec12 spec13 spec14 > spec21 spec22 spec23 spec24 > spec31 spec32 spec33 spec34 > > key5 value5 > key6 value6 > > key7 value7 > more11 more12 more13 > more21 more22 more23 > > key8 value8 > ''' > infile = StringIO(fileText) > > keyvalues = {} > matrices = {} > for line in infile: > fields = line.strip().split() > if len(fields) == 2: > keyvalues[fields[0]] = fields[1] > lastkey = fields[0] > elif fields: > matrices.setdefault(lastkey, []).append(fields) > > ============== > Here is the sample output: > > >>> from pprint import pprint as pp > >>> pp(keyvalues) > {'key1': 'value1', > 'key2': 'value2', > 'key3': 'value3', > 'key4': 'value4', > 'key5': 'value5', > 'key6': 'value6', > 'key7': 'value7', > 'key8': 'value8'} > >>> pp(matrices) > {'key4': [['spec11', 'spec12', 'spec13', 'spec14'], > ['spec21', 'spec22', 'spec23', 'spec24'], > ['spec31', 'spec32', 'spec33', 'spec34']], > 'key7': [['more11', 'more12', 'more13'], ['more21', 'more22', > 'more23']]} > >>>
Paddy, thanks, this looks even better. Paul, pyparsing looks like an overkill, even the config parser module is something that is too complex for me for such a simple task. The text files are actually input files to a program and will never be longer than 20-30 lines so Paddy's solution is perfectly fine. In any case it's good to know that there exists a module called pyparsing :) -- http://mail.python.org/mailman/listinfo/python-list