> on a second read ... I see that you mean the case that should only > join consecutive lines with the same key
Yes...there are actually three cases that occur to me: 1) don't care about order, but want one row for each key (1st value) 2) do care about order, and don't want disjoint runs of duplicate keys to be smashed together 3) do care about order, and do want disjoint runs to be smashed together (presumably outputting in the key-order as they were encountered in the file...if not, you'd have to clarify) My original post addresses #1 and #2, but not #3. Some tweaks to my solution for #1 should address #3: results = {} order = [] for line in file('in.txt'): k,v = line.rstrip('\n').split('\t') if k not in results: order.append(k) results.setdefault(k, []).append(v) for k in order: print k, '|'.join(results[k]) #2 does have the advantage that it can process large (multi-gig) streams of data without bogging down as it behaves like the sed version, processing only a window at a time and retaining only data for consecutively matching lines. -tkc -- http://mail.python.org/mailman/listinfo/python-list