On Jul 9, 7:38 am, Beliavsky <[EMAIL PROTECTED]> wrote: > How can I replace multiple consecutive spaces in a file with a single > character (usually a space, but maybe a comma if converting to a CSV > file)? Ideally, the Python program would not compress consecutive > spaces inside single or double quotes. An inelegant method is to > repeatedly replace two consecutive spaces with one.
One can try mx.TextTools. E.g., from mx.TextTools import * import re string_inside_quotes=re.compile(r'(?P<quote>["\']).*?(?<!\\)(? P=quote)', re.MULTILINE) def advance_position(text, position, len_text, sre): mobj = sre.match(text[position:]) if mobj: incr = len(mobj.group(0)) else: incr = 0 return position + incr table = ('try_again', ('quoted_string', CallArg, (advance_position, string_inside_quotes), +1, 'try_again'), ('nonspace', AllNotIn, ' ', +1, 'try_again'), ('space', AllIn, ' ', +1, 'try_again'), (None, EOF, Here, +1, MatchOk), (None, Fail, Here),) for target_string in ( " Try using mx.TextTools 'for parsing strings'", "'It might be' just what you needed", 'I find "it worthwhile"', ): print "BEFORE:%s" % target_string _, taglist, _ = tag(target_string, table) if taglist: tokens = [] for t in taglist: tagobj, left_index, right_index = t[0:3] if tagobj == 'space': tokens.append(' ') else: tokens.append(target_string[left_index:right_index]) print "AFTER:%s" % ''.join(tokens) else: print "Something went horribly wrong" -- Hope this helps, Steven -- http://mail.python.org/mailman/listinfo/python-list