trailingPattern = '(\S*)\ +?\n'
    line = re.sub(trailingPattern, '\\1\n', line)

What happens with this?

     trailingPattern = '\s+$'
     line = re.sub(trailingPattern, '', line)

I'm guessing that $ terminates \s+'s greediness without snarfing the underlying \n. Then I'm guessing that the lack of a \1 replacer will help the sub work faster with less internal string shuffling.

line = line.rstrip()?

is probably faster still, but there might be a technical reason to avoid it.

But these uncertainties are why I write unit tests, including tests for the edge cases. (What if it's a \r\n? What if the \n is missing? etc.) That way I don't need to memorize re's exact behavior, and if I find a reason to swap in a .rstrip(), I can pass all the tests and make sure the substitution works the same.

--
  Phlip
  http://c2.com/cgi/wiki?ZeekLand
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to