dasacc22 <dasacc22 <at> gmail.com> writes: > > U presume entirely to much. I have a preprocessor that normalizes > documents while performing other more complex operations. Theres > nothing buggy about what im doing
Are you sure? Your "solution" calculates (the number of leading whitespace characters) + (the number of TRAILING whitespace characters). Problem 1: including TRAILING whitespace. Example: "content" + 3 * " " + "\n" has 4 leading spaces according to your reckoning; should be 0. Fix: use lstrip() instead of strip() Problem 2: assuming all whitespace characters have *effective* width the same as " ". Examples: TAB has width 4 or 8 or whatever you want it to be. There are quite a number of whitespace characters, even when you stick to ASCII. When you look at Unicode, there are heaps more. Here's a list of BMP characters such that character.isspace() is True, showing the Unicode codepoint, the Python repr(), and the name of the character (other than for control characters): U+0009 u'\t' ? U+000A u'\n' ? U+000B u'\x0b' ? U+000C u'\x0c' ? U+000D u'\r' ? U+001C u'\x1c' ? U+001D u'\x1d' ? U+001E u'\x1e' ? U+001F u'\x1f' ? U+0020 u' ' SPACE U+0085 u'\x85' ? U+00A0 u'\xa0' NO-BREAK SPACE U+1680 u'\u1680' OGHAM SPACE MARK U+2000 u'\u2000' EN QUAD U+2001 u'\u2001' EM QUAD U+2002 u'\u2002' EN SPACE U+2003 u'\u2003' EM SPACE U+2004 u'\u2004' THREE-PER-EM SPACE U+2005 u'\u2005' FOUR-PER-EM SPACE U+2006 u'\u2006' SIX-PER-EM SPACE U+2007 u'\u2007' FIGURE SPACE U+2008 u'\u2008' PUNCTUATION SPACE U+2009 u'\u2009' THIN SPACE U+200A u'\u200a' HAIR SPACE U+200B u'\u200b' ZERO WIDTH SPACE U+2028 u'\u2028' LINE SEPARATOR U+2029 u'\u2029' PARAGRAPH SEPARATOR U+202F u'\u202f' NARROW NO-BREAK SPACE U+205F u'\u205f' MEDIUM MATHEMATICAL SPACE U+3000 u'\u3000' IDEOGRAPHIC SPACE Hmmm, looks like all kinds of widths, from zero upwards. -- http://mail.python.org/mailman/listinfo/python-list