Scott David Daniels wrote: > Frank Niessink wrote: > >>- What is the easiest/most pythonic (preferably build-in) way of >>checking a unicode string for control characters and weeding those >>characters out? > > > drop_controls = [None] * 0x20 > for c in '\t\r\n': > drop_controls[c] = unichr(c) > ... > some_unicode_string = some_unicode_string.translate(drop_controls)
Hi Scott, Your code gave me a "TypeError: an integer is required". Anyway, it was sufficient to push me in the right direction. This is my version: UNICODE_CONTROL_CHARACTERS_TO_WEED = {} for ordinal in range(0x20): if chr(ordinal) not in '\t\r\n': UNICODE_CONTROL_CHARACTERS_TO_WEED[ordinal] = None Which let you do: >>> u'T\x04est\x09'.translate(UNICODE_CONTROL_CHARACTERS_TO_WEED) u'Test\t' Thanks, Frank -- http://mail.python.org/mailman/listinfo/python-list