Fillmore wrote: > On 03/11/2016 07:13 AM, Wolfgang Maier wrote: >> One lesson for Perl regex users is that in Python many things can be >> solved without regexes. How about defining: >> >> printable = {chr(n) for n in range(32, 127)} >> >> then using: >> >> if (set(my_string) - set(printable)): >> break > > seems computationally heavy. I have a file with about 70k lines, of which > only 20 contain "funny" chars. > > ANy idea on how I can create a script that compares Perl speed vs. Python > speed in performing the cleaning operation?
Try for line in ...: if has_nonprint(line): continue ... with the has_nonprint() function as defined below: $ cat isprint.py import sys import unicodedata class Lookup(dict): def __missing__(self, n): c = chr(n) cat = unicodedata.category(c) if cat in {'Cs', 'Cn', 'Zl', 'Cc', 'Zp'}: self[n] = c return c else: self[n] = None return None lookup = Lookup() lookup[10] = None # allow newline def has_nonprint(s): return bool(s.translate(lookup)) $ python3 -i isprint.py >>> has_nonprint("foo") False >>> has_nonprint("foo\n") False >>> has_nonprint("foo\t") True >>> has_nonprint("\0foo") True -- https://mail.python.org/mailman/listinfo/python-list