On Mon, Oct 17, 2016 at 2:20 PM, Adam Funk <a24...@ducksburg.com> wrote: > I'm using IDLE 3 (with python 3.5.2) to work interactively with > Twitter data, which of course contains emojis. Whenever the running > program tries to print the text of a tweet with an emoji, it barfs > this & stops running: > > UnicodeEncodeError: 'UCS-2' codec can't encode characters in > position 102-102: Non-BMP character not supported in Tk > > Is there any way to set IDLE to ignore these characters (either drop > them or replace them with something else) instead of throwing the > exception? > > If not, what's the best way to strip them out of the string before > printing?
You can patch print() to transcode non-BMP characters as surrogate pairs. For example: import builtins def print_ucs2(*args, print=builtins.print, **kwds): args2 = [] for a in args: a = str(a) if max(a) > '\uffff': b = a.encode('utf-16le', 'surrogatepass') chars = [b[i:i+2].decode('utf-16le', 'surrogatepass') for i in range(0, len(b), 2)] a = ''.join(chars) args2.append(a) print(*args2, **kwds) builtins._print = builtins.print builtins.print = print_ucs2 On Windows this should allow printing non-BMP characters such as emojis (e.g. U+0001F44C). On Linux it prints a non-BMP character as a pair of empty boxes. If you're not using Windows you can modify this to print something else for non-BMP characters, such as a replacement character or \U literals. -- https://mail.python.org/mailman/listinfo/python-list