New submission from Nick Coghlan:
Prompted by issue 18713 and
http://lucumr.pocoo.org/2013/7/2/the-updated-guide-to-unicode/, here are some
possible utilities we could add to the codecs module to help deal with/debug
issues related to surrogate escaped strings:
def has_escaped_bytes(s):
"""Returns true if string contains surrogate escaped bytes"""
...
def replace_escaped_bytes(s):
"""Replaces each surrogate escaped byte with a valid code point"""
...
def decode_escaped_bytes(s, nominal_encoding, actual_encoding):
"""Reinterprets incorrectly decoded text using a new encoding"""
return s.encode(nominal_encoding,
'surrogateescape').decode(actual_encoding)
----------
messages: 195937
nosy: ncoghlan
priority: normal
severity: normal
stage: needs patch
status: open
title: Add tools for "cleaning" surrogate escaped strings
type: enhancement
versions: Python 3.4
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue18814>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com