Christian Tanzer added the comment:

IMNSHO, the problem lies in the Python 3 pickle.py and it is **not** restricted 
to datetime instances 
(for a detailed rambling see 
http://c-tanzer.at/en/breaking_py2_pickles_in_py3.html) .

In Python 2, 8-bit strings are used for text and for binary data. Well designed 
applications will use unicode for all text, but Python 2 itself forces some 
text to be 8-bit string, e.g., names of attributes, classes, and functions. In 
other words, **any 8-bit strings explicitly created by such an application will 
contain binary data.**

In Python 2, pickle.dump uses BINSTRING (and SHORT_BINSTRING) for 8-bit 
strings; Python 3 uses BINBYTES (and SHORT_BINBYTES) instead.

In Python 3, pickle.load should handle BINSTRING (and SHORT_BINSTRING) like 
this:

* convert ASCII values to `str`

* convert non-ASCII values to `bytes`

`bytes` is Python 3's equivalent to Python 2's 8-bit string! 

It is only because of the use of 8-bit strings for Python 2 names that the 
mapping to `str` is necessary but all such names are guaranteed to be ASCII!

I would propose to change `load_binstring` and `load_short_binstring` to call a 
function like::

    def _decode_binstring(self, value):
        # Used to allow strings from Python 2 to be decoded either as
        # bytes or Unicode strings.  This should be used only with the
        # BINSTRING and SHORT_BINSTRING opcodes.
        if self.encoding != "bytes":
            try :
                return value.decode("ASCII")
            except UnicodeDecodeError:
                pass
        return value

instead of the currently called `_decode_string`.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue22005>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to