[issue6784] byte/unicode pickle incompatibilities between python2 and and python3

2009-08-27 Thread Martin v . Löwis

Martin v. Löwis  added the comment:

> how about a way to declare one wants exact 1:1 mapping between py2<>py3,
> so str<>bytes and unicode<>str will work for sure

In a sense, that's already possible. Inherit from _Pickler/_Unpickler,
and replace the dispatch dict with a different mapping.

I wouldn't object to supporting this with an option, though, assuming it
was properly documented and implemented for both pickle and _pickle
(probably along with pickletools).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6784] byte/unicode pickle incompatibilities between python2 and and python3

2009-08-27 Thread RonnyPfannschmidt

RonnyPfannschmidt  added the comment:

in case the actual behavior is not supposed to change

how about a way to declare one wants exact 1:1 mapping between py2<>py3,
so str<>bytes and unicode<>str will work for sure

something like load/dump(..., encoding=bytes) just crossed my mind

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6784] byte/unicode pickle incompatibilities between python2 and and python3

2009-08-27 Thread RonnyPfannschmidt

RonnyPfannschmidt  added the comment:

unpickle of any non-ascii string from python2 will break
the only way out would be to ensure text strings and a single defined
encoding (at that point storing unicode strings in any case seems more
practical)

also byte-strings stored as python2 str would break

and since i pass around binary strings as parts of objects, its just
completely broken for me

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6784] byte/unicode pickle incompatibilities between python2 and and python3

2009-08-27 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

The problem with trying to solve the following issue:
   "a bytes instance from python3 is pickled as custom class in
protocols <3"
is that if we pickle bytes from Python 3 as a 2.x str in protocol <= 2,
unpickling it using Python 3 will yield a str (unicode), not a bytes
object. Therefore the whole chain (pickling then unpickling) will not be
idempotent.

--
components: +Library (Lib) -None
nosy: +alexandre.vassalotti, gvanrossum, pitrou
versions: +Python 2.7, Python 3.2

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6784] byte/unicode pickle incompatibilities between python2 and and python3

2009-08-27 Thread RonnyPfannschmidt

RonnyPfannschmidt  added the comment:

its even worse

python3:
>>> import pickle
>>> pickle.dumps(b'', protocol=2)
b'\x80\x02c__builtin__\nbytes\nq\x00]q\x01\x85q\x02Rq\x03.'

python2.6:
>>> import pickle
>>> pickle.loads('\x80\x02c__builtin__\nbytes\nq\x00]q\x01\x85q\x02Rq\x03.')
'[]'

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6784] byte/unicode pickle incompatibilities between python2 and and python3

2009-08-26 Thread RonnyPfannschmidt

RonnyPfannschmidt  added the comment:

Since it breaks for anything non-ascii, its not that helpfull after all
and since python2 strings are encoding-unaware there is no way to fix
it.

It might be preferable to supply unpicklers that are cappable of
coercing if the user really wants wants coercing.

yup
> 
> > 3. python 3 string map to python 2 unicode 
> 
> That's also the case, AFAICT.
yup
> 
> > 4. python 3 bytestring maps to python 2 string
> 
> Hmm. This may be indeed a mistake. Until r61467, bytes were saved
> with the (BIN)STRING code; not sure why this was changed.
Python 3 is indeed evil there.

b'\x80\x02c__builtin__\nbytes\nq\x00]q\x01\x85q\x02Rq\x03.'

I'm convinced that a 1:1 mapping of python2 string from/to python3
bytestrings is the least surprising behaviour and will keep surprising
errors away when needing to communicate between different python
versions.

It just has bitten me, and i suspect will will get others, too.
Unpickle that completely fails in the face of encodings is not desirable
at all.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6784] byte/unicode pickle incompatibilities between python2 and and python3

2009-08-26 Thread Martin v . Löwis

Martin v. Löwis  added the comment:

> the basic behavior i want to see for all protocols <= 2
> 
> 1. python 2 string maps to python3 byte-string

That would not be good. Many people create pickles in 2.x where the
string type really represents characters, more often so than they want
it to represent bytes. Giving them bytes on unpickling will likely
cause more problems than the current approach.

> 2. python 2 unicode maps to python3 string

That's the case, right?

> 3. python 3 string map to python 2 unicode 

That's also the case, AFAICT.

> 4. python 3 bytestring maps to python 2 string

Hmm. This may be indeed a mistake. Until r61467, bytes were saved
with the (BIN)STRING code; not sure why this was changed.

--
title: byte/unicode pickle incompatibilities between python2 and and python3 -> 
byte/unicode pickle incompatibilities between python2 and   and python3

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6784] byte/unicode pickle incompatibilities between python2 and and python3

2009-08-26 Thread RonnyPfannschmidt

RonnyPfannschmidt  added the comment:

the basic behavior i want to see for all protocols <= 2

1. python 2 string maps to python3 byte-string
2. python 2 unicode maps to python3 string
3. python 3 string map to python 2 unicode 
4. python 3 bytestring maps to python 2 string

anything else is is confusing and may break
for example one can't unpickle '\xFF' in python3 if it was pickled in
python2

note that these changes seem irrelevant for protocol 3 as python2.x
doesn't support it

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6784] byte/unicode pickle incompatibilities between python2 and and python3

2009-08-26 Thread Martin v . Löwis

Martin v. Löwis  added the comment:

Why are you reporting this here? If you think there is a bug, can you
propose an alternative behavior that you would consider correct?

The changes you mentioned are all deliberate.

--
nosy: +loewis

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6784] byte/unicode pickle incompatibilities between python2 and and python3

2009-08-26 Thread RonnyPfannschmidt

Changes by RonnyPfannschmidt :


--
title: bytw/unicode string incompatibilities between python2 and and python3 -> 
byte/unicode pickle incompatibilities between python2 and and python3

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com