[issue2389] Array pickling exposes internal memory representation of elements

2009-07-15 Thread Alexandre Vassalotti

Alexandre Vassalotti alexan...@peadrop.com added the comment:

Committed fix for 3.x in r74013 and for 2.x in r74014.

--
resolution:  - fixed
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2389
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2389] Array pickling exposes internal memory representation of elements

2009-07-06 Thread Alexandre Vassalotti

Alexandre Vassalotti alexan...@peadrop.com added the comment:

I know believe that arrays should be pickled as a list of values on
Python 2.x. Doing otherwise makes it impossible to unpickle arrays
coming from Python 2.x using Python 3.x, since pickle on Python 3
decodes all the strings from 2.x to Unicode.

However, we still can use the compact memory representation on Python 3.x.

So, I propose that we change the array module on Python 2.x to emit a
list instead of memory string and implement the portable array pickling
mechanism only on Python 3.x.

--
versions: +Python 3.2 -Python 3.1
Added file: http://bugs.python.org/file14461/portable_array_pickling-2.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2389
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2389] Array pickling exposes internal memory representation of elements

2009-07-05 Thread Alexandre Vassalotti

Alexandre Vassalotti alexan...@peadrop.com added the comment:

I would like to commit my patch later this week. So if you see any issue
with the patch, please speak up.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2389
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2389] Array pickling exposes internal memory representation of elements

2009-07-05 Thread Alexandre Vassalotti

Changes by Alexandre Vassalotti alexan...@peadrop.com:


Removed file: http://bugs.python.org/file11096/fix_array_pickling.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2389
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2389] Array pickling exposes internal memory representation of elements

2009-06-26 Thread Alexandre Vassalotti

Alexandre Vassalotti alexan...@peadrop.com added the comment:

Here's a patch that implements the solution I described in msg85298.

Please give it a good review:
http://codereview.appspot.com/87072

--
Added file: http://bugs.python.org/file14369/portable_array_pickling.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2389
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2389] Array pickling exposes internal memory representation of elements

2009-04-03 Thread Alexandre Vassalotti

Alexandre Vassalotti alexan...@peadrop.com added the comment:

Ah, I just remembered the smart way I had devised some time ago to
handle this issue without changing the constructor of array.array. The
trick would be to add a __reduce__ method to array.array. This method
would return a special constructor function, the binary data of the
array and a string representing the format of the array.  Upon
unpickling, the special constructor function would be called with the
binary data and its format and then it would recreate the array.

Now, the only thing I am not sure about is whether this would work well
with subclasses of array.array. I guess we make __reduce__ also return
the instance's type which could be used by special constructor to
recreate the instance from the proper subclass.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2389
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2389] Array pickling exposes internal memory representation of elements

2009-04-03 Thread Collin Winter

Changes by Collin Winter coll...@gmail.com:


--
nosy: +collinwinter

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2389
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2389] Array pickling exposes internal memory representation of elements

2008-08-14 Thread Alexandre Vassalotti

Alexandre Vassalotti [EMAIL PROTECTED] added the comment:

I'm all in for a standardized representation of array's pickles (with
width and endianness preserved). However to happen, we will either need
to change array's constructor to support at least the byte-order
specification (like struct) or add built-in support for array in the
pickle module (which could be done without modifying the pickle protocol).

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2389
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2389] Array pickling exposes internal memory representation of elements

2008-08-14 Thread Martin v. Löwis

Martin v. Löwis [EMAIL PROTECTED] added the comment:

I think changing the array constructor is fairly easy: just pick a set
of codes that are defined to be platform-neutral (i.e. for each size two
codes, one for each endianness). For example, the control characters
(\0..\x1F) could be used in the following way:

char, signed-byte, unsigned byte: c, b, B
(big/little)
sint16: 1,2 uint16: 3,4
sint32: 5,6 uint32: 7,8
sint64: 9,10 uint64: 11,12
float:  13,14 double: 15,16
UCS-2:  17,18 UCS-4: 19,20

In above scheme, even codes are little-endian, odd codes are big endian.
Converting the codes to native codes could be table-driven.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2389
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2389] Array pickling exposes internal memory representation of elements

2008-08-12 Thread Hrvoje Nikšić

Hrvoje Nikšić [EMAIL PROTECTED] added the comment:

Unfortunately dumping the internal representation of non-long arrays
won't work, for several reasons.  First, it breaks when porting pickles
between platforms of different endianness such as Intel and SPARC. 
Then, it ignores the considerable work put into correctly pickling
floats, including the support for IEEE 754 special values.  Finally, it
will break when unpickling Unicode character arrays pickled on different
Python versions -- wchar_t is 2 bytes wide on Windows, 4 bytes on Unix.

I believe pickling arrays to compact strings is the right approach on
the grounds of efficiency and I wouldn't change it.  We must only be
careful to pickle to a string with a portable representation of values.
 The straightforward way to do this is to pick a standard size for
types (much like the struct module does) and endianness and use it in
the pickled array.  Ints are simple, and the code for handling floats is
already there, for example _PyFloat_Pack8 used by cPickle.

Pickling arrays as lists is probably a decent workaround for the pending
release because it's backward and forward compatible (old pickles will
work as well as before and new pickles will be correctly read by old
Python versions), but for the next release I would prefer to handle this
the right way.  If there is agreement on this, I can start work on a
patch in the following weeks.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2389
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2389] Array pickling exposes internal memory representation of elements

2008-08-12 Thread Martin v. Löwis

Martin v. Löwis [EMAIL PROTECTED] added the comment:

I like to challenge the view what correct behavior is here. If I
pickle an array of 32-bit integer values on one system, and unpickle it
as an array of 64-bit integer values on a different system, is that
correct, or incorrect?

IMO, correct behavior would preserve the width as much as possible. For
integers, this should be straight-forward, as it should be for floats
and doubles (failing to unpickle them if the target system doesn't
support a certain format). For Unicode, I think the array module should
grow platform-independent width, for both 2-byte and 4-byte Unicode.

When pickling, the pickle should always use network byte order;
alternatively, the pickle should contain a byte order marker (presence
of which could also be used as an indication that the new array pickle
format is used). IOW, i would indicate little-endian four byte
integers, and so on.

--
nosy: +loewis

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2389
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2389] Array pickling exposes internal memory representation of elements

2008-08-12 Thread Hrvoje Nikšić

Hrvoje Nikšić [EMAIL PROTECTED] added the comment:

I think preserving integer width is a good idea because it saves us from
having to throw overflow errors when unpickling to machines with
different width of C types.  The cost is that pickling/unpickling the
array might change the array's typecode, which can be a problem for C
code that processes the array's buffer and expects the C type to remain
invariant.

Instead of sticking to network byte order, I propose to include byte
order information in the pickle (for example as '' or '' like struct
does), so that pickling/unpickling between the same-endianness
architectures doesn't have to convert at all.  Floats are always pickled
as IEEE754, but the same optimization (not having to convert anything)
would apply when unpickling a float array on an IEEE754 architecture.

Preserving widths and including endianness information would allow
pickling to be as fast as it is now (with the exception of unicode chars
and floats on non-IEEE754 platforms).  It would also allow unpickling to
be as fast between architecture with equal endianness, and correct
between others.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2389
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2389] Array pickling exposes internal memory representation of elements

2008-08-12 Thread Guido van Rossum

Guido van Rossum [EMAIL PROTECTED] added the comment:

 Instead of sticking to network byte order, I propose to include byte
 order information in the pickle (for example as '' or '' like struct
 does), so that pickling/unpickling between the same-endianness
 architectures doesn't have to convert at all.  Floats are always pickled
 as IEEE754, but the same optimization (not having to convert anything)
 would apply when unpickling a float array on an IEEE754 architecture.

 Preserving widths and including endianness information would allow
 pickling to be as fast as it is now (with the exception of unicode chars
 and floats on non-IEEE754 platforms).  It would also allow unpickling to
 be as fast between architecture with equal endianness, and correct
 between others.

This sounds like the best approach yet -- it can be made backwards
compatible (so 2.6 can read 2.5 pickles at least on the same platform)
and can be just as fast when unpickling on the same platform, and only
slightly slower on a different platform.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2389
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2389] Array pickling exposes internal memory representation of elements

2008-08-11 Thread Guido van Rossum

Guido van Rossum [EMAIL PROTECTED] added the comment:

Wouldn't that be lots and lots slower?  I believe speed is one of the
reasons why the binary representation is currently dumped.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2389
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2389] Array pickling exposes internal memory representation of elements

2008-08-11 Thread Alexandre Vassalotti

Alexandre Vassalotti [EMAIL PROTECTED] added the comment:

The slowdown depends of the array type. The patch makes array unpickling
a few orders of magnitude slower (i.e. between 4 and 15 times slower
depending of the array type). In general, pickling is about as fast as
with the binary representation (or faster!). 

Although since most 64-bit compilers uses the LP64 model, I think we
could make a compromise and only pickle as a list arrays of long
integers. This would fix the problem without any visible speed penalties.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2389
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2389] Array pickling exposes internal memory representation of elements

2008-08-10 Thread Alexandre Vassalotti

Alexandre Vassalotti [EMAIL PROTECTED] added the comment:

I don't see why this cannot be fixed easily. All we need to do is fix
the __reduce__ method of array objects to emit a list--i.e. with
array.tolist()--instead of a memory string. Since the reduce protocol is
just a fancy way to store the constructor arguments, this won't break
unpickling of array objects pickled by previous Python versions.

And here is a patch against the trunk.

--
keywords: +patch
nosy: +alexandre.vassalotti
Added file: http://bugs.python.org/file11096/fix_array_pickling.patch

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2389
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2389] Array pickling exposes internal memory representation of elements

2008-08-06 Thread Hrvoje Nikšić

Hrvoje Nikšić [EMAIL PROTECTED] added the comment:

I guess it went unnoticed due to prevalence of little-endian 32-bit
machines.  With 64-bit architectures becoming more and more popular,
this might become a bigger issue.

Raymond, why do you think fixing this bug would complicate porting to
2.6/3.0?

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2389
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2389] Array pickling exposes internal memory representation of elements

2008-08-05 Thread Guido van Rossum

Guido van Rossum [EMAIL PROTECTED] added the comment:

Agreed, this has been broken for a long time, and few people have
noticed or complained.  Let's wait.

--
assignee: gvanrossum - 
versions: +Python 2.7, Python 3.1 -Python 2.6, Python 3.0

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2389
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2389] Array pickling exposes internal memory representation of elements

2008-07-31 Thread Raymond Hettinger

Raymond Hettinger [EMAIL PROTECTED] added the comment:

At this point, I think it better to wait until Py2.7/3.1.  Changing it 
now would just complicate efforts to port from 2.5 to 2.6 to 3.0.  

Guido, do you agree?

--
assignee:  - gvanrossum
nosy: +rhettinger

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2389
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2389] Array pickling exposes internal memory representation of elements

2008-07-30 Thread Benjamin Peterson

Benjamin Peterson [EMAIL PROTECTED] added the comment:

Ping.

--
nosy: +benjamin.peterson

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2389
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2389] Array pickling exposes internal memory representation of elements

2008-04-25 Thread Jesús Cea Avión

Changes by Jesús Cea Avión [EMAIL PROTECTED]:


--
nosy: +jcea

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2389
__
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2389] Array pickling exposes internal memory representation of elements

2008-04-14 Thread Guido van Rossum

Guido van Rossum [EMAIL PROTECTED] added the comment:

This looks indeed wrong. Unfortunately it also looks hard to fix in a
way that won't break unpickling arrays pickled by a previous Python
version.  We won't be able to fix this in 2.5 (it'll be a new feature)
but we should try to fix this in 2.6 and 3.0.

--
nosy: +gvanrossum
priority:  - critical
versions: +Python 3.0 -Python 2.5

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2389
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2389] Array pickling exposes internal memory representation of elements

2008-03-21 Thread Hrvoje Nikšić

Hrvoje Nikšić [EMAIL PROTECTED] added the comment:

Here is an example that directly demonstrates the bug.  Pickling on x86_64:

Python 2.5.1 (r251:54863, Mar 21 2008, 13:06:31) 
[GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2
Type help, copyright, credits or license for more information.
 import array, cPickle as pickle
 pickle.dumps(array.array('l', [1, 2, 3]))
carray\narray\np1\n(S'l'\nS'\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x02\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x03\\x00\\x00\\x00\\x00\\x00\\x00\\x00'\ntRp2\n.

Unpickling on ia32:
Python 2.5.1 (r251:54863, Oct  5 2007, 13:36:32) 
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type help, copyright, credits or license for more information.
 import cPickle as pickle

pickle.loads(carray\narray\np1\n(S'l'\nS'\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x02\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x03\\x00\\x00\\x00\\x00\\x00\\x00\\x00'\ntRp2\n.)
array('l', [1, 0, 2, 0, 3, 0])

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2389
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2389] Array pickling exposes internal memory representation of elements

2008-03-18 Thread Hrvoje Nikšić

New submission from Hrvoje Nikšić [EMAIL PROTECTED]:

It would seem that pickling arrays directly exposes the underlying
machine words, making the pickle non-portable to platforms with
different layout of array elements.  The guts of array.__reduce__ look
like this:

if (array-ob_size  0) {
result = Py_BuildValue(O(cs#)O, 
array-ob_type, 
array-ob_descr-typecode,
array-ob_item,
array-ob_size * array-ob_descr-itemsize,
dict);
}

The byte string that is pickled is directly created from the array's
contents.  Unpickling calls array_new which in turn calls
array_fromstring, which ends up memcpying the string data to the new array.

As far as I can tell, array pickles created on one platform cannot be
unpickled on a platform with different endianness (in case of integer
arrays), wchar_t size (in case of unicode arrays) or floating-point
representation (rare in practice, but possible).  If pickles are
supposed to be platform-independent, this should be fixed.

Maybe the typecode field when used with the constructor could be
augmented to include information about the elements, such as endianness
and floating-point format.  Or we should simply punt and pickle the
array as a list of Python objects that comprise it...?

--
components: Extension Modules
messages: 63915
nosy: hniksic
severity: normal
status: open
title: Array pickling exposes internal memory representation of elements
type: behavior
versions: Python 2.5, Python 2.6

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2389
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com