Bugs item #1724366, was opened at 2007-05-23 18:42
Message generated for change (Comment added) made by ronaldoussoren
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1724366&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Geoffrey Bache (gjb1002)
Assigned to: Jack Jansen (jackjansen)
Summary: cPickle module doesn't work with universal line endings

Initial Comment:
On UNIX, I cannot read pickle files created on Windows using the cPickle 
module, even if I open the file with universal line endings.

It works fine with the pickle module but is of course slower (and I have to 
read lots of them)

I attach a test case that pickles and unpickles an smptlib.SMTP object, 
converting the file to DOS format in between. There is nothing special about 
SMTP, you can use any object at all in a different module. 

On my system (RHEL4 with Python 2.4.3) I get the following output:

portmoller : pickletest.py cPickle
unix2dos: converting file dump to DOS format ...
Traceback (most recent call last):
  File "pickletest.py", line 14, in ?
    print load(readFile)
ImportError: No module named smtplib
portmoller : pickletest.py pickle
unix2dos: converting file dump to DOS format ...
<smtplib.SMTP instance at 0xb7ea350c>


----------------------------------------------------------------------

>Comment By: Ronald Oussoren (ronaldoussoren)
Date: 2007-07-12 18:24

Message:
Logged In: YES 
user_id=580910
Originator: NO

I can confirm that this is problem is present in python 2.5 (current svn)
running on osx 10.4.10. Given the code of cPickle it is rather amazing that
this script does work correctly on a linux system, as gagenellina noted
cPickle shortcuts reads from real file objects and completely ignores
universal newlines while doing so.

IMHO Fixing this requires replicating the universal newline code in
cPickle. 

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2007-05-29 07:14

Message:
Logged In: YES 
user_id=21627
Originator: NO

Jack, can you take a look? If not, please unassign.

----------------------------------------------------------------------

Comment By: Geoffrey Bache (gjb1002)
Date: 2007-05-25 19:24

Message:
Logged In: YES 
user_id=769182
Originator: YES

Yes, I'm sure Python is trying to import "smtplib\r".

For various reasons I need to use protocol 0: not least because I use the
pickle files as test data and it's much easier to administer a load of text
files than a load of binary files.

I will experiment with reading the files in binary mode on Monday and get
back to you. My current workaround is to do loads(file.read()) instead of
load(file) which I guess is a performance penalty. Any idea whether this is
likely to be slower than just using the pickle module? (I haven't tested
this)


----------------------------------------------------------------------

Comment By: Gabriel Genellina (gagenellina)
Date: 2007-05-25 12:29

Message:
Logged In: YES 
user_id=479790
Originator: NO

The culprit is cPickle.c; it takes certain shortcuts for read() and
readline() depending on which type of file you pass in.
For a true file object, it uses its own implementation for those two
methods, ignoring the file mode.

But it appears that there is NO WAY universal line endings could work if
the pickle contains any unicode object. The pickle format for Unicode
quotes any \n but *not* \r so the unpickler cannot determine, when it sees
a "\r", if it is a MAC end-of-line or an embedded "\r".
So, the only safe end-of-line character for a pickle using protocol 0 is
"\n", and that means that the file must be written in binary mode.
(This may also indicate that you cannot read unicode objects with embedded
\r in a MAC using protocol 0, but I don't have a MAC to test it).

So, until this is fixed (either the module or the documentation), one
should forget about universal line endings and write all pickle files as
binary. (This way ALL lines end in \n and it should work fine on all
platforms)


----------------------------------------------------------------------

Comment By: Gabriel Genellina (gagenellina)
Date: 2007-05-25 11:04

Message:
Logged In: YES 
user_id=479790
Originator: NO

I don't see any "Attach" button...
Just add these lines near the top of the test script:

original__import = __import__
def myimport(name, *args):
  print "import",repr(name)
  return original__import(name,*args)
  #end myimport
__builtins__.__import__ = myimport


----------------------------------------------------------------------

Comment By: Gabriel Genellina (gagenellina)
Date: 2007-05-25 11:00

Message:
Logged In: YES 
user_id=479790
Originator: NO

Please try again with this modified version. I think you will see that
Python is trying to import "smtplib\r"
On Windows, trying to read a pickle file with MAC line endings gives a
different error:
cPickle.UnpicklingError: pickle data was truncated

It seems that cPickle support for protocol 0 is broken. If you can, try to
use the higher, binary, protocols, they don't have this problem. Even if
you must use protocol 0, opening the file always in binary mode should not
have this problem.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1724366&group_id=5470
_______________________________________________
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to