[issue12291] file written using marshal in 3.2 can be read by 2.7, but not 3.2 or 3.3

Vinay Sajip Thu, 30 Jun 2011 12:14:10 -0700

Vinay Sajip <vinay_sa...@yahoo.co.uk> added the comment:

> Antoine Pitrou <pit...@free.fr> added the comment:
> 
> I  think the question is: will the slowdown apply to module import, or only 
> to  
>explicit uses of the marshal module? If the latter, then I think we can live  
>with it - we discourage using marshal as a general-purpose serialization 
>scheme  
>anyway.


Thanks for reviewing the patch.

As to your point  - agreed, and as the marshal code is completely broken now, 
something that works is an improvement, even if it's slower than optimal. If 
that turns out to be a problem in practice, it can be fixed.

> As for the patch:
> - why do the tests have to carry a large  chunk of binary data? if some data 
> is 
>needed, at least it would be nice if it  were kept small

Agreed, I just used the initial file where I had the problem - didn't know 
where 
the problem was, initially. Will look at reducing this.

> - not sure either why it needs zlib and base64; just use the  repr() of the 
>bytestring

The original data was 53K, so just the repr of the bytestring of the raw data 
is 
around 150K. If I zip the data, it's about 4K, but the repr of that data is 
around 12K. The base64/zlibbed version is 5K.

This will be less of an issue when the test data size is reduced.

> - "assertEqual = self.assertEqual" is more of a  nuisance than anything else; 
>you're making the code more complicated to read  just to save a few 
>keystrokes; 
>same for "assertIsInstance =  self.assertIsInstance"

I'm not sure how it's more complicated or harder to understand. I didn't do it 
just to save keystrokes when writing, it's also to avoid noise when reading. 
IMO 
this is a question of personal taste, or is this approach proscribed somewhere? 
I've certainly seen it in other Python code.

> - can you add a comment next to the fields you're  adding to the marshal C 
>struct?

Yes, will do.

> - if the "r_byte" macro isn't used anymore,  it should be removed, not 
>commented out

The commenting out is a temporary measure, the comment will be removed before 
commit.

> - in r_string(), what happens if  PyBytes_Check(data) is false? there should 
>probably be a TypeError of some  sort

There is a PyBytes_Check in marshal_load which raises a TypeError if it fails. 
The PyBytes_Check in r_string is only called in the path from marshal_load 
(because p->readable is only non-NULL in that path) - so for the failure to 
occur one would need an initial read to deliver bytes, and a subsequent read to 
deliver non-bytes. Still, I have no problem adding a TypeError raising in 
r_string if PyBytes_Check fails.

> - in r_string() again, if data is NULL after the read() call, the  exception 
>shouldn't be shadowed by an EOFError

Good point, I aim to fix this by changing the lower condition to

if (!PyErr_Occurred() && (read < n)) { ... }

which should be sufficient - do you agree?

> - why do you call read(1) at  the beginning? it seems to make the code more 
>complicated for no useful  purpose

I gave the reasoning in the comment just above the read(1). I agree it makes it 
a little more complicated, but not onerously so - and the approach fails fast 
as 
well as allowing operation with any stream, not just random-seekable ones.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue12291>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12291] file written using marshal in 3.2 can be read by 2.7, but not 3.2 or 3.3

Reply via email to