Ok, how about merging the two sub-threads :-)

On Mon, 18 Nov 2013 16:44:59 -0600
Tim Peters <tim.pet...@gmail.com> wrote:
> [Antoine]
> > You can't know how much space the pickle will take until the pickling
> > ends, though, which makes it difficult to decide whether you want to
> > emit a PREFETCH opcode or not.
> 
> Ah, of course.  Presumably the outgoing pickle stream is first stored
> in some memory buffer, right?  If pickling completes before the buffer
> is first flushed, then you know exactly how large the entire pickle
> is.  If "it's small" (say, < 100 bytes), don't write out the PREFETCH
> part.  Else do.

That's true. We could also have a SMALLPREFETCH opcode with a one-byte
length to still get the benefits of prefetching.

> > Well, yes: much better memory usage for large pickles.
> > Some people use pickles to store huge data, which was the motivation to
> > add the 8-byte-size opcodes after all.
> 
> We'd have the same advantage _if_ it were feasible to know the entire
> size up front.  I understand now that it's not feasible.

AFAICT, it would only be possible by doing two-pass pickling, which
would also slow it down massively.

> A long-running process can legitimately put billions of items on work
> queues, far more than could ever fit in RAM simultaneously.  Comparing
> this to PyObject overhead makes no sense to me.  Neither does the line
> of argument "there are several kinds of overheads, so making this
> overhead worse too doesn't matter".

Well, it's a question of cost / benefit: does it make sense to optimize
something that will be dwarfed by other factors in real world
situations?

> When possible, we should strive not to add overheads that don't repay
> their costs.  For small pickles, an 8-byte size field doesn't appear
> to buy anything.  But I appreciate that it costs implementation effort
> to avoid producing it in these cases.

I share the concern, although I still don't think the "ocean of tiny
pickles" is a reasonable use case :-)

That said, assuming you think this is important (do you?), we're left
with the following constraints:
- it would be nice to have this PEP in 3.4
- 3.4 beta1 and feature freeze is in approximately one week
- switching to the PREFETCH scheme requires some non-trivial work on the
  current patch, work done by either Alexandre or me (but I already
  have pathlib (PEP 428) on my plate, so it'll have to be Alexandre) -
  unless you want to do it, of course?

What do you think?

Regards

Antoine.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to