New submission from David Wilson:

This is a followup to the thread at 
https://mail.python.org/pipermail/python-dev/2014-July/135543.html , discussing 
the existing behaviour of BytesIO copying its source object, and how this 
regresses compared to cStringIO.StringI.

The goal of posting the patch on list was to try and stimulate discussion 
around the approach. The patch itself obviously isn't ready for review, and I'm 
not in a position to dedicate time to it just now (although in a few weeks I'd 
love to give it full attention!).

Ignoring this quick implementation, are there any general comments around the 
approach?

My only concern is that it might keep large objects alive in a non-intuitive 
way in certain circumstances, though I can't think of any obvious ones 
immediately.

Also interested in comments on the second half of that thread: "a natural 
extension of this is to do something very similar on the write side: instead of 
generating a temporary private heap allocation, generate (and freely resize) a 
private PyBytes object until it is exposed to the user, at which point, 
_getvalue() returns it, and converts its into an IO_SHARED buffer."

There are quite a few interactions with making that work correctly, in 
particular:

* How BytesIO would implement the buffers interface without causing the 
under-construction Bytes to become readonly

* Avoiding redundant copies and resizes -- we can't simply tack 25% slack on 
the end of the Bytes and then truncate it during getvalue() without likely 
triggering a copy and move, however with careful measurement of allocator 
behavior there are various tradeoffs that could be made - e.g. obmalloc won't 
move a <500 byte allocation if it shrinks by <25%. glibc malloc's rules are a 
bit more complex though.

Could also add a private _PyBytes_SetSize() API to allow truncation to the 
final size during getvalue() without informing the allocator. Then we'd simply 
overallocate by up to 10% or 1-2kb, and write off the loss of the slack space.

Notably, this approach completely differs from the one documented in 
http://bugs.python.org/issue15381 .. it's not clear to me which is better.

----------
components: Library (Lib)
files: cow.patch
keywords: patch
messages: 223383
nosy: dw
priority: normal
severity: normal
status: open
title: BytesIO copy-on-write
type: performance
versions: Python 3.5
Added file: http://bugs.python.org/file35988/cow.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue22003>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to