I don't think it makes sense to add any more ideas to PEP 467. That needed to be a PEP because it proposed breaking backwards compatibility in a couple of areas, and because of the complex history of Python 3's "bytes-as-tuple-of-ints" and Python 2's "bytes-as-str" semantics.
Other enhancements to the binary data handling APIs in Python 3 can be considered on their own merits. On 12 October 2016 at 14:08, INADA Naoki <songofaca...@gmail.com> wrote: > Memoryview problem > ================= > > To avoid redundant copy of `line = bytes(buf)[:n]`, current solution > is using memoryview. > > First code I wrote is: `line = bytes(memoryview(buf)[:n])`. > > On CPython, it works fine. But `del buff[:n+2]` in next line may fail > on other Python > implementations. Changing bytearray size is inhibited while > memoryview is alive. > > So right code is: > > with memoryview(buf) as m: > line = bytes(m[:n]) > > The problem of memoryview approach is: > > * Overhead: creating temporary memoryview, __enter__, and __exit__. (see > below) > > * It isn't "one obvious way": Developers including me may forget to > use context manager. > And since it works on CPython, it's hard to point it out. To add to the confusion, there's also https://docs.python.org/3/library/stdtypes.html#memoryview.tobytes giving: line = memoryview(buf)[:n].tobytes() However, folks *do* need to learn that many mutable data types will lock themselves against modification while you have a live memory view on them, so it's important to release views promptly and reliably when we don't need them any more. > Quick benchmark: > > (temporary bytes) > $ python3 -m perf timeit -s 'buf = > bytearray(b"foo\r\nbar\r\nbaz\r\n")' -- 'bytes(buf)[:3]' > .................... > Median +- std dev: 652 ns +- 19 ns > > (temporary memoryview without "with" > $ python3 -m perf timeit -s 'buf = > bytearray(b"foo\r\nbar\r\nbaz\r\n")' -- 'bytes(memoryview(buf)[:3])' > .................... > Median +- std dev: 886 ns +- 26 ns > > (temporary memoryview with "with") > $ python3 -m perf timeit -s 'buf = bytearray(b"foo\r\nbar\r\nbaz\r\n")' -- ' > with memoryview(buf) as m: > bytes(m[:3]) > ' > .................... > Median +- std dev: 1.11 us +- 0.03 us This is normal though, as memory views trade lower O(N) costs (reduced data copying) for higher O(1) setup costs (creating and managing the view, indirection for data access). > Proposed solution > =============== > > Adding one more constructor to bytes: > > # when length=-1 (default), use until end of *byteslike*. > bytes.frombuffer(byteslike, length=-1, offset=0) > > With ths API > > with memoryview(buf) as m: > line = bytes(m[:n]) > > becomes > > line = bytes.frombuffer(buf, n) Does that need to be a method on the builtin rather than a separate helper function, though? Once you define: def snapshot(buf, length=None, offset=0): with memoryview(buf) as m: return m[offset:length].tobytes() then that can be replaced by a more optimised C implementation without users needing to care about the internal details. That is, getting back to a variant on one of Serhiy's suggestions in the last PEP 467 discussion, it may make sense for us to offer a "buffertools" library that's specifically aimed at supporting efficient buffer manipulation operations that minimise data copying. The pure Python implementations would work entirely through memoryview, but we could also have selected C accelerated operations if that showed a noticeable improvement on asyncio's benchmarks. Regards, Nick. P.S. The length/offset API design is also problematic due to the way it differs from range() & slice(), but I don't think it makes sense to get into that kind of detail before discussing the larger question of adding a new helper module for working efficiently with memory buffers vs further widening the method API for the builtin bytes type -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com