I am working on a toolbox for computer-archaeology where old data media are 
"excavated" and presented on a web-page. 
(https://github.com/Datamuseum-DK/AutoArchaeologist for anybody who cares).

Since these data-media can easily sum tens of gigabytes, mmap and virtual 
memory is my weapons of choice and that has brought me into an obscure corner 
of python where few people seem to venture:  I want to access the 
buffer-protocol from "userland".

The fundamental problem is that if I have a image of a disk and it has 2 
partitions, I end up with the "mmap.mmap" object that mapped the raw disk 
image, and two "bytes" or "bytearray" objects, each containing one partition, 
for a total memory footprint of twice the size of the disk.

As the tool dives into the filesystems in the partitions and creates objects 
for the individual files in the filesystem, that grows to three times the size 
of the disk etc.

To avoid this, I am writing a "bytes-like" scatter-gather class (not yet 
committed), and that is fine as far as it goes.

If I want to write one of my scatter-gather objects to disk, I have to:

    fd.write(bytes(myobj))

As a preliminary point, I think that is just wrong:  A class with a __bytes__ 
method should satisfy any needs the buffer-protocol might have, so this should 
work:

   fd.write(myobj)

But taking this a little bit further, I think __bytes__ should be allowed to be 
an iterator, provided the object also offers __len__, so that this would work:

    class bar():

        def __len__(self):
            return 3

        def __bytes__(self):
            yield b'0'
            yield b'1'
            yield b'2'

    open("/tmp/_", "wb").write(foo())

This example is of course trivial, but hav the yield statements hand out 
hundreds of megabytes, and the savings in time and malloc-space becomes very 
tangible.

Poul-Henning
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/LPXGGCU2UG7Q7P4EYDKCR2XKH7HVYPB7/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to