News123 wrote:
Dave Angel wrote:
News123 wrote:
Hi.

I started playing with PIL.

I'm performing operations on multiple images and would like compromise
between speed and memory requirement.
. . .

The question, that I have is whether there is any way to tell python,
that certain objects could be garbage collected if needed and ask python
at a later time whether the object has been collected so far (image has
to be reloaded) or not (image would not have to be reloaded)



You don't say what implementation of Python, nor on what OS platform. Yet you're asking how to influence that implementation.

Sorry my fault. I'm using C-python under Windows and under Linux
In CPython, version 2.6 (and probably most other versions, but somebody
else would have to chime in) an object is freed as soon as its reference
count goes to zero.  So the garbage collector is only there to catch
cycles, and it runs relatively infrequently.

If CYthon frees objects as early as possible (as soon as the refcount is
0), then weakref wil not really help me.
In this case I'd have to elaborate into a cache like structure.
So, if you keep a reference to an object, it'll not be freed. Theoretically, you can use the weakref module to keep a reference
without inhibiting the garbage collection, but I don't have any
experience with the module.  You could start by studying its
documentation. But probably you want a weakref.WeakValueDictionary. Use that in your third approach to store the cache.

If you're using Cython or Jython, or one of many other implementations,
the rules will be different.

The real key to efficiency is usually managing locality of reference. If a given image is going to be used for many output files, you might
try to do all the work with it before going on to the next image.  In
that case, it might mean searching all_creation_rules for rules which
reference the file you've currently loaded, measurement is key.

Changing the order of the images to be calculated is key and I'm working
on that.

For a first step I can reorder the image creation such, that all outpout
images, that depend only on one input image will be calculated one after
the other.

so for this case I can transform:
# Slowest approach:
for creation_rule in all_creation_rules():
    img = Image.new(...)
    for img_file in creation_rule.input_files():
        src_img = Image.open(img_file)
        img = do_somethingwith(img,src_img) # wrong indentation in OP
    img.save()


into
src_img = Image.open(img_file)
for creation_rule in all_creation_rules_with_on_src_img():
    img = Image.new(...)
    img = do_somethingwith(img,src_img)
    img.save()


What I was more concerned is a group of output images depending on TWO
or more input images.

Depending on the platform (and the images) I might not be able to
preload all two (or more images)

So,  as CPython's garbage collection takes always place immediately,
then I'd like to pursue something else.
I can create a cache, which caches input files as long as python leaves
at least n MB available for the rest of the system.

For this I have to know how much RAM is still available on a system.

I'll start looking into this.

thanks again



N


As I said earlier, I think weakref is probably what you need. A weakref is still a reference from the point of view of the ref-counting, but not from the point of view of the garbage collector. Have you read the help on weakref module? In particular, did you read Pep 0205? http://www.python.org/dev/peps/pep-0205/

Object cache is one of the two reasons for the weakref module.

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to