On 12/20/05, The Rasterman Carsten Haitzler <[EMAIL PROTECTED]> wrote:

just one thing - with efm it shouldnt be forking 1 process per image all at
once. it will only be keeping 1 forked child at a time - running along
generating for images without thumbs if they need one. the parent just gets the
child exit event then forks off another. so it's 1 fork per image - and only
per image that needs a thumb ANd only once per generation. i think you can
safely asume even on the worst of posi systems 1 fork is nothing compared to
the workload of loading, scaling then writing an image file :)

Ahh, I should have read the code more closely. I saw the fork() at the top of the _e_thumb_generate and assumed the worst. Thanks for clarifying that.

anyway. i personalyl still favor the fork model as it requires nop pthreads,
likely is no overhead compared to threads, has no concurrency and cache issues,
aned is simple.

Agreed.

what it does ned is an ability to tune how many forked image
generators to allow at a time (efm allows only 1 so dual cpu systems will be
happy, more cpus wont benefit - ok maybe 3 as x is probably involved, and you
might say 4 if you let the kernel run IO cpu instructions on a 4th cpu).

I think single CPU systems could even benefit from spawning a few worker processes. Each process will reach a state where it's waiting on a read from the image file, so two or three thumbnailing processes could potentially interleave their resource usage pretty well. One sitting in a blocking read while the other is processing image data.

anyway
- you DO have a very valid point for when 2 apps start thumbnailign the same
dir. we should definitely put in a locking mechanism for that so wither they
share the workload, or the first guy in gets "lock ownership" and drives the
thumbnailing until he's done and the other process sits and waits (maybe
polling the lock file if we sue that mechanism - the owenr coudl update the
timespamp on the lockfile whenever it generates something. or the lock file
could contain info as so the queue of ungenerated images to go... or maybe the
simplest case all processes not owning the thumbnailing for that dir hold off
until the owner releases (hopefulyl not too long from now) and then do a full
update).

I'm actually pretty surprised the fd.o spec didn't address locking at all, at least not that I saw the last time I read it.

anyway - i do agree that there is need to unify. i do also think there are 2
levels here. 1. just generate thumbs and let calling process know (either via a
blocking api or a fork/event), 2. be able to ask for the thumb path for any
given file path, and 3. load thumb into a canvas object (another level
entirely). i do think you want to support blocking and sync - both. really
async can just be a wrapper on top of the blocking api.

I think 1 and 2 are what I'm most concerned with atm. Following the fd.o spec (to a point, since its lack of jpeg support is just dumb) 3 is not a large issue since its just loading a png or jpeg.

imho it could do with:

1. add a file path to the thumb gen queue
2. delete a file path from the queue
3. begin queue processing
4. pause/unpause queue processing
5. end queue processing
6. ask for thumb path from file path
7. brute-force blocking-api generate thumb
8. get "new thumb available" events
9. set paralellism count (how many threads or forked children to allow at a
time)

I think we're on the same page here as far as features. So here's the idea I had in mind for implementation. First off, we have a lib that provides the blocking API with Epsilon. I spoke to atmos about all of this and he expressed a desire to keep Epsilon simple and not expand the functionality much at that level. So that could provide the lowest level blocking thumbnail generation based on MIME type with plugins (as mentioned in your next paragraph). To address the async aspect, we'd wrap Epsilon with the queue processing and event API with the features you mention above. Then to provide the async behavior (and address the locking problem), the lib could actually setup an IPC channel and fork off a small daemon. That daemon would then be available to all processes owned by that user and fork off processes responsible for the actual generation of the thumbnails. Since the daemon provides the only route to thumbnail generation we don't have to deal with locking or potential deadlocks and the race condition is eliminated. If the daemon is auto-started on demand, it could also exit after a configurable amount of inactivity. It also knows what's currently being processed so it can shortcut the requeueing of duplicate items.

The only real downside I see to this is the IPC communication overhead, though entropy actually does this to communicate between threads and appears to do so w/o any significant performance impact. Any other concerns that I'm missing?

locking can be implemented under the bonnet of such and api. also one thing i
think might be good here is that the lib actually dynamically adapts to
whatever libs it can find RUNTIME - not compile-time. so if it finds imlib2, it
will sue it. if it finds evas, it will use that, if it finds epeg, it will use
that - it can dlopen the libs just like the runtime linker, and thus adapt to
whatever is on the system runtime without compile time dependencies (installing
more libs just gets faster thumbnailing or more format support etc.). this
allows us to add other things in future under the hood (thumbnailing pdf's,
html files, text files, svg, etc.)

Yeah, I think this all belongs at the lower level blocking API.

Nathan

Reply via email to