Re: Per-repository thread pool in Jackrabbit

Felix Meschberger Mon, 13 Jul 2009 05:58:14 -0700

Hi,

Marcel Reutegger schrieb:
> Hi,
> 
> 2009/7/12 Jukka Zitting <jukka.zitt...@gmail.com>:
>> Hi,
>>
>> 2009/7/8 Marcel Reutegger <marcel.reuteg...@gmx.net>:
>>> - paralleled execution of some work. this is primarily to make use of
>>> multi-core processors. execution should be distributed over and
>>> executed by N threads which is a factor of the available processors.
>> If I recall correctly we debated this already earlier. My point was
>> that limiting the number of tasks to the number of available
>> processors may not be a good approach as the tasks may be IO-bound or
>> block for other reasons, in which case having more task threads would
>> give you better throughput. But I recall being proven wrong, did we
>> have some benchmark for that? Do you remember where this discussion
>> was?
> 
> I don't remember either... But let's just start a new one.
> 
> I think this very much depends on the work that needs to be distributed. there
> is no prove that one way is better than the other. for CPU intensive work we'd
> probably want to limit the number of concurrent tasks. for I/O intensive work
> the concurrency should be higher.
> 
> my above point was rather related to CPU intensive work. e.g. creating a 
> posting
> list while content is indexed. but of course there might be other work that 
> may
> be parallelized more aggressively.
> 
> I guess the actual pool shouldn't care about that. some utility on top
> of the pool
> should provide that functionality. i.e. execute a number of tasks with a given
> level of concurrency. the utility would then dispatch the tasks to the pool
> accordingly.
> 
>>> - Timers used in TransactionContext and MultiIndex. This could be
>>> turned into a scheduling mechanism that could also be used by the
>>> ClusterNode sync. Other classes that use periodic checks in a
>>> background thread: DatabaseJournal (ClusterRevisionJanitor),
>>> CooperativeFileLock (watch dog).
>> Yep. Perhaps we could also reuse some of the scheduling functionality in 
>> Sling.
> 
> I'm not sure this is needed. the java rt library already comes with
> Timer and Task
> classes. our needs are very simple and I'm not sure that justifies a
> new dependency.


Yes, AFAICT Java also has ThreadPool implementations. If not, I urge to
still _not_ reinvent the wheel and take something existing even if it
would a single dependency.

Regards
Felix

> 
>>> the more I think about it, the more I like your idea. but we should be
>>> careful with a maximum size for a repository wide pool. extensive use
>>> of the pool by a module should not lock up another module just because
>>> there are no more idle threads. maybe that global pool shouldn't have
>>> a maximum size...
>> That might make sense. Perhaps we should have some concept of
>> sub-pools (that borrow from the main pool) with fixed limits for tasks
>> that need them (see above).
> 
> hmm, that doesn't sound flexible and generic. I just thought again how cool
> it was if we could deploy jackrabbit into a google app-engine. that however
> requires that all background threads are removed. if we have that generic
> pool and client code adjusted accordingly it could be as easy as turning
> the pool into a direct executor variant ;) well, that's very optimistic but
> sounds promising to me...
> 
> regards
>  marcel
>

Re: Per-repository thread pool in Jackrabbit

Reply via email to