This is very micro-managed, and it should be possible to use a single
instance on multiple threads as well. E.g., I'd like to use parallel
sequences in Scala to distribute a single model instance over several
documents (so, if I have documents in a List mydocuments, then I should be
able to do mydocuments.par and process each document, but I can't). Better
encapsulation would do the trick here. Is there a good reason not to?

Jason

On Wed, Jul 6, 2011 at 9:38 AM, Jörn Kottmann <[email protected]> wrote:

> On 7/6/11 4:32 PM, Jason Baldridge wrote:
>
>> So the core components (e.g. sentence detector, tokenizer) are not
>> thread-safe due to poor encapsulation. I seem to recall this being
>> discussed
>> before, but can't remember. Regardless, I think this would be a Very Good
>> Thing to fix, both to have better designed code, and to allow OpenNLP to
>> be
>> more easily used when exploiting multiple cores.
>>
>
> Our thread-safety strategy is to create one instance per thread and share
> the
> model. So you create one SentenceDetectorME per thread, but all instances
> share
> the model.
>
> Having multiple threads sharing one instance can slow down performance when
> it is not lock-free.
>
> Jörn
>



-- 
Jason Baldridge
Assistant Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Reply via email to