Re: Thread Safety of POSTaggerME

Grant Ingersoll Sat, 05 Mar 2011 10:52:25 -0800

On Mar 5, 2011, at 8:24 AM, Jörn Kottmann wrote:

> On 3/5/11 1:49 PM, Grant Ingersoll wrote:
>> On Feb 22, 2011, at 11:25 AM, Jörn Kottmann wrote:
>> 
>>> On 2/22/11 4:55 PM, Grant Ingersoll wrote:
>>>> Hi,
>>>> 
>>>> I'm using 1.4.3, but it looks like trunk has the same issue.  That is, it 
>>>> doesn't appear like the POSTaggerME class is thread safe, but perhaps I am 
>>>> misreading it.  I ask this, because it seems like the capturing of the 
>>>> bestSequence instance is a member variable and the tag and probs methods 
>>>> both access this method.  The reason I ask, is b/c I want to use this 
>>>> inside of Solr, but that is multithreaded and could be serving up a lot of 
>>>> requests and I certainly can't afford to load the model for each request.  
>>>> The fix for this particular class seems relatively straightforward, at the 
>>>> cost of breaking back compatibility of the API (which is a whole other 
>>>> topic)
>>>> 
>>>> I haven't looked deeper, but are there any other classes that I should be 
>>>> aware of w/ thread safety that people can think of?
>>> The components are not thread safe. They must only be called from one 
>>> thread.
>>> How to run OpenNLP in multiple threads then?
>>> 
>>> The models are thread-safe (because they are immutable) and can be shared
>>> between multiple instances of the same component (not strictly immutable, 
>>> so make sure
>>> to publish them correctly). Just create one instance per thread and share 
>>> the
>>> model instance.  In your case I guess you can just use ThreadLocal to 
>>> maintain one instance
>>> per thread combined with lazy initialization.
>>> 
>>> This way we are lock free and avoid difficult to understand
>>> and test multi-threading code. Making sure that our models are immutable is 
>>> easy
>>> and even if we make a mistake there it is unlikely that a user changes the 
>>> model
>>> in an application like yours. In the end I believe that we found a really 
>>> simple, solid
>>> and nice solution for this problem.
>> 
>> In this particular case, we could still be lock free and thread safe.  All 
>> we would need to do is to return out the best sequence instead of storing it 
>> in the object.
>> 
>> ThreadLocal's are not a great way of handling this stuff, IMO.  I also 
>> wonder how lightweight it is to create the objects that wrap the models.
>> 
> 
> Actually I do not understand how that would be, can you please elaborate a 
> little here?



ThreadLocals and GC seem to have a hard time working together in high 
performance situations, is my understanding (see 
http://www.lucidimagination.com/search/?q=threadlocal for some lengthy 
discussions).  Also, there seems to be questions about how they behave with 
classloaders in webapps.  I can't say I fully understand the implications and I 
won't say they are harmful, but I don't think they are necessarily the way you 
want to go unless you have to.  In the case of the POSTaggerME, it would be 
dead simple to be lock free and thread safe, AFAICT, at the cost of an API 
change.  I suspect this is likely the case in other places, too.

My bigger question is what is the cost of creating the objects that wrap the 
models?

-Grant

Re: Thread Safety of POSTaggerME

Reply via email to