Re: Thread Safety of POSTaggerME

Grant Ingersoll Wed, 09 Mar 2011 06:34:22 -0800

On Mar 7, 2011, at 7:16 AM, Jörn Kottmann wrote:

> On 3/6/11 1:37 PM, Grant Ingersoll wrote:
>> On Mar 5, 2011, at 2:13 PM, Jörn Kottmann wrote:
>>> I actually tried to ask how you would do that. I don't think it is super 
>>> simple. Can you please shortly
>>> explain what you have in mind?
>> From the looks of it, we'd just need to return the bestSequence object (or 
>> some larger containing object) out to the user and not use it (or other 
>> pieces that may change) as a member variable.  Granted, I'm still learning 
>> the code, so I likely am misreading some things.  From the looks of it, 
>> though, simply changing the tag method to return the bestSequence would let 
>> the user make the appropriate calls to best outcome and to get the 
>> probabilities (or the probs() method could take in the bestSequence object 
>> if you wanted to keep that convenience)
>> 
>> I suppose I should just work up a patch, it would be a lot easier than 
>> discussing it in the abstract.
>> 
> There is also a cache which must be created then per call, we need to do some 
> measuring
> how expensive that is compared to the current solution.
> 
> The POS Tagger should also use the new feature generation stuff we made
> for the name finder, but that is not thread safe by design, because it has a
> state. The state is necessary to support per document features like we have 
> it in
> the name finder.
> 
> Do you think making the name finder and other components thread safe in the
> same way is also possible?


Not sure.  I only noticed it in the POS tagger.

> Right now we have the same thread-safety convention
> for all components, which I like because it is easy for some one new to learn.
> When it is mixed, e.g. POS Tagger thread safe and name finder not, then people
> will get confused.

It is no doubt a hard problem.   There is always this tradeoff between easy to 
learn and fast, it seems.  In my experience, most programmers aren't good at 
concurrent programming (and I certainly don't claim to be either) and so it is 
hard to get it right.   I think one of the big wins for us could be to make 
OpenNLP really fast, which will increase its viability and attract others.

-Grant

Re: Thread Safety of POSTaggerME

Reply via email to