On Mar 7, 2011, at 7:16 AM, Jörn Kottmann wrote: > On 3/6/11 1:37 PM, Grant Ingersoll wrote: >> On Mar 5, 2011, at 2:13 PM, Jörn Kottmann wrote: >>> I actually tried to ask how you would do that. I don't think it is super >>> simple. Can you please shortly >>> explain what you have in mind? >> From the looks of it, we'd just need to return the bestSequence object (or >> some larger containing object) out to the user and not use it (or other >> pieces that may change) as a member variable. Granted, I'm still learning >> the code, so I likely am misreading some things. From the looks of it, >> though, simply changing the tag method to return the bestSequence would let >> the user make the appropriate calls to best outcome and to get the >> probabilities (or the probs() method could take in the bestSequence object >> if you wanted to keep that convenience) >> >> I suppose I should just work up a patch, it would be a lot easier than >> discussing it in the abstract. >> > There is also a cache which must be created then per call, we need to do some > measuring > how expensive that is compared to the current solution. > > The POS Tagger should also use the new feature generation stuff we made > for the name finder, but that is not thread safe by design, because it has a > state. The state is necessary to support per document features like we have > it in > the name finder. > > Do you think making the name finder and other components thread safe in the > same way is also possible?
Not sure. I only noticed it in the POS tagger. > Right now we have the same thread-safety convention > for all components, which I like because it is easy for some one new to learn. > When it is mixed, e.g. POS Tagger thread safe and name finder not, then people > will get confused. It is no doubt a hard problem. There is always this tradeoff between easy to learn and fast, it seems. In my experience, most programmers aren't good at concurrent programming (and I certainly don't claim to be either) and so it is hard to get it right. I think one of the big wins for us could be to make OpenNLP really fast, which will increase its viability and attract others. -Grant
