On 11/01/2017 22:51, Joern Kottmann wrote:
On Wed, 2017-01-11 at 11:05 +0100, Thilo Goetz wrote:
in a recent project, I was using SentenceDetectorME, TokenizerME and
POSTaggerME. It turns out that none of those is thread safe. This is
because the classification probabilities for the last tag() call
(for
example) are stored in a member variable and can be retrieved by a
separate API call.
The POSTagger already has the Sequence object to return the result
with probabilties. If we would introduce a new method we can probably
just deprecate the method to retrieve the probs.

Should be a minor change to have an interface that can be thread safe.

[...]
I don't want to muddy the waters, but I had another idea: we could also add a getThreadLocal() method to the tools we want. You would create a POSTaggerME (for example) like always, and if you needed a per thread version, you could then call getThreadLocal(), which would give you another POSTaggerME with the same model, per thread. The advantage as I see it is that the API extension would be conservative (just one method added), and getting the probabilities would continue to work as before because you have one instance per thread.

Does that make sense? I'm not sure I'm explaining this in the best possible manner...

--Thilo

Reply via email to