This is very micro-managed, and it should be possible to use a single instance on multiple threads as well. E.g., I'd like to use parallel sequences in Scala to distribute a single model instance over several documents (so, if I have documents in a List mydocuments, then I should be able to do mydocuments.par and process each document, but I can't). Better encapsulation would do the trick here. Is there a good reason not to?
Jason On Wed, Jul 6, 2011 at 9:38 AM, Jörn Kottmann <[email protected]> wrote: > On 7/6/11 4:32 PM, Jason Baldridge wrote: > >> So the core components (e.g. sentence detector, tokenizer) are not >> thread-safe due to poor encapsulation. I seem to recall this being >> discussed >> before, but can't remember. Regardless, I think this would be a Very Good >> Thing to fix, both to have better designed code, and to allow OpenNLP to >> be >> more easily used when exploiting multiple cores. >> > > Our thread-safety strategy is to create one instance per thread and share > the > model. So you create one SentenceDetectorME per thread, but all instances > share > the model. > > Having multiple threads sharing one instance can slow down performance when > it is not lock-free. > > Jörn > -- Jason Baldridge Assistant Professor, Department of Linguistics The University of Texas at Austin http://www.jasonbaldridge.com http://twitter.com/jasonbaldridge
