Re: UIMACPP and multi-threading
Benjamin, Initial testing with the latest AMQ broker indicates an incompatibility with the existing UIMACPP release. Along with the problems you have exposed there is good motivation to get another uimacpp release out relatively soon. THanks for exposing the GC/threading issue with the JNI and potential fixes. Eddie On Tue, Apr 26, 2016 at 3:47 AM, Benjamin De Boe < benjamin.de...@intersystems.com> wrote: > Hi Eddie, > > I'm not familiar with the serializeJNI issue. > Few sources still recommend implementing finalize(), because it is > undetermined in which order the GC process will eventually invoke them. We > also thought it was counterintuitive to see the UimacppEngine being > finalized before the UimacppAnalysisComponent that wraps it, but that's > what our extra logs quite consistently seemed to indicate, so that's > probably just what the word "non-deterministic" means. > > This article suggests a few alternatives that may be considered for this > UIMACPP / JNI issue in the long run: > http://www.oracle.com/technetwork/java/javamail/finalization-137655.html > > > Thanks, > benjamin > > -- > Benjamin De Boe | Product Manager > M: +32 495 19 19 27 | T: +32 2 464 97 33 > InterSystems Corporation | http://www.intersystems.com > > -Original Message- > From: Eddie Epstein [mailto:eaepst...@gmail.com] > Sent: Tuesday, April 26, 2016 4:58 AM > To: user@uima.apache.org > Cc: Jos Denys ; Chen-Chieh Hsu < > chen-chieh@intersystems.com> > Subject: Re: UIMACPP and multi-threading > > Hi, > > Not the author of the JNI, but does it make sense that > UimacppEngine.finalize() could be called while UimacppAnalysisComponent > maintains a valid engine pointer to UimacppEngine? And once the engine > pointer has been set to null, UimacppAnalysisComponent.destroy() will not > call UimacppEngine.destroy(). Leaves me confused how this could happen. > > At any rate, do you think finalize is related to the serizalizeJNI problem? > > Eddie > > > > > > On Mon, Apr 25, 2016 at 8:27 AM, Benjamin De Boe < > benjamin.de...@intersystems.com> wrote: > > > After some more debugging, it seems this is probably a garbage > > collection issue rather than a multi-threading issue, although > > multiple threads may well increase the likelihood of it happening. > > > > We've found that there are two methods on the CPP side for cleaning up > > the memory used by the CPP engine: destroyJNI() and destructorJNI(). > > destructorJNI() is called from the UimacppEngine:finalize() method and > > only deletes the pInstance pointer, whereas destroyJNI() does a lot > > more work in cleaning up what lies beyond and is called through > > UimacppEngine:destroy(), which in turn is invoked from > UimacppAnalysisComponent:finalize(). > > > > Now, the arcane magic in the GC process seems to first finish off the > > UimacppEngine helper object (calling destructorJNI()) and then the > > UimacppAnalysisComponent instance that contained the other one, with > > its > > destroyJNI() method then running into trouble because pInstance was > > already deleted in destructorJNI(), causing the access violation we've > > been struggling with. > > > > [logged as https://issues.apache.org/jira/browse/UIMA-4899 ] > > > > There are a number of ways how we could work around this (such as just > > calling destroyJNI() in both cases, exiting early if it's already > > cleaned up), but of course we'd hope someone of the original UIMACPP > > team to weigh in and share the reasoning behind those two separate > > methods and anything we're overlooking in our assessment. Anybody who > > can recommend what we should do in the short run and how this might > > translate into a fixed UIMA / UIMACPP release at some point? An > > out-of-the-box 64-bit UIMACPP release would probably benefit more than > > just us (cf https://issues.apache.org/jira/browse/UIMA-4900). > > > > > > > > Thanks, > > benjamin > > > > -- > > Benjamin De Boe | Product Manager > > M: +32 495 19 19 27 | T: +32 2 464 97 33 InterSystems Corporation | > > http://www.intersystems.com > > > > -Original Message- > > From: Eddie Epstein [mailto:eaepst...@gmail.com] > > Sent: Thursday, April 7, 2016 1:58 PM > > To: user@uima.apache.org > > Subject: Re: UIMACPP and multi-threading > > > > Standalone.java certainly does show threading issues with uimacpp's JNI. > > The multithread testing thru the JNI, like the one I did a few days > > ago, was clearly not sufficient to declare it thread
RE: UIMACPP and multi-threading
Hi Eddie, From my observations, it looks like the destructorJNI() is meant for JNI Instances that were constructed (calling constructJNI()), but never used (like you would allocate a pool of them, but not use them all). destroyJNI() frees all memory allocated for analysis processing. It is safe to first call ‘’destroyJNI()’’ and then ‘’destructorJNI()’’, even on unused instances, but it is not safe to reverse the methods, memory leaks (on used instances) are the result, and even worse, access violations because the same instance pointer is deleted twice. This is the problem we are facing. As to why both methods (destructor & destroy) co-exist is a little unclear to me, I made the destructorJNI() method doing nothing, and that solved our problem, as destructorJNI() and destroyJNI() are always called in pairs. If there are cases where this is not true, then we’ll have memory leaks of course, so a more robust solution is very welcome. Regards, Jos Denys. De : Eddie Epstein [mailto:eaepst...@gmail.com] Envoyé : mardi 26 avril 2016 04:58 À : user@uima.apache.org Cc : Jos Denys ; Chen-Chieh Hsu Objet : Re: UIMACPP and multi-threading Hi, Not the author of the JNI, but does it make sense that UimacppEngine.finalize() could be called while UimacppAnalysisComponent maintains a valid engine pointer to UimacppEngine? And once the engine pointer has been set to null, UimacppAnalysisComponent.destroy() will not call UimacppEngine.destroy(). Leaves me confused how this could happen. At any rate, do you think finalize is related to the serizalizeJNI problem? Eddie On Mon, Apr 25, 2016 at 8:27 AM, Benjamin De Boe mailto:benjamin.de...@intersystems.com>> wrote: After some more debugging, it seems this is probably a garbage collection issue rather than a multi-threading issue, although multiple threads may well increase the likelihood of it happening. We've found that there are two methods on the CPP side for cleaning up the memory used by the CPP engine: destroyJNI() and destructorJNI(). destructorJNI() is called from the UimacppEngine:finalize() method and only deletes the pInstance pointer, whereas destroyJNI() does a lot more work in cleaning up what lies beyond and is called through UimacppEngine:destroy(), which in turn is invoked from UimacppAnalysisComponent:finalize(). Now, the arcane magic in the GC process seems to first finish off the UimacppEngine helper object (calling destructorJNI()) and then the UimacppAnalysisComponent instance that contained the other one, with its destroyJNI() method then running into trouble because pInstance was already deleted in destructorJNI(), causing the access violation we've been struggling with. [logged as https://issues.apache.org/jira/browse/UIMA-4899 ] There are a number of ways how we could work around this (such as just calling destroyJNI() in both cases, exiting early if it's already cleaned up), but of course we'd hope someone of the original UIMACPP team to weigh in and share the reasoning behind those two separate methods and anything we're overlooking in our assessment. Anybody who can recommend what we should do in the short run and how this might translate into a fixed UIMA / UIMACPP release at some point? An out-of-the-box 64-bit UIMACPP release would probably benefit more than just us (cf https://issues.apache.org/jira/browse/UIMA-4900). Thanks, benjamin -- Benjamin De Boe | Product Manager M: +32 495 19 19 27 | T: +32 2 464 97 33 InterSystems Corporation | http://www.intersystems.com -Original Message- From: Eddie Epstein [mailto:eaepst...@gmail.com<mailto:eaepst...@gmail.com>] Sent: Thursday, April 7, 2016 1:58 PM To: user@uima.apache.org<mailto:user@uima.apache.org> Subject: Re: UIMACPP and multi-threading Standalone.java certainly does show threading issues with uimacpp's JNI. The multithread testing thru the JNI, like the one I did a few days ago, was clearly not sufficient to declare it thread safe. Our local uimacpp development with regards thread safety was focused on multithread testing for the development of uimacpp's native AMQ service wrapper. If you do fix the JNI threading issues please consider contributing them back to ASF! Eddie On Tue, Apr 5, 2016 at 8:54 AM, Jos Denys mailto:jos.de...@intersystems.com>> wrote: > Hi Eddie, > > I worked on the CPP-side, and what I noticed was that the JNI > Interface always passes an instance pointer : > > JNIEXPORT void JNICALL JAVA_PREFIX(resetJNI) (JNIEnv* jeEnv, jobject > joJTaf) { > try { > UIMA_TPRINT("entering resetDocument()"); > > uima::JNIInstance* pInstance = JNIUtils::getCppInstance(jeEnv, > joJTaf); > > > Now the strange thing, and finally what caused the acces violation > error, was that the pInstance pointer was the same for the 3 threads > that > (simultaneously) did the UIMA processin
RE: UIMACPP and multi-threading
Hi Eddie, I'm not familiar with the serializeJNI issue. Few sources still recommend implementing finalize(), because it is undetermined in which order the GC process will eventually invoke them. We also thought it was counterintuitive to see the UimacppEngine being finalized before the UimacppAnalysisComponent that wraps it, but that's what our extra logs quite consistently seemed to indicate, so that's probably just what the word "non-deterministic" means. This article suggests a few alternatives that may be considered for this UIMACPP / JNI issue in the long run: http://www.oracle.com/technetwork/java/javamail/finalization-137655.html Thanks, benjamin -- Benjamin De Boe | Product Manager M: +32 495 19 19 27 | T: +32 2 464 97 33 InterSystems Corporation | http://www.intersystems.com -Original Message- From: Eddie Epstein [mailto:eaepst...@gmail.com] Sent: Tuesday, April 26, 2016 4:58 AM To: user@uima.apache.org Cc: Jos Denys ; Chen-Chieh Hsu Subject: Re: UIMACPP and multi-threading Hi, Not the author of the JNI, but does it make sense that UimacppEngine.finalize() could be called while UimacppAnalysisComponent maintains a valid engine pointer to UimacppEngine? And once the engine pointer has been set to null, UimacppAnalysisComponent.destroy() will not call UimacppEngine.destroy(). Leaves me confused how this could happen. At any rate, do you think finalize is related to the serizalizeJNI problem? Eddie On Mon, Apr 25, 2016 at 8:27 AM, Benjamin De Boe < benjamin.de...@intersystems.com> wrote: > After some more debugging, it seems this is probably a garbage > collection issue rather than a multi-threading issue, although > multiple threads may well increase the likelihood of it happening. > > We've found that there are two methods on the CPP side for cleaning up > the memory used by the CPP engine: destroyJNI() and destructorJNI(). > destructorJNI() is called from the UimacppEngine:finalize() method and > only deletes the pInstance pointer, whereas destroyJNI() does a lot > more work in cleaning up what lies beyond and is called through > UimacppEngine:destroy(), which in turn is invoked from > UimacppAnalysisComponent:finalize(). > > Now, the arcane magic in the GC process seems to first finish off the > UimacppEngine helper object (calling destructorJNI()) and then the > UimacppAnalysisComponent instance that contained the other one, with > its > destroyJNI() method then running into trouble because pInstance was > already deleted in destructorJNI(), causing the access violation we've > been struggling with. > > [logged as https://issues.apache.org/jira/browse/UIMA-4899 ] > > There are a number of ways how we could work around this (such as just > calling destroyJNI() in both cases, exiting early if it's already > cleaned up), but of course we'd hope someone of the original UIMACPP > team to weigh in and share the reasoning behind those two separate > methods and anything we're overlooking in our assessment. Anybody who > can recommend what we should do in the short run and how this might > translate into a fixed UIMA / UIMACPP release at some point? An > out-of-the-box 64-bit UIMACPP release would probably benefit more than > just us (cf https://issues.apache.org/jira/browse/UIMA-4900). > > > > Thanks, > benjamin > > -- > Benjamin De Boe | Product Manager > M: +32 495 19 19 27 | T: +32 2 464 97 33 InterSystems Corporation | > http://www.intersystems.com > > -----Original Message- > From: Eddie Epstein [mailto:eaepst...@gmail.com] > Sent: Thursday, April 7, 2016 1:58 PM > To: user@uima.apache.org > Subject: Re: UIMACPP and multi-threading > > Standalone.java certainly does show threading issues with uimacpp's JNI. > The multithread testing thru the JNI, like the one I did a few days > ago, was clearly not sufficient to declare it thread safe. > > Our local uimacpp development with regards thread safety was focused > on multithread testing for the development of uimacpp's native AMQ > service wrapper. > > If you do fix the JNI threading issues please consider contributing > them back to ASF! > Eddie > > On Tue, Apr 5, 2016 at 8:54 AM, Jos Denys > wrote: > > > Hi Eddie, > > > > I worked on the CPP-side, and what I noticed was that the JNI > > Interface always passes an instance pointer : > > > > JNIEXPORT void JNICALL JAVA_PREFIX(resetJNI) (JNIEnv* jeEnv, jobject > > joJTaf) { > > try { > > UIMA_TPRINT("entering resetDocument()"); > > > > uima::JNIInstance* pInstance = JNIUtils::getCppInstance(jeEnv, > > joJTaf); > > > > > > Now the strange thing, and finally what caused th
Re: UIMACPP and multi-threading
Hi, Not the author of the JNI, but does it make sense that UimacppEngine.finalize() could be called while UimacppAnalysisComponent maintains a valid engine pointer to UimacppEngine? And once the engine pointer has been set to null, UimacppAnalysisComponent.destroy() will not call UimacppEngine.destroy(). Leaves me confused how this could happen. At any rate, do you think finalize is related to the serizalizeJNI problem? Eddie On Mon, Apr 25, 2016 at 8:27 AM, Benjamin De Boe < benjamin.de...@intersystems.com> wrote: > After some more debugging, it seems this is probably a garbage collection > issue rather than a multi-threading issue, although multiple threads may > well increase the likelihood of it happening. > > We've found that there are two methods on the CPP side for cleaning up the > memory used by the CPP engine: destroyJNI() and destructorJNI(). > destructorJNI() is called from the UimacppEngine:finalize() method and only > deletes the pInstance pointer, whereas destroyJNI() does a lot more work in > cleaning up what lies beyond and is called through UimacppEngine:destroy(), > which in turn is invoked from UimacppAnalysisComponent:finalize(). > > Now, the arcane magic in the GC process seems to first finish off the > UimacppEngine helper object (calling destructorJNI()) and then the > UimacppAnalysisComponent instance that contained the other one, with its > destroyJNI() method then running into trouble because pInstance was already > deleted in destructorJNI(), causing the access violation we've been > struggling with. > > [logged as https://issues.apache.org/jira/browse/UIMA-4899 ] > > There are a number of ways how we could work around this (such as just > calling destroyJNI() in both cases, exiting early if it's already cleaned > up), but of course we'd hope someone of the original UIMACPP team to weigh > in and share the reasoning behind those two separate methods and anything > we're overlooking in our assessment. Anybody who can recommend what we > should do in the short run and how this might translate into a fixed UIMA / > UIMACPP release at some point? An out-of-the-box 64-bit UIMACPP release > would probably benefit more than just us (cf > https://issues.apache.org/jira/browse/UIMA-4900). > > > > Thanks, > benjamin > > -- > Benjamin De Boe | Product Manager > M: +32 495 19 19 27 | T: +32 2 464 97 33 > InterSystems Corporation | http://www.intersystems.com > > -----Original Message----- > From: Eddie Epstein [mailto:eaepst...@gmail.com] > Sent: Thursday, April 7, 2016 1:58 PM > To: user@uima.apache.org > Subject: Re: UIMACPP and multi-threading > > Standalone.java certainly does show threading issues with uimacpp's JNI. > The multithread testing thru the JNI, like the one I did a few days ago, > was clearly not sufficient to declare it thread safe. > > Our local uimacpp development with regards thread safety was focused on > multithread testing for the development of uimacpp's native AMQ service > wrapper. > > If you do fix the JNI threading issues please consider contributing them > back to ASF! > Eddie > > On Tue, Apr 5, 2016 at 8:54 AM, Jos Denys > wrote: > > > Hi Eddie, > > > > I worked on the CPP-side, and what I noticed was that the JNI > > Interface always passes an instance pointer : > > > > JNIEXPORT void JNICALL JAVA_PREFIX(resetJNI) (JNIEnv* jeEnv, jobject > > joJTaf) { > > try { > > UIMA_TPRINT("entering resetDocument()"); > > > > uima::JNIInstance* pInstance = JNIUtils::getCppInstance(jeEnv, > > joJTaf); > > > > > > Now the strange thing, and finally what caused the acces violation > > error, was that the pInstance pointer was the same for the 3 threads > > that > > (simultaneously) did the UIMA processing, so it looks like the same > > CAS was passed for 3 different analysis worker threads. > > > > Any idea why and how this can happen ? > > > > Thanks for your feedback, > > Jos Denys, > > InterSystems Benelux. > > > > > > De : Benjamin De Boe > > Envoyé : mardi 5 avril 2016 09:33 > > À : user@uima.apache.org > > Cc : Jos Denys ; Chen-Chieh Hsu < > > chen-chieh@intersystems.com> Objet : RE: UIMACPP and > > multi-threading > > > > > > Hi Eddie, > > > > > > > > Thanks for your prompt response. > > > > In our experiment, we have one initial thread instantiating a CasPool > > and then passing it on to newly spawned threads that each have their > > own DaveDetector instance and fetch a new CAS from the shared pool. > > The UimacppEngine objects'
RE: UIMACPP and multi-threading
After some more debugging, it seems this is probably a garbage collection issue rather than a multi-threading issue, although multiple threads may well increase the likelihood of it happening. We've found that there are two methods on the CPP side for cleaning up the memory used by the CPP engine: destroyJNI() and destructorJNI(). destructorJNI() is called from the UimacppEngine:finalize() method and only deletes the pInstance pointer, whereas destroyJNI() does a lot more work in cleaning up what lies beyond and is called through UimacppEngine:destroy(), which in turn is invoked from UimacppAnalysisComponent:finalize(). Now, the arcane magic in the GC process seems to first finish off the UimacppEngine helper object (calling destructorJNI()) and then the UimacppAnalysisComponent instance that contained the other one, with its destroyJNI() method then running into trouble because pInstance was already deleted in destructorJNI(), causing the access violation we've been struggling with. [logged as https://issues.apache.org/jira/browse/UIMA-4899 ] There are a number of ways how we could work around this (such as just calling destroyJNI() in both cases, exiting early if it's already cleaned up), but of course we'd hope someone of the original UIMACPP team to weigh in and share the reasoning behind those two separate methods and anything we're overlooking in our assessment. Anybody who can recommend what we should do in the short run and how this might translate into a fixed UIMA / UIMACPP release at some point? An out-of-the-box 64-bit UIMACPP release would probably benefit more than just us (cf https://issues.apache.org/jira/browse/UIMA-4900). Thanks, benjamin -- Benjamin De Boe | Product Manager M: +32 495 19 19 27 | T: +32 2 464 97 33 InterSystems Corporation | http://www.intersystems.com -Original Message- From: Eddie Epstein [mailto:eaepst...@gmail.com] Sent: Thursday, April 7, 2016 1:58 PM To: user@uima.apache.org Subject: Re: UIMACPP and multi-threading Standalone.java certainly does show threading issues with uimacpp's JNI. The multithread testing thru the JNI, like the one I did a few days ago, was clearly not sufficient to declare it thread safe. Our local uimacpp development with regards thread safety was focused on multithread testing for the development of uimacpp's native AMQ service wrapper. If you do fix the JNI threading issues please consider contributing them back to ASF! Eddie On Tue, Apr 5, 2016 at 8:54 AM, Jos Denys wrote: > Hi Eddie, > > I worked on the CPP-side, and what I noticed was that the JNI > Interface always passes an instance pointer : > > JNIEXPORT void JNICALL JAVA_PREFIX(resetJNI) (JNIEnv* jeEnv, jobject > joJTaf) { > try { > UIMA_TPRINT("entering resetDocument()"); > > uima::JNIInstance* pInstance = JNIUtils::getCppInstance(jeEnv, > joJTaf); > > > Now the strange thing, and finally what caused the acces violation > error, was that the pInstance pointer was the same for the 3 threads > that > (simultaneously) did the UIMA processing, so it looks like the same > CAS was passed for 3 different analysis worker threads. > > Any idea why and how this can happen ? > > Thanks for your feedback, > Jos Denys, > InterSystems Benelux. > > > De : Benjamin De Boe > Envoyé : mardi 5 avril 2016 09:33 > À : user@uima.apache.org > Cc : Jos Denys ; Chen-Chieh Hsu < > chen-chieh@intersystems.com> Objet : RE: UIMACPP and > multi-threading > > > Hi Eddie, > > > > Thanks for your prompt response. > > In our experiment, we have one initial thread instantiating a CasPool > and then passing it on to newly spawned threads that each have their > own DaveDetector instance and fetch a new CAS from the shared pool. > The UimacppEngine objects' cppEnginePointer variable differs per > thread, but on the C++ side, it looks like all threads are pointing to > the same memory address for the CAS they operate on. Given the actions > UimacppEngine:process() performs and its cas being process registered > as a protected field rather than a local variable, it's no wonder it > causes trouble. > > > > I can imagine UIMA-AS follows a path that's perhaps slightly different > (and apparently safe, given your test case), but I'm wondering what > we're doing wrong that we need to fiddle with synchronized keywords on > the framework classes to ensure we avoid the crash. > > Here's our test program. When the CAS pool is small enough (i.e. 5), > things work fine. When it is larger than the number of documents we > want to process (23), it also works. When it is somewhere in between > (i.e. 20), we get the crash. > > > > package com.intersys.uima.test; > > >
Re: UIMACPP and multi-threading
Standalone.java certainly does show threading issues with uimacpp's JNI. The multithread testing thru the JNI, like the one I did a few days ago, was clearly not sufficient to declare it thread safe. Our local uimacpp development with regards thread safety was focused on multithread testing for the development of uimacpp's native AMQ service wrapper. If you do fix the JNI threading issues please consider contributing them back to ASF! Eddie On Tue, Apr 5, 2016 at 8:54 AM, Jos Denys wrote: > Hi Eddie, > > I worked on the CPP-side, and what I noticed was that the JNI Interface > always passes an instance pointer : > > JNIEXPORT void JNICALL JAVA_PREFIX(resetJNI) (JNIEnv* jeEnv, jobject > joJTaf) { > try { > UIMA_TPRINT("entering resetDocument()"); > > uima::JNIInstance* pInstance = JNIUtils::getCppInstance(jeEnv, joJTaf); > > > Now the strange thing, and finally what caused the acces violation error, > was that the pInstance pointer was the same for the 3 threads that > (simultaneously) did the UIMA processing, > so it looks like the same CAS was passed for 3 different analysis worker > threads. > > Any idea why and how this can happen ? > > Thanks for your feedback, > Jos Denys, > InterSystems Benelux. > > > De : Benjamin De Boe > Envoyé : mardi 5 avril 2016 09:33 > À : user@uima.apache.org > Cc : Jos Denys ; Chen-Chieh Hsu < > chen-chieh@intersystems.com> > Objet : RE: UIMACPP and multi-threading > > > Hi Eddie, > > > > Thanks for your prompt response. > > In our experiment, we have one initial thread instantiating a CasPool and > then passing it on to newly spawned threads that each have their own > DaveDetector instance and fetch a new CAS from the shared pool. The > UimacppEngine objects' cppEnginePointer variable differs per thread, but on > the C++ side, it looks like all threads are pointing to the same memory > address for the CAS they operate on. Given the actions > UimacppEngine:process() performs and its cas being process registered as a > protected field rather than a local variable, it's no wonder it causes > trouble. > > > > I can imagine UIMA-AS follows a path that's perhaps slightly different > (and apparently safe, given your test case), but I'm wondering what we're > doing wrong that we need to fiddle with synchronized keywords on the > framework classes to ensure we avoid the crash. > > Here's our test program. When the CAS pool is small enough (i.e. 5), > things work fine. When it is larger than the number of documents we want to > process (23), it also works. When it is somewhere in between (i.e. 20), we > get the crash. > > > > package com.intersys.uima.test; > > > > import java.io.File; > > import java.net.URL; > > import java.net.URLClassLoader; > > import org.apache.uima.UIMAFramework; > > import org.apache.uima.analysis_engine.AnalysisEngine; > > import org.apache.uima.cas.CAS; > > import org.apache.uima.resource.ResourceSpecifier; > > import org.apache.uima.util.CasCreationUtils; > > import org.apache.uima.util.CasPool; > > import org.apache.uima.util.Level; > > import org.apache.uima.util.XMLInputSource; > > > > /** > > * > > * @author bdeboe > > */ > > public class Standalone implements Runnable { > > > > private String text; > > private AnalysisEngine ae; > > private CasPool pool; > > > > public Standalone(String txt, AnalysisEngine ae, CasPool pool) { > > this.text = txt; > > this.ae = ae; > > this.pool = pool; > > } > > > > public static void main(String[] args) throws Exception { > > > > String descPath = ((args != null) && (args.length > 0)) ? args[0] > : "C:\\InterSystems\\UIMA\\bin\\DaveDetector.xml"; > >int casPoolSize = ((args != null) && (args.length > 1)) ? > Integer.valueOf(args[1]) : 20; > > > > XMLInputSource in = new XMLInputSource(descPath); > > ResourceSpecifier specifier > > = UIMAFramework.getXMLParser().parseResourceSpecifier(in); > > AnalysisEngine ae = UIMAFramework.produceAnalysisEngine(specifier); > > > > String[] text = new String[23]; > > // populating the array… > > text[22] = "…"; > > > > CasPool pool = (casPoolSize > 0) ? new CasPool(casPoolSize, ae) : > null; > > for (int i = 0; i < text.length; i++) { > > Standalone task = new Standalone(text[i], > UIMAFramework.produceAnalysisEngine(specif
RE: UIMACPP and multi-threading
Hi Eddie, I worked on the CPP-side, and what I noticed was that the JNI Interface always passes an instance pointer : JNIEXPORT void JNICALL JAVA_PREFIX(resetJNI) (JNIEnv* jeEnv, jobject joJTaf) { try { UIMA_TPRINT("entering resetDocument()"); uima::JNIInstance* pInstance = JNIUtils::getCppInstance(jeEnv, joJTaf); Now the strange thing, and finally what caused the acces violation error, was that the pInstance pointer was the same for the 3 threads that (simultaneously) did the UIMA processing, so it looks like the same CAS was passed for 3 different analysis worker threads. Any idea why and how this can happen ? Thanks for your feedback, Jos Denys, InterSystems Benelux. De : Benjamin De Boe Envoyé : mardi 5 avril 2016 09:33 À : user@uima.apache.org Cc : Jos Denys ; Chen-Chieh Hsu Objet : RE: UIMACPP and multi-threading Hi Eddie, Thanks for your prompt response. In our experiment, we have one initial thread instantiating a CasPool and then passing it on to newly spawned threads that each have their own DaveDetector instance and fetch a new CAS from the shared pool. The UimacppEngine objects' cppEnginePointer variable differs per thread, but on the C++ side, it looks like all threads are pointing to the same memory address for the CAS they operate on. Given the actions UimacppEngine:process() performs and its cas being process registered as a protected field rather than a local variable, it's no wonder it causes trouble. I can imagine UIMA-AS follows a path that's perhaps slightly different (and apparently safe, given your test case), but I'm wondering what we're doing wrong that we need to fiddle with synchronized keywords on the framework classes to ensure we avoid the crash. Here's our test program. When the CAS pool is small enough (i.e. 5), things work fine. When it is larger than the number of documents we want to process (23), it also works. When it is somewhere in between (i.e. 20), we get the crash. package com.intersys.uima.test; import java.io.File; import java.net.URL; import java.net.URLClassLoader; import org.apache.uima.UIMAFramework; import org.apache.uima.analysis_engine.AnalysisEngine; import org.apache.uima.cas.CAS; import org.apache.uima.resource.ResourceSpecifier; import org.apache.uima.util.CasCreationUtils; import org.apache.uima.util.CasPool; import org.apache.uima.util.Level; import org.apache.uima.util.XMLInputSource; /** * * @author bdeboe */ public class Standalone implements Runnable { private String text; private AnalysisEngine ae; private CasPool pool; public Standalone(String txt, AnalysisEngine ae, CasPool pool) { this.text = txt; this.ae = ae; this.pool = pool; } public static void main(String[] args) throws Exception { String descPath = ((args != null) && (args.length > 0)) ? args[0] : "C:\\InterSystems\\UIMA\\bin\\DaveDetector.xml"; int casPoolSize = ((args != null) && (args.length > 1)) ? Integer.valueOf(args[1]) : 20; XMLInputSource in = new XMLInputSource(descPath); ResourceSpecifier specifier = UIMAFramework.getXMLParser().parseResourceSpecifier(in); AnalysisEngine ae = UIMAFramework.produceAnalysisEngine(specifier); String[] text = new String[23]; // populating the array… text[22] = "…"; CasPool pool = (casPoolSize > 0) ? new CasPool(casPoolSize, ae) : null; for (int i = 0; i < text.length; i++) { Standalone task = new Standalone(text[i], UIMAFramework.produceAnalysisEngine(specifier), (casPoolSize > 0) ? pool : null); Thread t = new Thread(task); t.start(); } } @Override public void run() { CAS cas = null; try { if (pool != null) { cas = pool.getCas(); } else { cas = CasCreationUtils.createCas(ae.getAnalysisEngineMetaData()); } cas.setDocumentText(text); ae.process(cas); System.out.println("Done processing text"); } catch (Exception e) { e.printStackTrace(); } finally { if (pool != null) pool.releaseCas(cas); } } } Probably also of note: we sometimes get a simple exception on destroyJNI() (pasted below), rather than the outright total process crash described earlier. We assume this is just “luck” in that the different threads are invoking a not-so-critical section. Apr 05, 2016 9:25:25 AM org.apache.uima.uimacpp.UimacppAnalysisComponent logJTafException SEVERE: The following internal exception was caught: 5,002 (UIMA_ERR_ENGINE_UNEXPECTED_EXCEPTION) Apr 05, 2016 9:25:25 AM org.apache.uima.uimacpp.UimacppAnalysisComponent logJTafExcep
RE: UIMACPP and multi-threading
Many thanks for your feedback, benjamin -- Benjamin De Boe | Product Manager M: +32 495 19 19 27 | T: +32 2 464 97 33 InterSystems Corporation | http://www.intersystems.com -Original Message- From: Eddie Epstein [mailto:eaepst...@gmail.com] Sent: Tuesday, April 5, 2016 12:47 AM To: user@uima.apache.org Subject: Re: UIMACPP and multi-threading Hi Benjamin, UIMACPP is thread safe, as is the JNI interface. To confirm, I just created a UIMA-AS service with 10 instances of DaveDetector, and fed the service 800 CASes with up to 10 concurrent CASes at any time. It is not the case with DaveDetector, but at annotator initialization some analytics will store info in thread local storage, and expect the same thread be used to call the annotator process method. UIMA-AS and DUCC guarantee that an instantiated AE is always called on the same thread. Eddie On Mon, Apr 4, 2016 at 10:56 AM, Benjamin De Boe < benjamin.de...@intersystems.com> wrote: > Hi, > > We're working with a UIMACPP annotator (wrapping our existing NLP > library) and are running in what appears to be thread safety issues, > which we can reproduce with the DaveDetector demo AE. > When separate threads are accessing separate instances of the > org.apache.uima.uimacpp.UimacppAnalysisComponent wrapper class on the > Java side, it appears they are invoking the same object on the C++ > side, which results in quite a mess (access violations and process > crashes) when different threads concurrently invoke resetJNI() and > fillCASJNI() on the org.apache.uima.uimacpp.UimacppAnalysisComponent > object. When using a small CAS pool on the Java side, the problem does > not seem to occur, but it resurfaces if the CAS pool grows bigger and > memory settings are not increased accordingly. However, if this were a > pure memory issue, we had hoped to see more telling errors and just > guessing how big memory should be for larger deployments isn't very appealing > an option either. > Adding the synchronized keyword to the relevant method of the wrapper > class on the Java side also avoids the issue, at the obvious cost of > performance. Moving to UIMA-AS is not an option for us, currently. > > Given that the documentation is not explicit about it, we're hoping to > get an unambiguous answer from this list: is UIMACPP actually supposed > to be thread-safe? We saw old and resolved JIRA's that addressed > thread-safety issues for UIMACPP, so we assumed it was the case, but > reality seems to point in the opposite direction. > > > Thanks in advance for your feedback, > > benjamin > > > -- > Benjamin De Boe | Product Manager > M: +32 495 19 19 27 | T: +32 2 464 97 33 InterSystems Corporation | > http://www.intersystems.com > >
Re: UIMACPP and multi-threading
Hi Benjamin, UIMACPP is thread safe, as is the JNI interface. To confirm, I just created a UIMA-AS service with 10 instances of DaveDetector, and fed the service 800 CASes with up to 10 concurrent CASes at any time. It is not the case with DaveDetector, but at annotator initialization some analytics will store info in thread local storage, and expect the same thread be used to call the annotator process method. UIMA-AS and DUCC guarantee that an instantiated AE is always called on the same thread. Eddie On Mon, Apr 4, 2016 at 10:56 AM, Benjamin De Boe < benjamin.de...@intersystems.com> wrote: > Hi, > > We're working with a UIMACPP annotator (wrapping our existing NLP library) > and are running in what appears to be thread safety issues, which we can > reproduce with the DaveDetector demo AE. > When separate threads are accessing separate instances of the > org.apache.uima.uimacpp.UimacppAnalysisComponent wrapper class on the Java > side, it appears they are invoking the same object on the C++ side, which > results in quite a mess (access violations and process crashes) when > different threads concurrently invoke resetJNI() and fillCASJNI() on the > org.apache.uima.uimacpp.UimacppAnalysisComponent object. When using a small > CAS pool on the Java side, the problem does not seem to occur, but it > resurfaces if the CAS pool grows bigger and memory settings are not > increased accordingly. However, if this were a pure memory issue, we had > hoped to see more telling errors and just guessing how big memory should be > for larger deployments isn't very appealing an option either. > Adding the synchronized keyword to the relevant method of the wrapper > class on the Java side also avoids the issue, at the obvious cost of > performance. Moving to UIMA-AS is not an option for us, currently. > > Given that the documentation is not explicit about it, we're hoping to get > an unambiguous answer from this list: is UIMACPP actually supposed to be > thread-safe? We saw old and resolved JIRA's that addressed thread-safety > issues for UIMACPP, so we assumed it was the case, but reality seems to > point in the opposite direction. > > > Thanks in advance for your feedback, > > benjamin > > > -- > Benjamin De Boe | Product Manager > M: +32 495 19 19 27 | T: +32 2 464 97 33 > InterSystems Corporation | http://www.intersystems.com > >