Hi, I am following up on a discussion previously in the "re: ctakes web service" thread from this month. Apologies if I summarize anyone's comments incorrectly. Sean had commented that it would not be advisable to create a pool of pipelines and dispatch 1 per thread in the same process because the individual AEs have static variables and resources that would be shared across instances. I can comment that anecdotally, we have not seen crashes when doing this (but we have seen crashes when we are trying to share 1 pipeline across > 1 thread). Nevertheless, I cannot guarantee that the annotations are happening correctly all the time or that we might not occasionally get unlucky and enter into a race condition. It also sounds like from Peter's comment in the previous thread, https://lists.apache.org/thread.html/93da8248b03b1c59135fb9b4030b0546a4631ec32d6f5c779d2821cc@%3Cdev.ctakes.apache.org%3E that a pipeline pool across multiple threads has been stable for his work. I have a couple of questions:
1) Does anyone else have experience with this? Sean, from your comments before, do you think it might not crash yet produce unreliable results when using the components in the DefaultClinicalPipeline? 2) Sean, you commented before > That being said, supposedly you can configure Spark to handle this by keeping everything contained in a unique copy per thread. Sort of like ThreadLocal (I think), but more effective on a full-pipeline level. Do you have any more information about this- we are currently looking into it, and it looks like it should be possible to limit each executor (JVM) to a single thread, but I was wondering if you had any references to the ThreadLocal-style setup or knew anyone else that had tried it. 3) In the TS pipelines, what does the "threads" keyword in the piper file actually enforce? Is it the number of threads it will allow you to share the pipeline with or does it automatically create a threaded pipeline for you? Thanks! Jeff