Hi,

I am following up on a discussion previously in the "re: ctakes web
service" thread from this month. Apologies if I summarize anyone's comments
incorrectly. Sean had commented that it would not be advisable to create a
pool of pipelines and dispatch 1 per thread in the same process because the
individual AEs have static variables and resources that would be shared
across instances. I can comment that anecdotally, we have not seen crashes
when doing this (but we have seen crashes when we are trying to share 1
pipeline across > 1 thread). Nevertheless, I cannot guarantee that the
annotations are happening correctly all the time or that we might not
occasionally get unlucky and enter into a race condition. It also sounds
like from Peter's comment in the previous thread,
https://lists.apache.org/thread.html/93da8248b03b1c59135fb9b4030b0546a4631ec32d6f5c779d2821cc@%3Cdev.ctakes.apache.org%3E
that a pipeline pool across multiple threads has been stable for his work.
I have a couple of questions:

1) Does anyone else have experience with this? Sean, from your comments
before, do you think it might not crash yet produce unreliable results when
using the components in the DefaultClinicalPipeline?

2) Sean, you commented before

> That being said, supposedly you can configure Spark to handle this by
keeping everything contained in a unique copy per thread.  Sort of like
ThreadLocal (I think), but more effective on a full-pipeline level.

Do you have any more information about this- we are currently looking into
it, and it looks like it should be possible to limit each executor (JVM) to
a single thread, but I was wondering if you had any references to the
ThreadLocal-style setup or knew anyone else that had tried it.

3) In the TS pipelines, what does the "threads" keyword in the piper file
actually enforce? Is it the number of threads it will allow you to share
the pipeline with or does it automatically create a threaded pipeline for
you?

Thanks!
Jeff

Reply via email to