Hi,
I have working code to deserialize XCAS files and read-only process them
further, it is based on CASConsumerTestDriver.java, an example is :
// inputs to the CAS file and the AE from cTAKES, templates here
String xCasLocation = <location-of-CAS-file>;
String taeDescriptionLocation =
<location-of-AggregatePlaintextFastUMLSProcessor.xml>;
// initialize the ae
InputStream xCasStream = new FileInputStream(xCasLocation);
AnalysisEngineDescription taeDescription =
UIMAFramework.getXMLParser().parseAnalysisEngineDescription(
new XMLInputSource(new File(taeDescriptionLocation)));
AnalysisEngine ae = UIMAFramework.produceAnalysisEngine(taeDescription);
// read CAS
CAS cas = ae.newCAS();
XCASDeserializer.deserialize(xCasStream, cas);
// print out the Sofa
System.out.println(cas.getSofaDataString());
// create jCAS and print out some UmlsConcepts
JFSIndexRepository indexes = cas.getJCas().getJFSIndexRepository();
Iterator iterator =
indexes.getAnnotationIndex(SignSymptomMention.type).iterator();
while (iterator.hasNext()) {
SignSymptomMention annot = (SignSymptomMention) iterator.next();
System.out.println(annot.getCoveredText());
// further read the annotation
FSArray ocArr = annot.getOntologyConceptArr();
// ...
}
The code above runs fine, but runs sequentially. I have a lot of CAS files and
would like to process them in parallel (for instance to extract some values and
store them in another DB).
My question:
Can I give a reference to the above created AnalysisEngine ae to code that is
run in parallel (java.util.concurrent.Callable or parallel Java 8 Streams, it
does not matter), provided that I only use read operations (such as
annot.getCoveredText() or some other calls to get the CUI) and no two Threads
would work on the same CAS ?
I read in
https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.0+Component+Use+Guide
that "cTAKES is not designed to be thread safe", but here I would be doing
read-only operations to extract concepts and CUIs from JCas objects. No new
annotations would be created, no annotators called.
If this is not recommended, what would be the best course of action to
deserialize and read-only process these CAS files?
Thanks for any help, I would really appreciate it
Tomasz