Hi Sean,
I'm building the jar within IntelliJ IDEA. Project Structure>Artifacts>New
artifact>Jar>From modules with dependencies... I've made one of my pipeline
classes the main class, am extracting to the target JAR, and put the manifest
file in my resources directory.
Here's what my pipeline looks like, to give you a sense of my goals:
CollectionReaderDescription collectionReader =
CollectionReaderFactory.createReaderDescriptionFromPath("C:/Users/eng148/Documents/GitHub/cTAKESPipelines/src/resources/desc/FilesInDirectoryCollectionReader.xml",
ConfigParameterConstants.PARAM_INPUTDIR,
inputDir);
AggregateBuilder aggregateBuilder = new AggregateBuilder();
aggregateBuilder.add(SimpleSegmentAnnotator.createAnnotatorDescription());
aggregateBuilder.add(SentenceDetector.createAnnotatorDescription());
aggregateBuilder.add(TokenizerAnnotatorPTB.createAnnotatorDescription());
// aggregateBuilder.add(LvgAnnotator.createAnnotatorDescription());
//URI not hierarchical error
aggregateBuilder.add(ContextDependentTokenizerAnnotator.createAnnotatorDescription());
aggregateBuilder.add(POSTagger.createAnnotatorDescription());
aggregateBuilder.add(Chunker.createAnnotatorDescription("C:/Users/eng148/Documents/GitHub/cTAKESPipelines/src/resources/model/chunker-model.zip"));
aggregateBuilder.add(ChunkAdjuster.createAnnotatorDescription(new
String[] {"NP", "NP"}, 1));
aggregateBuilder.add(ChunkAdjuster.createAnnotatorDescription(new
String[] {"NP", "PP", "NP"}, 2));
aggregateBuilder.add(DefaultJCasTermAnnotator.createAnnotatorDescription("C:/Users/eng148/Documents/GitHub/cTAKESPipelines/src/resources/desc/BsvDictionaryAD.xml"));
aggregateBuilder.add(ClearNLPDependencyParserAE.createAnnotatorDescription());
aggregateBuilder.add(AnalysisEngineFactory.createEngineDescription(ContextAnnotator.class));
// negation
aggregateBuilder.add(AnalysisEngineFactory.createEngineDescription(ContextAnnotator.class,
// status
ContextAnnotator.MAX_LEFT_SCOPE_SIZE_PARAM, 10,
ContextAnnotator.MAX_RIGHT_SCOPE_SIZE_PARAM, 10,
"ContextAnalyzerClass",
"org.apache.ctakes.necontexts.status.StatusContextAnalyzer",
"ContextHitConsumerClass",
"org.apache.ctakes.necontexts.status.StatusContextHitConsumer"));
aggregateBuilder.add(SubjectCleartkAnalysisEngine.createAnnotatorDescription());
I'm using gradle, with the following dependencies included:
compile 'org.apache.ctakes:ctakes-type-system:4.0.0'
compile 'org.apache.ctakes:ctakes-clinical-pipeline:4.0.0'
compile 'org.apache.ctakes:ctakes-core:4.0.0'
compile 'org.apache.ctakes:ctakes:4.0.0'
For the pipeline I'm worried about making portable right now, I do not need
relations, temporal information, or coreferences. But there are other pipelines
within the project for doing location relation and temporal relation
extraction. I'm still learning Java and how to use tools like gradle/maven, so
it's definitely possible that I don't need all of those dependencies listed
above. I was just erring on the side of getting it to work!
Erin
-----Original Message-----
From: Finan, Sean [mailto:[email protected]]
Sent: Thursday, June 08, 2017 10:39 AM
To: [email protected]
Subject: RE: Pipeline executable
Before I dig into the error and all enigmas uima, how are you building the jar?
Also, what do you need from ctakes? If you do not need the higher functions
for relations, temporal information, coreferences ... or the sideline items
like smoking status, drug-ner, ytex (a big one) ... then you can probably
create a jar that is about half that size just by getting rid of their
libraries and dependencies.
Sean
-----Original Message-----
From: Erin Gustafson [mailto:[email protected]]
Sent: Thursday, June 08, 2017 11:33 AM
To: [email protected]
Subject: RE: Pipeline executable
Within org.apache.ctakes.typesystem itself there are no classes, but in
org.apache.ctakes.typesystem.type.textspan I do see Segment.class.
The jar is indeed huge (1.14 GB). Open to any suggestions for the most
efficient way to go about this!
Erin
-----Original Message-----
From: Finan, Sean [mailto:[email protected]]
Sent: Thursday, June 08, 2017 10:24 AM
To: [email protected]
Subject: RE: Pipeline executable
Hi Erin,
Do you have any classes in ctakes-typesystem (org.apache.ctakes.type.system)?
It could be that jcasgen needs to be run.
Just out of curiosity, how huge is your jar file? You may be able to decrease
the size ...
Sean
-----Original Message-----
From: Erin Gustafson [mailto:[email protected]]
Sent: Thursday, June 08, 2017 11:16 AM
To: [email protected]
Subject: Pipeline executable
Hi all,
I have a project that contains a series of classes to build cTAKES pipelines.
I've been successfully running the pipelines myself within an IDE, but would
like to be able to provide collaborators with an executable jar file to run our
pipeline.
So far, I've managed to build a jar that will start running the pipeline from
the command line. It successfully initializes the annotators but throws an
exception when processing begins:
Exception in thread "main" java.lang.IllegalStateException:
org.apache.uima.resource.ResourceInitializationException: Undefined type
"org.apache.ctakes.typesystem.type.textspan.Segment" in type priority list.
(Descriptor: <unknown>)
Any thoughts about how to resolve this error? Let me know if I can provide any
more information..
Thanks,
Erin