FWIW, the method that we use for doing large batches is to create a
pipeline descriptor using uimafit, create a reader such as
FilesInDirectoryCollectionReader or UriCollectionReader, and then use a
JCasIterable to wrap a for-loop around every document. This lets you
collect statistics or data structures from every document, say, and
then do something with them at the end.
// create engine and reader
...
// loop over documents:
for(JCas jcas : new JCasIterable(readerDescription,
aggregate.createAggregateDescription()){
// handle jcas for one document
}
// any other code to finish up
...
Alternatively, if you can easily handle everything within the pipeline
you can just use SimplePipeline:
SimplePipeline.runPipeline(
collectionReader,
aggregateBuilder.createAggregate(),
xWriter);
Tim
On Sun, 2017-01-22 at 14:36 +0000, Arron Lacey wrote:
> Thanks very much Sean. Didn't work unfortunately - but I am curious
> if
> you don't personally use the CPE, how to you batch process documents?
>
> I would like to just run the AggregatePlaintextFastUMLSProcessor.xml
> on
> all files in a given directory - perhaps with *some* control over
> the
> output filenames.
>
> Thanks,
>
> Arron.
>
> On Fri, 20 Jan, 2017 at 4:38 PM, Finan, Sean
> <[email protected]> wrote:
> >
> > Hi Arron Lacey,
> >
> > That particular cas consumer java class is a uimafit-paradigm
> > implementation, and from my memory the CPE gui does not play well
> > with Uimafit. I could be wrong - I never use the cpe anymore.
> >
> > You might be able to get things working by changing line #23 in
> > the
> > .xml file from
> >
> > <implementationName>org.apache.ctakes.core.cc.XmiWriterCasConsumerC
> > takes</implementationName>
> >
> > To
> > <implementationName>org.apache.uima.tools.components.XmiWriterCasCo
> > nsumer</implementationName>
> >
> > As far as I know the ctakes version is the same as the uima
> > version
> > but with better output file naming and a uimafit framing.
> >
> > Again, I'm not certain that the problem is cpe : uimafit
> > incompatibility. If somebody else out there knows better then
> > please
> > speak up.
> >
> > Good luck,
> > Sean
> >
> > -----Original Message-----
> > From: Arron Lacey [mailto:[email protected]]
> > Sent: Friday, January 20, 2017 11:13 AM
> > To: [email protected]
> > Subject: Cannot load XMIWriterCasConsumer.xml with CPE.sh
> >
> > Hi - I am trying to use the CPI to output results using the CAS
> > Consumer: __XmiWriterCasConsumer.xml
> >
> >
> > but here is the error message I am getting:
> >
> > >
> > > org.apache.uima.resource.ResourceInitializationException
> > > CausedBy:
> > > org.apache.uima.resource.ResourceConfigurationException
> > > CausedBy: java.lang.Exception: The component XMI Writer CAS
> > > Consumer
> > > cannot be created (Thread name: Thread-4)
> > My setup is using:
> >
> > Collection Reader
> > >
> > >
> > > desc/ctakes-
> > > core/desc/collection_reader/FilesInDirectoryCollectionReader.xml
> > Analysis Engine
> > >
> > >
> > > desc/ctakes-clinical-
> > > pipeline/desc/analysis_engine/AggregatePlaintextFastUMLSProcessor
> > > .xml
> > CAS Consumer
> > >
> > > desc/ctakes-core/desc/cas_consumer/__XmiWriterCasConsumer.xml
> > I can get the normal XML writer to work, so I would like to ask
> > what I
> > need to do to my pipeline to use the XMI Writer?
> >
> > Thanks very much,
> >
> > Arron Lacey.