Re: Run an analysis engine after processing document collection?

Jens Grivolla Sat, 23 Dec 2017 11:00:04 -0800

Hi Ben,

if I understand correctly you want to run a process once the whole
collection has been analyzed. You can have an AnalysisEngine that does this
by implementing
http://uima.apache.org/d/uimaj-2.10.0/apidocs/org/apache/uima/analysis_engine/AnalysisEngine.html#collectionProcessComplete()

You just need to make sure that you gather all the necessary information
somehow. If the AE that calculates the statistics is at the end of the
pipeline and you have only one instance of it it's easy to gather all the
information there. Or you could just write everything you need to a
centralized datastore (i.e. a database) and use that to calculate the
statistics.

If I didn't misunderstand you, that's really a quite common scenario.

Best,
Jens

On Fri, Dec 22, 2017 at 6:26 PM, Benedict Holland <
benedict.m.holl...@gmail.com> wrote:

> Hello All,
>
> I find myself in a strange situation. I have a content processing engine
> working. I have N threads populating N CAS objects and running my pipeline.
> Each CAS object gets 1 piece of data, like say a row in a database. Each
> process is entirely independent and can run concurrently. I specifically
> did not configure this pipeline as an aggregate process as I don't really
> care when the events trigger since the CPE maintains the order of the
> engines.
>
> Now I want to add an analysis that will run over the aggregate output. For
> example, I processed N texts using the CPE and now I want to run a TF-IDF
> analysis over the entire corpora. The TF-IDF analysis should only run once
> all documents are processed.
>
> How would I go about doing this? Does this have to do with not allowing
> multiple deployments?
>
> Thanks,
> ~Ben
>

Re: Run an analysis engine after processing document collection?

Reply via email to