to a Solr
index.
I think you could take advantage of UIMA Collection Processing Engine [1],
particularly by using a UIMA-AS based architecture since it looks like you
are handling huge collections [2].
Apart from the specific algorithms used for clustering / collapsing, which
would define the UIMA
This sounds like you are actually looking for the project next door: Mahout.
UIMA really doesn't have a lot to do with clustering (although you could
do some things). We do use UIMA for information extraction *before*
clustering and sending it to Solr, though, as a sort of preprocessing to
Hi all,
I recently discovered Apache UIMA, and it looks like a very large project! I
was hoping that someone more experienced with it than I could comment on
whether there are parts of the project that could help with my problem.
I need to go over many millions of objects (Protocol Buffers in