Re: Clustering, Collapsing

2012-06-11 Thread Tommaso Teofili
to a Solr index. I think you could take advantage of UIMA Collection Processing Engine [1], particularly by using a UIMA-AS based architecture since it looks like you are handling huge collections [2]. Apart from the specific algorithms used for clustering / collapsing, which would define the UIMA

Re: Clustering, Collapsing

2012-06-11 Thread Jens Grivolla
This sounds like you are actually looking for the project next door: Mahout. UIMA really doesn't have a lot to do with clustering (although you could do some things). We do use UIMA for information extraction *before* clustering and sending it to Solr, though, as a sort of preprocessing to

Clustering, Collapsing

2012-06-08 Thread Deejay
Hi all, I recently discovered Apache UIMA, and it looks like a very large project! I was hoping that someone more experienced with it than I could comment on whether there are parts of the project that could help with my problem. I need to go over many millions of objects (Protocol Buffers in