Re: Mahout & Solr
You're right...It would be nice to be able to see the cluster results coming from Solr though... Adam On Thu, Jun 16, 2011 at 3:21 AM, Andrew Clegg wrote: > Well, it does have the ability to pull TermVectors from an index: > > > https://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html#CreatingVectorsfromText-FromLucene > > Nothing Solr-specific about it though. > > On 15 June 2011 15:38, Mark wrote: > > "Apache Mahout is a new Apache TLP project to create scalable, machine > > learning algorithms under the Apache license. It is related to other > Apache > > Lucene projects and integrates well with Solr." > > > > How does Mahout integrate well with Solr? Can someone explain a brief > > overview on whats available. I'm guessing one of the features would be > the > > replacing of the Carrot2 clustering algorithm with something a little > more > > sophisticated? > > > > Thanks > > > > > > -- > > http://tinyurl.com/andrew-clegg-linkedin | http://twitter.com/andrew_clegg >
Re: Mahout & Solr
Hmm, I suppose I have the same question from the Mahout side (I didn't write that text). I would certainly call this far more related to Hadoop than Lucene, though there are some Lucene touch-points, but no direct connection to Solr that I'm aware of. If I'm not wildly mistaken then I can edit the wiki. On Wed, Jun 15, 2011 at 3:38 PM, Mark wrote: > "Apache Mahout is a new Apache TLP project to create scalable, machine > learning algorithms under the Apache license. It is related to other Apache > Lucene projects and integrates well with Solr." > > How does Mahout integrate well with Solr? Can someone explain a brief > overview on whats available. I'm guessing one of the features would be the > replacing of the Carrot2 clustering algorithm with something a little more > sophisticated? > > Thanks >
Re: Mahout & Solr
> > Is it possible to use the clustering component to use predefined clusters > generated by Mahout? Actually, the existing Solr ClusteringComponent's API has been designed to deal with both search results clustering (implemented by Carrot2) and off-line clustering of the whole index. The latter has not yet been implemented, so the API is very likely to change depending on the specific design decisions (should clustering be triggered through Solr or externally?, should the clusters be stored in Solr?, how to handle new documents?, how to use the clusters at search time?). I can also imagine a simpler approach based on a search results clustering "algorithm" that would simply fetch Mahout's predefined clusters for each document being returned in the search result. Getting this to work is a matter of implementing a dedicated http://lucene.apache.org/solr/api/org/apache/solr/handler/clustering/SearchClusteringEngine.html and should be fairly straightforward, at least in terms of interaction with Solr. Staszek
Re: Mahout & Solr
I was hoping this wasn't the case :( Is it possible to use the clustering component to use predefined clusters generated by Mahout? On 6/15/11 9:14 AM, Sean Owen wrote: Hmm, I suppose I have the same question from the Mahout side (I didn't write that text). I would certainly call this far more related to Hadoop than Lucene, though there are some Lucene touch-points, but no direct connection to Solr that I'm aware of. If I'm not wildly mistaken then I can edit the wiki. On Wed, Jun 15, 2011 at 3:38 PM, Mark wrote: "Apache Mahout is a new Apache TLP project to create scalable, machine learning algorithms under the Apache license. It is related to other Apache Lucene projects and integrates well with Solr." How does Mahout integrate well with Solr? Can someone explain a brief overview on whats available. I'm guessing one of the features would be the replacing of the Carrot2 clustering algorithm with something a little more sophisticated? Thanks
Re: Mahout & Solr
The only integration at this point (as far as I can tell) is that Mahout can read the lucene index created by Solr. I agree that it would be nice to swap out the Carrot2 clustering engine with Mahout's set of algorithms but that has not been done yet. Grant has pointed out that you can use Solr's callback system to fire off another task like a mahout job. http://www.lucidimagination.com/blog/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/ Adam On Wed, Jun 15, 2011 at 10:38 AM, Mark wrote: > "Apache Mahout is a new Apache TLP project to create scalable, machine > learning algorithms under the Apache license. It is related to other Apache > Lucene projects and integrates well with Solr." > > How does Mahout integrate well with Solr? Can someone explain a brief > overview on whats available. I'm guessing one of the features would be the > replacing of the Carrot2 clustering algorithm with something a little more > sophisticated? > > Thanks >