Re: Mahout & Solr

2011-06-16 Thread Adam Estrada
You're right...It would be nice to be able to see the cluster results coming
from Solr though...

Adam

On Thu, Jun 16, 2011 at 3:21 AM, Andrew Clegg  wrote:

> Well, it does have the ability to pull TermVectors from an index:
>
>
> https://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html#CreatingVectorsfromText-FromLucene
>
> Nothing Solr-specific about it though.
>
> On 15 June 2011 15:38, Mark  wrote:
> > "Apache Mahout is a new Apache TLP project to create scalable, machine
> > learning algorithms under the Apache license. It is related to other
> Apache
> > Lucene projects and integrates well with Solr."
> >
> > How does Mahout integrate well with Solr? Can someone explain a brief
> > overview on whats available. I'm guessing one of the features would be
> the
> > replacing of the Carrot2 clustering algorithm with something a little
> more
> > sophisticated?
> >
> > Thanks
> >
>
>
>
> --
>
> http://tinyurl.com/andrew-clegg-linkedin | http://twitter.com/andrew_clegg
>


Re: Mahout & Solr

2011-06-15 Thread Sean Owen
Hmm, I suppose I have the same question from the Mahout side (I didn't
write that text). I would certainly call this far more related to
Hadoop than Lucene, though there are some Lucene touch-points, but no
direct connection to Solr that I'm aware of.

If I'm not wildly mistaken then I can edit the wiki.

On Wed, Jun 15, 2011 at 3:38 PM, Mark  wrote:
> "Apache Mahout is a new Apache TLP project to create scalable, machine
> learning algorithms under the Apache license. It is related to other Apache
> Lucene projects and integrates well with Solr."
>
> How does Mahout integrate well with Solr? Can someone explain a brief
> overview on whats available. I'm guessing one of the features would be the
> replacing of the Carrot2 clustering algorithm with something a little more
> sophisticated?
>
> Thanks
>


Re: Mahout & Solr

2011-06-15 Thread Stanislaw Osinski
>
> Is it possible to use the clustering component to use predefined clusters
> generated by Mahout?


Actually, the existing Solr ClusteringComponent's API has been designed to
deal with both search results clustering (implemented by Carrot2) and
off-line clustering of the whole index. The latter has not yet been
implemented, so the API is very likely to change depending on the specific
design decisions (should clustering be triggered through Solr or
externally?, should the clusters be stored in Solr?, how to handle new
documents?, how to use the clusters at search time?).

I can also imagine a simpler approach based on a search results clustering
"algorithm" that would simply fetch Mahout's predefined clusters for each
document being returned in the search result. Getting this to work is a
matter of implementing a dedicated
http://lucene.apache.org/solr/api/org/apache/solr/handler/clustering/SearchClusteringEngine.html
and
should be fairly straightforward, at least in terms of interaction with
Solr.

Staszek


Re: Mahout & Solr

2011-06-15 Thread Mark

I was hoping this wasn't the case :(

Is it possible to use the clustering component to use predefined 
clusters generated by Mahout?


On 6/15/11 9:14 AM, Sean Owen wrote:

Hmm, I suppose I have the same question from the Mahout side (I didn't
write that text). I would certainly call this far more related to
Hadoop than Lucene, though there are some Lucene touch-points, but no
direct connection to Solr that I'm aware of.

If I'm not wildly mistaken then I can edit the wiki.

On Wed, Jun 15, 2011 at 3:38 PM, Mark  wrote:

"Apache Mahout is a new Apache TLP project to create scalable, machine
learning algorithms under the Apache license. It is related to other Apache
Lucene projects and integrates well with Solr."

How does Mahout integrate well with Solr? Can someone explain a brief
overview on whats available. I'm guessing one of the features would be the
replacing of the Carrot2 clustering algorithm with something a little more
sophisticated?

Thanks



Re: Mahout & Solr

2011-06-15 Thread Adam Estrada
The only integration at this point (as far as I can tell) is that Mahout can
read the lucene index created by Solr. I agree that it would be nice to swap
out the Carrot2 clustering engine with Mahout's set of algorithms but that
has not been done yet. Grant has pointed out that you can use Solr's
callback system to fire off another task like a mahout job.

http://www.lucidimagination.com/blog/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/

Adam

On Wed, Jun 15, 2011 at 10:38 AM, Mark  wrote:

> "Apache Mahout is a new Apache TLP project to create scalable, machine
> learning algorithms under the Apache license. It is related to other Apache
> Lucene projects and integrates well with Solr."
>
> How does Mahout integrate well with Solr? Can someone explain a brief
> overview on whats available. I'm guessing one of the features would be the
> replacing of the Carrot2 clustering algorithm with something a little more
> sophisticated?
>
> Thanks
>