Re: Reg:-Integrating Mahout with Solr

2017-04-06 Thread arun abraham
Hi Pat,

Thanks a lot for the detailed reply,it guided me to read more on to the
recommendation features provided by mahout.

I have been trying to find a recommendation  approach for my application.
Kindly find the details of my approach panned for item recommendation.
Kindly request you to correct me if I go wrong with the recommendation
approach.

I am trying to do an Item based search,which depends on the keyword
provided by the user to search within Solr index.
Solr returns the following for each document- id,file Name,content of the
document(TIKA extracted),author.

As a first step,I am thinking not to implement a complex recommendation,but
rather a item based/user based one which I believe require less
complexity.I have been doing some hands on with sample mahout APIs examples
to generate item based recommendation with sample data set.

My application(search tool) helps to find the appropriate documents from
LAN(documents are indexed using Solr) upon search input,application returns
documents with respective details.I would like to have the recommendations
displayed for each documents.

We have a rating feature where user can rate the document(0- 5),for this we
have a table in MySQL with id,user_id,doc Name,rating,time-stamp.
We also have a table where the document interaction details are
stored-id,user_id,doc_id,no.of view,no.of search for each document.

I would like to combine the rating and document interaction tables data to
create a item recommendation which I believe would provide better
recommendations.
Can I accomplish the task without using Solr integrated,is it possible to
use only mahout APIs and data from MySQL to create a efficient
recommendation functionality?
It would be helpful if I get your comments on the scope I mentioned above
and also on the implementation part.


Kindly guide me on the same.


Thanks and Regards,
Arun

On Apr 3, 2017 12:40 AM, "Pat Ferrel"  wrote:

> Ted’s cautions still apply regarding interactions per item and per user.
> Do not ignore this advice.
>
> Also doing behavioral boosting in search is very different from item-based
> recommendations. Behavioral boosting will give you only a small amount of
> lift vs creating a recommender. Intuitively think of the fact that you may
> have many items to recommend to the user but the added restriction of
> containing the search terms means you will throw away most of the
> recommendations you might make only to meet this requirement. Item-based
> recs are the ones you show at the bottom of a product page or item being
> read that are “similar” to the item the user is looking at in terms of
> other interactions users make. Here there are no restrictions about what
> terms these recommendations must contain. Therefor a recommender is better
> than behavioral boosting as a general rule but since they can be used
> together, it is a good this to implement as a second step if you have the
> right kind of data.
>
> If you or anyone else reading this still needs behavioral search boosting
> read on...
>
> As to integrating “behavioral boosting” with search, you will need to
> create indicators by recording interactions. What are your conversion
> events? Read, Buy? This will be your primary interaction, the one you want
> to see happen more often. Then record secondary interactions, if you have
> them. For an E-Commerce app the primary / conversion interaction is a
> “buy”, one possible secondary would be a “product detail view” but there
> are several other things you might record.
>
> Do you plan to write Scala code or use the Mahout CLI drivers? To use the
> driver is not the ideal production tool but does work. You feed in a csv
> for each interactions type recorded with the primary csv recording the
> conversions interactions you want to favor. You will get our a series of
> csvs that have data you can put into your Solr index since the key if each
> row of the csv will be the item, the value will be a list of inicators you
> should attache to the item in your index as a new field of type String
> Array. So we are talking about the index you already have for items and
> augmenting it with these behavioral indicators. If the indicator is “buy”
> the you index will now have item “documents” with fields for your content,
> maybe title, body, etc. Then you will add the behavioral indicators for
> “buy”, “detail-view” etc. The use of the Mahout CLI drive for
> “spark-itemsimilarity” is here: http://mahout.apache.org/
> users/algorithms/intro-cooccurrence-spark.html  users/algorithms/intro-cooccurrence-spark.html>
>
> When you query, construct a query that must match some of the search
> terms, but ask Solr to boost any items that also match the user’s history
> if it can. This will cause items that the user is likely to favor to be
> boosted in ranking. This also shows how search terms limit what can be done
> to “recommend” items. Users expect that the words they use in 

Re: Reg:-Integrating Mahout with Solr

2017-04-02 Thread Pat Ferrel
Ted’s cautions still apply regarding interactions per item and per user. Do not 
ignore this advice. 

Also doing behavioral boosting in search is very different from item-based 
recommendations. Behavioral boosting will give you only a small amount of lift 
vs creating a recommender. Intuitively think of the fact that you may have many 
items to recommend to the user but the added restriction of containing the 
search terms means you will throw away most of the recommendations you might 
make only to meet this requirement. Item-based recs are the ones you show at 
the bottom of a product page or item being read that are “similar” to the item 
the user is looking at in terms of other interactions users make. Here there 
are no restrictions about what terms these recommendations must contain. 
Therefor a recommender is better than behavioral boosting as a general rule but 
since they can be used together, it is a good this to implement as a second 
step if you have the right kind of data.

If you or anyone else reading this still needs behavioral search boosting read 
on...

As to integrating “behavioral boosting” with search, you will need to create 
indicators by recording interactions. What are your conversion events? Read, 
Buy? This will be your primary interaction, the one you want to see happen more 
often. Then record secondary interactions, if you have them. For an E-Commerce 
app the primary / conversion interaction is a “buy”, one possible secondary 
would be a “product detail view” but there are several other things you might 
record.

Do you plan to write Scala code or use the Mahout CLI drivers? To use the 
driver is not the ideal production tool but does work. You feed in a csv for 
each interactions type recorded with the primary csv recording the conversions 
interactions you want to favor. You will get our a series of csvs that have 
data you can put into your Solr index since the key if each row of the csv will 
be the item, the value will be a list of inicators you should attache to the 
item in your index as a new field of type String Array. So we are talking about 
the index you already have for items and augmenting it with these behavioral 
indicators. If the indicator is “buy” the you index will now have item 
“documents” with fields for your content, maybe title, body, etc. Then you will 
add the behavioral indicators for “buy”, “detail-view” etc. The use of the 
Mahout CLI drive for “spark-itemsimilarity” is here: 
http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html 


When you query, construct a query that must match some of the search terms, but 
ask Solr to boost any items that also match the user’s history if it can. This 
will cause items that the user is likely to favor to be boosted in ranking. 
This also shows how search terms limit what can be done to “recommend” items. 
Users expect that the words they use in search must be somewhere in the content 
so we are limited to re-ranking term-based search. This is not as strong as a 
recommender but should still be in your bag of tricks as it is with the “big 
guys” like Amazon.

Send me a private email if you are looking for hands-on help with this.


On Apr 1, 2017, at 6:21 PM, arun abraham  wrote:

Hi,

Thanks Pat for the reply.

I am trying to implement item based recommendation as the first step.When
the user searches with a keyword(using Solr),not only it should return
keyword matching results(already implemented along with other search
features of Solr) but also related documents(recommended).

I believe implementing item based recommendation will be a good learning
curve towards implementing the user based recommendation or Behavioral
based.As  a first step I am trying to recommend min of two documents(As my
Solr document index is ~100 docs).

I understood that in the above scenario,first step is to provide the Solr
index to mahout to read and will generate a vector file from it.
It will be helpful if I get guidance on the integration steps to follow for
the same.

Thanks and Regards,
Arun


On 1 April 2017 at 23:46, Pat Ferrel  wrote:

> You want to create “Behavioral Search”? This is where you boost items that
> have the search terms in them more likely to be favored by the individual
> user?
> 
> You want to use the CCO algorithm in Mahout. You need to collect
> behavioral information like conversions, detailed page views, etc. Run each
> event through CCO and you get a collection of “indicators” as item
> attributes. Augment the Solr index with fields (indicators) attached to
> item documents. Then at query time supply the search terms as a “must
> match” and use user history as the query segment against the corresponding
> indicator fields as a “should match” with some boosting factor.
> 
> CCO is here: http://mahout.apache.org/users/algorithms/intro-
> 

Re: Reg:-Integrating Mahout with Solr

2017-04-02 Thread Ted Dunning
Hundreds of users are going to generate a really, really tiny amount of
data (relative to the normal amounts that recommenders get to see).

The problem is that hundreds of hyper-active users who issue thousands of
queries are only going to generate a tiny amount of data per document. You
will need to have roughly 20 positive interactions per document to get
decent performance. If you have a thousand documents, that means you will
need an absolutely (and implausible) 20 thousand engagements. Because the
distribution will be very lop-sided, you probably need 10-100x more than
that.

The final result is your hundreds of users would likely need to issue
thousands of queries. Each.

That seems like a lot.

You should get good results for a small minority of documents at smaller
data volumes.




On Sat, Apr 1, 2017 at 11:37 PM, arun abraham 
wrote:

> Hi Ted,
>
> Each documents to be indexed by Solr has  fairly large content in it and
> 100+ users searching within it(once the solr search tool goes live).
> Kindly guide me on the integration steps for mahout with Solr(with respect
> all the stats mentioned).
>
> Thanks and Regards,
> Arun
>
> On 2 April 2017 at 11:59, Ted Dunning  wrote:
>
> > Arun,
> >
> > That's good news.
> >
> > The second limitation will be how much data you have for each document
> and
> > whehter you have a good measure of how engaged users are with documents.
> >
> >
> >
> > On Sat, Apr 1, 2017 at 6:48 PM, arun abraham 
> > wrote:
> >
> > > Hi Ted,
> > >
> > > Thanks for the reply.
> > >
> > > I understood Ted,to have  a good effective results a larger set of
> > > documents/index is required.
> > >
> > > For all the Solr related functionalities and Search,I used ~100
> docs(path
> > > pointing to my local system) to index and set things up.This is only
> for
> > > testing and implementing.
> > >
> > > Once the configuration and high level testing is done the configuration
> > > will be changed in such way the document path will be pointing to the
> LAN
> > > location where we have  a large collection of documents for indexing
> and
> > > high level testing is done.
> > >
> > > It wont be a problem for me to use the LAN path for configurations and
> > > index.I can use the larger document base.
> > >
> > > Thanks and Regards,
> > > Arun
> > >
> > > On 2 April 2017 at 07:00, Ted Dunning  wrote:
> > >
> > > > On Sat, Apr 1, 2017 at 6:21 PM, arun abraham <
> arunabraham...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > As  a first step I am trying to recommend min of two documents(As
> my
> > > > > Solr document index is ~100 docs).
> > > > >
> > > >
> > > > This is kind of weird.
> > > >
> > > > Can you say why you have so very few documents?
> > > >
> > > > There may be something special going on that will make this work
> better
> > > or
> > > > worse.
> > > >
> > > > I have seen people use indicator-based recommendations for ad
> targeting
> > > > where they had several thousand options, but haven't seen anything
> with
> > > > only 100 options.
> > > >
> > >
> >
>


Re: Reg:-Integrating Mahout with Solr

2017-04-02 Thread arun abraham
Hi Ted,

Each documents to be indexed by Solr has  fairly large content in it and
100+ users searching within it(once the solr search tool goes live).
Kindly guide me on the integration steps for mahout with Solr(with respect
all the stats mentioned).

Thanks and Regards,
Arun

On 2 April 2017 at 11:59, Ted Dunning  wrote:

> Arun,
>
> That's good news.
>
> The second limitation will be how much data you have for each document and
> whehter you have a good measure of how engaged users are with documents.
>
>
>
> On Sat, Apr 1, 2017 at 6:48 PM, arun abraham 
> wrote:
>
> > Hi Ted,
> >
> > Thanks for the reply.
> >
> > I understood Ted,to have  a good effective results a larger set of
> > documents/index is required.
> >
> > For all the Solr related functionalities and Search,I used ~100 docs(path
> > pointing to my local system) to index and set things up.This is only for
> > testing and implementing.
> >
> > Once the configuration and high level testing is done the configuration
> > will be changed in such way the document path will be pointing to the LAN
> > location where we have  a large collection of documents for indexing and
> > high level testing is done.
> >
> > It wont be a problem for me to use the LAN path for configurations and
> > index.I can use the larger document base.
> >
> > Thanks and Regards,
> > Arun
> >
> > On 2 April 2017 at 07:00, Ted Dunning  wrote:
> >
> > > On Sat, Apr 1, 2017 at 6:21 PM, arun abraham  >
> > > wrote:
> > >
> > > > As  a first step I am trying to recommend min of two documents(As my
> > > > Solr document index is ~100 docs).
> > > >
> > >
> > > This is kind of weird.
> > >
> > > Can you say why you have so very few documents?
> > >
> > > There may be something special going on that will make this work better
> > or
> > > worse.
> > >
> > > I have seen people use indicator-based recommendations for ad targeting
> > > where they had several thousand options, but haven't seen anything with
> > > only 100 options.
> > >
> >
>


Re: Reg:-Integrating Mahout with Solr

2017-04-02 Thread Ted Dunning
Arun,

That's good news.

The second limitation will be how much data you have for each document and
whehter you have a good measure of how engaged users are with documents.



On Sat, Apr 1, 2017 at 6:48 PM, arun abraham 
wrote:

> Hi Ted,
>
> Thanks for the reply.
>
> I understood Ted,to have  a good effective results a larger set of
> documents/index is required.
>
> For all the Solr related functionalities and Search,I used ~100 docs(path
> pointing to my local system) to index and set things up.This is only for
> testing and implementing.
>
> Once the configuration and high level testing is done the configuration
> will be changed in such way the document path will be pointing to the LAN
> location where we have  a large collection of documents for indexing and
> high level testing is done.
>
> It wont be a problem for me to use the LAN path for configurations and
> index.I can use the larger document base.
>
> Thanks and Regards,
> Arun
>
> On 2 April 2017 at 07:00, Ted Dunning  wrote:
>
> > On Sat, Apr 1, 2017 at 6:21 PM, arun abraham 
> > wrote:
> >
> > > As  a first step I am trying to recommend min of two documents(As my
> > > Solr document index is ~100 docs).
> > >
> >
> > This is kind of weird.
> >
> > Can you say why you have so very few documents?
> >
> > There may be something special going on that will make this work better
> or
> > worse.
> >
> > I have seen people use indicator-based recommendations for ad targeting
> > where they had several thousand options, but haven't seen anything with
> > only 100 options.
> >
>


Re: Reg:-Integrating Mahout with Solr

2017-04-01 Thread arun abraham
Hi Ted,

Thanks for the reply.

I understood Ted,to have  a good effective results a larger set of
documents/index is required.

For all the Solr related functionalities and Search,I used ~100 docs(path
pointing to my local system) to index and set things up.This is only for
testing and implementing.

Once the configuration and high level testing is done the configuration
will be changed in such way the document path will be pointing to the LAN
location where we have  a large collection of documents for indexing and
high level testing is done.

It wont be a problem for me to use the LAN path for configurations and
index.I can use the larger document base.

Thanks and Regards,
Arun

On 2 April 2017 at 07:00, Ted Dunning  wrote:

> On Sat, Apr 1, 2017 at 6:21 PM, arun abraham 
> wrote:
>
> > As  a first step I am trying to recommend min of two documents(As my
> > Solr document index is ~100 docs).
> >
>
> This is kind of weird.
>
> Can you say why you have so very few documents?
>
> There may be something special going on that will make this work better or
> worse.
>
> I have seen people use indicator-based recommendations for ad targeting
> where they had several thousand options, but haven't seen anything with
> only 100 options.
>


Re: Reg:-Integrating Mahout with Solr

2017-04-01 Thread Ted Dunning
On Sat, Apr 1, 2017 at 6:21 PM, arun abraham 
wrote:

> As  a first step I am trying to recommend min of two documents(As my
> Solr document index is ~100 docs).
>

This is kind of weird.

Can you say why you have so very few documents?

There may be something special going on that will make this work better or
worse.

I have seen people use indicator-based recommendations for ad targeting
where they had several thousand options, but haven't seen anything with
only 100 options.


Re: Reg:-Integrating Mahout with Solr

2017-04-01 Thread arun abraham
Hi,

Thanks Pat for the reply.

I am trying to implement item based recommendation as the first step.When
the user searches with a keyword(using Solr),not only it should return
keyword matching results(already implemented along with other search
features of Solr) but also related documents(recommended).

I believe implementing item based recommendation will be a good learning
curve towards implementing the user based recommendation or Behavioral
 based.As  a first step I am trying to recommend min of two documents(As my
Solr document index is ~100 docs).

I understood that in the above scenario,first step is to provide the Solr
index to mahout to read and will generate a vector file from it.
It will be helpful if I get guidance on the integration steps to follow for
the same.

Thanks and Regards,
Arun


On 1 April 2017 at 23:46, Pat Ferrel  wrote:

> You want to create “Behavioral Search”? This is where you boost items that
> have the search terms in them more likely to be favored by the individual
> user?
>
> You want to use the CCO algorithm in Mahout. You need to collect
> behavioral information like conversions, detailed page views, etc. Run each
> event through CCO and you get a collection of “indicators” as item
> attributes. Augment the Solr index with fields (indicators) attached to
> item documents. Then at query time supply the search terms as a “must
> match” and use user history as the query segment against the corresponding
> indicator fields as a “should match” with some boosting factor.
>
> CCO is here: http://mahout.apache.org/users/algorithms/intro-
> cooccurrence-spark.html  cooccurrence-spark.html>
> and a post on Personalizing Search here: http://www.actionml.com/blog/
> personalized_search 
>
> BTW Do you have a recommender running? If not that is likely to generate
> almost an order of magnitude better results than Behavioral Search. From
> Industry wisdom and experience, implement a recommender first, then augment
> search. On E-Commerce data we have reported results of 10-30% conversion
> lift from recommendations and ~3% for Behavioral Search. 3% is significant
> but requires you to gather the same info that it takes to do a recommender
> so why not do a recommender first.
>
> There is an almost turnkey recommender that uses CCO here:
> http://actionml.com/ur It uses Elasticsearch but is standalone, not
> integrated into any search tech you use elsewhere.
>
>
> On Mar 31, 2017, at 9:30 PM, arun abraham 
> wrote:
>
> Hi All,
>
> I am trying to integrate Apache mahout with Solr.I have created a search
> application using Solr which has spellcheck,type ahead suggestions
> functionalities.I have a new requirement to display recommendations( from
> index which has ~100 docs ) for a specific search(keyword based).Is it
> possible to recommend docs or links from web together with the indexed
> data?
> Kindly guide me on the possibilities for the same also on the integration
> part.
>
> Thanks and Regards,
> Arun
>
>


Re: Reg:-Integrating Mahout with Solr

2017-04-01 Thread Pat Ferrel
You want to create “Behavioral Search”? This is where you boost items that have 
the search terms in them more likely to be favored by the individual user?

You want to use the CCO algorithm in Mahout. You need to collect behavioral 
information like conversions, detailed page views, etc. Run each event through 
CCO and you get a collection of “indicators” as item attributes. Augment the 
Solr index with fields (indicators) attached to item documents. Then at query 
time supply the search terms as a “must match” and use user history as the 
query segment against the corresponding indicator fields as a “should match” 
with some boosting factor. 

CCO is here: 
http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html 

and a post on Personalizing Search here: 
http://www.actionml.com/blog/personalized_search 


BTW Do you have a recommender running? If not that is likely to generate almost 
an order of magnitude better results than Behavioral Search. From Industry 
wisdom and experience, implement a recommender first, then augment search. On 
E-Commerce data we have reported results of 10-30% conversion lift from 
recommendations and ~3% for Behavioral Search. 3% is significant but requires 
you to gather the same info that it takes to do a recommender so why not do a 
recommender first.

There is an almost turnkey recommender that uses CCO here: 
http://actionml.com/ur It uses Elasticsearch but is standalone, not integrated 
into any search tech you use elsewhere.


On Mar 31, 2017, at 9:30 PM, arun abraham  wrote:

Hi All,

I am trying to integrate Apache mahout with Solr.I have created a search
application using Solr which has spellcheck,type ahead suggestions
functionalities.I have a new requirement to display recommendations( from
index which has ~100 docs ) for a specific search(keyword based).Is it
possible to recommend docs or links from web together with the indexed data?
Kindly guide me on the possibilities for the same also on the integration
part.

Thanks and Regards,
Arun