Re: Decaying score for old preferences when using the .refresh()

2013-11-07 Thread Gokhan Capan
On Fri, Nov 8, 2013 at 6:24 AM, Ted Dunning wrote: > On Thu, Nov 7, 2013 at 12:50 AM, Gokhan Capan wrote: > > > This particular approach is discussed, and proven to increase the > accuracy > > in "Collaborative filtering with Temporal Dynamics" by Yehuda Koren. The > > decay function is paramete

Re: OnlineLogisticRegression: Are my settings sensible

2013-11-07 Thread Ted Dunning
On Thu, Nov 7, 2013 at 9:45 PM, Andreas Bauer wrote: > Hi, > > Thanks for your comments. > > I modified the examples from the mahout in action book, therefore I used > the hashed approach and that's why i used 100 features. I'll adjust the > number. > Makes sense. But the book was doing sparse

Re: OnlineLogisticRegression: Are my settings sensible

2013-11-07 Thread Andreas Bauer
Hi, Thanks for your comments. I modified the examples from the mahout in action book, therefore I used the hashed approach and that's why i used 100 features. I'll adjust the number. You say that I'm using the same CVE for all features, so you mean i should create 12 separate CVE for adding

Re: OnlineLogisticRegression: Are my settings sensible

2013-11-07 Thread Ted Dunning
Why is FEATURE_NUMBER != 13? With 12 features that are already lovely and continuous, just stick them in elements 1..12 of a 13 long vector and put a constant value at the beginning of it. Hashed encoding is good for sparse stuff, but confusing for your case. Also, it looks like you only pass th

Re: Decaying score for old preferences when using the .refresh()

2013-11-07 Thread Ted Dunning
On Thu, Nov 7, 2013 at 12:50 AM, Gokhan Capan wrote: > This particular approach is discussed, and proven to increase the accuracy > in "Collaborative filtering with Temporal Dynamics" by Yehuda Koren. The > decay function is parameterized per user, keeping track of how consistent > the user behav

Re: Solr-recommender for Mahout 0.9

2013-11-07 Thread Ken Krugler
Hi Pat, On Nov 7, 2013, at 7:30pm, Pat Ferrel wrote: > Another approach would be to weight the terms in the docs by there Mahout > similarity strength. But that will be for another day. > > My current question is whether Lucene looks at word proximity. I see the > query syntax supports proxi

Re: Solr-recommender for Mahout 0.9

2013-11-07 Thread Pat Ferrel
Another approach would be to weight the terms in the docs by there Mahout similarity strength. But that will be for another day. My current question is whether Lucene looks at word proximity. I see the query syntax supports proximity but I don’t see that it is default so that’s good. On Nov 7

OnlineLogisticRegression: Are my settings sensible

2013-11-07 Thread Andreas Bauer
Hi, I’m trying to use OnlineLogisticRegression for a two-class classification problem, but as my classification results are not very good, I wanted to ask for support to find out if my settings are correct and if I’m using Mahout correctly. Because if I’m doing it correctly then probably my f

RE: Solr-recommender for Mahout 0.9

2013-11-07 Thread Dyer, James
Best to my knowledge, Lucene does not care about the position of a keyword within a document. You could bucket the ids into several fields. Then use a dismax query to boost the top-tier ids more than then second, etc. A more fine-grained approach would probably involve a custom Similarity clas

Re: Decaying score for old preferences when using the .refresh()

2013-11-07 Thread Pat Ferrel
Not sure how you are going to decay in Mahout. Once ingested into Mahout there are no timestamps. So you’ll have to do that before ingesting. Last year we set up an ecom-department store type recommender with data from online user purchase, add-to-cart, and view. The data was actual user behavio

Re: Solr-recommender for Mahout 0.9

2013-11-07 Thread Pat Ferrel
Interesting to think about ordering and adjacentness. The index ids are sorted by Mahout strength so the first id is the most similar to the row key and so forth. But the query is ordered buy recency. In both cases the first id is in some sense the most important. Does Solr/Lucene care about clo

Re: Solr-recommender for Mahout 0.9

2013-11-07 Thread Pat Ferrel
Yes you are correct but my integration framework treats non-text fields as scalars so it is easier to neuter text than implement fulltext searching on strings. I would do what you suggest if were using raw Solr. My understanding was that string also does not get tfidf applied, which is not what

RE: Solr-recommender for Mahout 0.9

2013-11-07 Thread Dyer, James
The multivalued field will obey the "positionIncrementGap" value you specify (default=100). So for querying purposes, those id's will be 100 (or whatever you specified) positions apart. So a phrase search for adjacent ids would not match, unless you set the slop for >= positionIncrementGap. O

Re: Solr-recommender for Mahout 0.9

2013-11-07 Thread Andrew Psaltis
Pat, Perhaps I am missing something here, but why not use a String field if you do not need any of the analysis? Seems like from your previous email "The query is a simple text query made of space delimited video id strings" - - that you basically have a keyword style query which would seem to fit

Re: Solr-recommender for Mahout 0.9

2013-11-07 Thread Pat Ferrel
One difference is that a “text” field has analyzers like Porter stemming applied. I had to take these out of the schema.xml. I think TFIDF is also applied to the tems in “text” but may not be to MV fields. I think TFIDF is good in the application. The idea is that if everyone likes a movie, it i

Re: Solr-recommender for Mahout 0.9

2013-11-07 Thread Dominik Hübner
Does anyone know what the difference is between keeping the ids in a space delimited string and indexing a multivalued field of ids? I recently tried the latter since ... it felt right, however I am not sure which of both has which advantages. On 07 Nov 2013, at 18:18, Pat Ferrel wrote: > I h

Re: Solr-recommender for Mahout 0.9

2013-11-07 Thread Pat Ferrel
I have dismax (no edismax) but am not using it yet, using the default query, which does use ‘AND’. I had much the same though as I slept on it. Changing to OR is now working much much better. So obvious it almost bit me, not good in this case... With only a trivially small amount of testing I’d

RE: Solr-recommender for Mahout 0.9

2013-11-07 Thread Dyer, James
Pat, Can you give us the query it generates when you enter "vampire werewolf zombie", q/qt/defType ? My guess is you're using the default query parser with "q.op=AND" , or, you're using dismax/edismax with a high "mm" (min-must-match) value. James Dyer Ingram Content Group (615) 213-4311 ---

Re: Decaying score for old preferences when using the .refresh()

2013-11-07 Thread Gokhan Capan
Cassio, I am not sure if there are direct/indirect ways to to this with existing code. Recall that an item neighborhood based score prediction, in simplest terms, is a weighted average of the active user's ratings on other items, where the weights are item-to-item similarities. Applying a decay f