On Fri, Nov 8, 2013 at 6:24 AM, Ted Dunning wrote:
> On Thu, Nov 7, 2013 at 12:50 AM, Gokhan Capan wrote:
>
> > This particular approach is discussed, and proven to increase the
> accuracy
> > in "Collaborative filtering with Temporal Dynamics" by Yehuda Koren. The
> > decay function is paramete
On Thu, Nov 7, 2013 at 9:45 PM, Andreas Bauer wrote:
> Hi,
>
> Thanks for your comments.
>
> I modified the examples from the mahout in action book, therefore I used
> the hashed approach and that's why i used 100 features. I'll adjust the
> number.
>
Makes sense. But the book was doing sparse
Hi,
Thanks for your comments.
I modified the examples from the mahout in action book, therefore I used the
hashed approach and that's why i used 100 features. I'll adjust the number.
You say that I'm using the same CVE for all features, so you mean i should
create 12 separate CVE for adding
Why is FEATURE_NUMBER != 13?
With 12 features that are already lovely and continuous, just stick them in
elements 1..12 of a 13 long vector and put a constant value at the
beginning of it. Hashed encoding is good for sparse stuff, but confusing
for your case.
Also, it looks like you only pass th
On Thu, Nov 7, 2013 at 12:50 AM, Gokhan Capan wrote:
> This particular approach is discussed, and proven to increase the accuracy
> in "Collaborative filtering with Temporal Dynamics" by Yehuda Koren. The
> decay function is parameterized per user, keeping track of how consistent
> the user behav
Hi Pat,
On Nov 7, 2013, at 7:30pm, Pat Ferrel wrote:
> Another approach would be to weight the terms in the docs by there Mahout
> similarity strength. But that will be for another day.
>
> My current question is whether Lucene looks at word proximity. I see the
> query syntax supports proxi
Another approach would be to weight the terms in the docs by there Mahout
similarity strength. But that will be for another day.
My current question is whether Lucene looks at word proximity. I see the query
syntax supports proximity but I don’t see that it is default so that’s good.
On Nov 7
Hi,
I’m trying to use OnlineLogisticRegression for a two-class classification
problem, but as my classification results are not very good, I wanted to ask
for support to find out if my settings are correct and if I’m using Mahout
correctly. Because if I’m doing it correctly then probably my f
Best to my knowledge, Lucene does not care about the position of a keyword
within a document.
You could bucket the ids into several fields. Then use a dismax query to boost
the top-tier ids more than then second, etc.
A more fine-grained approach would probably involve a custom Similarity clas
Not sure how you are going to decay in Mahout. Once ingested into Mahout there
are no timestamps. So you’ll have to do that before ingesting.
Last year we set up an ecom-department store type recommender with data from
online user purchase, add-to-cart, and view. The data was actual user behavio
Interesting to think about ordering and adjacentness. The index ids are sorted
by Mahout strength so the first id is the most similar to the row key and so
forth. But the query is ordered buy recency. In both cases the first id is in
some sense the most important. Does Solr/Lucene care about clo
Yes you are correct but my integration framework treats non-text fields as
scalars so it is easier to neuter text than implement fulltext searching on
strings. I would do what you suggest if were using raw Solr. My understanding
was that string also does not get tfidf applied, which is not what
The multivalued field will obey the "positionIncrementGap" value you specify
(default=100). So for querying purposes, those id's will be 100 (or whatever
you specified) positions apart. So a phrase search for adjacent ids would not
match, unless you set the slop for >= positionIncrementGap. O
Pat,
Perhaps I am missing something here, but why not use a String field if you
do not need any of the analysis? Seems like from your previous email "The
query is a simple text query made of space delimited video id strings" - -
that you basically have a keyword style query which would seem to fit
One difference is that a “text” field has analyzers like Porter stemming
applied. I had to take these out of the schema.xml. I think TFIDF is also
applied to the tems in “text” but may not be to MV fields. I think TFIDF is
good in the application. The idea is that if everyone likes a movie, it i
Does anyone know what the difference is between keeping the ids in a space
delimited string and indexing a multivalued field of ids? I recently tried the
latter since ... it felt right, however I am not sure which of both has which
advantages.
On 07 Nov 2013, at 18:18, Pat Ferrel wrote:
> I h
I have dismax (no edismax) but am not using it yet, using the default query,
which does use ‘AND’. I had much the same though as I slept on it. Changing to
OR is now working much much better. So obvious it almost bit me, not good in
this case...
With only a trivially small amount of testing I’d
Pat,
Can you give us the query it generates when you enter "vampire werewolf
zombie", q/qt/defType ?
My guess is you're using the default query parser with "q.op=AND" , or, you're
using dismax/edismax with a high "mm" (min-must-match) value.
James Dyer
Ingram Content Group
(615) 213-4311
---
Cassio,
I am not sure if there are direct/indirect ways to to this with existing
code.
Recall that an item neighborhood based score prediction, in simplest terms,
is a weighted average of the active user's ratings on other items, where
the weights are item-to-item similarities. Applying a decay f
19 matches
Mail list logo