I guess from description it means they always assume preference 1 for all existing values and treat rating matrix as confidence matrix and baseline + 0 preference for everything else. Ok -- that's reasonable and faithful to the original paper description i suppose.
On Thu, Feb 20, 2014 at 12:09 AM, Dmitriy Lyubimov (JIRA) <j...@apache.org>wrote: > > [ > https://issues.apache.org/jira/browse/MAHOUT-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13906741#comment-13906741] > > Dmitriy Lyubimov commented on MAHOUT-1365: > ------------------------------------------ > > Yeah. I am not sure what they are doing there. Last time i looked at it, > MLLib did not have any form of weighed ALS. Now this exapmple seems to > include "trainImplicit" which works on the rating matrix only. In original > formulation of implicit feedback problem there were two values, preference > and confidence in such preference. So i am not sure what they do there > since the input is obviously one sparse matrix. > > My generalization of the problem includes formulation where any confidence > level could be attached to either 0 or 1 as a preference, plus baseline. I > also assume that model may have more than one parameter to form confidence > which requires fitting as well. (simply speaking what is "level of > consumption" if user clicks on it vs. add-2-cart, if any etc.). Similarly, > there could be difference levels of confidence of ignoring stuff depending > on situation. So 0 preferences do not have to always have the baseline > confidence either. > > > Weighted ALS-WR iterator for Spark > > ---------------------------------- > > > > Key: MAHOUT-1365 > > URL: https://issues.apache.org/jira/browse/MAHOUT-1365 > > Project: Mahout > > Issue Type: Task > > Reporter: Dmitriy Lyubimov > > Assignee: Dmitriy Lyubimov > > Fix For: 1.0 > > > > Attachments: distributed-als-with-confidence.pdf > > > > > > Given preference P and confidence C distributed sparse matrices, compute > ALS-WR solution for implicit feedback (Spark Bagel version). > > Following Hu-Koren-Volynsky method (stripping off any concrete > methodology to build C matrix), with parameterized test for convergence. > > The computational scheme is following ALS-WR method (which should be > slightly more efficient for sparser inputs). > > The best performance will be achieved if non-sparse anomalies > prefilitered (eliminated) (such as an anomalously active user which doesn't > represent typical user anyway). > > the work is going here > https://github.com/dlyubimov/mahout-commits/tree/dev-0.9.x-scala. I am > porting away our (A1) implementation so there are a few issues associated > with that. > > > > -- > This message was sent by Atlassian JIRA > (v6.1.5#6160) >