[ 
https://issues.apache.org/jira/browse/FLINK-4613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15650882#comment-15650882
 ] 

ASF GitHub Bot commented on FLINK-4613:
---------------------------------------

Github user thvasilo commented on the issue:

    https://github.com/apache/flink/pull/2542
  
    @gaborhermann Yup the approach taken by the Spark community for testing is 
closer to what we would like to have for non-deterministic algorithms, but what 
you have implemented now should suffice on the assumption that the ALS 
implementation is correct.
    
    @tillrohrmann Initially implemented ALS so I'm not sure how he arrived at 
the expected results. It would be a good idea for the future to document how we 
generate test data so it's easy to replicate and validate the process. That 
should be enough for deterministic algorithms, and for non-deterministic we 
should have proxies like measuring the error of reconstruction etc.
    
    I'll take a look at the code again now, and will add comments if I find 
something. Otherwise I hope @mbalassi can find some time to review and merge if 
no objections come up.


> Extend ALS to handle implicit feedback datasets
> -----------------------------------------------
>
>                 Key: FLINK-4613
>                 URL: https://issues.apache.org/jira/browse/FLINK-4613
>             Project: Flink
>          Issue Type: New Feature
>          Components: Machine Learning Library
>            Reporter: Gábor Hermann
>            Assignee: Gábor Hermann
>
> The Alternating Least Squares implementation should be extended to handle 
> _implicit feedback_ datasets. These datasets do not contain explicit ratings 
> by users, they are rather built by collecting user behavior (e.g. user 
> listened to artist X for Y minutes), and they require a slightly different 
> optimization objective. See details by [Hu et 
> al|http://dx.doi.org/10.1109/ICDM.2008.22].
> We do not need to modify much in the original ALS algorithm. See [Spark ALS 
> implementation|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala],
>  which could be a basis for this extension. Only the updating factor part is 
> modified, and most of the changes are in the local parts of the algorithm 
> (i.e. UDFs). In fact, the only modification that is not local, is 
> precomputing a matrix product Y^T * Y and broadcasting it to all the nodes, 
> which we can do with broadcast DataSets. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to