[ 
https://issues.apache.org/jira/browse/FLINK-4613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15654004#comment-15654004
 ] 

ASF GitHub Bot commented on FLINK-4613:
---------------------------------------

Github user gaborhermann commented on the issue:

    https://github.com/apache/flink/pull/2542
  
    I agree. I already fixed the seed at the `ImplicitALSITSuite`. At the 
`ALSITSuite` the seed is unset, but by default it's 0, that's why `ALSITSuite` 
is deterministic, so I don't think it's a problem. Although, I would reconsider 
setting the default seed, because that way two training for the same parameters 
yields the same result, and that might not be what do user expects. The user 
might expect truly randomized result.
    
    Do you think we should modify the default optional seed `Some(0L)` to be 
`None`, and use the system random generator by default? Again, this is in the 
original algorithm. Should we do it in this PR or create another issue for 
refactoring ALS?


> Extend ALS to handle implicit feedback datasets
> -----------------------------------------------
>
>                 Key: FLINK-4613
>                 URL: https://issues.apache.org/jira/browse/FLINK-4613
>             Project: Flink
>          Issue Type: New Feature
>          Components: Machine Learning Library
>            Reporter: Gábor Hermann
>            Assignee: Gábor Hermann
>
> The Alternating Least Squares implementation should be extended to handle 
> _implicit feedback_ datasets. These datasets do not contain explicit ratings 
> by users, they are rather built by collecting user behavior (e.g. user 
> listened to artist X for Y minutes), and they require a slightly different 
> optimization objective. See details by [Hu et 
> al|http://dx.doi.org/10.1109/ICDM.2008.22].
> We do not need to modify much in the original ALS algorithm. See [Spark ALS 
> implementation|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala],
>  which could be a basis for this extension. Only the updating factor part is 
> modified, and most of the changes are in the local parts of the algorithm 
> (i.e. UDFs). In fact, the only modification that is not local, is 
> precomputing a matrix product Y^T * Y and broadcasting it to all the nodes, 
> which we can do with broadcast DataSets. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to