[ https://issues.apache.org/jira/browse/SPARK-13857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15821070#comment-15821070 ]
Alan Budd commented on SPARK-13857: ----------------------------------- I just had a short email conversation with [~mlnick] with regards to this JIRA issue. I'm very interested in the functionality with regards to my project, which is creating an implicit-feedback ALS recommendation engine for a website using URLs as the item. Essentially, my pipeline will consists of: # A DataFrame consisting of: *# user id column (userID) *# URL column (URLid) *# an aggregate count for each id/url pair (for the rating/preference column) (count) # Creating a {{ParamGridBuilder()}} to optimize the regularization parameter, {{regParam}}. # Training the model using {{ALS()}}, with the following: {code} .setMaxIter(5) .setImplicitPrefs(true) .setUserCol("userID") .setItemCol("URLid") .setRatingCol("count") {code} # Optimize the {{regParam}} hyperparamter using the {{CrossValidator()}} functionality. When the ML Pipeline is built using the above steps, resulting in a {{org.apache.spark.ml.PipelineModel}} object, the final step will be to use this pipeline model to generate the top K recommendations for every user in the model (in batches) and export that DataFrame for use in real-time calls. [~mlnick], I hope that this provides a little insight of a desired production use-case and can help drive this issue towards production. On a last note, I would definitely encourage plenty of documentation with examples for how to use it in an ML Pipeline (or a stand-alone ALS model, i.e. a {{org.apache.spark.ml.recommendation.ALSModel}} object) for people desiring to use in a production environment. Let me know if you need me to elaborate on any further details! > Feature parity for ALS ML with MLLIB > ------------------------------------ > > Key: SPARK-13857 > URL: https://issues.apache.org/jira/browse/SPARK-13857 > Project: Spark > Issue Type: Sub-task > Components: ML > Reporter: Nick Pentreath > Assignee: Nick Pentreath > > Currently {{mllib.recommendation.MatrixFactorizationModel}} has methods > {{recommendProducts/recommendUsers}} for recommending top K to a given user / > item, as well as {{recommendProductsForUsers/recommendUsersForProducts}} to > recommend top K across all users/items. > Additionally, SPARK-10802 is for adding the ability to do > {{recommendProductsForUsers}} for a subset of users (or vice versa). > Look at exposing or porting (as appropriate) these methods to ALS in ML. > Investigate if efficiency can be improved at the same time (see SPARK-11968). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org