[ 
https://issues.apache.org/jira/browse/SPARK-13857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15821070#comment-15821070
 ] 

Alan Budd commented on SPARK-13857:
-----------------------------------

I just had a short email conversation with [~mlnick] with regards to this JIRA 
issue. I'm very interested in the functionality with regards to my project, 
which is creating an implicit-feedback ALS recommendation engine for a website 
using URLs as the item. Essentially, my pipeline will consists of:

# A DataFrame consisting of:
*# user id column (userID)
*# URL column (URLid)
*# an aggregate count for each id/url pair (for the rating/preference column) 
(count)
# Creating a {{ParamGridBuilder()}} to optimize the regularization parameter, 
{{regParam}}.
# Training the model using {{ALS()}}, with the following:
{code}
.setMaxIter(5)
.setImplicitPrefs(true)
.setUserCol("userID")
.setItemCol("URLid")
.setRatingCol("count")
{code}
# Optimize the {{regParam}} hyperparamter using the {{CrossValidator()}} 
functionality.

When the ML Pipeline is built using the above steps, resulting in a 
{{org.apache.spark.ml.PipelineModel}} object, the final step will be to use 
this pipeline model to generate the top K recommendations for every user in the 
model (in batches) and export that DataFrame for use in real-time calls.

[~mlnick], I hope that this provides a little insight of a desired production 
use-case and can help drive this issue towards production. On a last note, I 
would definitely encourage plenty of documentation with examples for how to use 
it in an ML Pipeline (or a stand-alone ALS model, i.e. a 
{{org.apache.spark.ml.recommendation.ALSModel}} object) for people desiring to 
use in a production environment. Let me know if you need me to elaborate on any 
further details!

> Feature parity for ALS ML with MLLIB
> ------------------------------------
>
>                 Key: SPARK-13857
>                 URL: https://issues.apache.org/jira/browse/SPARK-13857
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML
>            Reporter: Nick Pentreath
>            Assignee: Nick Pentreath
>
> Currently {{mllib.recommendation.MatrixFactorizationModel}} has methods 
> {{recommendProducts/recommendUsers}} for recommending top K to a given user / 
> item, as well as {{recommendProductsForUsers/recommendUsersForProducts}} to 
> recommend top K across all users/items.
> Additionally, SPARK-10802 is for adding the ability to do 
> {{recommendProductsForUsers}} for a subset of users (or vice versa).
> Look at exposing or porting (as appropriate) these methods to ALS in ML. 
> Investigate if efficiency can be improved at the same time (see SPARK-11968).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to