[jira] [Comment Edited] (SPARK-13857) Feature parity for ALS ML with MLLIB

Nick Pentreath (JIRA) Tue, 15 Mar 2016 23:46:36 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-13857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195696#comment-15195696
 ]


Nick Pentreath edited comment on SPARK-13857 at 3/16/16 6:45 AM:
-----------------------------------------------------------------

There are two broad options for adding this, in terms of ML API:

# Extending {{transform}} to work with additional param(s) to specify whether 
to recommend top-k. 
# Adding methods such as {{recommendItems}} and {{recommendUsers}}.

I've seen some examples of #2, e.g. in {{LDAModel.describeTopics}}. However 
this seems to fall more naturally into #1, so that it can be part of a 
Pipeline. Having said that, this is likely to be the final stage of a pipeline 
- use model to batch-predict recommendations, and export the resulting 
predictions DF - so perhaps not that important.

e.g.
{code}
val model = ALS.fit(df)
// model has userCol and itemCol set, so calling transform makes predictions 
for each user, item combination
val predictions = model.transform(df)

// Option 1 - requires 3 extra params
val topKItemsForUsers = model.setK(10).setUserTopKCol("userTopK").transform(df)
val topKUsersForItems = model.setK(10).setItemTopKCol("itemTopK").transform(df)

// Option 2
val topKItemsForUsers = model.recommendItems(df, 10)
val topKUsersForItems = model.recommendUsers(df, 10)
{code}

[~josephkb] [~mengxr] thoughts? I guess I lean toward #1 to fit into the 
{{Transformer}} API, even though it's a little more clunky.


was (Author: mlnick):
There are two broad options for adding this, in terms of ML API:

# Extending {{transform}} to work with additional param(s) to specify whether 
to recommend top-k. 
# Adding methods such as {{recommendItems}} and {{recommendUsers}}.

I've seen some examples of #2, e.g. in {{LDAModel.describeTopics}}. However 
this seems to fall more naturally into #1, so that it can be part of a 
Pipeline. Having said that, this is likely to be the final stage of a pipeline 
- use model to batch-predict recommendations, and export the resulting 
predictions DF - so perhaps not that important.

e.g.
{code}
val model = ALS.fit(df)
// model has userCol and itemCol set, so calling transform makes predictions 
for each user, item combination
val predictions = model.transform(df)

// Option 1 - requires 3 extra params
val topKItemsForUsers = model.setK(10).setUserTopKCol("userTopK").transform(df)
val topKUsersForItems = model.setK(10).setItemTopKCol("itemTopK").transform(df)

// Option 2 - requires to (re)specify the user / item input col in the input DF
val topKItemsForUsers = model.recommendItems(df, "userId", 10)
val topKUsersForItems = model.recommendUsers(df, "itemId", 10)
{code}

[~josephkb] [~mengxr] thoughts? I guess I lean toward #1 to fit into the 
{{Transformer}} API, even though it's a little more clunky.

> Feature parity for ALS ML with MLLIB
> ------------------------------------
>
>                 Key: SPARK-13857
>                 URL: https://issues.apache.org/jira/browse/SPARK-13857
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: Nick Pentreath
>
> Currently {{mllib.recommendation.MatrixFactorizationModel}} has methods 
> {{recommendProducts/recommendUsers}} for recommending top K to a given user / 
> item, as well as {{recommendProductsForUsers/recommendUsersForProducts}} to 
> recommend top K across all users/items.
> Additionally, SPARK-10802 is for adding the ability to do 
> {{recommendProductsForUsers}} for a subset of users (or vice versa).
> Look at exposing or porting (as appropriate) these methods to ALS in ML. 
> Investigate if efficiency can be improved at the same time (see SPARK-11968).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-13857) Feature parity for ALS ML with MLLIB

Reply via email to