[ https://issues.apache.org/jira/browse/SPARK-13857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195696#comment-15195696 ]
Nick Pentreath edited comment on SPARK-13857 at 3/16/16 6:45 AM: ----------------------------------------------------------------- There are two broad options for adding this, in terms of ML API: # Extending {{transform}} to work with additional param(s) to specify whether to recommend top-k. # Adding methods such as {{recommendItems}} and {{recommendUsers}}. I've seen some examples of #2, e.g. in {{LDAModel.describeTopics}}. However this seems to fall more naturally into #1, so that it can be part of a Pipeline. Having said that, this is likely to be the final stage of a pipeline - use model to batch-predict recommendations, and export the resulting predictions DF - so perhaps not that important. e.g. {code} val model = ALS.fit(df) // model has userCol and itemCol set, so calling transform makes predictions for each user, item combination val predictions = model.transform(df) // Option 1 - requires 3 extra params val topKItemsForUsers = model.setK(10).setUserTopKCol("userTopK").transform(df) val topKUsersForItems = model.setK(10).setItemTopKCol("itemTopK").transform(df) // Option 2 val topKItemsForUsers = model.recommendItems(df, 10) val topKUsersForItems = model.recommendUsers(df, 10) {code} [~josephkb] [~mengxr] thoughts? I guess I lean toward #1 to fit into the {{Transformer}} API, even though it's a little more clunky. was (Author: mlnick): There are two broad options for adding this, in terms of ML API: # Extending {{transform}} to work with additional param(s) to specify whether to recommend top-k. # Adding methods such as {{recommendItems}} and {{recommendUsers}}. I've seen some examples of #2, e.g. in {{LDAModel.describeTopics}}. However this seems to fall more naturally into #1, so that it can be part of a Pipeline. Having said that, this is likely to be the final stage of a pipeline - use model to batch-predict recommendations, and export the resulting predictions DF - so perhaps not that important. e.g. {code} val model = ALS.fit(df) // model has userCol and itemCol set, so calling transform makes predictions for each user, item combination val predictions = model.transform(df) // Option 1 - requires 3 extra params val topKItemsForUsers = model.setK(10).setUserTopKCol("userTopK").transform(df) val topKUsersForItems = model.setK(10).setItemTopKCol("itemTopK").transform(df) // Option 2 - requires to (re)specify the user / item input col in the input DF val topKItemsForUsers = model.recommendItems(df, "userId", 10) val topKUsersForItems = model.recommendUsers(df, "itemId", 10) {code} [~josephkb] [~mengxr] thoughts? I guess I lean toward #1 to fit into the {{Transformer}} API, even though it's a little more clunky. > Feature parity for ALS ML with MLLIB > ------------------------------------ > > Key: SPARK-13857 > URL: https://issues.apache.org/jira/browse/SPARK-13857 > Project: Spark > Issue Type: Improvement > Components: ML > Reporter: Nick Pentreath > > Currently {{mllib.recommendation.MatrixFactorizationModel}} has methods > {{recommendProducts/recommendUsers}} for recommending top K to a given user / > item, as well as {{recommendProductsForUsers/recommendUsersForProducts}} to > recommend top K across all users/items. > Additionally, SPARK-10802 is for adding the ability to do > {{recommendProductsForUsers}} for a subset of users (or vice versa). > Look at exposing or porting (as appropriate) these methods to ALS in ML. > Investigate if efficiency can be improved at the same time (see SPARK-11968). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org