[
https://issues.apache.org/jira/browse/FLINK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15689765#comment-15689765
]
ASF GitHub Bot commented on FLINK-4712:
---------------------------------------
Github user thvasilo commented on the issue:
https://github.com/apache/flink/pull/2838
Hello Gabor,
I like the idea of having a RankingScore, it seems like having that
hierarchy with Score, RankingScore and PairWiseScore gives us the flexibility
we need to include ranking and supervised learning evaluation under the same
umbrella.
I would also encourage sharing any other ideas you broached that might
break the API, this is still very much an evolving project and there is no need
to shoehorn everything into an `evaluate(test: TestType): DataSet[Double]`
function if there are better alternatives.
One think we need to consider is how this affects cross-validation and
model selection/hyper-parameter tuning. These two aspects of the library are
tightly linked and I think that we'll need to work on them in parallel to find
issues that affect both.
I recommend taking a look at the [cross-validation
PR](https://github.com/apache/flink/pull/891) I had opened way back when, and
make a new WIP PR that uses the current one (#2838) as a basis. Since the
`Score` interface still exists it shouldn't require many changes, and all
that's added is the CrossValidation class. There are other fundamental issues
with the sampling there we can discuss in due time.
Regarding the RankingPredictor we should consider the usecase of such an
interface. Is it only going to be used for recommendation? If yes, what are the
cases where we could build a Pipeline with current or future pre-processing
steps? Could you give some pipeline examples in a recommendation setting?
> Implementing ranking predictions for ALS
> ----------------------------------------
>
> Key: FLINK-4712
> URL: https://issues.apache.org/jira/browse/FLINK-4712
> Project: Flink
> Issue Type: New Feature
> Components: Machine Learning Library
> Reporter: Domokos Miklós Kelen
> Assignee: Gábor Hermann
>
> We started working on implementing ranking predictions for recommender
> systems. Ranking prediction means that beside predicting scores for user-item
> pairs, the recommender system is able to recommend a top K list for the users.
> Details:
> In practice, this would mean finding the K items for a particular user with
> the highest predicted rating. It should be possible also to specify whether
> to exclude the already seen items from a particular user's toplist. (See for
> example the 'exclude_known' setting of [Graphlab Create's ranking
> factorization
> recommender|https://turi.com/products/create/docs/generated/graphlab.recommender.ranking_factorization_recommender.RankingFactorizationRecommender.recommend.html#graphlab.recommender.ranking_factorization_recommender.RankingFactorizationRecommender.recommend]
> ).
> The output of the topK recommendation function could be in the form of
> {{DataSet[(Int,Int,Int)]}}, meaning (user, item, rank), similar to Graphlab
> Create's output. However, this is arguable: follow up work includes
> implementing ranking recommendation evaluation metrics (such as precision@k,
> recall@k, ndcg@k), similar to [Spark's
> implementations|https://spark.apache.org/docs/1.5.0/mllib-evaluation-metrics.html#ranking-systems].
> It would be beneficial if we were able to design the API such that it could
> be included in the proposed evaluation framework (see
> [5157|https://issues.apache.org/jira/browse/FLINK-2157]), which makes it
> neccessary to consider the possible output type {{DataSet[(Int,
> Array[Int])]}} or {{DataSet[(Int, Array[(Int,Double)])]}} meaning (user,
> array of items), possibly including the predicted scores as well. See
> [4713|https://issues.apache.org/jira/browse/FLINK-4713] for details.
> Another question arising is whether to provide this function as a member of
> the ALS class, as a switch-kind of parameter to the ALS implementation
> (meaning the model is either a rating or a ranking recommender model) or in
> some other way.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)