Github user gaborhermann commented on the issue: https://github.com/apache/flink/pull/2838 Thanks again for taking a look at our PR! I've just realized from a developer mailing list thread that the FlinkML API is still not carved into stone even until 2.0, and it's nice to hear that :) The problem is not with the `evaluate(test: TestType): DataSet[Double]` but rather with `evaluate(test: TestType): DataSet[(Prediction,Prediction)]`. It's at least confusing to have both, but it might not be worth to expose the one giving `(Prediction,Prediction)` pairs to the user as it only *prepares* evaluation. With introducing the evaluation framework, we could at least rename it to something like `preparePairwiseEvaluation(test: TestType): DataSet[(Prediction,Prediction)]`. In the ranking case we might generalize it to `prepareEvaluation(test: TestType): PreparedTesting`. We basically did this with the `PrepareDataSetOperation`, we've just left the old `evaluate` as it is for now. I suggest to change this if we can break the API. I'll do a rebase on the cross-validation PR. At first glance, it should not really be a problem to do both cross-validation and hyper-parameter tuning, as the user has to provide a `Scorer` anyway. A minor issue I see is the user falling back to a default `score` (e.g. RMSE in case of ALS). This might not be a problem for recommendation models that give rating predictions beside ranking predictions, but it's a problem for models that *only* give ranking predictions, because those do not extend the `Predictor` class. This is not an issue for now, but might be a problem when adding more recommendation models. Should we try and do this now or is it a bit "overengineering"? I'll see if any other problem comes up with after rebasing. The `RankingPredictor` interface is useful *internally* for the `Score`s. It serves a contract between a `RankingScore` and the model. I'm sure it will be used only for recommendations, but it's no effort exposing it, so the user can write code using a general `RankingPredictor` (although I would not think this is what users would like to do :) ). A better question is whether to use it in a `Pipeline`. We discussed this with some people, and could not really find a use-case where we need a `Transformer`-like preprocessing for recommendations. Of course, there could be other preprocessing steps, such as removing/aggregating duplicates, but those do not have to be `fit` to training data. Based on this, it's not worth the effort to integrate `RankingPredictor` with the `Pipeline`, at least for now.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---