Github user gaborhermann commented on the issue:

    https://github.com/apache/flink/pull/2838
  
    Thanks again for taking a look at our PR!
    
    I've just realized from a developer mailing list thread that the FlinkML 
API is still not carved into stone even until 2.0, and it's nice to hear that :)
    
    The problem is not with the `evaluate(test: TestType): DataSet[Double]` but 
rather with `evaluate(test: TestType): DataSet[(Prediction,Prediction)]`. It's 
at least confusing to have both, but it might not be worth to expose the one 
giving `(Prediction,Prediction)` pairs to the user as it only *prepares* 
evaluation. With introducing the evaluation framework, we could at least rename 
it to something like `preparePairwiseEvaluation(test: TestType): 
DataSet[(Prediction,Prediction)]`. In the ranking case we might generalize it 
to `prepareEvaluation(test: TestType): PreparedTesting`. We basically did this 
with the `PrepareDataSetOperation`, we've just left the old `evaluate` as it is 
for now. I suggest to change this if we can break the API.
    
    I'll do a rebase on the cross-validation PR. At first glance, it should not 
really be a problem to do both cross-validation and hyper-parameter tuning, as 
the user has to provide a `Scorer` anyway. A minor issue I see is the user 
falling back to a default `score` (e.g. RMSE in case of ALS). This might not be 
a problem for recommendation models that give rating predictions beside ranking 
predictions, but it's a problem for models that *only* give ranking 
predictions, because those do not extend the `Predictor` class. This is not an 
issue for now, but might be a problem when adding more recommendation models. 
Should we try and do this now or is it a bit "overengineering"? I'll see if any 
other problem comes up with after rebasing.
    
    The `RankingPredictor` interface is useful *internally* for the `Score`s. 
It serves a contract between a `RankingScore` and the model. I'm sure it will 
be used only for recommendations, but it's no effort exposing it, so the user 
can write code using a general `RankingPredictor` (although I would not think 
this is what users would like to do :) ). A better question is whether to use 
it in a `Pipeline`. We discussed this with some people, and could not really 
find a use-case where we need a `Transformer`-like preprocessing for 
recommendations. Of course, there could be other preprocessing steps, such as 
removing/aggregating duplicates, but those do not have to be `fit` to training 
data. Based on this, it's not worth the effort to integrate `RankingPredictor` 
with the `Pipeline`, at least for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to