[ https://issues.apache.org/jira/browse/FLINK-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622406#comment-14622406 ]
ASF GitHub Bot commented on FLINK-2108: --------------------------------------- GitHub user thvasilo opened a pull request: https://github.com/apache/flink/pull/902 [FLINK-2108] [ml] [WIP] Add score function for Predictors This PR build upon the evaluation PR currently under review (#871) and adds to new operations to the Predictor class, one that takes a scorer and a test set and produces a score as an evaluation of the Predictor performance using the provided score, and one that takes only a test set and a default score is used. This PR includes implementations for custom scores and simple scores for all Predictor implementations, either through the Classifier and Regressor base classes, or specific ones, like the one provided for ALS. The provided score custom score operation currently expects DataSet[LabeledVector] as the type of test set and Double as the type of prediction. TODO: Docs and code cleanup You can merge this pull request into a Git repository by running: $ git pull https://github.com/thvasilo/flink score-operation Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/902.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #902 ---- commit 305b43a451af3d8bc859671476c215308fbfc7fc Author: mikiobraun <mikiobr...@gmail.com> Date: 2015-06-22T15:04:42Z Adding some first loss functions for the evaluation framework commit bdb1a6912d2bcec29446ca4a9fbc550f2ecb8f4a Author: Theodore Vasiloudis <t...@sics.se> Date: 2015-06-23T14:07:48Z Scorer for evaluation commit 4a7593ade68f43d444a6b289191f053a4ea8b031 Author: Theodore Vasiloudis <t...@sics.se> Date: 2015-06-25T09:41:10Z Adds accuracy score and R^2 score. Also trying out Scores as classes instead of functions. Not too happy with the extra biolerplate of Score as classes will probably revert, and have objects like RegressionsScores, ClassificationScores that contain the definitions of the relevant scores. commit 5c89c478bd00f168bfe48954d06367b28f948571 Author: Theodore Vasiloudis <t...@sics.se> Date: 2015-06-26T11:30:56Z Adds a evaluate operation for LabeledVector input commit e7bb4b42424641d640df370cd6ace71f7f42ee8d Author: Theodore Vasiloudis <t...@sics.se> Date: 2015-06-26T11:32:13Z Adds Regressor interface, and a score function for regression algorithms. commit 3d8a6928b02b30c732f282df61613561dbf8d4fc Author: Theodore Vasiloudis <t...@sics.se> Date: 2015-06-30T14:04:58Z Added Classifier intermediate class, and default score function for classifiers. commit e1a26ed30bb784633685703892f67d51136f6060 Author: Theodore Vasiloudis <t...@sics.se> Date: 2015-07-01T08:20:41Z Going back to having scores defined in objects instead of their own classes. commit 0dd251a5a59cd610c4df3e9a1ea3921b1a9cc2e0 Author: Theodore Vasiloudis <t...@sics.se> Date: 2015-07-01T13:00:37Z Removed ParameterMap from predict function of PredictOperation commit 492e9a383af6285f0fdca5031d2bd7bdfe3cd511 Author: Theodore Vasiloudis <t...@sics.se> Date: 2015-07-02T10:21:28Z Reworked score functionality allow chained Predictors. All predictors must now implement a calculateScore function. We are for now assuming that predictors are supervised learning algorithms, once unsupervised learning algorithms are added this will need to be reworked. Also added an evaluate dataset operation to ALS, to allow for scoring of the algorithm. Default performance measure for ALS is RMSE. commit d9715ed3a6faba78e0b34368425768e826d5a736 Author: Theodore Vasiloudis <t...@sics.se> Date: 2015-07-06T08:50:59Z Made calculateScore only take DataSet[(Double, Double)] commit 7f1a6da52dfcd47d39c39cee2141112e5c10ddad Author: Theodore Vasiloudis <t...@sics.se> Date: 2015-07-07T08:15:58Z Added test for DataSet.mean() commit edbe3dd9ea48d168f67a9ff231f8373a6aaee38d Author: Theodore Vasiloudis <t...@sics.se> Date: 2015-07-07T12:11:45Z Switched from cross to mapWithBcVariable commit e840c14032f5fea3b476e1a99122eb9125ba5a4f Author: Theodore Vasiloudis <t...@sics.se> Date: 2015-07-08T16:48:27Z Addressed multiple PR comments. commit 6e48b612f0a367e798e40590f6921d4dc242f2aa Author: Theodore Vasiloudis <t...@sics.se> Date: 2015-07-08T17:06:42Z Add approximatelyEquals for Doubles, used for score calculation. commit 57d0ef2c4bc268d1d870c7aab537dd611f464fcf Author: Theodore Vasiloudis <t...@sics.se> Date: 2015-07-08T17:23:14Z Improved dostrings for Score commit eb66de590947ca8f887a8e52f8c66ec860b82af3 Author: Theodore Vasiloudis <t...@sics.se> Date: 2015-07-08T17:27:32Z Removed score function from Predictor. commit 13053ef358091427fa89c66305d602a84a819c87 Author: Theodore Vasiloudis <t...@sics.se> Date: 2015-07-09T13:52:18Z Added score operation. This operation is similar to the predict and evaluate operations, allowing type-dependent implementations of scoring functions for predictors. commit 859dd13432554a24a7ba9fcf356f4038048a1b27 Author: Theodore Vasiloudis <t...@sics.se> Date: 2015-07-10T12:52:10Z Adds simple score operation for regressor and classifier. This allows us to score classifiers and regressors without needing to provide a Scorer object, by using default scores instead. commit 15fbd9c0ee8d7eecc4085400a8f0b52709b6c4fe Author: Theodore Vasiloudis <t...@sics.se> Date: 2015-07-10T14:50:07Z Simple score operation for ALS ---- > Add score function for Predictors > --------------------------------- > > Key: FLINK-2108 > URL: https://issues.apache.org/jira/browse/FLINK-2108 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library > Reporter: Theodore Vasiloudis > Assignee: Theodore Vasiloudis > Priority: Minor > Labels: ML > > A score function for Predictor implementations should take a DataSet[(I, O)] > and an (optional) scoring measure and return a score. > The DataSet[(I, O)] would probably be the output of the predict function. > For example in MultipleLinearRegression, we can call predict on a labeled > dataset, get back predictions for each item in the data, and then call score > with the resulting dataset as an argument and we should get back a score for > the prediction quality, such as the R^2 score. -- This message was sent by Atlassian JIRA (v6.3.4#6332)