[ 
https://issues.apache.org/jira/browse/FLINK-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622406#comment-14622406
 ] 

ASF GitHub Bot commented on FLINK-2108:
---------------------------------------

GitHub user thvasilo opened a pull request:

    https://github.com/apache/flink/pull/902

    [FLINK-2108] [ml] [WIP]  Add score function for Predictors

    This PR build upon the evaluation PR currently under review (#871) and adds 
to new operations to the Predictor class, one that takes a scorer and a test 
set and produces a score as an evaluation of the Predictor performance using 
the provided score, and one that takes only a test set and a default score is 
used.
    
    This PR includes implementations for custom scores and simple scores for 
all Predictor implementations, either through the Classifier and Regressor base 
classes, or specific ones, like the one provided for ALS. The provided score 
custom score operation currently expects DataSet[LabeledVector] as the type of 
test set and Double as the type of prediction.
    
    TODO: Docs and code cleanup

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/thvasilo/flink score-operation

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/902.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #902
    
----
commit 305b43a451af3d8bc859671476c215308fbfc7fc
Author: mikiobraun <mikiobr...@gmail.com>
Date:   2015-06-22T15:04:42Z

    Adding some first loss functions for the evaluation framework

commit bdb1a6912d2bcec29446ca4a9fbc550f2ecb8f4a
Author: Theodore Vasiloudis <t...@sics.se>
Date:   2015-06-23T14:07:48Z

    Scorer for evaluation

commit 4a7593ade68f43d444a6b289191f053a4ea8b031
Author: Theodore Vasiloudis <t...@sics.se>
Date:   2015-06-25T09:41:10Z

    Adds accuracy score and R^2 score. Also trying out Scores as classes 
instead of functions.
    
    Not too happy with the extra biolerplate of Score as classes will probably 
revert,
    and have objects like RegressionsScores, ClassificationScores that contain 
the definitions
    of the relevant scores.

commit 5c89c478bd00f168bfe48954d06367b28f948571
Author: Theodore Vasiloudis <t...@sics.se>
Date:   2015-06-26T11:30:56Z

    Adds a evaluate operation for LabeledVector input

commit e7bb4b42424641d640df370cd6ace71f7f42ee8d
Author: Theodore Vasiloudis <t...@sics.se>
Date:   2015-06-26T11:32:13Z

    Adds Regressor interface, and a score function for regression algorithms.

commit 3d8a6928b02b30c732f282df61613561dbf8d4fc
Author: Theodore Vasiloudis <t...@sics.se>
Date:   2015-06-30T14:04:58Z

    Added Classifier intermediate class, and default score function for 
classifiers.

commit e1a26ed30bb784633685703892f67d51136f6060
Author: Theodore Vasiloudis <t...@sics.se>
Date:   2015-07-01T08:20:41Z

    Going back to having scores defined in objects instead of their own classes.

commit 0dd251a5a59cd610c4df3e9a1ea3921b1a9cc2e0
Author: Theodore Vasiloudis <t...@sics.se>
Date:   2015-07-01T13:00:37Z

    Removed ParameterMap from predict function of PredictOperation

commit 492e9a383af6285f0fdca5031d2bd7bdfe3cd511
Author: Theodore Vasiloudis <t...@sics.se>
Date:   2015-07-02T10:21:28Z

    Reworked score functionality allow chained Predictors.
    
    All predictors must now implement a calculateScore function.
    We are for now assuming that predictors are supervised learning algorithms,
    once unsupervised learning algorithms are added this will need to be 
reworked.
    
    Also added an evaluate dataset operation to ALS, to allow for scoring of the
    algorithm. Default performance measure for ALS is RMSE.

commit d9715ed3a6faba78e0b34368425768e826d5a736
Author: Theodore Vasiloudis <t...@sics.se>
Date:   2015-07-06T08:50:59Z

    Made calculateScore only take DataSet[(Double, Double)]

commit 7f1a6da52dfcd47d39c39cee2141112e5c10ddad
Author: Theodore Vasiloudis <t...@sics.se>
Date:   2015-07-07T08:15:58Z

    Added test for DataSet.mean()

commit edbe3dd9ea48d168f67a9ff231f8373a6aaee38d
Author: Theodore Vasiloudis <t...@sics.se>
Date:   2015-07-07T12:11:45Z

    Switched from cross to mapWithBcVariable

commit e840c14032f5fea3b476e1a99122eb9125ba5a4f
Author: Theodore Vasiloudis <t...@sics.se>
Date:   2015-07-08T16:48:27Z

    Addressed multiple PR comments.

commit 6e48b612f0a367e798e40590f6921d4dc242f2aa
Author: Theodore Vasiloudis <t...@sics.se>
Date:   2015-07-08T17:06:42Z

    Add approximatelyEquals for Doubles, used for score calculation.

commit 57d0ef2c4bc268d1d870c7aab537dd611f464fcf
Author: Theodore Vasiloudis <t...@sics.se>
Date:   2015-07-08T17:23:14Z

    Improved dostrings for Score

commit eb66de590947ca8f887a8e52f8c66ec860b82af3
Author: Theodore Vasiloudis <t...@sics.se>
Date:   2015-07-08T17:27:32Z

    Removed score function from Predictor.

commit 13053ef358091427fa89c66305d602a84a819c87
Author: Theodore Vasiloudis <t...@sics.se>
Date:   2015-07-09T13:52:18Z

    Added score operation.
    
    This operation is similar to the predict and evaluate operations,
    allowing type-dependent implementations of scoring functions for
    predictors.

commit 859dd13432554a24a7ba9fcf356f4038048a1b27
Author: Theodore Vasiloudis <t...@sics.se>
Date:   2015-07-10T12:52:10Z

    Adds simple score operation for regressor and classifier.
    
    This allows us to score classifiers and regressors without needing
    to provide a Scorer object, by using default scores instead.

commit 15fbd9c0ee8d7eecc4085400a8f0b52709b6c4fe
Author: Theodore Vasiloudis <t...@sics.se>
Date:   2015-07-10T14:50:07Z

    Simple score operation for ALS

----


> Add score function for Predictors
> ---------------------------------
>
>                 Key: FLINK-2108
>                 URL: https://issues.apache.org/jira/browse/FLINK-2108
>             Project: Flink
>          Issue Type: Improvement
>          Components: Machine Learning Library
>            Reporter: Theodore Vasiloudis
>            Assignee: Theodore Vasiloudis
>            Priority: Minor
>              Labels: ML
>
> A score function for Predictor implementations should take a DataSet[(I, O)] 
> and an (optional) scoring measure and return a score.
> The DataSet[(I, O)] would probably be the output of the predict function.
> For example in MultipleLinearRegression, we can call predict on a labeled 
> dataset, get back predictions for each item in the data, and then call score 
> with the resulting dataset as an argument and we should get back a score for 
> the prediction quality, such as the R^2 score.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to