[
https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15943528#comment-15943528
]
ASF GitHub Bot commented on FLINK-5785:
---------------------------------------
GitHub user p4nna opened a pull request:
https://github.com/apache/flink/pull/3625
[FLINK-5785] Add an Imputer for preparing data
Provides an Imputer for sparse DataSets of Vectors.
Adds missing values with the mean, median or most frequent value of each
vector resp. dimension
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/p4nna/flink ml-Imputer-edits
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3625.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3625
----
commit f2875ac5890564213d5f055d710976d1fede3962
Author: p4nna <[email protected]>
Date: 2017-03-27T09:47:39Z
Add files via upload
commit 8e6909b52dad34d6c4cd6c84618616ac50cd83d1
Author: p4nna <[email protected]>
Date: 2017-03-27T09:49:59Z
Test for Imputer class
Two testclasses which test the functions implemented in the new imputer
class. One for the rowwise imputing over all vectors and one for the vectorwise
imputing
commit 0c420a84c136b330135ce180db04d899b5a6f54c
Author: p4nna <[email protected]>
Date: 2017-03-27T09:56:51Z
removed unused imports and methods
commit 9136607e84a0297bb4fb24a53bad9950b86bf116
Author: p4nna <[email protected]>
Date: 2017-03-27T15:58:37Z
Imputer was added
adds missing values in sparse DataSets of Vectors
----
> Add an Imputer for preparing data
> ---------------------------------
>
> Key: FLINK-5785
> URL: https://issues.apache.org/jira/browse/FLINK-5785
> Project: Flink
> Issue Type: New Feature
> Components: Machine Learning Library
> Reporter: Stavros Kontopoulos
> Assignee: Stavros Kontopoulos
>
> We need to add an Imputer as described in [1].
> "The Imputer class provides basic strategies for imputing missing values,
> either using the mean, the median or the most frequent value of the row or
> column in which the missing values are located. This class also allows for
> different missing values encodings."
> References
> 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing
> 2.
> http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)