[jira] [Created] (FLINK-6668) Add flink history server to DCOS
Stavros Kontopoulos created FLINK-6668: -- Summary: Add flink history server to DCOS Key: FLINK-6668 URL: https://issues.apache.org/jira/browse/FLINK-6668 Project: Flink Issue Type: New Feature Components: Mesos Reporter: Stavros Kontopoulos We need to have history server within dc/os env as with the spark case. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (FLINK-5841) Algorithms for each pipeline stage should handle NaN, infinity like in scikit-learn
Stavros Kontopoulos created FLINK-5841: -- Summary: Algorithms for each pipeline stage should handle NaN, infinity like in scikit-learn Key: FLINK-5841 URL: https://issues.apache.org/jira/browse/FLINK-5841 Project: Flink Issue Type: Bug Components: Machine Learning Library Reporter: Stavros Kontopoulos Assignee: Stavros Kontopoulos Algorithms in scikit-learn don't accept NaN, Infinity values. Since we are following the scikit-learn approach we should conform that. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (FLINK-5785) Add an Imputer for preparing data
Stavros Kontopoulos created FLINK-5785: -- Summary: Add an Imputer for preparing data Key: FLINK-5785 URL: https://issues.apache.org/jira/browse/FLINK-5785 Project: Flink Issue Type: New Feature Components: Machine Learning Library Reporter: Stavros Kontopoulos We need to add an Imputer as described in [1]. "The Imputer class provides basic strategies for imputing missing values, either using the mean, the median or the most frequent value of the row or column in which the missing values are located. This class also allows for different missing values encodings." References 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing 2. http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (FLINK-5588) Add a unit scaler based on different norms
Stavros Kontopoulos created FLINK-5588: -- Summary: Add a unit scaler based on different norms Key: FLINK-5588 URL: https://issues.apache.org/jira/browse/FLINK-5588 Project: Flink Issue Type: New Feature Components: Machine Learning Library Reporter: Stavros Kontopoulos Priority: Minor So far ML has two scalers: min-max and the standard. A third one used is the scaler to unit. We could implement a transformer for this type of scaling for different norms available to the user. Resources [1] https://en.wikipedia.org/wiki/Feature_scaling -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5525) Streaming Version of a Linear Regression model
Stavros Kontopoulos created FLINK-5525: -- Summary: Streaming Version of a Linear Regression model Key: FLINK-5525 URL: https://issues.apache.org/jira/browse/FLINK-5525 Project: Flink Issue Type: New Feature Components: Machine Learning Library Reporter: Stavros Kontopoulos Given the nature of Flink we should have a streaming version of the algorithms when possible. Update of the model should be done on a per window basis. An extreme case is: https://en.wikipedia.org/wiki/Online_machine_learning Resources [1] http://scikit-learn.org/dev/modules/scaling_strategies.html#incremental-learning [2] http://stats.stackexchange.com/questions/6920/efficient-online-linear-regression -- This message was sent by Atlassian JIRA (v6.3.4#6332)