[jira] [Created] (FLINK-5841) Algorithms for each pipeline stage should handle NaN, infinity like in scikit-learn

2017-02-18 Thread Stavros Kontopoulos (JIRA)
Stavros Kontopoulos created FLINK-5841:
--

 Summary: Algorithms for each pipeline stage should handle NaN, 
infinity like in scikit-learn
 Key: FLINK-5841
 URL: https://issues.apache.org/jira/browse/FLINK-5841
 Project: Flink
  Issue Type: Bug
  Components: Machine Learning Library
Reporter: Stavros Kontopoulos
Assignee: Stavros Kontopoulos


Algorithms in scikit-learn don't accept NaN, Infinity values. Since we are 
following the scikit-learn approach we should conform that.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-5785) Add an Imputer for preparing data

2017-02-13 Thread Stavros Kontopoulos (JIRA)
Stavros Kontopoulos created FLINK-5785:
--

 Summary: Add an Imputer for preparing data
 Key: FLINK-5785
 URL: https://issues.apache.org/jira/browse/FLINK-5785
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Stavros Kontopoulos


We need to add an Imputer as described in [1].

"The Imputer class provides basic strategies for imputing missing values, 
either using the mean, the median or the most frequent value of the row or 
column in which the missing values are located. This class also allows for 
different missing values encodings."

References
1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing
2. 
http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-5588) Add a unit scaler based on different norms

2017-01-20 Thread Stavros Kontopoulos (JIRA)
Stavros Kontopoulos created FLINK-5588:
--

 Summary: Add a unit scaler based on different norms
 Key: FLINK-5588
 URL: https://issues.apache.org/jira/browse/FLINK-5588
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Stavros Kontopoulos
Priority: Minor


So far ML has two scalers: min-max and the standard.
A third one used is the scaler to unit.
We could implement a transformer for this type of scaling for different norms 
available to the user.

Resources
[1] https://en.wikipedia.org/wiki/Feature_scaling



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5525) Streaming Version of a Linear Regression model

2017-01-17 Thread Stavros Kontopoulos (JIRA)
Stavros Kontopoulos created FLINK-5525:
--

 Summary: Streaming Version of a Linear Regression model
 Key: FLINK-5525
 URL: https://issues.apache.org/jira/browse/FLINK-5525
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Stavros Kontopoulos


Given the nature of Flink we should have a streaming version of the algorithms 
when possible.
Update of the model should be done on a per window basis.
An extreme case is: https://en.wikipedia.org/wiki/Online_machine_learning

Resources

[1] 
http://scikit-learn.org/dev/modules/scaling_strategies.html#incremental-learning
[2] 
http://stats.stackexchange.com/questions/6920/efficient-online-linear-regression




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)