[jira] [Commented] (FLINK-1844) Add Normaliser to ML library

2015-06-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14578569#comment-14578569
 ] 

ASF GitHub Bot commented on FLINK-1844:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/798


> Add Normaliser to ML library
> 
>
> Key: FLINK-1844
> URL: https://issues.apache.org/jira/browse/FLINK-1844
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Faye Beligianni
>Assignee: Faye Beligianni
>Priority: Minor
>  Labels: ML, Starter
>
> In many algorithms in ML, the features' values would be better to lie between 
> a given range of values, usually in the range (0,1) [1]. Therefore, a 
> {{Transformer}} could be implemented to achieve that normalisation.
> Resources: 
> [1][http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1844) Add Normaliser to ML library

2015-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577360#comment-14577360
 ] 

ASF GitHub Bot commented on FLINK-1844:
---

Github user fobeligi commented on a diff in the pull request:

https://github.com/apache/flink/pull/798#discussion_r31927083
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/preprocessing/MinMaxScaler.scala
 ---
@@ -0,0 +1,254 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.ml.preprocessing
+
+import breeze.linalg
+import breeze.linalg.{max, min}
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.scala._
+import org.apache.flink.ml._
+import org.apache.flink.ml.common.{LabeledVector, Parameter, ParameterMap}
+import org.apache.flink.ml.math.Breeze._
+import org.apache.flink.ml.math.{BreezeVectorConverter, Vector}
+import org.apache.flink.ml.pipeline.{FitOperation, TransformOperation, 
Transformer}
+import org.apache.flink.ml.preprocessing.MinMaxScaler.{Max, Min}
+
+import scala.reflect.ClassTag
+
+/** Scales observations, so that all features are in a user-specified 
range.
+  * By default for [[MinMaxScaler]] transformer range = [0,1].
+  *
+  * This transformer takes a subtype of  [[Vector]] of values and maps it 
to a
+  * scaled subtype of [[Vector]] such that each feature lies between a 
user-specified range.
+  *
+  * This transformer can be prepended to all [[Transformer]] and
+  * [[org.apache.flink.ml.pipeline.Predictor]] implementations which 
expect as input a subtype
+  * of [[Vector]].
+  *
+  * @example
+  * {{{
+  *   val trainingDS: DataSet[Vector] = 
env.fromCollection(data)
+  *   val transformer = MinMaxScaler().setMin(-1.0)
+  *
+  *   transformer.fit(trainingDS)
+  *   val transformedDS = transformer.transform(trainingDS)
+  * }}}
+  *
+  * =Parameters=
+  *
+  * - [[Min]]: The minimum value of the range of the transformed data set; 
by default equal to 0
+  * - [[Max]]: The maximum value of the range of the transformed data set; 
by default
+  * equal to 1
+  */
+class MinMaxScaler extends Transformer[MinMaxScaler] {
+
+  var metricsOption: Option[DataSet[(linalg.Vector[Double], 
linalg.Vector[Double])]] = None
--- End diff --

Yes ^^


> Add Normaliser to ML library
> 
>
> Key: FLINK-1844
> URL: https://issues.apache.org/jira/browse/FLINK-1844
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Faye Beligianni
>Assignee: Faye Beligianni
>Priority: Minor
>  Labels: ML, Starter
>
> In many algorithms in ML, the features' values would be better to lie between 
> a given range of values, usually in the range (0,1) [1]. Therefore, a 
> {{Transformer}} could be implemented to achieve that normalisation.
> Resources: 
> [1][http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1844) Add Normaliser to ML library

2015-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577346#comment-14577346
 ] 

ASF GitHub Bot commented on FLINK-1844:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/798#discussion_r31926077
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/preprocessing/MinMaxScaler.scala
 ---
@@ -0,0 +1,254 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.ml.preprocessing
+
+import breeze.linalg
+import breeze.linalg.{max, min}
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.scala._
+import org.apache.flink.ml._
+import org.apache.flink.ml.common.{LabeledVector, Parameter, ParameterMap}
+import org.apache.flink.ml.math.Breeze._
+import org.apache.flink.ml.math.{BreezeVectorConverter, Vector}
+import org.apache.flink.ml.pipeline.{FitOperation, TransformOperation, 
Transformer}
+import org.apache.flink.ml.preprocessing.MinMaxScaler.{Max, Min}
+
+import scala.reflect.ClassTag
+
+/** Scales observations, so that all features are in a user-specified 
range.
+  * By default for [[MinMaxScaler]] transformer range = [0,1].
+  *
+  * This transformer takes a subtype of  [[Vector]] of values and maps it 
to a
+  * scaled subtype of [[Vector]] such that each feature lies between a 
user-specified range.
+  *
+  * This transformer can be prepended to all [[Transformer]] and
+  * [[org.apache.flink.ml.pipeline.Predictor]] implementations which 
expect as input a subtype
+  * of [[Vector]].
+  *
+  * @example
+  * {{{
+  *   val trainingDS: DataSet[Vector] = 
env.fromCollection(data)
+  *   val transformer = MinMaxScaler().setMin(-1.0)
+  *
+  *   transformer.fit(trainingDS)
+  *   val transformedDS = transformer.transform(trainingDS)
+  * }}}
+  *
+  * =Parameters=
+  *
+  * - [[Min]]: The minimum value of the range of the transformed data set; 
by default equal to 0
+  * - [[Max]]: The maximum value of the range of the transformed data set; 
by default
+  * equal to 1
+  */
+class MinMaxScaler extends Transformer[MinMaxScaler] {
+
+  var metricsOption: Option[DataSet[(linalg.Vector[Double], 
linalg.Vector[Double])]] = None
--- End diff --

Package private should be ok, since the test is in the same package, right?


> Add Normaliser to ML library
> 
>
> Key: FLINK-1844
> URL: https://issues.apache.org/jira/browse/FLINK-1844
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Faye Beligianni
>Assignee: Faye Beligianni
>Priority: Minor
>  Labels: ML, Starter
>
> In many algorithms in ML, the features' values would be better to lie between 
> a given range of values, usually in the range (0,1) [1]. Therefore, a 
> {{Transformer}} could be implemented to achieve that normalisation.
> Resources: 
> [1][http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1844) Add Normaliser to ML library

2015-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577329#comment-14577329
 ] 

ASF GitHub Bot commented on FLINK-1844:
---

Github user fobeligi commented on a diff in the pull request:

https://github.com/apache/flink/pull/798#discussion_r31924947
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/preprocessing/MinMaxScaler.scala
 ---
@@ -0,0 +1,254 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.ml.preprocessing
+
+import breeze.linalg
+import breeze.linalg.{max, min}
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.scala._
+import org.apache.flink.ml._
+import org.apache.flink.ml.common.{LabeledVector, Parameter, ParameterMap}
+import org.apache.flink.ml.math.Breeze._
+import org.apache.flink.ml.math.{BreezeVectorConverter, Vector}
+import org.apache.flink.ml.pipeline.{FitOperation, TransformOperation, 
Transformer}
+import org.apache.flink.ml.preprocessing.MinMaxScaler.{Max, Min}
+
+import scala.reflect.ClassTag
+
+/** Scales observations, so that all features are in a user-specified 
range.
+  * By default for [[MinMaxScaler]] transformer range = [0,1].
+  *
+  * This transformer takes a subtype of  [[Vector]] of values and maps it 
to a
+  * scaled subtype of [[Vector]] such that each feature lies between a 
user-specified range.
+  *
+  * This transformer can be prepended to all [[Transformer]] and
+  * [[org.apache.flink.ml.pipeline.Predictor]] implementations which 
expect as input a subtype
+  * of [[Vector]].
+  *
+  * @example
+  * {{{
+  *   val trainingDS: DataSet[Vector] = 
env.fromCollection(data)
+  *   val transformer = MinMaxScaler().setMin(-1.0)
+  *
+  *   transformer.fit(trainingDS)
+  *   val transformedDS = transformer.transform(trainingDS)
+  * }}}
+  *
+  * =Parameters=
+  *
+  * - [[Min]]: The minimum value of the range of the transformed data set; 
by default equal to 0
+  * - [[Max]]: The maximum value of the range of the transformed data set; 
by default
+  * equal to 1
+  */
+class MinMaxScaler extends Transformer[MinMaxScaler] {
+
+  var metricsOption: Option[DataSet[(linalg.Vector[Double], 
linalg.Vector[Double])]] = None
--- End diff --

Hey, if the {{metricsOption}} field is package private then my tests will 
fail, cause I am also testing in the {{MinMaxScalerITSuite}} if the min, max of 
each feature has been calculated correct.


> Add Normaliser to ML library
> 
>
> Key: FLINK-1844
> URL: https://issues.apache.org/jira/browse/FLINK-1844
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Faye Beligianni
>Assignee: Faye Beligianni
>Priority: Minor
>  Labels: ML, Starter
>
> In many algorithms in ML, the features' values would be better to lie between 
> a given range of values, usually in the range (0,1) [1]. Therefore, a 
> {{Transformer}} could be implemented to achieve that normalisation.
> Resources: 
> [1][http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1844) Add Normaliser to ML library

2015-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577298#comment-14577298
 ] 

ASF GitHub Bot commented on FLINK-1844:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/798#discussion_r31922838
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/preprocessing/MinMaxScaler.scala
 ---
@@ -0,0 +1,254 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.ml.preprocessing
+
+import breeze.linalg
+import breeze.linalg.{max, min}
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.scala._
+import org.apache.flink.ml._
+import org.apache.flink.ml.common.{LabeledVector, Parameter, ParameterMap}
+import org.apache.flink.ml.math.Breeze._
+import org.apache.flink.ml.math.{BreezeVectorConverter, Vector}
+import org.apache.flink.ml.pipeline.{FitOperation, TransformOperation, 
Transformer}
+import org.apache.flink.ml.preprocessing.MinMaxScaler.{Max, Min}
+
+import scala.reflect.ClassTag
+
+/** Scales observations, so that all features are in a user-specified 
range.
+  * By default for [[MinMaxScaler]] transformer range = [0,1].
+  *
+  * This transformer takes a subtype of  [[Vector]] of values and maps it 
to a
+  * scaled subtype of [[Vector]] such that each feature lies between a 
user-specified range.
+  *
+  * This transformer can be prepended to all [[Transformer]] and
+  * [[org.apache.flink.ml.pipeline.Predictor]] implementations which 
expect as input a subtype
+  * of [[Vector]].
+  *
+  * @example
+  * {{{
+  *   val trainingDS: DataSet[Vector] = 
env.fromCollection(data)
+  *   val transformer = MinMaxScaler().setMin(-1.0)
+  *
+  *   transformer.fit(trainingDS)
+  *   val transformedDS = transformer.transform(trainingDS)
+  * }}}
+  *
+  * =Parameters=
+  *
+  * - [[Min]]: The minimum value of the range of the transformed data set; 
by default equal to 0
+  * - [[Max]]: The maximum value of the range of the transformed data set; 
by default
+  * equal to 1
+  */
+class MinMaxScaler extends Transformer[MinMaxScaler] {
+
+  var metricsOption: Option[DataSet[(linalg.Vector[Double], 
linalg.Vector[Double])]] = None
--- End diff --

Will make the field package private.


> Add Normaliser to ML library
> 
>
> Key: FLINK-1844
> URL: https://issues.apache.org/jira/browse/FLINK-1844
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Faye Beligianni
>Assignee: Faye Beligianni
>Priority: Minor
>  Labels: ML, Starter
>
> In many algorithms in ML, the features' values would be better to lie between 
> a given range of values, usually in the range (0,1) [1]. Therefore, a 
> {{Transformer}} could be implemented to achieve that normalisation.
> Resources: 
> [1][http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1844) Add Normaliser to ML library

2015-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577292#comment-14577292
 ] 

ASF GitHub Bot commented on FLINK-1844:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/798#discussion_r31922171
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/preprocessing/MinMaxScaler.scala
 ---
@@ -0,0 +1,254 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.ml.preprocessing
+
+import breeze.linalg
+import breeze.linalg.{max, min}
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.scala._
+import org.apache.flink.ml._
+import org.apache.flink.ml.common.{LabeledVector, Parameter, ParameterMap}
+import org.apache.flink.ml.math.Breeze._
+import org.apache.flink.ml.math.{BreezeVectorConverter, Vector}
+import org.apache.flink.ml.pipeline.{FitOperation, TransformOperation, 
Transformer}
+import org.apache.flink.ml.preprocessing.MinMaxScaler.{Max, Min}
+
+import scala.reflect.ClassTag
+
+/** Scales observations, so that all features are in a user-specified 
range.
+  * By default for [[MinMaxScaler]] transformer range = [0,1].
+  *
+  * This transformer takes a subtype of  [[Vector]] of values and maps it 
to a
+  * scaled subtype of [[Vector]] such that each feature lies between a 
user-specified range.
+  *
+  * This transformer can be prepended to all [[Transformer]] and
+  * [[org.apache.flink.ml.pipeline.Predictor]] implementations which 
expect as input a subtype
+  * of [[Vector]].
+  *
+  * @example
+  * {{{
+  *   val trainingDS: DataSet[Vector] = 
env.fromCollection(data)
+  *   val transformer = MinMaxScaler().setMin(-1.0)
+  *
+  *   transformer.fit(trainingDS)
+  *   val transformedDS = transformer.transform(trainingDS)
+  * }}}
+  *
+  * =Parameters=
+  *
+  * - [[Min]]: The minimum value of the range of the transformed data set; 
by default equal to 0
+  * - [[Max]]: The maximum value of the range of the transformed data set; 
by default
+  * equal to 1
+  */
+class MinMaxScaler extends Transformer[MinMaxScaler] {
+
+  var metricsOption: Option[DataSet[(linalg.Vector[Double], 
linalg.Vector[Double])]] = None
--- End diff --

As private state, the developer should be able to choose any type. Thus, a 
`BreezeVector` should be fine here. I was just wondering, whether a 
`DenseVector` does not make more sense here. Is it safe to assume that every 
feature has at least 2 non-zero values?


> Add Normaliser to ML library
> 
>
> Key: FLINK-1844
> URL: https://issues.apache.org/jira/browse/FLINK-1844
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Faye Beligianni
>Assignee: Faye Beligianni
>Priority: Minor
>  Labels: ML, Starter
>
> In many algorithms in ML, the features' values would be better to lie between 
> a given range of values, usually in the range (0,1) [1]. Therefore, a 
> {{Transformer}} could be implemented to achieve that normalisation.
> Resources: 
> [1][http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1844) Add Normaliser to ML library

2015-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577286#comment-14577286
 ] 

ASF GitHub Bot commented on FLINK-1844:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/798#discussion_r31921747
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/preprocessing/MinMaxScaler.scala
 ---
@@ -0,0 +1,254 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.ml.preprocessing
+
+import breeze.linalg
+import breeze.linalg.{max, min}
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.scala._
+import org.apache.flink.ml._
+import org.apache.flink.ml.common.{LabeledVector, Parameter, ParameterMap}
+import org.apache.flink.ml.math.Breeze._
+import org.apache.flink.ml.math.{BreezeVectorConverter, Vector}
+import org.apache.flink.ml.pipeline.{FitOperation, TransformOperation, 
Transformer}
+import org.apache.flink.ml.preprocessing.MinMaxScaler.{Max, Min}
+
+import scala.reflect.ClassTag
+
+/** Scales observations, so that all features are in a user-specified 
range.
+  * By default for [[MinMaxScaler]] transformer range = [0,1].
+  *
+  * This transformer takes a subtype of  [[Vector]] of values and maps it 
to a
+  * scaled subtype of [[Vector]] such that each feature lies between a 
user-specified range.
+  *
+  * This transformer can be prepended to all [[Transformer]] and
+  * [[org.apache.flink.ml.pipeline.Predictor]] implementations which 
expect as input a subtype
+  * of [[Vector]].
+  *
+  * @example
+  * {{{
+  *   val trainingDS: DataSet[Vector] = 
env.fromCollection(data)
+  *   val transformer = MinMaxScaler().setMin(-1.0)
+  *
+  *   transformer.fit(trainingDS)
+  *   val transformedDS = transformer.transform(trainingDS)
+  * }}}
+  *
+  * =Parameters=
+  *
+  * - [[Min]]: The minimum value of the range of the transformed data set; 
by default equal to 0
+  * - [[Max]]: The maximum value of the range of the transformed data set; 
by default
+  * equal to 1
+  */
+class MinMaxScaler extends Transformer[MinMaxScaler] {
+
+  var metricsOption: Option[DataSet[(linalg.Vector[Double], 
linalg.Vector[Double])]] = None
+
+  /** Sets the minimum for the range of the transformed data
+*
+* @param min the user-specified minimum value.
+* @return the MinMaxScaler instance with its minimum value set to the 
user-specified value.
+*/
+  def setMin(min: Double): MinMaxScaler = {
+parameters.add(Min, min)
+this
+  }
+
+  /** Sets the maximum for the range of the transformed data
+*
+* @param max the user-specified maximum value.
+* @return the MinMaxScaler instance with its minimum value set to the 
user-specified value.
+*/
+  def setMax(max: Double): MinMaxScaler = {
+parameters.add(Max, max)
+this
+  }
+}
+
+object MinMaxScaler {
+
+  // == Parameters 
=
+
+  case object Min extends Parameter[Double] {
+override val defaultValue: Option[Double] = Some(0.0)
+  }
+
+  case object Max extends Parameter[Double] {
+override val defaultValue: Option[Double] = Some(1.0)
+  }
+
+  //  Factory methods 
==
+
+  def apply(): MinMaxScaler = {
+new MinMaxScaler()
+  }
+
+  // == Operations 
=
+
+  /** Trains the [[org.apache.flink.ml.preprocessing.MinMaxScaler]] by 
learning the minimum and
+* maximum of each feature of the training data. These 

[jira] [Commented] (FLINK-1844) Add Normaliser to ML library

2015-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577284#comment-14577284
 ] 

ASF GitHub Bot commented on FLINK-1844:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/798#discussion_r31921716
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/preprocessing/MinMaxScaler.scala
 ---
@@ -0,0 +1,254 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.ml.preprocessing
+
+import breeze.linalg
+import breeze.linalg.{max, min}
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.scala._
+import org.apache.flink.ml._
+import org.apache.flink.ml.common.{LabeledVector, Parameter, ParameterMap}
+import org.apache.flink.ml.math.Breeze._
+import org.apache.flink.ml.math.{BreezeVectorConverter, Vector}
+import org.apache.flink.ml.pipeline.{FitOperation, TransformOperation, 
Transformer}
+import org.apache.flink.ml.preprocessing.MinMaxScaler.{Max, Min}
+
+import scala.reflect.ClassTag
+
+/** Scales observations, so that all features are in a user-specified 
range.
+  * By default for [[MinMaxScaler]] transformer range = [0,1].
+  *
+  * This transformer takes a subtype of  [[Vector]] of values and maps it 
to a
+  * scaled subtype of [[Vector]] such that each feature lies between a 
user-specified range.
+  *
+  * This transformer can be prepended to all [[Transformer]] and
+  * [[org.apache.flink.ml.pipeline.Predictor]] implementations which 
expect as input a subtype
+  * of [[Vector]].
+  *
+  * @example
+  * {{{
+  *   val trainingDS: DataSet[Vector] = 
env.fromCollection(data)
+  *   val transformer = MinMaxScaler().setMin(-1.0)
+  *
+  *   transformer.fit(trainingDS)
+  *   val transformedDS = transformer.transform(trainingDS)
+  * }}}
+  *
+  * =Parameters=
+  *
+  * - [[Min]]: The minimum value of the range of the transformed data set; 
by default equal to 0
+  * - [[Max]]: The maximum value of the range of the transformed data set; 
by default
+  * equal to 1
+  */
+class MinMaxScaler extends Transformer[MinMaxScaler] {
+
+  var metricsOption: Option[DataSet[(linalg.Vector[Double], 
linalg.Vector[Double])]] = None
+
+  /** Sets the minimum for the range of the transformed data
+*
+* @param min the user-specified minimum value.
+* @return the MinMaxScaler instance with its minimum value set to the 
user-specified value.
+*/
+  def setMin(min: Double): MinMaxScaler = {
+parameters.add(Min, min)
+this
+  }
+
+  /** Sets the maximum for the range of the transformed data
+*
+* @param max the user-specified maximum value.
+* @return the MinMaxScaler instance with its minimum value set to the 
user-specified value.
+*/
+  def setMax(max: Double): MinMaxScaler = {
+parameters.add(Max, max)
+this
+  }
+}
+
+object MinMaxScaler {
+
+  // == Parameters 
=
+
+  case object Min extends Parameter[Double] {
+override val defaultValue: Option[Double] = Some(0.0)
+  }
+
+  case object Max extends Parameter[Double] {
+override val defaultValue: Option[Double] = Some(1.0)
+  }
+
+  //  Factory methods 
==
+
+  def apply(): MinMaxScaler = {
+new MinMaxScaler()
+  }
+
+  // == Operations 
=
+
+  /** Trains the [[org.apache.flink.ml.preprocessing.MinMaxScaler]] by 
learning the minimum and
+* maximum of each feature of the training data. These 

[jira] [Commented] (FLINK-1844) Add Normaliser to ML library

2015-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577280#comment-14577280
 ] 

ASF GitHub Bot commented on FLINK-1844:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/798#discussion_r31921435
  
--- Diff: docs/libs/ml/minMax_scaler.md ---
@@ -0,0 +1,112 @@
+---
+mathjax: include
+htmlTitle: FlinkML - MinMax Scaler
+title: FlinkML - MinMax Scaler
+---
+
+
+* This will be replaced by the TOC
+{:toc}
+
+## Description
+
+ The MinMax scaler scales the given data set, so that all values will lie 
between a user specified range [min,max].
+ In case the user does not provide a specific minimum and maximum value 
for the scaling range, the MinMax scaler transforms the features of the input 
data set to lie in the [0,1] interval.
+ Given a set of input data $x_1, x_2,... x_n$, with minimum value:
+
+ $$x_{min} = min({x_1, x_2,..., x_n})$$
+
+ and maximum value:
+
+ $$x_{max} = max({x_1, x_2,..., x_n})$$
+
+The scaled data set $z_1, z_2,...,z_n$ will be:
+
+ $$z_{i}= \frac{x_{i} - x_{min}}{x_{max} - x_{min}} \left ( max - min 
\right ) + min$$
+
+where $\textit{min}$ and $\textit{max}$ are the user specified minimum and 
maximum values of the range to scale.
+
+## Operations
+
+`MinMaxScaler` is a `Transformer`.
+As such, it supports the `fit` and `transform` operation.
+
+### Fit
+
+MinMaxScaler is trained on all subtypes of `Vector` or `LabeledVector`:
+
+* `fit[T <: Vector]: DataSet[T] => Unit`
+* `fit: DataSet[LabeledVector] => Unit`
+
+### Transform
+
+MinMaxScaler transforms all subtypes of `Vector` or `LabeledVector` into 
the respective type:
+
+* `transform[T <: Vector]: DataSet[T] => DataSet[T]`
+* `transform: DataSet[LabeledVector] => DataSet[LabeledVector]`
+
+## Parameters
+
+The MinMax scaler implementation can be controlled by the following two 
parameters:
+
+ 
+  
+
+  Parameters
+  Description
+
+  
+
+  
+
+  Min
+  
+
+  The minimum value of the range for the scaled data set. (Default 
value: 0.0)
+
+  
+
+
+  Max
+  
+
+  The maximum value of the range for the scaled data set. (Default 
value: 1.0)
+
+  
+
+  
+
+
+## Examples
+
+{% highlight scala %}
+// Create MinMax scaler transformer
+val minMaxscaler = MinMaxScaler()
+.setMin(-1.0)
--- End diff --

Will address this when merging.


> Add Normaliser to ML library
> 
>
> Key: FLINK-1844
> URL: https://issues.apache.org/jira/browse/FLINK-1844
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Faye Beligianni
>Assignee: Faye Beligianni
>Priority: Minor
>  Labels: ML, Starter
>
> In many algorithms in ML, the features' values would be better to lie between 
> a given range of values, usually in the range (0,1) [1]. Therefore, a 
> {{Transformer}} could be implemented to achieve that normalisation.
> Resources: 
> [1][http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1844) Add Normaliser to ML library

2015-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577279#comment-14577279
 ] 

ASF GitHub Bot commented on FLINK-1844:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/798#discussion_r31921419
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/preprocessing/MinMaxScaler.scala
 ---
@@ -0,0 +1,254 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.ml.preprocessing
+
+import breeze.linalg
+import breeze.linalg.{max, min}
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.scala._
+import org.apache.flink.ml._
+import org.apache.flink.ml.common.{LabeledVector, Parameter, ParameterMap}
+import org.apache.flink.ml.math.Breeze._
+import org.apache.flink.ml.math.{BreezeVectorConverter, Vector}
+import org.apache.flink.ml.pipeline.{FitOperation, TransformOperation, 
Transformer}
+import org.apache.flink.ml.preprocessing.MinMaxScaler.{Max, Min}
+
+import scala.reflect.ClassTag
+
+/** Scales observations, so that all features are in a user-specified 
range.
+  * By default for [[MinMaxScaler]] transformer range = [0,1].
+  *
+  * This transformer takes a subtype of  [[Vector]] of values and maps it 
to a
+  * scaled subtype of [[Vector]] such that each feature lies between a 
user-specified range.
+  *
+  * This transformer can be prepended to all [[Transformer]] and
+  * [[org.apache.flink.ml.pipeline.Predictor]] implementations which 
expect as input a subtype
+  * of [[Vector]].
--- End diff --

You're right. Will add it when I merge it.


> Add Normaliser to ML library
> 
>
> Key: FLINK-1844
> URL: https://issues.apache.org/jira/browse/FLINK-1844
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Faye Beligianni
>Assignee: Faye Beligianni
>Priority: Minor
>  Labels: ML, Starter
>
> In many algorithms in ML, the features' values would be better to lie between 
> a given range of values, usually in the range (0,1) [1]. Therefore, a 
> {{Transformer}} could be implemented to achieve that normalisation.
> Resources: 
> [1][http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1844) Add Normaliser to ML library

2015-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577193#comment-14577193
 ] 

ASF GitHub Bot commented on FLINK-1844:
---

Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/798#discussion_r31914466
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/preprocessing/MinMaxScaler.scala
 ---
@@ -0,0 +1,254 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.ml.preprocessing
+
+import breeze.linalg
+import breeze.linalg.{max, min}
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.scala._
+import org.apache.flink.ml._
+import org.apache.flink.ml.common.{LabeledVector, Parameter, ParameterMap}
+import org.apache.flink.ml.math.Breeze._
+import org.apache.flink.ml.math.{BreezeVectorConverter, Vector}
+import org.apache.flink.ml.pipeline.{FitOperation, TransformOperation, 
Transformer}
+import org.apache.flink.ml.preprocessing.MinMaxScaler.{Max, Min}
+
+import scala.reflect.ClassTag
+
+/** Scales observations, so that all features are in a user-specified 
range.
+  * By default for [[MinMaxScaler]] transformer range = [0,1].
+  *
+  * This transformer takes a subtype of  [[Vector]] of values and maps it 
to a
+  * scaled subtype of [[Vector]] such that each feature lies between a 
user-specified range.
+  *
+  * This transformer can be prepended to all [[Transformer]] and
+  * [[org.apache.flink.ml.pipeline.Predictor]] implementations which 
expect as input a subtype
+  * of [[Vector]].
+  *
+  * @example
+  * {{{
+  *   val trainingDS: DataSet[Vector] = 
env.fromCollection(data)
+  *   val transformer = MinMaxScaler().setMin(-1.0)
+  *
+  *   transformer.fit(trainingDS)
+  *   val transformedDS = transformer.transform(trainingDS)
+  * }}}
+  *
+  * =Parameters=
+  *
+  * - [[Min]]: The minimum value of the range of the transformed data set; 
by default equal to 0
+  * - [[Max]]: The maximum value of the range of the transformed data set; 
by default
+  * equal to 1
+  */
+class MinMaxScaler extends Transformer[MinMaxScaler] {
+
+  var metricsOption: Option[DataSet[(linalg.Vector[Double], 
linalg.Vector[Double])]] = None
--- End diff --

Not right now, so these can remain. I was mostly concerned that this 
parameter was user-facing, meaning the user had to provide Breeze vectors as 
parameters, but that is not the case.


> Add Normaliser to ML library
> 
>
> Key: FLINK-1844
> URL: https://issues.apache.org/jira/browse/FLINK-1844
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Faye Beligianni
>Assignee: Faye Beligianni
>Priority: Minor
>  Labels: ML, Starter
>
> In many algorithms in ML, the features' values would be better to lie between 
> a given range of values, usually in the range (0,1) [1]. Therefore, a 
> {{Transformer}} could be implemented to achieve that normalisation.
> Resources: 
> [1][http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1844) Add Normaliser to ML library

2015-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577187#comment-14577187
 ] 

ASF GitHub Bot commented on FLINK-1844:
---

Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/798#discussion_r31913806
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/preprocessing/MinMaxScaler.scala
 ---
@@ -0,0 +1,254 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.ml.preprocessing
+
+import breeze.linalg
+import breeze.linalg.{max, min}
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.scala._
+import org.apache.flink.ml._
+import org.apache.flink.ml.common.{LabeledVector, Parameter, ParameterMap}
+import org.apache.flink.ml.math.Breeze._
+import org.apache.flink.ml.math.{BreezeVectorConverter, Vector}
+import org.apache.flink.ml.pipeline.{FitOperation, TransformOperation, 
Transformer}
+import org.apache.flink.ml.preprocessing.MinMaxScaler.{Max, Min}
+
+import scala.reflect.ClassTag
+
+/** Scales observations, so that all features are in a user-specified 
range.
+  * By default for [[MinMaxScaler]] transformer range = [0,1].
+  *
+  * This transformer takes a subtype of  [[Vector]] of values and maps it 
to a
+  * scaled subtype of [[Vector]] such that each feature lies between a 
user-specified range.
+  *
+  * This transformer can be prepended to all [[Transformer]] and
+  * [[org.apache.flink.ml.pipeline.Predictor]] implementations which 
expect as input a subtype
+  * of [[Vector]].
+  *
+  * @example
+  * {{{
+  *   val trainingDS: DataSet[Vector] = 
env.fromCollection(data)
+  *   val transformer = MinMaxScaler().setMin(-1.0)
+  *
+  *   transformer.fit(trainingDS)
+  *   val transformedDS = transformer.transform(trainingDS)
+  * }}}
+  *
+  * =Parameters=
+  *
+  * - [[Min]]: The minimum value of the range of the transformed data set; 
by default equal to 0
+  * - [[Max]]: The maximum value of the range of the transformed data set; 
by default
+  * equal to 1
+  */
+class MinMaxScaler extends Transformer[MinMaxScaler] {
+
+  var metricsOption: Option[DataSet[(linalg.Vector[Double], 
linalg.Vector[Double])]] = None
+
+  /** Sets the minimum for the range of the transformed data
+*
+* @param min the user-specified minimum value.
+* @return the MinMaxScaler instance with its minimum value set to the 
user-specified value.
+*/
+  def setMin(min: Double): MinMaxScaler = {
+parameters.add(Min, min)
+this
+  }
+
+  /** Sets the maximum for the range of the transformed data
+*
+* @param max the user-specified maximum value.
+* @return the MinMaxScaler instance with its minimum value set to the 
user-specified value.
+*/
+  def setMax(max: Double): MinMaxScaler = {
+parameters.add(Max, max)
+this
+  }
+}
+
+object MinMaxScaler {
+
+  // == Parameters 
=
+
+  case object Min extends Parameter[Double] {
+override val defaultValue: Option[Double] = Some(0.0)
+  }
+
+  case object Max extends Parameter[Double] {
+override val defaultValue: Option[Double] = Some(1.0)
+  }
+
+  //  Factory methods 
==
+
+  def apply(): MinMaxScaler = {
+new MinMaxScaler()
+  }
+
+  // == Operations 
=
+
+  /** Trains the [[org.apache.flink.ml.preprocessing.MinMaxScaler]] by 
learning the minimum and
+* maximum of each feature of the training data. These valu

[jira] [Commented] (FLINK-1844) Add Normaliser to ML library

2015-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577185#comment-14577185
 ] 

ASF GitHub Bot commented on FLINK-1844:
---

Github user fobeligi commented on a diff in the pull request:

https://github.com/apache/flink/pull/798#discussion_r31913634
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/preprocessing/MinMaxScaler.scala
 ---
@@ -0,0 +1,254 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.ml.preprocessing
+
+import breeze.linalg
+import breeze.linalg.{max, min}
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.scala._
+import org.apache.flink.ml._
+import org.apache.flink.ml.common.{LabeledVector, Parameter, ParameterMap}
+import org.apache.flink.ml.math.Breeze._
+import org.apache.flink.ml.math.{BreezeVectorConverter, Vector}
+import org.apache.flink.ml.pipeline.{FitOperation, TransformOperation, 
Transformer}
+import org.apache.flink.ml.preprocessing.MinMaxScaler.{Max, Min}
+
+import scala.reflect.ClassTag
+
+/** Scales observations, so that all features are in a user-specified 
range.
+  * By default for [[MinMaxScaler]] transformer range = [0,1].
+  *
+  * This transformer takes a subtype of  [[Vector]] of values and maps it 
to a
+  * scaled subtype of [[Vector]] such that each feature lies between a 
user-specified range.
+  *
+  * This transformer can be prepended to all [[Transformer]] and
+  * [[org.apache.flink.ml.pipeline.Predictor]] implementations which 
expect as input a subtype
+  * of [[Vector]].
+  *
+  * @example
+  * {{{
+  *   val trainingDS: DataSet[Vector] = 
env.fromCollection(data)
+  *   val transformer = MinMaxScaler().setMin(-1.0)
+  *
+  *   transformer.fit(trainingDS)
+  *   val transformedDS = transformer.transform(trainingDS)
+  * }}}
+  *
+  * =Parameters=
+  *
+  * - [[Min]]: The minimum value of the range of the transformed data set; 
by default equal to 0
+  * - [[Max]]: The maximum value of the range of the transformed data set; 
by default
+  * equal to 1
+  */
+class MinMaxScaler extends Transformer[MinMaxScaler] {
+
+  var metricsOption: Option[DataSet[(linalg.Vector[Double], 
linalg.Vector[Double])]] = None
--- End diff --

I am using metricsOption vectors internally in the transformer in 
elementwise subtraction and divisions, so instead of transforming to/from 
Breeze to flink.ml.math.Vector I have it as breeze.linalg.Vector. 
Can I perform the same operations with flink.ml.math.Vector, or do you 
believe that it would be better to perform the transformations (to/from breeze 
vectors) in the functions?


> Add Normaliser to ML library
> 
>
> Key: FLINK-1844
> URL: https://issues.apache.org/jira/browse/FLINK-1844
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Faye Beligianni
>Assignee: Faye Beligianni
>Priority: Minor
>  Labels: ML, Starter
>
> In many algorithms in ML, the features' values would be better to lie between 
> a given range of values, usually in the range (0,1) [1]. Therefore, a 
> {{Transformer}} could be implemented to achieve that normalisation.
> Resources: 
> [1][http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1844) Add Normaliser to ML library

2015-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577145#comment-14577145
 ] 

ASF GitHub Bot commented on FLINK-1844:
---

Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/798#discussion_r31911384
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/preprocessing/MinMaxScaler.scala
 ---
@@ -0,0 +1,254 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.ml.preprocessing
+
+import breeze.linalg
+import breeze.linalg.{max, min}
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.scala._
+import org.apache.flink.ml._
+import org.apache.flink.ml.common.{LabeledVector, Parameter, ParameterMap}
+import org.apache.flink.ml.math.Breeze._
+import org.apache.flink.ml.math.{BreezeVectorConverter, Vector}
+import org.apache.flink.ml.pipeline.{FitOperation, TransformOperation, 
Transformer}
+import org.apache.flink.ml.preprocessing.MinMaxScaler.{Max, Min}
+
+import scala.reflect.ClassTag
+
+/** Scales observations, so that all features are in a user-specified 
range.
+  * By default for [[MinMaxScaler]] transformer range = [0,1].
+  *
+  * This transformer takes a subtype of  [[Vector]] of values and maps it 
to a
+  * scaled subtype of [[Vector]] such that each feature lies between a 
user-specified range.
+  *
+  * This transformer can be prepended to all [[Transformer]] and
+  * [[org.apache.flink.ml.pipeline.Predictor]] implementations which 
expect as input a subtype
+  * of [[Vector]].
+  *
+  * @example
+  * {{{
+  *   val trainingDS: DataSet[Vector] = 
env.fromCollection(data)
+  *   val transformer = MinMaxScaler().setMin(-1.0)
+  *
+  *   transformer.fit(trainingDS)
+  *   val transformedDS = transformer.transform(trainingDS)
+  * }}}
+  *
+  * =Parameters=
+  *
+  * - [[Min]]: The minimum value of the range of the transformed data set; 
by default equal to 0
+  * - [[Max]]: The maximum value of the range of the transformed data set; 
by default
+  * equal to 1
+  */
+class MinMaxScaler extends Transformer[MinMaxScaler] {
+
+  var metricsOption: Option[DataSet[(linalg.Vector[Double], 
linalg.Vector[Double])]] = None
+
+  /** Sets the minimum for the range of the transformed data
+*
+* @param min the user-specified minimum value.
+* @return the MinMaxScaler instance with its minimum value set to the 
user-specified value.
+*/
+  def setMin(min: Double): MinMaxScaler = {
+parameters.add(Min, min)
+this
+  }
+
+  /** Sets the maximum for the range of the transformed data
+*
+* @param max the user-specified maximum value.
+* @return the MinMaxScaler instance with its minimum value set to the 
user-specified value.
+*/
+  def setMax(max: Double): MinMaxScaler = {
+parameters.add(Max, max)
+this
+  }
+}
+
+object MinMaxScaler {
+
+  // == Parameters 
=
+
+  case object Min extends Parameter[Double] {
+override val defaultValue: Option[Double] = Some(0.0)
+  }
+
+  case object Max extends Parameter[Double] {
+override val defaultValue: Option[Double] = Some(1.0)
+  }
+
+  //  Factory methods 
==
+
+  def apply(): MinMaxScaler = {
+new MinMaxScaler()
+  }
+
+  // == Operations 
=
+
+  /** Trains the [[org.apache.flink.ml.preprocessing.MinMaxScaler]] by 
learning the minimum and
+* maximum of each feature of the training data. These valu

[jira] [Commented] (FLINK-1844) Add Normaliser to ML library

2015-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577142#comment-14577142
 ] 

ASF GitHub Bot commented on FLINK-1844:
---

Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/798#discussion_r31911306
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/preprocessing/MinMaxScaler.scala
 ---
@@ -0,0 +1,254 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.ml.preprocessing
+
+import breeze.linalg
+import breeze.linalg.{max, min}
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.scala._
+import org.apache.flink.ml._
+import org.apache.flink.ml.common.{LabeledVector, Parameter, ParameterMap}
+import org.apache.flink.ml.math.Breeze._
+import org.apache.flink.ml.math.{BreezeVectorConverter, Vector}
+import org.apache.flink.ml.pipeline.{FitOperation, TransformOperation, 
Transformer}
+import org.apache.flink.ml.preprocessing.MinMaxScaler.{Max, Min}
+
+import scala.reflect.ClassTag
+
+/** Scales observations, so that all features are in a user-specified 
range.
+  * By default for [[MinMaxScaler]] transformer range = [0,1].
+  *
+  * This transformer takes a subtype of  [[Vector]] of values and maps it 
to a
+  * scaled subtype of [[Vector]] such that each feature lies between a 
user-specified range.
+  *
+  * This transformer can be prepended to all [[Transformer]] and
+  * [[org.apache.flink.ml.pipeline.Predictor]] implementations which 
expect as input a subtype
+  * of [[Vector]].
+  *
+  * @example
+  * {{{
+  *   val trainingDS: DataSet[Vector] = 
env.fromCollection(data)
+  *   val transformer = MinMaxScaler().setMin(-1.0)
+  *
+  *   transformer.fit(trainingDS)
+  *   val transformedDS = transformer.transform(trainingDS)
+  * }}}
+  *
+  * =Parameters=
+  *
+  * - [[Min]]: The minimum value of the range of the transformed data set; 
by default equal to 0
+  * - [[Max]]: The maximum value of the range of the transformed data set; 
by default
+  * equal to 1
+  */
+class MinMaxScaler extends Transformer[MinMaxScaler] {
+
+  var metricsOption: Option[DataSet[(linalg.Vector[Double], 
linalg.Vector[Double])]] = None
--- End diff --

Are these of breeze.linag.Vector type? If yes why not use 
flink.ml.math.Vector?


> Add Normaliser to ML library
> 
>
> Key: FLINK-1844
> URL: https://issues.apache.org/jira/browse/FLINK-1844
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Faye Beligianni
>Assignee: Faye Beligianni
>Priority: Minor
>  Labels: ML, Starter
>
> In many algorithms in ML, the features' values would be better to lie between 
> a given range of values, usually in the range (0,1) [1]. Therefore, a 
> {{Transformer}} could be implemented to achieve that normalisation.
> Resources: 
> [1][http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1844) Add Normaliser to ML library

2015-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577141#comment-14577141
 ] 

ASF GitHub Bot commented on FLINK-1844:
---

Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/798#discussion_r31911162
  
--- Diff: docs/libs/ml/minMax_scaler.md ---
@@ -0,0 +1,112 @@
+---
+mathjax: include
+htmlTitle: FlinkML - MinMax Scaler
+title: FlinkML - MinMax Scaler
+---
+
+
+* This will be replaced by the TOC
+{:toc}
+
+## Description
+
+ The MinMax scaler scales the given data set, so that all values will lie 
between a user specified range [min,max].
+ In case the user does not provide a specific minimum and maximum value 
for the scaling range, the MinMax scaler transforms the features of the input 
data set to lie in the [0,1] interval.
+ Given a set of input data $x_1, x_2,... x_n$, with minimum value:
+
+ $$x_{min} = min({x_1, x_2,..., x_n})$$
+
+ and maximum value:
+
+ $$x_{max} = max({x_1, x_2,..., x_n})$$
+
+The scaled data set $z_1, z_2,...,z_n$ will be:
+
+ $$z_{i}= \frac{x_{i} - x_{min}}{x_{max} - x_{min}} \left ( max - min 
\right ) + min$$
+
+where $\textit{min}$ and $\textit{max}$ are the user specified minimum and 
maximum values of the range to scale.
+
+## Operations
+
+`MinMaxScaler` is a `Transformer`.
+As such, it supports the `fit` and `transform` operation.
+
+### Fit
+
+MinMaxScaler is trained on all subtypes of `Vector` or `LabeledVector`:
+
+* `fit[T <: Vector]: DataSet[T] => Unit`
+* `fit: DataSet[LabeledVector] => Unit`
+
+### Transform
+
+MinMaxScaler transforms all subtypes of `Vector` or `LabeledVector` into 
the respective type:
+
+* `transform[T <: Vector]: DataSet[T] => DataSet[T]`
+* `transform: DataSet[LabeledVector] => DataSet[LabeledVector]`
+
+## Parameters
+
+The MinMax scaler implementation can be controlled by the following two 
parameters:
+
+ 
+  
+
+  Parameters
+  Description
+
+  
+
+  
+
+  Min
+  
+
+  The minimum value of the range for the scaled data set. (Default 
value: 0.0)
+
+  
+
+
+  Max
+  
+
+  The maximum value of the range for the scaled data set. (Default 
value: 1.0)
+
+  
+
+  
+
+
+## Examples
+
+{% highlight scala %}
+// Create MinMax scaler transformer
+val minMaxscaler = MinMaxScaler()
+.setMin(-1.0)
--- End diff --

Indent 2 spaces


> Add Normaliser to ML library
> 
>
> Key: FLINK-1844
> URL: https://issues.apache.org/jira/browse/FLINK-1844
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Faye Beligianni
>Assignee: Faye Beligianni
>Priority: Minor
>  Labels: ML, Starter
>
> In many algorithms in ML, the features' values would be better to lie between 
> a given range of values, usually in the range (0,1) [1]. Therefore, a 
> {{Transformer}} could be implemented to achieve that normalisation.
> Resources: 
> [1][http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1844) Add Normaliser to ML library

2015-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577140#comment-14577140
 ] 

ASF GitHub Bot commented on FLINK-1844:
---

Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/798#discussion_r31911138
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/preprocessing/MinMaxScaler.scala
 ---
@@ -0,0 +1,254 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.ml.preprocessing
+
+import breeze.linalg
+import breeze.linalg.{max, min}
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.scala._
+import org.apache.flink.ml._
+import org.apache.flink.ml.common.{LabeledVector, Parameter, ParameterMap}
+import org.apache.flink.ml.math.Breeze._
+import org.apache.flink.ml.math.{BreezeVectorConverter, Vector}
+import org.apache.flink.ml.pipeline.{FitOperation, TransformOperation, 
Transformer}
+import org.apache.flink.ml.preprocessing.MinMaxScaler.{Max, Min}
+
+import scala.reflect.ClassTag
+
+/** Scales observations, so that all features are in a user-specified 
range.
+  * By default for [[MinMaxScaler]] transformer range = [0,1].
+  *
+  * This transformer takes a subtype of  [[Vector]] of values and maps it 
to a
+  * scaled subtype of [[Vector]] such that each feature lies between a 
user-specified range.
+  *
+  * This transformer can be prepended to all [[Transformer]] and
+  * [[org.apache.flink.ml.pipeline.Predictor]] implementations which 
expect as input a subtype
+  * of [[Vector]].
--- End diff --

Doesn't LabedledVector apply here as well?


> Add Normaliser to ML library
> 
>
> Key: FLINK-1844
> URL: https://issues.apache.org/jira/browse/FLINK-1844
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Faye Beligianni
>Assignee: Faye Beligianni
>Priority: Minor
>  Labels: ML, Starter
>
> In many algorithms in ML, the features' values would be better to lie between 
> a given range of values, usually in the range (0,1) [1]. Therefore, a 
> {{Transformer}} could be implemented to achieve that normalisation.
> Resources: 
> [1][http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1844) Add Normaliser to ML library

2015-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577131#comment-14577131
 ] 

ASF GitHub Bot commented on FLINK-1844:
---

Github user thvasilo commented on the pull request:

https://github.com/apache/flink/pull/798#issuecomment-109983940
  
The documentation must also change index.html (FlinkML landing site) so 
that it is linked from somewhere.


> Add Normaliser to ML library
> 
>
> Key: FLINK-1844
> URL: https://issues.apache.org/jira/browse/FLINK-1844
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Faye Beligianni
>Assignee: Faye Beligianni
>Priority: Minor
>  Labels: ML, Starter
>
> In many algorithms in ML, the features' values would be better to lie between 
> a given range of values, usually in the range (0,1) [1]. Therefore, a 
> {{Transformer}} could be implemented to achieve that normalisation.
> Resources: 
> [1][http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1844) Add Normaliser to ML library

2015-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577122#comment-14577122
 ] 

ASF GitHub Bot commented on FLINK-1844:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/798#issuecomment-109982291
  
LGTM. Will merge once Travis gives green light.


> Add Normaliser to ML library
> 
>
> Key: FLINK-1844
> URL: https://issues.apache.org/jira/browse/FLINK-1844
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Faye Beligianni
>Assignee: Faye Beligianni
>Priority: Minor
>  Labels: ML, Starter
>
> In many algorithms in ML, the features' values would be better to lie between 
> a given range of values, usually in the range (0,1) [1]. Therefore, a 
> {{Transformer}} could be implemented to achieve that normalisation.
> Resources: 
> [1][http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1844) Add Normaliser to ML library

2015-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14576962#comment-14576962
 ] 

ASF GitHub Bot commented on FLINK-1844:
---

Github user thvasilo commented on the pull request:

https://github.com/apache/flink/pull/798#issuecomment-109937398
  
Note: you might want to rename this to *[FLINK-1844] [ml] - Add Normaliser 
to ML library* so that JIRA picks up on the issue.


> Add Normaliser to ML library
> 
>
> Key: FLINK-1844
> URL: https://issues.apache.org/jira/browse/FLINK-1844
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Faye Beligianni
>Assignee: Faye Beligianni
>Priority: Minor
>  Labels: ML, Starter
>
> In many algorithms in ML, the features' values would be better to lie between 
> a given range of values, usually in the range (0,1) [1]. Therefore, a 
> {{Transformer}} could be implemented to achieve that normalisation.
> Resources: 
> [1][http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1844) Add Normaliser to ML library

2015-06-07 Thread Faye Beligianni (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14576282#comment-14576282
 ] 

Faye Beligianni commented on FLINK-1844:


Hey [~tvas]
I opened a PR for the normaliser, which I named MinMaxScaler.  
Any comments are welcomed!  
Regarding the two tests that I wrote, I think that maybe they are too simple, 
as I am only checking if the numbers are in the user-specified range.
An attempt to cross check the result against a dataset of 
"expectedScaledVectors" would require to use the same method for calculating 
the "expectedScaledVectors" which I used in the implementation of the 
MinMaxScaler (wasn't sure if that would've been correct).

> Add Normaliser to ML library
> 
>
> Key: FLINK-1844
> URL: https://issues.apache.org/jira/browse/FLINK-1844
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Faye Beligianni
>Assignee: Faye Beligianni
>Priority: Minor
>  Labels: ML, Starter
>
> In many algorithms in ML, the features' values would be better to lie between 
> a given range of values, usually in the range (0,1) [1]. Therefore, a 
> {{Transformer}} could be implemented to achieve that normalisation.
> Resources: 
> [1][http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1844) Add Normaliser to ML library

2015-06-05 Thread Theodore Vasiloudis (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574166#comment-14574166
 ] 

Theodore Vasiloudis commented on FLINK-1844:


No worries [~fobeligi], thank you for your contribution. Keep us updated.

> Add Normaliser to ML library
> 
>
> Key: FLINK-1844
> URL: https://issues.apache.org/jira/browse/FLINK-1844
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Faye Beligianni
>Assignee: Faye Beligianni
>Priority: Minor
>  Labels: ML, Starter
>
> In many algorithms in ML, the features' values would be better to lie between 
> a given range of values, usually in the range (0,1) [1]. Therefore, a 
> {{Transformer}} could be implemented to achieve that normalisation.
> Resources: 
> [1][http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1844) Add Normaliser to ML library

2015-06-04 Thread Faye Beligianni (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573629#comment-14573629
 ] 

Faye Beligianni commented on FLINK-1844:


Hello [~tvas], 
I have implemented the main algorithm but I will have to migrate to the new ml 
pipeline and also create a test.
I am sorry for not looking at this issue for a while, I will finalise it during 
weekend.
 


> Add Normaliser to ML library
> 
>
> Key: FLINK-1844
> URL: https://issues.apache.org/jira/browse/FLINK-1844
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Faye Beligianni
>Assignee: Faye Beligianni
>Priority: Minor
>  Labels: ML, Starter
>
> In many algorithms in ML, the features' values would be better to lie between 
> a given range of values, usually in the range (0,1) [1]. Therefore, a 
> {{Transformer}} could be implemented to achieve that normalisation.
> Resources: 
> [1][http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1844) Add Normaliser to ML library

2015-06-04 Thread Theodore Vasiloudis (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572941#comment-14572941
 ] 

Theodore Vasiloudis commented on FLINK-1844:


Hello Faye, have you managed to make any progress on this?

We could really use it for the quickstart examples.

> Add Normaliser to ML library
> 
>
> Key: FLINK-1844
> URL: https://issues.apache.org/jira/browse/FLINK-1844
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Faye Beligianni
>Assignee: Faye Beligianni
>Priority: Minor
>  Labels: ML, Starter
>
> In many algorithms in ML, the features' values would be better to lie between 
> a given range of values, usually in the range (0,1) [1]. Therefore, a 
> {{Transformer}} could be implemented to achieve that normalisation.
> Resources: 
> [1][http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)