[GitHub] flink pull request #2178: [Flink-1815] Add methods to read and write a Graph...

2016-06-28 Thread fobeligi
Github user fobeligi commented on a diff in the pull request:

https://github.com/apache/flink/pull/2178#discussion_r68848969
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/Graph.java ---
@@ -408,6 +408,79 @@ public static GraphCsvReader fromCsvReader(String 
edgesPath, ExecutionEnvironmen
}
 
/**
+* Creates a graph from a Adjacency List text file  with Vertex Key 
values. Edges will be created automatically.
+*
+* @param filePath a path to an Adjacency List text file with the 
Vertex data
+* @param context  the execution environment.
+* @return An instance of {@link 
org.apache.flink.graph.GraphAdjacencyListReader},
+* on which calling methods to specify types of the Vertex ID, Vertex 
value and Edge value returns a Graph.
+*/
+   public static GraphAdjacencyListReader fromAdjacencyListFile(String 
filePath, ExecutionEnvironment context) {
+   return new GraphAdjacencyListReader(filePath, context);
+   }
+
+   /**
+* Writes a graph as an Adjacency List formatted text file in a user 
specified folder.
+*
+* @param filePath   the path that the Adjacency List formatted text 
file should be written in
+* @param delimiters the delimiters that separate the different value 
types in the Adjacency List formatted text
+*   file. Delimiters should be provided with the 
following order:
+*   NEIGHBOR_DELIMITER : separating source from its 
neighbors
+*   VERTICES_DELIMITER : separating the different 
neighbors of a source vertex
+*   VERTEX_VALUE_DELIMITER: separating the source 
vertex-id from the vertex value, as well as the
+*   target vertex-ids from the edge value.
+*/
+   public void writeAsAdjacencyList(String filePath, String... delimiters) 
{
+
+   final String NEIGHBOR_DELIMITER = delimiters.length > 0 ? 
delimiters[0] : "\t";
+
+   final String VERTICES_DELIMITER = delimiters.length > 1 ? 
delimiters[1] : ",";
+
+   final String VERTEX_VALUE_DELIMITER = delimiters.length > 1 ? 
delimiters[2] : "-";
--- End diff --

You mean the error in this declaration: 
```java
final String VERTEX_VALUE_DELIMITER = delimiters.length > 1 ? delimiters[2] 
: "-";
```
and not to check directly for length greater than two, because in that way 
the user will have to provide all three delimiters or none.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2178: [Flink-1815] Add methods to read and write a Graph...

2016-06-28 Thread fobeligi
Github user fobeligi commented on a diff in the pull request:

https://github.com/apache/flink/pull/2178#discussion_r68848469
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/Graph.java ---
@@ -408,6 +408,79 @@ public static GraphCsvReader fromCsvReader(String 
edgesPath, ExecutionEnvironmen
}
 
/**
+* Creates a graph from a Adjacency List text file  with Vertex Key 
values. Edges will be created automatically.
+*
+* @param filePath a path to an Adjacency List text file with the 
Vertex data
+* @param context  the execution environment.
+* @return An instance of {@link 
org.apache.flink.graph.GraphAdjacencyListReader},
+* on which calling methods to specify types of the Vertex ID, Vertex 
value and Edge value returns a Graph.
+*/
+   public static GraphAdjacencyListReader fromAdjacencyListFile(String 
filePath, ExecutionEnvironment context) {
+   return new GraphAdjacencyListReader(filePath, context);
+   }
+
+   /**
+* Writes a graph as an Adjacency List formatted text file in a user 
specified folder.
+*
+* @param filePath   the path that the Adjacency List formatted text 
file should be written in
+* @param delimiters the delimiters that separate the different value 
types in the Adjacency List formatted text
+*   file. Delimiters should be provided with the 
following order:
+*   NEIGHBOR_DELIMITER : separating source from its 
neighbors
+*   VERTICES_DELIMITER : separating the different 
neighbors of a source vertex
+*   VERTEX_VALUE_DELIMITER: separating the source 
vertex-id from the vertex value, as well as the
+*   target vertex-ids from the edge value.
+*/
+   public void writeAsAdjacencyList(String filePath, String... delimiters) 
{
+
+   final String NEIGHBOR_DELIMITER = delimiters.length > 0 ? 
delimiters[0] : "\t";
+
+   final String VERTICES_DELIMITER = delimiters.length > 1 ? 
delimiters[1] : ",";
+
+   final String VERTEX_VALUE_DELIMITER = delimiters.length > 1 ? 
delimiters[2] : "-";
+
+
+   DataSet<Tuple2<K, VV>> vertices = this.getVerticesAsTuple2();
+
+   DataSet<Tuple3<K, K, EV>> edgesNValues = 
this.getEdgesAsTuple3();
--- End diff --

As I see now, we don't have to convert the vertex set to tuple2 set, so I 
already changed that.

Regarding the edges dataset, in order to write the Adjacency List file, I 
use the coGroup transformation to the Vertex dataset and EdgesAsTuple3 dataset, 
where the vertexId equals the source of the edge. 

In that case, even when a Vertex is source to no edges (e.g. has only 
incoming edges), I can still have the vertexId in the "coGrouped" dataset (I 
couldn't do that with a join).

I can't think how I could use the Edge dataset in a coGroup or similar 
transformation. 
Please let me know if you have any suggestions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2178: [Flink-1815] Add methods to read and write a Graph...

2016-06-28 Thread fobeligi
Github user fobeligi commented on a diff in the pull request:

https://github.com/apache/flink/pull/2178#discussion_r68846112
  
--- Diff: 
flink-libraries/flink-gelly-scala/src/main/scala/org/apache/flink/graph/scala/Graph.scala
 ---
@@ -1127,8 +1194,7 @@ TypeInformation : ClassTag](jgraph: jg.Graph[K, VV, 
EV]) {
*
* @param analytic the analytic to run on the Graph
*/
-  def run[T: TypeInformation : ClassTag](analytic: GraphAnalytic[K, VV, 
EV, T]):
-  GraphAnalytic[K, VV, EV, T] = {
+  def run[T: TypeInformation : ClassTag](analytic: GraphAnalytic[K, VV, 
EV, T])= {
--- End diff --

No, I will revert the change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2178: [Flink-1815] Add methods to read and write a Graph...

2016-06-28 Thread fobeligi
GitHub user fobeligi opened a pull request:

https://github.com/apache/flink/pull/2178

[Flink-1815] Add methods to read and write a Graph as adjacency list

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [ ] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [ ] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [ ] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fobeligi/incubator-flink FLINK-1815

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2178.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2178


commit 3a9502da61b7758e1383803d5141a16fe3a5777a
Author: fobeligi <faybeligia...@gmail.com>
Date:   2016-06-22T16:11:23Z

[FLINK-1815] Add GraphAdjacencyListReader class to read an Adjacency List 
formatted text file. Moreover, add writeAsAdjacencyList method to Graph. Test 
cases are also added for each new method.

commit 8aab5b40e031b132c46782a5908d58cc6290892f
Author: fobeligi <faybeligia...@gmail.com>
Date:   2016-06-28T08:49:03Z

[FLINK-1815] Add fromAdjacencyListFile and writeAsAdjacencyList methods to 
Graph scala API. Tests are also added.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [Flink 1844] Add Normaliser to ML library

2015-06-08 Thread fobeligi
Github user fobeligi commented on a diff in the pull request:

https://github.com/apache/flink/pull/798#discussion_r31894794
  
--- Diff: docs/libs/ml/minMax_scaler.md ---
@@ -0,0 +1,113 @@
+---
+mathjax: include
+htmlTitle: FlinkML - MinMax Scaler
+title: a href=../mlFlinkML/a - MinMax Scaler
+---
+!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+License); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+--
+
+* This will be replaced by the TOC
+{:toc}
+
+## Description
+
+ The MinMax scaler scales the given data set, so that all values will lie 
between a user specified range [min,max].
+ In case the user does not provide a specific minimum and maximum value 
for the scaling range, the MinMax scaler transforms the features of the input 
data set to lie in the [0,1] interval.
+ Given a set of input data $x_1, x_2,... x_n$, with minimum value:
+
+ $$x_{min} = min({x_1, x_2,..., x_n})$$
+
+ and maximum value:
+
+ $$x_{max} = max({x_1, x_2,..., x_n})$$
+
+The scaled data set $z_1, z_2,...,z_n$ will be:
+
+ $$z_{i}= \frac{x_{i} - x_{min}}{x_{max} - x_{min}} \left ( max - min 
\right ) + min$$
+
+where $\textit{min}$ and $\textit{max}$ are the user specified minimum and 
maximum values of the range to scale.
+
+## Operations
+
+`MinMaxScaler` is a `Transformer`.
+As such, it supports the `fit` and `transform` operation.
+
+### Fit
+
+MinMaxScaler is trained on all subtypes of `Vector` or `LabeledVector`:
+
+* `fit[T : Vector]: DataSet[T] = Unit`
+* `fit: DataSet[LabeledVector] = Unit`
+
+### Transform
+
+MinMaxScaler transforms all subtypes of `Vector` or `LabeledVector` into 
the respective type:
+
+* `transform[T : Vector]: DataSet[T] = DataSet[T]`
+* `transform: DataSet[LabeledVector] = DataSet[LabeledVector]`
+
+## Parameters
+
+The MinMax scaler implementation can be controlled by the following two 
parameters:
+
+ table class=table table-bordered
+  thead
+tr
+  th class=text-left style=width: 20%Parameters/th
+  th class=text-centerDescription/th
+/tr
+  /thead
+
+  tbody
+tr
+  tdstrongMin/strong/td
+  td
+p
+  The minimum value of the range for the scaled data set. (Default 
value: strong0.0/strong)
+/p
+  /td
+/tr
+tr
+  tdstrongStd/strong/td
--- End diff --

Yes, you are right!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [Flink 1844] Add Normaliser to ML library

2015-06-08 Thread fobeligi
Github user fobeligi commented on a diff in the pull request:

https://github.com/apache/flink/pull/798#discussion_r31895883
  
--- Diff: 
flink-staging/flink-ml/src/test/scala/org/apache/flink/ml/preprocessing/MinMaxScalerITSuite.scala
 ---
@@ -0,0 +1,180 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.ml.preprocessing
+
+import breeze.linalg
+import org.apache.flink.api.scala._
+import org.apache.flink.ml.common.LabeledVector
+import org.apache.flink.ml.math.Breeze._
+import org.apache.flink.ml.math.{DenseVector, Vector}
+import org.apache.flink.test.util.FlinkTestBase
+import org.scalatest.{FlatSpec, Matchers}
+
+
+class MinMaxScalerITSuite
+  extends FlatSpec
+  with Matchers
+  with FlinkTestBase {
+
+  behavior of Flink's MinMax Scaler
+
+  import MinMaxScalerData._
+
+  it should scale the vectors' values to be restricted in the (0.0,1.0) 
range in {
+
+val env = ExecutionEnvironment.getExecutionEnvironment
+
+val dataSet = env.fromCollection(data)
+val minMaxScaler = MinMaxScaler()
+minMaxScaler.fit(dataSet)
+val scaledVectors = minMaxScaler.transform(dataSet).collect
+
+scaledVectors.length should equal(data.length)
+
+for (vector - scaledVectors) {
+  val test = vector.asBreeze.forall(fv = {
+fv = 0.0  fv = 1.0
--- End diff --

In this case I will use the same method as in the implementation of the 
transformer.
Calculating the min and max of each feature and then applying the formula 
which I explain in the documentation. Is that OK?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1844] [ml] Add Normaliser to ML library

2015-06-08 Thread fobeligi
Github user fobeligi commented on a diff in the pull request:

https://github.com/apache/flink/pull/798#discussion_r31913634
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/preprocessing/MinMaxScaler.scala
 ---
@@ -0,0 +1,254 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.ml.preprocessing
+
+import breeze.linalg
+import breeze.linalg.{max, min}
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.scala._
+import org.apache.flink.ml._
+import org.apache.flink.ml.common.{LabeledVector, Parameter, ParameterMap}
+import org.apache.flink.ml.math.Breeze._
+import org.apache.flink.ml.math.{BreezeVectorConverter, Vector}
+import org.apache.flink.ml.pipeline.{FitOperation, TransformOperation, 
Transformer}
+import org.apache.flink.ml.preprocessing.MinMaxScaler.{Max, Min}
+
+import scala.reflect.ClassTag
+
+/** Scales observations, so that all features are in a user-specified 
range.
+  * By default for [[MinMaxScaler]] transformer range = [0,1].
+  *
+  * This transformer takes a subtype of  [[Vector]] of values and maps it 
to a
+  * scaled subtype of [[Vector]] such that each feature lies between a 
user-specified range.
+  *
+  * This transformer can be prepended to all [[Transformer]] and
+  * [[org.apache.flink.ml.pipeline.Predictor]] implementations which 
expect as input a subtype
+  * of [[Vector]].
+  *
+  * @example
+  * {{{
+  *   val trainingDS: DataSet[Vector] = 
env.fromCollection(data)
+  *   val transformer = MinMaxScaler().setMin(-1.0)
+  *
+  *   transformer.fit(trainingDS)
+  *   val transformedDS = transformer.transform(trainingDS)
+  * }}}
+  *
+  * =Parameters=
+  *
+  * - [[Min]]: The minimum value of the range of the transformed data set; 
by default equal to 0
+  * - [[Max]]: The maximum value of the range of the transformed data set; 
by default
+  * equal to 1
+  */
+class MinMaxScaler extends Transformer[MinMaxScaler] {
+
+  var metricsOption: Option[DataSet[(linalg.Vector[Double], 
linalg.Vector[Double])]] = None
--- End diff --

I am using metricsOption vectors internally in the transformer in 
elementwise subtraction and divisions, so instead of transforming to/from 
Breeze to flink.ml.math.Vector I have it as breeze.linalg.Vector. 
Can I perform the same operations with flink.ml.math.Vector, or do you 
believe that it would be better to perform the transformations (to/from breeze 
vectors) in the functions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1844] [ml] Add Normaliser to ML library

2015-06-08 Thread fobeligi
Github user fobeligi commented on a diff in the pull request:

https://github.com/apache/flink/pull/798#discussion_r31924947
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/preprocessing/MinMaxScaler.scala
 ---
@@ -0,0 +1,254 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.ml.preprocessing
+
+import breeze.linalg
+import breeze.linalg.{max, min}
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.scala._
+import org.apache.flink.ml._
+import org.apache.flink.ml.common.{LabeledVector, Parameter, ParameterMap}
+import org.apache.flink.ml.math.Breeze._
+import org.apache.flink.ml.math.{BreezeVectorConverter, Vector}
+import org.apache.flink.ml.pipeline.{FitOperation, TransformOperation, 
Transformer}
+import org.apache.flink.ml.preprocessing.MinMaxScaler.{Max, Min}
+
+import scala.reflect.ClassTag
+
+/** Scales observations, so that all features are in a user-specified 
range.
+  * By default for [[MinMaxScaler]] transformer range = [0,1].
+  *
+  * This transformer takes a subtype of  [[Vector]] of values and maps it 
to a
+  * scaled subtype of [[Vector]] such that each feature lies between a 
user-specified range.
+  *
+  * This transformer can be prepended to all [[Transformer]] and
+  * [[org.apache.flink.ml.pipeline.Predictor]] implementations which 
expect as input a subtype
+  * of [[Vector]].
+  *
+  * @example
+  * {{{
+  *   val trainingDS: DataSet[Vector] = 
env.fromCollection(data)
+  *   val transformer = MinMaxScaler().setMin(-1.0)
+  *
+  *   transformer.fit(trainingDS)
+  *   val transformedDS = transformer.transform(trainingDS)
+  * }}}
+  *
+  * =Parameters=
+  *
+  * - [[Min]]: The minimum value of the range of the transformed data set; 
by default equal to 0
+  * - [[Max]]: The maximum value of the range of the transformed data set; 
by default
+  * equal to 1
+  */
+class MinMaxScaler extends Transformer[MinMaxScaler] {
+
+  var metricsOption: Option[DataSet[(linalg.Vector[Double], 
linalg.Vector[Double])]] = None
--- End diff --

Hey, if the {{metricsOption}} field is package private then my tests will 
fail, cause I am also testing in the {{MinMaxScalerITSuite}} if the min, max of 
each feature has been calculated correct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1844] [ml] Add Normaliser to ML library

2015-06-08 Thread fobeligi
Github user fobeligi commented on a diff in the pull request:

https://github.com/apache/flink/pull/798#discussion_r31927083
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/preprocessing/MinMaxScaler.scala
 ---
@@ -0,0 +1,254 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.ml.preprocessing
+
+import breeze.linalg
+import breeze.linalg.{max, min}
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.scala._
+import org.apache.flink.ml._
+import org.apache.flink.ml.common.{LabeledVector, Parameter, ParameterMap}
+import org.apache.flink.ml.math.Breeze._
+import org.apache.flink.ml.math.{BreezeVectorConverter, Vector}
+import org.apache.flink.ml.pipeline.{FitOperation, TransformOperation, 
Transformer}
+import org.apache.flink.ml.preprocessing.MinMaxScaler.{Max, Min}
+
+import scala.reflect.ClassTag
+
+/** Scales observations, so that all features are in a user-specified 
range.
+  * By default for [[MinMaxScaler]] transformer range = [0,1].
+  *
+  * This transformer takes a subtype of  [[Vector]] of values and maps it 
to a
+  * scaled subtype of [[Vector]] such that each feature lies between a 
user-specified range.
+  *
+  * This transformer can be prepended to all [[Transformer]] and
+  * [[org.apache.flink.ml.pipeline.Predictor]] implementations which 
expect as input a subtype
+  * of [[Vector]].
+  *
+  * @example
+  * {{{
+  *   val trainingDS: DataSet[Vector] = 
env.fromCollection(data)
+  *   val transformer = MinMaxScaler().setMin(-1.0)
+  *
+  *   transformer.fit(trainingDS)
+  *   val transformedDS = transformer.transform(trainingDS)
+  * }}}
+  *
+  * =Parameters=
+  *
+  * - [[Min]]: The minimum value of the range of the transformed data set; 
by default equal to 0
+  * - [[Max]]: The maximum value of the range of the transformed data set; 
by default
+  * equal to 1
+  */
+class MinMaxScaler extends Transformer[MinMaxScaler] {
+
+  var metricsOption: Option[DataSet[(linalg.Vector[Double], 
linalg.Vector[Double])]] = None
--- End diff --

Yes ^^


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [Flink 1844] Add Normaliser to ML library

2015-06-05 Thread fobeligi
GitHub user fobeligi opened a pull request:

https://github.com/apache/flink/pull/798

[Flink 1844] Add Normaliser to ML library

Adds a MinMaxScaler to the ML preprocessing package. MinMax scaler scales 
the values to a user-specified range.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fobeligi/incubator-flink FLINK-1844

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/798.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #798


commit 802b9da07a2c3f7c055b4c024aaecbbe647db1cd
Author: fobeligi faybeligia...@gmail.com
Date:   2015-06-05T21:12:43Z

[FLINK-1844] Add MinMaxScaler implementation in the proprocessing package, 
test for the for the corresponding functionality and documentation.

commit e639185108f9bda253e296bae4c6c4269a30d1d0
Author: fobeligi faybeligia...@gmail.com
Date:   2015-06-05T22:12:33Z

[FLINK-1844] Change second test to use LabeledVectors instead of Vectors




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Ml branch

2015-04-08 Thread fobeligi
GitHub user fobeligi opened a pull request:

https://github.com/apache/flink/pull/579

Ml branch

Implementation of StandardScaler and respective tests for FLINK-1809 JIRA.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fobeligi/incubator-flink ml-branch

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/579.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #579


commit 96cb2f5676e945d7bc414987934e5c854de70584
Author: fobeligi faybeligia...@gmail.com
Date:   2015-04-01T20:31:38Z

[FLINK-1809] Add Preprocessing package and Standardizer to ML-library

commit 2e8333b74e08f0c48bb58d36f2915a9ad832c456
Author: fobeligi faybeligia...@gmail.com
Date:   2015-04-03T16:52:35Z

[FLINK-1809] Change implementation to use Breeze.linalg library and add 
tests for Standardizer




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---