[jira] [Commented] (FLINK-1664) Forbid sorting on POJOs

2015-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384904#comment-14384904
 ] 

ASF GitHub Bot commented on FLINK-1664:
---

GitHub user fhueske opened a pull request:

https://github.com/apache/flink/pull/541

[FLINK-1664] Adds checks if selected sort key is sortable

- Adds checks if a sort key can be actually sorted. 
  - The POJO type is defined as non-sortable, because an order would depend 
on the undefined order of POJO fields. 
- Adds a few more tests for API sort functions

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fhueske/flink sortOnPojo

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/541.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #541


commit e26d934eb1b2c14298900c53e8413487ce43a17a
Author: Fabian Hueske 
Date:   2015-03-27T20:37:59Z

[FLINK-1664] Adds check if a selected sort key is sortable




> Forbid sorting on POJOs
> ---
>
> Key: FLINK-1664
> URL: https://issues.apache.org/jira/browse/FLINK-1664
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Affects Versions: 0.8.0, 0.9
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>
> Flink's groupSort, partitionSort, and outputSort operators allow to sort 
> partitions or groups of a DataSet.
> If the sort is defined on a POJO field, the sort order is not well defined. 
> Internally, the POJO is recursively decomposed into atomic fields (primitives 
> or generic types) and sorted by sorting these atomic fields. Thereby, the 
> order of these atomic fields is not well defined (I believe it is 
> lexicographic order of the POJO's member names).
> IMO, the best approach is to forbid sorting on POJO types for now. Instead, 
> it is always possible to select the nested fields of the POJO that should be 
> used for sorting. Later we can relax this restriction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1664] Adds checks if selected sort key ...

2015-03-27 Thread fhueske
GitHub user fhueske opened a pull request:

https://github.com/apache/flink/pull/541

[FLINK-1664] Adds checks if selected sort key is sortable

- Adds checks if a sort key can be actually sorted. 
  - The POJO type is defined as non-sortable, because an order would depend 
on the undefined order of POJO fields. 
- Adds a few more tests for API sort functions

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fhueske/flink sortOnPojo

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/541.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #541


commit e26d934eb1b2c14298900c53e8413487ce43a17a
Author: Fabian Hueske 
Date:   2015-03-27T20:37:59Z

[FLINK-1664] Adds check if a selected sort key is sortable




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1794] [test-utils] Adds test base for s...

2015-03-27 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/540#discussion_r27329656
  
--- Diff: 
flink-clients/src/main/java/org/apache/flink/client/program/Client.java ---
@@ -68,7 +68,7 @@

private final Optimizer compiler;   // the compiler to 
compile the jobs

-   private boolean printStatusDuringExecution = false;
--- End diff --

Is this an intentional change? Doesn't it change the behavior of all 
submission clients based on this class?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1794) Add test base for scalatest and adapt flink-ml test cases

2015-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384602#comment-14384602
 ] 

ASF GitHub Bot commented on FLINK-1794:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/540#discussion_r27329656
  
--- Diff: 
flink-clients/src/main/java/org/apache/flink/client/program/Client.java ---
@@ -68,7 +68,7 @@

private final Optimizer compiler;   // the compiler to 
compile the jobs

-   private boolean printStatusDuringExecution = false;
--- End diff --

Is this an intentional change? Doesn't it change the behavior of all 
submission clients based on this class?


> Add test base for scalatest and adapt flink-ml test cases
> -
>
> Key: FLINK-1794
> URL: https://issues.apache.org/jira/browse/FLINK-1794
> Project: Flink
>  Issue Type: Improvement
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>
> Currently, the flink-ml test cases use the standard {{ExecutionEnvironment}} 
> which can cause problems in parallel test executions as they happen on 
> Travis. For these tests it would be helpful to have an appropriate Scala test 
> base which instantiates a {{ForkableFlinkMiniCluster}} and sets the 
> {{ExecutionEnvironment}} appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-703) Use complete element as join key.

2015-03-27 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384529#comment-14384529
 ] 

Fabian Hueske commented on FLINK-703:
-

Hi, that's a very good example and shows that the description of this issue is 
incomplete ;-)
Key expressions (such as {{"*"}} and {{"_"}}) are supported for Pojo, Tuple, 
and CaseClass types.

If you would change your example in a way, that you join against DataSet of 
primitive types (such as DataSet) it would no longer work. Since 
primitive types are not composite, the only possible key expression is the 
wildcard which selects the full type. We handle this case in several places of 
the API as special case. See for example in 
[DataSink.java|https://github.com/apache/flink/blob/master/flink-java/src/main/java/org/apache/flink/api/java/operators/DataSink.java]
 in function {{sortLocalOutput(String fieldExpression, Order order)}} (around 
line 180).

So this issue would be mean to add similiar functionality to the key definition 
functions of join, coGroup, and grouping. For that, we need to check if the 
type is a valid key type ({{TypeInformation.isKeyType()}}) and if the type is 
an atomic type, set the key to int[]\{0\}


> Use complete element as join key.
> -
>
> Key: FLINK-703
> URL: https://issues.apache.org/jira/browse/FLINK-703
> Project: Flink
>  Issue Type: Improvement
>Reporter: GitHub Import
>Assignee: Chiwan Park
>Priority: Trivial
>  Labels: github-import
> Fix For: pre-apache
>
>
> In some situations such as semi-joins it could make sense to use a complete 
> element as join key. 
> Currently this can be done using a key-selector function, but we could offer 
> a shortcut for that.
> This is not an urgent issue, but might be helpful.
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/703
> Created by: [fhueske|https://github.com/fhueske]
> Labels: enhancement, java api, user satisfaction, 
> Milestone: Release 0.6 (unplanned)
> Created at: Thu Apr 17 23:40:00 CEST 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1796) Local mode TaskManager should have a process reaper

2015-03-27 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384458#comment-14384458
 ] 

Stephan Ewen commented on FLINK-1796:
-

I have a fix for this coming up...

> Local mode TaskManager should have a process reaper
> ---
>
> Key: FLINK-1796
> URL: https://issues.apache.org/jira/browse/FLINK-1796
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Affects Versions: 0.9
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 0.9
>
>
> We use process reaper actors (a typical Akka design pattern) to shut down the 
> JVM processes when the core actors die, as this is currently unrecoverable.
> The local mode uses the process reaper only for the JobManager actor, not for 
> the TaskManager actor. This may lead to dead stale JVMs on critical 
> TaskManager errors and makes debugging harder.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1796) Local mode TaskManager should have a process reaper

2015-03-27 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-1796:
---

 Summary: Local mode TaskManager should have a process reaper
 Key: FLINK-1796
 URL: https://issues.apache.org/jira/browse/FLINK-1796
 Project: Flink
  Issue Type: Bug
  Components: JobManager
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.9


We use process reaper actors (a typical Akka design pattern) to shut down the 
JVM processes when the core actors die, as this is currently unrecoverable.

The local mode uses the process reaper only for the JobManager actor, not for 
the TaskManager actor. This may lead to dead stale JVMs on critical TaskManager 
errors and makes debugging harder.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1795) Solution set allows duplicates upon construction.

2015-03-27 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384456#comment-14384456
 ] 

Stephan Ewen commented on FLINK-1795:
-

I have a fix and test coming up...

> Solution set allows duplicates upon construction.
> -
>
> Key: FLINK-1795
> URL: https://issues.apache.org/jira/browse/FLINK-1795
> Project: Flink
>  Issue Type: Bug
>  Components: Iterations
>Affects Versions: 0.9
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 0.9
>
>
> The solution set identifies entries uniquely by key.
> During construction, it does not eliminate duplicates. The duplicates do not 
> get updated during the iterations (since only the first match is considered), 
> but are contained in the final result.
> This contradicts the definition of the solution set. It should not contain 
> duplicates to begin with.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1754) Deadlock in job execution

2015-03-27 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384450#comment-14384450
 ] 

Stephan Ewen commented on FLINK-1754:
-

Is this issue resolved?

The issue with deadlocks in 0.8.1 is (I think) that the runtime does not obey 
the assumptions from the optimizer. The hash table building requires (for some 
reason) data availability on the probe side as well.

> Deadlock in job execution
> -
>
> Key: FLINK-1754
> URL: https://issues.apache.org/jira/browse/FLINK-1754
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 0.8.1
>Reporter: Sebastian Kruse
>
> I have encountered a reproducible deadlock in the execution of one of my 
> jobs. The part of the plan, where this happens, is the following:
> {code:java}
> /** Performs the reduction via creating transitive INDs and removing them 
> from the original IND set. */
> private DataSet> 
> calculateTransitiveReduction1(DataSet> 
> inclusionDependencies) {
> // Concatenate INDs (only one hop).
> DataSet> transitiveInds = inclusionDependencies
> .flatMap(new SplitInds())
> .joinWithTiny(inclusionDependencies)
> .where(1).equalTo(0)
> .with(new ConcatenateInds());
> // Remove the concatenated INDs to come up with a transitive 
> reduction of the INDs.
> return inclusionDependencies
> .coGroup(transitiveInds)
> .where(0).equalTo(0)
> .with(new RemoveTransitiveInds());
> }
> {code}
> Seemingly, the flatmap operator waits infinitely for a free buffer to write 
> on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Fix issue where Windows paths were not recogni...

2015-03-27 Thread balidani
Github user balidani commented on the pull request:

https://github.com/apache/flink/pull/491#issuecomment-87062926
  
So most of the jobs passed, except for  PROFILE="-Dhadoop.version=2.6.0 
-Dscala-2.11", where it says 

No output has been received in the last 10 minutes, this potentially 
indicates a stalled build or something wrong with the build itself.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (FLINK-1781) Quickstarts broken due to Scala Version Variables

2015-03-27 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1781.
-
Resolution: Fixed

Fixed in 1aba942c1fd7c4dbf1c4d4f30602d69c2cb3540e

> Quickstarts broken due to Scala Version Variables
> -
>
> Key: FLINK-1781
> URL: https://issues.apache.org/jira/browse/FLINK-1781
> Project: Flink
>  Issue Type: Bug
>  Components: Quickstarts
>Affects Versions: 0.9
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Blocker
> Fix For: 0.9
>
>
> The quickstart archetype resources refer to the scala version variables.
> When creating a maven project standalone, these variables are not defined, 
> and the pom is invalid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1783) Quickstart shading should not created shaded jar and dependency reduced pom

2015-03-27 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1783.
-
Resolution: Fixed
  Assignee: Stephan Ewen

Fixed in d11e0910880a48bbd5c452e4c76ffdca000f5614

> Quickstart shading should not created shaded jar and dependency reduced pom
> ---
>
> Key: FLINK-1783
> URL: https://issues.apache.org/jira/browse/FLINK-1783
> Project: Flink
>  Issue Type: Improvement
>  Components: Quickstarts
>Affects Versions: 0.9
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 0.9
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1795) Solution set allows duplicates upon construction.

2015-03-27 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-1795:
---

 Summary: Solution set allows duplicates upon construction.
 Key: FLINK-1795
 URL: https://issues.apache.org/jira/browse/FLINK-1795
 Project: Flink
  Issue Type: Bug
  Components: Iterations
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.9


The solution set identifies entries uniquely by key.

During construction, it does not eliminate duplicates. The duplicates do not 
get updated during the iterations (since only the first match is considered), 
but are contained in the final result.

This contradicts the definition of the solution set. It should not contain 
duplicates to begin with.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1718) Add sparse vector and sparse matrix types to machine learning library

2015-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384410#comment-14384410
 ] 

ASF GitHub Bot commented on FLINK-1718:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/539#issuecomment-87060638
  
I combined some of the sanity checks of the matrix/vector accessors.


> Add sparse vector and sparse matrix types to machine learning library
> -
>
> Key: FLINK-1718
> URL: https://issues.apache.org/jira/browse/FLINK-1718
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>  Labels: ML
>
> Currently, the machine learning library only supports dense matrix and dense 
> vectors. For future algorithms it would be beneficial to also support sparse 
> vectors and matrices.
> I'd propose to use the compressed sparse column (CSC) representation, because 
> it allows rather efficient operations compared to a map backed sparse 
> matrix/vector implementation. Furthermore, this is also the format the Breeze 
> library expects for sparse matrices/vectors. Thus, it is easy to convert to a 
> sparse breeze data structure which provides us with many linear algebra 
> operations.
> BIDMat [1] uses the same data representation.
> Resources:
> [1] [https://github.com/BIDData/BIDMat]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1718] Adds sparse matrix and sparse vec...

2015-03-27 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/539#issuecomment-87060638
  
I combined some of the sanity checks of the matrix/vector accessors.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1718] Adds sparse matrix and sparse vec...

2015-03-27 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/539#discussion_r27321365
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/math/SparseMatrix.scala
 ---
@@ -0,0 +1,235 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.math
+
+import scala.util.Sorting
+
+/** Sparse matrix using the compressed sparse column (CSC) representation.
+  *
+  * More details concerning the compressed sparse column (CSC) 
representation can be found
+  * 
[http://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_column_.28CSC_or_CCS.29].
+  *
+  * @param numRows Number of rows
+  * @param numCols Number of columns
+  * @param rowIndices Array containing the row indices of non-zero entries
+  * @param colPtrs Array containing the starting offsets in data for each 
column
+  * @param data Array containing the non-zero entries in column-major order
+  */
+class SparseMatrix(
+val numRows: Int,
+val numCols: Int,
+val rowIndices: Array[Int],
+val colPtrs: Array[Int],
+val data: Array[Double])
+  extends Matrix {
+
+  /** Element wise access function
+*
+* @param row row index
+* @param col column index
+* @return matrix entry at (row, col)
+*/
+  override def apply(row: Int, col: Int): Double = {
+
+val index = locate(row, col)
+
+if(index < 0){
+  0
+} else {
+ data(index)
+}
+  }
+
+  def toDenseMatrix: DenseMatrix = {
+val result = DenseMatrix.zeros(numRows, numCols)
+
+for(row <- 0 until numRows; col <- 0 until numCols) {
+  result(row, col) = apply(row, col)
+}
+
+result
+  }
+
+  /** Element wise update function
+*
+* @param row row index
+* @param col column index
+* @param value value to set at (row, col)
+*/
+  override def update(row: Int, col: Int, value: Double): Unit = {
+val index = locate(row, col)
+
+if(index < 0) {
+  throw new IllegalArgumentException("Cannot update zero value of 
sparse matrix at index " +
+  s"($row, $col)")
+} else {
+  data(index) = value
+}
+  }
+
+  override def toString: String = {
+val result = StringBuilder.newBuilder
+
+result.append(s"SparseMatrix($numRows, $numCols)\n")
+
+var columnIndex = 0
+
+val fieldWidth = math.max(numRows, numCols).toString.length
+val valueFieldWidth = data.map(_.toString.length).max + 2
+
+for(index <- 0 until colPtrs.last) {
+  while(colPtrs(columnIndex + 1) <= index){
+columnIndex += 1
+  }
+
+  val rowStr = rowIndices(index).toString
+  val columnStr = columnIndex.toString
+  val valueStr = data(index).toString
+
+  result.append("(" + " " * (fieldWidth - rowStr.length) + rowStr + 
"," +
+" " * (fieldWidth - columnStr.length) + columnStr + ")")
+  result.append(" " * (valueFieldWidth - valueStr.length) + valueStr)
+  result.append("\n")
+}
+
+result.toString
+  }
+
+  private def locate(row: Int, col: Int): Int = {
+require(0 <= row && row < numRows, s"Row $row is out of bounds [0, 
$numRows).")
+require(0 <= col && col < numCols, s"Col $col is out of bounds [0, 
$numCols).")
--- End diff --

Well but actually the string interpolations do not costs anything, because 
the message parameter is called by-name. Meaning that it is only evaluated in 
case of a false requirement.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1718) Add sparse vector and sparse matrix types to machine learning library

2015-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384372#comment-14384372
 ] 

ASF GitHub Bot commented on FLINK-1718:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/539#discussion_r27321365
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/math/SparseMatrix.scala
 ---
@@ -0,0 +1,235 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.math
+
+import scala.util.Sorting
+
+/** Sparse matrix using the compressed sparse column (CSC) representation.
+  *
+  * More details concerning the compressed sparse column (CSC) 
representation can be found
+  * 
[http://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_column_.28CSC_or_CCS.29].
+  *
+  * @param numRows Number of rows
+  * @param numCols Number of columns
+  * @param rowIndices Array containing the row indices of non-zero entries
+  * @param colPtrs Array containing the starting offsets in data for each 
column
+  * @param data Array containing the non-zero entries in column-major order
+  */
+class SparseMatrix(
+val numRows: Int,
+val numCols: Int,
+val rowIndices: Array[Int],
+val colPtrs: Array[Int],
+val data: Array[Double])
+  extends Matrix {
+
+  /** Element wise access function
+*
+* @param row row index
+* @param col column index
+* @return matrix entry at (row, col)
+*/
+  override def apply(row: Int, col: Int): Double = {
+
+val index = locate(row, col)
+
+if(index < 0){
+  0
+} else {
+ data(index)
+}
+  }
+
+  def toDenseMatrix: DenseMatrix = {
+val result = DenseMatrix.zeros(numRows, numCols)
+
+for(row <- 0 until numRows; col <- 0 until numCols) {
+  result(row, col) = apply(row, col)
+}
+
+result
+  }
+
+  /** Element wise update function
+*
+* @param row row index
+* @param col column index
+* @param value value to set at (row, col)
+*/
+  override def update(row: Int, col: Int, value: Double): Unit = {
+val index = locate(row, col)
+
+if(index < 0) {
+  throw new IllegalArgumentException("Cannot update zero value of 
sparse matrix at index " +
+  s"($row, $col)")
+} else {
+  data(index) = value
+}
+  }
+
+  override def toString: String = {
+val result = StringBuilder.newBuilder
+
+result.append(s"SparseMatrix($numRows, $numCols)\n")
+
+var columnIndex = 0
+
+val fieldWidth = math.max(numRows, numCols).toString.length
+val valueFieldWidth = data.map(_.toString.length).max + 2
+
+for(index <- 0 until colPtrs.last) {
+  while(colPtrs(columnIndex + 1) <= index){
+columnIndex += 1
+  }
+
+  val rowStr = rowIndices(index).toString
+  val columnStr = columnIndex.toString
+  val valueStr = data(index).toString
+
+  result.append("(" + " " * (fieldWidth - rowStr.length) + rowStr + 
"," +
+" " * (fieldWidth - columnStr.length) + columnStr + ")")
+  result.append(" " * (valueFieldWidth - valueStr.length) + valueStr)
+  result.append("\n")
+}
+
+result.toString
+  }
+
+  private def locate(row: Int, col: Int): Int = {
+require(0 <= row && row < numRows, s"Row $row is out of bounds [0, 
$numRows).")
+require(0 <= col && col < numCols, s"Col $col is out of bounds [0, 
$numCols).")
--- End diff --

Well but actually the string interpolations do not costs anything, because 
the message parameter is called by-name. Meaning that it is only evaluated in 
case of a false requirement.


> Add sparse vector and sparse matrix types to machine learning library

[GitHub] flink pull request: [FLINK-1718] Adds sparse matrix and sparse vec...

2015-03-27 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/539#discussion_r27321156
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/math/SparseMatrix.scala
 ---
@@ -0,0 +1,235 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.math
+
+import scala.util.Sorting
+
+/** Sparse matrix using the compressed sparse column (CSC) representation.
+  *
+  * More details concerning the compressed sparse column (CSC) 
representation can be found
+  * 
[http://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_column_.28CSC_or_CCS.29].
+  *
+  * @param numRows Number of rows
+  * @param numCols Number of columns
+  * @param rowIndices Array containing the row indices of non-zero entries
+  * @param colPtrs Array containing the starting offsets in data for each 
column
+  * @param data Array containing the non-zero entries in column-major order
+  */
+class SparseMatrix(
+val numRows: Int,
+val numCols: Int,
+val rowIndices: Array[Int],
+val colPtrs: Array[Int],
+val data: Array[Double])
+  extends Matrix {
+
+  /** Element wise access function
+*
+* @param row row index
+* @param col column index
+* @return matrix entry at (row, col)
+*/
+  override def apply(row: Int, col: Int): Double = {
+
+val index = locate(row, col)
+
+if(index < 0){
+  0
+} else {
+ data(index)
+}
+  }
+
+  def toDenseMatrix: DenseMatrix = {
+val result = DenseMatrix.zeros(numRows, numCols)
+
+for(row <- 0 until numRows; col <- 0 until numCols) {
+  result(row, col) = apply(row, col)
+}
+
+result
+  }
+
+  /** Element wise update function
+*
+* @param row row index
+* @param col column index
+* @param value value to set at (row, col)
+*/
+  override def update(row: Int, col: Int, value: Double): Unit = {
+val index = locate(row, col)
+
+if(index < 0) {
+  throw new IllegalArgumentException("Cannot update zero value of 
sparse matrix at index " +
+  s"($row, $col)")
+} else {
+  data(index) = value
+}
+  }
+
+  override def toString: String = {
+val result = StringBuilder.newBuilder
+
+result.append(s"SparseMatrix($numRows, $numCols)\n")
+
+var columnIndex = 0
+
+val fieldWidth = math.max(numRows, numCols).toString.length
+val valueFieldWidth = data.map(_.toString.length).max + 2
+
+for(index <- 0 until colPtrs.last) {
+  while(colPtrs(columnIndex + 1) <= index){
+columnIndex += 1
+  }
+
+  val rowStr = rowIndices(index).toString
+  val columnStr = columnIndex.toString
+  val valueStr = data(index).toString
+
+  result.append("(" + " " * (fieldWidth - rowStr.length) + rowStr + 
"," +
+" " * (fieldWidth - columnStr.length) + columnStr + ")")
+  result.append(" " * (valueFieldWidth - valueStr.length) + valueStr)
+  result.append("\n")
+}
+
+result.toString
+  }
+
+  private def locate(row: Int, col: Int): Int = {
+require(0 <= row && row < numRows, s"Row $row is out of bounds [0, 
$numRows).")
+require(0 <= col && col < numCols, s"Col $col is out of bounds [0, 
$numCols).")
--- End diff --

That's right. I'll probably remove the checks and expect the user to know 
what he's doing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1794) Add test base for scalatest and adapt flink-ml test cases

2015-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384362#comment-14384362
 ] 

ASF GitHub Bot commented on FLINK-1794:
---

GitHub user tillrohrmann opened a pull request:

https://github.com/apache/flink/pull/540

[FLINK-1794] [test-utils] Adds test base for scala tests and adapts 
existing flink-ml tests

The test base for scala tests is implemented as a trait. The trait expects 
as a self-type ```Suite```. Mixing this trait into a test suite will start a 
```ForkableFlinkMiniCluster``` upon class loading and stops the respective 
cluster after all tests have been run.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tillrohrmann/flink fixClusterMLITCases

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/540.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #540


commit 25542ddc1c88a92f5b50df7d3384ad2820560510
Author: Till Rohrmann 
Date:   2015-03-27T18:40:42Z

[FLINK-1794] [test-utils] Adds test base for scala tests and adapts 
existing flink-ml tests




> Add test base for scalatest and adapt flink-ml test cases
> -
>
> Key: FLINK-1794
> URL: https://issues.apache.org/jira/browse/FLINK-1794
> Project: Flink
>  Issue Type: Improvement
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>
> Currently, the flink-ml test cases use the standard {{ExecutionEnvironment}} 
> which can cause problems in parallel test executions as they happen on 
> Travis. For these tests it would be helpful to have an appropriate Scala test 
> base which instantiates a {{ForkableFlinkMiniCluster}} and sets the 
> {{ExecutionEnvironment}} appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1718) Add sparse vector and sparse matrix types to machine learning library

2015-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384365#comment-14384365
 ] 

ASF GitHub Bot commented on FLINK-1718:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/539#discussion_r27321156
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/math/SparseMatrix.scala
 ---
@@ -0,0 +1,235 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.math
+
+import scala.util.Sorting
+
+/** Sparse matrix using the compressed sparse column (CSC) representation.
+  *
+  * More details concerning the compressed sparse column (CSC) 
representation can be found
+  * 
[http://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_column_.28CSC_or_CCS.29].
+  *
+  * @param numRows Number of rows
+  * @param numCols Number of columns
+  * @param rowIndices Array containing the row indices of non-zero entries
+  * @param colPtrs Array containing the starting offsets in data for each 
column
+  * @param data Array containing the non-zero entries in column-major order
+  */
+class SparseMatrix(
+val numRows: Int,
+val numCols: Int,
+val rowIndices: Array[Int],
+val colPtrs: Array[Int],
+val data: Array[Double])
+  extends Matrix {
+
+  /** Element wise access function
+*
+* @param row row index
+* @param col column index
+* @return matrix entry at (row, col)
+*/
+  override def apply(row: Int, col: Int): Double = {
+
+val index = locate(row, col)
+
+if(index < 0){
+  0
+} else {
+ data(index)
+}
+  }
+
+  def toDenseMatrix: DenseMatrix = {
+val result = DenseMatrix.zeros(numRows, numCols)
+
+for(row <- 0 until numRows; col <- 0 until numCols) {
+  result(row, col) = apply(row, col)
+}
+
+result
+  }
+
+  /** Element wise update function
+*
+* @param row row index
+* @param col column index
+* @param value value to set at (row, col)
+*/
+  override def update(row: Int, col: Int, value: Double): Unit = {
+val index = locate(row, col)
+
+if(index < 0) {
+  throw new IllegalArgumentException("Cannot update zero value of 
sparse matrix at index " +
+  s"($row, $col)")
+} else {
+  data(index) = value
+}
+  }
+
+  override def toString: String = {
+val result = StringBuilder.newBuilder
+
+result.append(s"SparseMatrix($numRows, $numCols)\n")
+
+var columnIndex = 0
+
+val fieldWidth = math.max(numRows, numCols).toString.length
+val valueFieldWidth = data.map(_.toString.length).max + 2
+
+for(index <- 0 until colPtrs.last) {
+  while(colPtrs(columnIndex + 1) <= index){
+columnIndex += 1
+  }
+
+  val rowStr = rowIndices(index).toString
+  val columnStr = columnIndex.toString
+  val valueStr = data(index).toString
+
+  result.append("(" + " " * (fieldWidth - rowStr.length) + rowStr + 
"," +
+" " * (fieldWidth - columnStr.length) + columnStr + ")")
+  result.append(" " * (valueFieldWidth - valueStr.length) + valueStr)
+  result.append("\n")
+}
+
+result.toString
+  }
+
+  private def locate(row: Int, col: Int): Int = {
+require(0 <= row && row < numRows, s"Row $row is out of bounds [0, 
$numRows).")
+require(0 <= col && col < numCols, s"Col $col is out of bounds [0, 
$numCols).")
--- End diff --

That's right. I'll probably remove the checks and expect the user to know 
what he's doing.


> Add sparse vector and sparse matrix types to machine learning library
> -
>
>

[GitHub] flink pull request: [FLINK-1794] [test-utils] Adds test base for s...

2015-03-27 Thread tillrohrmann
GitHub user tillrohrmann opened a pull request:

https://github.com/apache/flink/pull/540

[FLINK-1794] [test-utils] Adds test base for scala tests and adapts 
existing flink-ml tests

The test base for scala tests is implemented as a trait. The trait expects 
as a self-type ```Suite```. Mixing this trait into a test suite will start a 
```ForkableFlinkMiniCluster``` upon class loading and stops the respective 
cluster after all tests have been run.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tillrohrmann/flink fixClusterMLITCases

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/540.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #540


commit 25542ddc1c88a92f5b50df7d3384ad2820560510
Author: Till Rohrmann 
Date:   2015-03-27T18:40:42Z

[FLINK-1794] [test-utils] Adds test base for scala tests and adapts 
existing flink-ml tests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (FLINK-1790) Remove the redundant import code

2015-03-27 Thread Henry Saputra (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Saputra resolved FLINK-1790.
--
   Resolution: Fixed
Fix Version/s: 0.9

PR merged to master. Thanks!

> Remove the redundant import code
> 
>
> Key: FLINK-1790
> URL: https://issues.apache.org/jira/browse/FLINK-1790
> Project: Flink
>  Issue Type: Improvement
>  Components: TaskManager
>Affects Versions: master
>Reporter: Sibao Hong
>Assignee: Sibao Hong
>Priority: Minor
> Fix For: 0.9
>
>
> Remove the redundant import code



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1790) Remove the redundant import code

2015-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384325#comment-14384325
 ] 

ASF GitHub Bot commented on FLINK-1790:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/538


> Remove the redundant import code
> 
>
> Key: FLINK-1790
> URL: https://issues.apache.org/jira/browse/FLINK-1790
> Project: Flink
>  Issue Type: Improvement
>  Components: TaskManager
>Affects Versions: master
>Reporter: Sibao Hong
>Assignee: Sibao Hong
>Priority: Minor
>
> Remove the redundant import code



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1790]Remove the redundant import code

2015-03-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/538


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-1794) Add test base for scalatest and adapt flink-ml test cases

2015-03-27 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-1794:


 Summary: Add test base for scalatest and adapt flink-ml test cases
 Key: FLINK-1794
 URL: https://issues.apache.org/jira/browse/FLINK-1794
 Project: Flink
  Issue Type: Improvement
Reporter: Till Rohrmann
Assignee: Till Rohrmann


Currently, the flink-ml test cases use the standard {{ExecutionEnvironment}} 
which can cause problems in parallel test executions as they happen on Travis. 
For these tests it would be helpful to have an appropriate Scala test base 
which instantiates a {{ForkableFlinkMiniCluster}} and sets the 
{{ExecutionEnvironment}} appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1782) Change Quickstart Java version to 1.7

2015-03-27 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1782.
-
   Resolution: Won't Fix
Fix Version/s: (was: 0.9)

> Change Quickstart Java version to 1.7
> -
>
> Key: FLINK-1782
> URL: https://issues.apache.org/jira/browse/FLINK-1782
> Project: Flink
>  Issue Type: Improvement
>  Components: Quickstarts
>Affects Versions: 0.9
>Reporter: Stephan Ewen
>
> The quickstarts refer to the outdated Java 1.6 source and bin version. We 
> should upgrade this to 1.7.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1782) Change Quickstart Java version to 1.7

2015-03-27 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384254#comment-14384254
 ] 

Stephan Ewen commented on FLINK-1782:
-

This is currently not possible, since the builds with Java 6 fail to run the 
quickstart tests when the quickstart pom specifies compiler version 1.7 (Java 7)

> Change Quickstart Java version to 1.7
> -
>
> Key: FLINK-1782
> URL: https://issues.apache.org/jira/browse/FLINK-1782
> Project: Flink
>  Issue Type: Improvement
>  Components: Quickstarts
>Affects Versions: 0.9
>Reporter: Stephan Ewen
>
> The quickstarts refer to the outdated Java 1.6 source and bin version. We 
> should upgrade this to 1.7.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1633) Add getTriplets() Gelly method

2015-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384210#comment-14384210
 ] 

ASF GitHub Bot commented on FLINK-1633:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/452#issuecomment-87024583
  
Ha! This seems like travis is pending but it has actually finished o.O

There is the following failure for hadoop1 / openjdk 6: 
`Failed tests: 
  ProcessFailureBatchRecoveryITCase.testTaskManagerProcessFailure:266 The 
program encountered a AssertionError : expected:<55> but 
was:<6239784976>`

Shall I ignore it and merge? Thanks!


> Add getTriplets() Gelly method
> --
>
> Key: FLINK-1633
> URL: https://issues.apache.org/jira/browse/FLINK-1633
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 0.9
>Reporter: Vasia Kalavri
>Assignee: Andra Lungu
>Priority: Minor
>  Labels: starter
>
> In some graph algorithms, it is required to access the graph edges together 
> with the vertex values of the source and target vertices. For example, 
> several graph weighting schemes compute some kind of similarity weights for 
> edges, based on the attributes of the source and target vertices. This issue 
> proposes adding a convenience Gelly method that generates a DataSet of 
>  triplets from the input graph.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1633][gelly] Added getTriplets() method...

2015-03-27 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/452#issuecomment-87024583
  
Ha! This seems like travis is pending but it has actually finished o.O

There is the following failure for hadoop1 / openjdk 6: 
`Failed tests: 
  ProcessFailureBatchRecoveryITCase.testTaskManagerProcessFailure:266 The 
program encountered a AssertionError : expected:<55> but 
was:<6239784976>`

Shall I ignore it and merge? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1718) Add sparse vector and sparse matrix types to machine learning library

2015-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384092#comment-14384092
 ] 

ASF GitHub Bot commented on FLINK-1718:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/539#issuecomment-86996068
  
Looks good to merge.
There are some performance improvements possible.


> Add sparse vector and sparse matrix types to machine learning library
> -
>
> Key: FLINK-1718
> URL: https://issues.apache.org/jira/browse/FLINK-1718
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>  Labels: ML
>
> Currently, the machine learning library only supports dense matrix and dense 
> vectors. For future algorithms it would be beneficial to also support sparse 
> vectors and matrices.
> I'd propose to use the compressed sparse column (CSC) representation, because 
> it allows rather efficient operations compared to a map backed sparse 
> matrix/vector implementation. Furthermore, this is also the format the Breeze 
> library expects for sparse matrices/vectors. Thus, it is easy to convert to a 
> sparse breeze data structure which provides us with many linear algebra 
> operations.
> BIDMat [1] uses the same data representation.
> Resources:
> [1] [https://github.com/BIDData/BIDMat]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1718] Adds sparse matrix and sparse vec...

2015-03-27 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/539#issuecomment-86996068
  
Looks good to merge.
There are some performance improvements possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1718) Add sparse vector and sparse matrix types to machine learning library

2015-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384080#comment-14384080
 ] 

ASF GitHub Bot commented on FLINK-1718:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/539#discussion_r27308257
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/math/SparseMatrix.scala
 ---
@@ -0,0 +1,235 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.math
+
+import scala.util.Sorting
+
+/** Sparse matrix using the compressed sparse column (CSC) representation.
+  *
+  * More details concerning the compressed sparse column (CSC) 
representation can be found
+  * 
[http://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_column_.28CSC_or_CCS.29].
+  *
+  * @param numRows Number of rows
+  * @param numCols Number of columns
+  * @param rowIndices Array containing the row indices of non-zero entries
+  * @param colPtrs Array containing the starting offsets in data for each 
column
+  * @param data Array containing the non-zero entries in column-major order
+  */
+class SparseMatrix(
+val numRows: Int,
+val numCols: Int,
+val rowIndices: Array[Int],
+val colPtrs: Array[Int],
+val data: Array[Double])
+  extends Matrix {
+
+  /** Element wise access function
+*
+* @param row row index
+* @param col column index
+* @return matrix entry at (row, col)
+*/
+  override def apply(row: Int, col: Int): Double = {
+
+val index = locate(row, col)
+
+if(index < 0){
+  0
+} else {
+ data(index)
+}
+  }
+
+  def toDenseMatrix: DenseMatrix = {
+val result = DenseMatrix.zeros(numRows, numCols)
+
+for(row <- 0 until numRows; col <- 0 until numCols) {
+  result(row, col) = apply(row, col)
+}
+
+result
+  }
+
+  /** Element wise update function
+*
+* @param row row index
+* @param col column index
+* @param value value to set at (row, col)
+*/
+  override def update(row: Int, col: Int, value: Double): Unit = {
+val index = locate(row, col)
+
+if(index < 0) {
+  throw new IllegalArgumentException("Cannot update zero value of 
sparse matrix at index " +
+  s"($row, $col)")
+} else {
+  data(index) = value
+}
+  }
+
+  override def toString: String = {
+val result = StringBuilder.newBuilder
+
+result.append(s"SparseMatrix($numRows, $numCols)\n")
+
+var columnIndex = 0
+
+val fieldWidth = math.max(numRows, numCols).toString.length
+val valueFieldWidth = data.map(_.toString.length).max + 2
+
+for(index <- 0 until colPtrs.last) {
+  while(colPtrs(columnIndex + 1) <= index){
+columnIndex += 1
+  }
+
+  val rowStr = rowIndices(index).toString
+  val columnStr = columnIndex.toString
+  val valueStr = data(index).toString
+
+  result.append("(" + " " * (fieldWidth - rowStr.length) + rowStr + 
"," +
+" " * (fieldWidth - columnStr.length) + columnStr + ")")
+  result.append(" " * (valueFieldWidth - valueStr.length) + valueStr)
+  result.append("\n")
+}
+
+result.toString
+  }
+
+  private def locate(row: Int, col: Int): Int = {
+require(0 <= row && row < numRows, s"Row $row is out of bounds [0, 
$numRows).")
+require(0 <= col && col < numCols, s"Col $col is out of bounds [0, 
$numCols).")
--- End diff --

Many ops in this class seem to use this method. I'm not sure how expensive 
the string interpolation is.


> Add sparse vector and sparse matrix types to machine learning library
> -
>
>   

[GitHub] flink pull request: [FLINK-1718] Adds sparse matrix and sparse vec...

2015-03-27 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/539#discussion_r27308257
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/math/SparseMatrix.scala
 ---
@@ -0,0 +1,235 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.math
+
+import scala.util.Sorting
+
+/** Sparse matrix using the compressed sparse column (CSC) representation.
+  *
+  * More details concerning the compressed sparse column (CSC) 
representation can be found
+  * 
[http://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_column_.28CSC_or_CCS.29].
+  *
+  * @param numRows Number of rows
+  * @param numCols Number of columns
+  * @param rowIndices Array containing the row indices of non-zero entries
+  * @param colPtrs Array containing the starting offsets in data for each 
column
+  * @param data Array containing the non-zero entries in column-major order
+  */
+class SparseMatrix(
+val numRows: Int,
+val numCols: Int,
+val rowIndices: Array[Int],
+val colPtrs: Array[Int],
+val data: Array[Double])
+  extends Matrix {
+
+  /** Element wise access function
+*
+* @param row row index
+* @param col column index
+* @return matrix entry at (row, col)
+*/
+  override def apply(row: Int, col: Int): Double = {
+
+val index = locate(row, col)
+
+if(index < 0){
+  0
+} else {
+ data(index)
+}
+  }
+
+  def toDenseMatrix: DenseMatrix = {
+val result = DenseMatrix.zeros(numRows, numCols)
+
+for(row <- 0 until numRows; col <- 0 until numCols) {
+  result(row, col) = apply(row, col)
+}
+
+result
+  }
+
+  /** Element wise update function
+*
+* @param row row index
+* @param col column index
+* @param value value to set at (row, col)
+*/
+  override def update(row: Int, col: Int, value: Double): Unit = {
+val index = locate(row, col)
+
+if(index < 0) {
+  throw new IllegalArgumentException("Cannot update zero value of 
sparse matrix at index " +
+  s"($row, $col)")
+} else {
+  data(index) = value
+}
+  }
+
+  override def toString: String = {
+val result = StringBuilder.newBuilder
+
+result.append(s"SparseMatrix($numRows, $numCols)\n")
+
+var columnIndex = 0
+
+val fieldWidth = math.max(numRows, numCols).toString.length
+val valueFieldWidth = data.map(_.toString.length).max + 2
+
+for(index <- 0 until colPtrs.last) {
+  while(colPtrs(columnIndex + 1) <= index){
+columnIndex += 1
+  }
+
+  val rowStr = rowIndices(index).toString
+  val columnStr = columnIndex.toString
+  val valueStr = data(index).toString
+
+  result.append("(" + " " * (fieldWidth - rowStr.length) + rowStr + 
"," +
+" " * (fieldWidth - columnStr.length) + columnStr + ")")
+  result.append(" " * (valueFieldWidth - valueStr.length) + valueStr)
+  result.append("\n")
+}
+
+result.toString
+  }
+
+  private def locate(row: Int, col: Int): Int = {
+require(0 <= row && row < numRows, s"Row $row is out of bounds [0, 
$numRows).")
+require(0 <= col && col < numCols, s"Col $col is out of bounds [0, 
$numCols).")
--- End diff --

Many ops in this class seem to use this method. I'm not sure how expensive 
the string interpolation is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1718) Add sparse vector and sparse matrix types to machine learning library

2015-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384070#comment-14384070
 ] 

ASF GitHub Bot commented on FLINK-1718:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/539#discussion_r27307815
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/math/DenseVector.scala
 ---
@@ -67,7 +63,25 @@ case class DenseVector(val values: Array[Double]) 
extends Vector {
* @return Copy of the vector instance
*/
   override def copy: Vector = {
-DenseVector(values.clone())
+DenseVector(data.clone())
+  }
+
+  /** Updates the element at the given index with the provided value
+*
+* @param index
+* @param value
+*/
+  override def update(index: Int, value: Double): Unit = {
+require(0 <= index && index < data.length, s"Index $index is out of 
bounds " +
--- End diff --

Might be inefficient.


> Add sparse vector and sparse matrix types to machine learning library
> -
>
> Key: FLINK-1718
> URL: https://issues.apache.org/jira/browse/FLINK-1718
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>  Labels: ML
>
> Currently, the machine learning library only supports dense matrix and dense 
> vectors. For future algorithms it would be beneficial to also support sparse 
> vectors and matrices.
> I'd propose to use the compressed sparse column (CSC) representation, because 
> it allows rather efficient operations compared to a map backed sparse 
> matrix/vector implementation. Furthermore, this is also the format the Breeze 
> library expects for sparse matrices/vectors. Thus, it is easy to convert to a 
> sparse breeze data structure which provides us with many linear algebra 
> operations.
> BIDMat [1] uses the same data representation.
> Resources:
> [1] [https://github.com/BIDData/BIDMat]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1718] Adds sparse matrix and sparse vec...

2015-03-27 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/539#discussion_r27307815
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/math/DenseVector.scala
 ---
@@ -67,7 +63,25 @@ case class DenseVector(val values: Array[Double]) 
extends Vector {
* @return Copy of the vector instance
*/
   override def copy: Vector = {
-DenseVector(values.clone())
+DenseVector(data.clone())
+  }
+
+  /** Updates the element at the given index with the provided value
+*
+* @param index
+* @param value
+*/
+  override def update(index: Int, value: Double): Unit = {
+require(0 <= index && index < data.length, s"Index $index is out of 
bounds " +
--- End diff --

Might be inefficient.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1650] Configure Netty (akka) to use Slf...

2015-03-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/518


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1589) Add option to pass Configuration to LocalExecutor

2015-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384021#comment-14384021
 ] 

ASF GitHub Bot commented on FLINK-1589:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/427#issuecomment-86979441
  
I've updated the PR. It is now ready for review again.


> Add option to pass Configuration to LocalExecutor
> -
>
> Key: FLINK-1589
> URL: https://issues.apache.org/jira/browse/FLINK-1589
> Project: Flink
>  Issue Type: New Feature
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>
> Right now its not possible for users to pass custom configuration values to 
> Flink when running it from within an IDE.
> It would be very convenient to be able to create a local execution 
> environment that allows passing configuration files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1589] Add option to pass configuration ...

2015-03-27 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/427#issuecomment-86979441
  
I've updated the PR. It is now ready for review again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (FLINK-1650) Suppress Akka's Netty Shutdown Errors through the log config

2015-03-27 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger resolved FLINK-1650.
---
Resolution: Fixed

Resolved in http://git-wip-us.apache.org/repos/asf/flink/commit/c9d29f26

> Suppress Akka's Netty Shutdown Errors through the log config
> 
>
> Key: FLINK-1650
> URL: https://issues.apache.org/jira/browse/FLINK-1650
> Project: Flink
>  Issue Type: Bug
>  Components: other
>Affects Versions: 0.9
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 0.9
>
>
> I suggest to set the logging for 
> `org.jboss.netty.channel.DefaultChannelPipeline` to error, in order to get 
> rid of the misleading stack trace caused by an akka/netty hickup on shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1501) Integrate metrics library and report basic metrics to JobManager web interface

2015-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384009#comment-14384009
 ] 

ASF GitHub Bot commented on FLINK-1501:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/421


> Integrate metrics library and report basic metrics to JobManager web interface
> --
>
> Key: FLINK-1501
> URL: https://issues.apache.org/jira/browse/FLINK-1501
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Robert Metzger
> Fix For: 0.9
>
>
> As per mailing list, the library: https://github.com/dropwizard/metrics
> The goal of this task is to get the basic infrastructure in place.
> Subsequent issues will integrate more features into the system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1650) Suppress Akka's Netty Shutdown Errors through the log config

2015-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384008#comment-14384008
 ] 

ASF GitHub Bot commented on FLINK-1650:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/518


> Suppress Akka's Netty Shutdown Errors through the log config
> 
>
> Key: FLINK-1650
> URL: https://issues.apache.org/jira/browse/FLINK-1650
> Project: Flink
>  Issue Type: Bug
>  Components: other
>Affects Versions: 0.9
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 0.9
>
>
> I suggest to set the logging for 
> `org.jboss.netty.channel.DefaultChannelPipeline` to error, in order to get 
> rid of the misleading stack trace caused by an akka/netty hickup on shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1501] Add metrics library for monitorin...

2015-03-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/421


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-1771) Add support for submitting single jobs to a detached YARN session

2015-03-27 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-1771:
--
Summary: Add support for submitting single jobs to a detached YARN session  
(was: Add support for submitting single jobs detached YARN)

> Add support for submitting single jobs to a detached YARN session
> -
>
> Key: FLINK-1771
> URL: https://issues.apache.org/jira/browse/FLINK-1771
> Project: Flink
>  Issue Type: Improvement
>  Components: YARN Client
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>
> We need tests ensuring that the processing slots are set properly when 
> starting Flink on YARN, in particular with the per job YARN session feature.
> Also, the YARN tests for detached YARN sessions / per job yarn clusters are 
> polluting the local home-directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-1771) Add support for submitting single jobs detached YARN

2015-03-27 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-1771:
--
Summary: Add support for submitting single jobs detached YARN  (was: Add 
tests for setting the processing slots for the YARN client)

> Add support for submitting single jobs detached YARN
> 
>
> Key: FLINK-1771
> URL: https://issues.apache.org/jira/browse/FLINK-1771
> Project: Flink
>  Issue Type: Improvement
>  Components: YARN Client
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>
> We need tests ensuring that the processing slots are set properly when 
> starting Flink on YARN, in particular with the per job YARN session feature.
> Also, the YARN tests for detached YARN sessions / per job yarn clusters are 
> polluting the local home-directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1793) Streaming jobs can't be canceled

2015-03-27 Thread JIRA

[ 
https://issues.apache.org/jira/browse/FLINK-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14383878#comment-14383878
 ] 

Márton Balassi commented on FLINK-1793:
---

Just cancelled a streaming job successfully after running it for 10 minutes on 
10 taskmanagers. Cancellation took around 8-10 seconds in this case (the use 
case is the infamous chained mapper though).

> Streaming jobs can't be canceled
> 
>
> Key: FLINK-1793
> URL: https://issues.apache.org/jira/browse/FLINK-1793
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 0.9
>Reporter: Maximilian Michels
> Fix For: 0.9
>
>
> The streaming WordCount gets stuck in the "CANCELED" state after it has been 
> canceled using either the web interface or the command line interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1793) Streaming jobs can't be canceled

2015-03-27 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14383850#comment-14383850
 ] 

Maximilian Michels commented on FLINK-1793:
---

Some minutes. Can't give you an exact time period though. The parallelism was 
set to 20 in the experiments. 10 task managers with two task slots each.

> Streaming jobs can't be canceled
> 
>
> Key: FLINK-1793
> URL: https://issues.apache.org/jira/browse/FLINK-1793
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 0.9
>Reporter: Maximilian Michels
> Fix For: 0.9
>
>
> The streaming WordCount gets stuck in the "CANCELED" state after it has been 
> canceled using either the web interface or the command line interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1650] Configure Netty (akka) to use Slf...

2015-03-27 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/518#issuecomment-86944772
  
I'll merge the change


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-1781) Quickstarts broken due to Scala Version Variables

2015-03-27 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-1781:
--
Assignee: Stephan Ewen  (was: Robert Metzger)

> Quickstarts broken due to Scala Version Variables
> -
>
> Key: FLINK-1781
> URL: https://issues.apache.org/jira/browse/FLINK-1781
> Project: Flink
>  Issue Type: Bug
>  Components: Quickstarts
>Affects Versions: 0.9
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Blocker
> Fix For: 0.9
>
>
> The quickstart archetype resources refer to the scala version variables.
> When creating a maven project standalone, these variables are not defined, 
> and the pom is invalid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1650) Suppress Akka's Netty Shutdown Errors through the log config

2015-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14383851#comment-14383851
 ] 

ASF GitHub Bot commented on FLINK-1650:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/518#issuecomment-86944772
  
I'll merge the change


> Suppress Akka's Netty Shutdown Errors through the log config
> 
>
> Key: FLINK-1650
> URL: https://issues.apache.org/jira/browse/FLINK-1650
> Project: Flink
>  Issue Type: Bug
>  Components: other
>Affects Versions: 0.9
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 0.9
>
>
> I suggest to set the logging for 
> `org.jboss.netty.channel.DefaultChannelPipeline` to error, in order to get 
> rid of the misleading stack trace caused by an akka/netty hickup on shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1793) Streaming jobs can't be canceled

2015-03-27 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14383829#comment-14383829
 ] 

Robert Metzger commented on FLINK-1793:
---

How long did you wait after cancelling the Job?
With higher parallelism, this can take minutes :(

> Streaming jobs can't be canceled
> 
>
> Key: FLINK-1793
> URL: https://issues.apache.org/jira/browse/FLINK-1793
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 0.9
>Reporter: Maximilian Michels
> Fix For: 0.9
>
>
> The streaming WordCount gets stuck in the "CANCELED" state after it has been 
> canceled using either the web interface or the command line interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1790) Remove the redundant import code

2015-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14383782#comment-14383782
 ] 

ASF GitHub Bot commented on FLINK-1790:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/538#issuecomment-86930773
  
:+1:


> Remove the redundant import code
> 
>
> Key: FLINK-1790
> URL: https://issues.apache.org/jira/browse/FLINK-1790
> Project: Flink
>  Issue Type: Improvement
>  Components: TaskManager
>Affects Versions: master
>Reporter: Sibao Hong
>Assignee: Sibao Hong
>Priority: Minor
>
> Remove the redundant import code



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1790]Remove the redundant import code

2015-03-27 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/538#issuecomment-86930773
  
:+1:


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1669) Streaming tests for recovery with distributed process failure

2015-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14383767#comment-14383767
 ] 

ASF GitHub Bot commented on FLINK-1669:
---

Github user mbalassi commented on the pull request:

https://github.com/apache/flink/pull/496#issuecomment-86927956
  
When merging this please put the outputs to a TempFolder, right now they 
are only deleted if the check of the output is reached.


> Streaming tests for recovery with distributed process failure
> -
>
> Key: FLINK-1669
> URL: https://issues.apache.org/jira/browse/FLINK-1669
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Affects Versions: 0.9
>Reporter: Márton Balassi
>Assignee: Márton Balassi
> Fix For: 0.9
>
>
> Multiple JVM test for streaming recovery from failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1669] [wip] Test for streaming recovery...

2015-03-27 Thread mbalassi
Github user mbalassi commented on the pull request:

https://github.com/apache/flink/pull/496#issuecomment-86927956
  
When merging this please put the outputs to a TempFolder, right now they 
are only deleted if the check of the output is reached.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request:

2015-03-27 Thread mbalassi
Github user mbalassi commented on the pull request:


https://github.com/apache/flink/commit/83db1db1c93b45943a788a8bd61f023b389d4af2#commitcomment-10430523
  
When merging this please put the outputs to a `TempFolder`, right now they 
are only deleted if the check of the output is reached.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-1792) Improve TM Monitoring: CPU utilization, hide graphs by default and show summary only

2015-03-27 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-1792:
--
Fix Version/s: (was: pre-apache)

> Improve TM Monitoring: CPU utilization, hide graphs by default and show 
> summary only
> 
>
> Key: FLINK-1792
> URL: https://issues.apache.org/jira/browse/FLINK-1792
> Project: Flink
>  Issue Type: Sub-task
>  Components: Webfrontend
>Affects Versions: 0.9
>Reporter: Robert Metzger
>
> As per https://github.com/apache/flink/pull/421 from FLINK-1501, there are 
> some enhancements to the current monitoring required
> - Get the CPU utilization in % from each TaskManager process
> - Remove the metrics graph from the overview and only show the current stats 
> as numbers (cpu load, heap utilization) and add a button to enable the 
> detailed graph.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1793) Streaming jobs can't be canceled

2015-03-27 Thread Maximilian Michels (JIRA)
Maximilian Michels created FLINK-1793:
-

 Summary: Streaming jobs can't be canceled
 Key: FLINK-1793
 URL: https://issues.apache.org/jira/browse/FLINK-1793
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.9
Reporter: Maximilian Michels
 Fix For: 0.9


The streaming WordCount gets stuck in the "CANCELED" state after it has been 
canceled using either the web interface or the command line interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1501) Integrate metrics library and report basic metrics to JobManager web interface

2015-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14383646#comment-14383646
 ] 

ASF GitHub Bot commented on FLINK-1501:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/421#issuecomment-86898969
  
I've filed a JIRA for the changes requested here: 
https://issues.apache.org/jira/browse/FLINK-1792


> Integrate metrics library and report basic metrics to JobManager web interface
> --
>
> Key: FLINK-1501
> URL: https://issues.apache.org/jira/browse/FLINK-1501
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Robert Metzger
> Fix For: 0.9
>
>
> As per mailing list, the library: https://github.com/dropwizard/metrics
> The goal of this task is to get the basic infrastructure in place.
> Subsequent issues will integrate more features into the system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1792) Improve TM Monitoring: CPU utilization, hide graphs by default and show summary only

2015-03-27 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-1792:
-

 Summary: Improve TM Monitoring: CPU utilization, hide graphs by 
default and show summary only
 Key: FLINK-1792
 URL: https://issues.apache.org/jira/browse/FLINK-1792
 Project: Flink
  Issue Type: Sub-task
  Components: Webfrontend
Affects Versions: 0.9
Reporter: Robert Metzger


As per https://github.com/apache/flink/pull/421 from FLINK-1501, there are some 
enhancements to the current monitoring required

- Get the CPU utilization in % from each TaskManager process
- Remove the metrics graph from the overview and only show the current stats as 
numbers (cpu load, heap utilization) and add a button to enable the detailed 
graph.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1501] Add metrics library for monitorin...

2015-03-27 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/421#issuecomment-86898969
  
I've filed a JIRA for the changes requested here: 
https://issues.apache.org/jira/browse/FLINK-1792


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1501) Integrate metrics library and report basic metrics to JobManager web interface

2015-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14383645#comment-14383645
 ] 

ASF GitHub Bot commented on FLINK-1501:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/421#issuecomment-86898299
  
Hey @bhatsachin,
I've merged the change to master.

If you want, we can do a quick hangout or skype call to discuss potential 
contributions from your side.


> Integrate metrics library and report basic metrics to JobManager web interface
> --
>
> Key: FLINK-1501
> URL: https://issues.apache.org/jira/browse/FLINK-1501
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Robert Metzger
> Fix For: 0.9
>
>
> As per mailing list, the library: https://github.com/dropwizard/metrics
> The goal of this task is to get the basic infrastructure in place.
> Subsequent issues will integrate more features into the system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1501) Integrate metrics library and report basic metrics to JobManager web interface

2015-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14383644#comment-14383644
 ] 

ASF GitHub Bot commented on FLINK-1501:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/421#issuecomment-86898282
  
Hey @bhatsachin,
I've merged the change to master.

If you want, we can do a quick hangout or skype call to discuss potential 
contributions from your side.


> Integrate metrics library and report basic metrics to JobManager web interface
> --
>
> Key: FLINK-1501
> URL: https://issues.apache.org/jira/browse/FLINK-1501
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Robert Metzger
> Fix For: 0.9
>
>
> As per mailing list, the library: https://github.com/dropwizard/metrics
> The goal of this task is to get the basic infrastructure in place.
> Subsequent issues will integrate more features into the system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1501] Add metrics library for monitorin...

2015-03-27 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/421#issuecomment-86898299
  
Hey @bhatsachin,
I've merged the change to master.

If you want, we can do a quick hangout or skype call to discuss potential 
contributions from your side.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1501] Add metrics library for monitorin...

2015-03-27 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/421#issuecomment-86898282
  
Hey @bhatsachin,
I've merged the change to master.

If you want, we can do a quick hangout or skype call to discuss potential 
contributions from your side.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (FLINK-1501) Integrate metrics library and report basic metrics to JobManager web interface

2015-03-27 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger resolved FLINK-1501.
---
   Resolution: Fixed
Fix Version/s: (was: pre-apache)
   0.9

Resolved in http://git-wip-us.apache.org/repos/asf/flink/commit/2d1f8b07

> Integrate metrics library and report basic metrics to JobManager web interface
> --
>
> Key: FLINK-1501
> URL: https://issues.apache.org/jira/browse/FLINK-1501
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Robert Metzger
> Fix For: 0.9
>
>
> As per mailing list, the library: https://github.com/dropwizard/metrics
> The goal of this task is to get the basic infrastructure in place.
> Subsequent issues will integrate more features into the system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Flink dev cluster on Docker with Docker Compos...

2015-03-27 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/533#issuecomment-86861202
  
Thank you very much for the contribution.

I've merged the pull request to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Flink dev cluster on Docker with Docker Compos...

2015-03-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/533


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---