[jira] [Commented] (FLINK-5967) Document AggregatingState

2017-05-06 Thread yanghua (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999673#comment-15999673
 ] 

yanghua commented on FLINK-5967:


hi [~aljoscha] where to add the document?

> Document AggregatingState
> -
>
> Key: FLINK-5967
> URL: https://issues.apache.org/jira/browse/FLINK-5967
> Project: Flink
>  Issue Type: Improvement
>  Components: DataStream API, Documentation
>Reporter: Aljoscha Krettek
>Priority: Critical
> Fix For: 1.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (FLINK-5967) Document AggregatingState

2017-05-06 Thread yanghua (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yanghua reassigned FLINK-5967:
--

Assignee: (was: yanghua)

> Document AggregatingState
> -
>
> Key: FLINK-5967
> URL: https://issues.apache.org/jira/browse/FLINK-5967
> Project: Flink
>  Issue Type: Improvement
>  Components: DataStream API, Documentation
>Reporter: Aljoscha Krettek
>Priority: Critical
> Fix For: 1.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (FLINK-5967) Document AggregatingState

2017-05-06 Thread yanghua (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yanghua reassigned FLINK-5967:
--

Assignee: yanghua

> Document AggregatingState
> -
>
> Key: FLINK-5967
> URL: https://issues.apache.org/jira/browse/FLINK-5967
> Project: Flink
>  Issue Type: Improvement
>  Components: DataStream API, Documentation
>Reporter: Aljoscha Krettek
>Assignee: yanghua
>Priority: Critical
> Fix For: 1.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6397) MultipleProgramsTestBase does not reset ContextEnvironment

2017-05-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999661#comment-15999661
 ] 

ASF GitHub Bot commented on FLINK-6397:
---

Github user ifndef-SleePy commented on the issue:

https://github.com/apache/flink/pull/3810
  
I think Till has implemented a static unsetAsContext method in 
TestEnvironment recently. I rebased the implementation from master, and 
recommit the pull request.


> MultipleProgramsTestBase does not reset ContextEnvironment
> --
>
> Key: FLINK-6397
> URL: https://issues.apache.org/jira/browse/FLINK-6397
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.3.0
>Reporter: Chesnay Schepler
>Assignee: Biao Liu
>
> The MultipleProgramsTestBase sets a new TestEnvironment as a context 
> environment but never explicitly unsets it, which can result subsequent tests 
> categorically failing.
> The CustomDistributionITCase doesn't unset the context either; and some 
> streaming test that i haven't quite nailed down yet.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] flink issue #3810: [FLINK-6397] MultipleProgramsTestBase does not reset Cont...

2017-05-06 Thread ifndef-SleePy
Github user ifndef-SleePy commented on the issue:

https://github.com/apache/flink/pull/3810
  
I think Till has implemented a static unsetAsContext method in 
TestEnvironment recently. I rebased the implementation from master, and 
recommit the pull request.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-6397) MultipleProgramsTestBase does not reset ContextEnvironment

2017-05-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999660#comment-15999660
 ] 

ASF GitHub Bot commented on FLINK-6397:
---

Github user ifndef-SleePy commented on a diff in the pull request:

https://github.com/apache/flink/pull/3810#discussion_r115133175
  
--- Diff: 
flink-test-utils-parent/flink-test-utils/src/main/java/org/apache/flink/test/util/MultipleProgramsTestBase.java
 ---
@@ -80,29 +80,36 @@

protected final TestExecutionMode mode;
 
-   
+   private TestEnvironment testEnvironment;
+
+   private CollectionTestEnvironment collectionTestEnvironment;
+
public MultipleProgramsTestBase(TestExecutionMode mode) {
this.mode = mode;
-   
+
switch(mode){
case CLUSTER:
-   new TestEnvironment(cluster, 4).setAsContext();
+   testEnvironment = new TestEnvironment(cluster, 
4);
--- End diff --

@zentol I just found that I can not move these codes into \@Before method, 
because it will not work with JUnit Parameterized. I proposal to keep these 
codes in constructor. What do you think?


> MultipleProgramsTestBase does not reset ContextEnvironment
> --
>
> Key: FLINK-6397
> URL: https://issues.apache.org/jira/browse/FLINK-6397
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.3.0
>Reporter: Chesnay Schepler
>Assignee: Biao Liu
>
> The MultipleProgramsTestBase sets a new TestEnvironment as a context 
> environment but never explicitly unsets it, which can result subsequent tests 
> categorically failing.
> The CustomDistributionITCase doesn't unset the context either; and some 
> streaming test that i haven't quite nailed down yet.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] flink pull request #3810: [FLINK-6397] MultipleProgramsTestBase does not res...

2017-05-06 Thread ifndef-SleePy
Github user ifndef-SleePy commented on a diff in the pull request:

https://github.com/apache/flink/pull/3810#discussion_r115133175
  
--- Diff: 
flink-test-utils-parent/flink-test-utils/src/main/java/org/apache/flink/test/util/MultipleProgramsTestBase.java
 ---
@@ -80,29 +80,36 @@

protected final TestExecutionMode mode;
 
-   
+   private TestEnvironment testEnvironment;
+
+   private CollectionTestEnvironment collectionTestEnvironment;
+
public MultipleProgramsTestBase(TestExecutionMode mode) {
this.mode = mode;
-   
+
switch(mode){
case CLUSTER:
-   new TestEnvironment(cluster, 4).setAsContext();
+   testEnvironment = new TestEnvironment(cluster, 
4);
--- End diff --

@zentol I just found that I can not move these codes into \@Before method, 
because it will not work with JUnit Parameterized. I proposal to keep these 
codes in constructor. What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3793: flink-6033 Support UNNEST query in the stream SQL API

2017-05-06 Thread fhueske
Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/3793
  
Thanks for the update @suez1224!
Looks good to me. Will do some minor refactoring and merge this PR.

Best, Fabian


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Comment Edited] (FLINK-5536) Config option: HA

2017-05-06 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999615#comment-15999615
 ] 

Stavros Kontopoulos edited comment on FLINK-5536 at 5/6/17 10:31 PM:
-

I tested it, ti seems to work fine. You need to setup flink-config.yaml plus 
the zookeeper namespace at the client side.
I will work for the PR. There are several properties we need to expose.


was (Author: skonto):
I tested it, i seems to work fine. You need to setup flink-config.yaml plus the 
zookeeper namespace at the client side.
I will work for the PR. There are several properties we need to expose.

> Config option: HA
> -
>
> Key: FLINK-5536
> URL: https://issues.apache.org/jira/browse/FLINK-5536
> Project: Flink
>  Issue Type: Sub-task
>  Components: Mesos
>Reporter: Eron Wright 
>Assignee: Stavros Kontopoulos
>
> Configure Flink HA thru package options plus good defaults.   The main 
> components are ZK configuration and state backend configuration.
> - The ZK information can be defaulted to `master.mesos` as with other packages
> - Evaluate whether ZK can be fully configured by default, even if a state 
> backend isn't configured.
> - Use DCOS HDFS as the filesystem for the state backend.  Evaluate whether to 
> assume that DCOS HDFS is installed by default, or whether to make it explicit.
> - To use DCOS HDFS, the init script should download the core-site.xml and 
> hdfs-site.xml from the HDFS 'connection' endpoint.   Supply a default value 
> for the endpoint address; see 
> [https://docs.mesosphere.com/service-docs/hdfs/connecting-clients/].



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5536) Config option: HA

2017-05-06 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999615#comment-15999615
 ] 

Stavros Kontopoulos commented on FLINK-5536:


I tested it, i seems to work fine. You need to setup flink-config.yaml plus the 
zookeeper namespace at the client side.
I will work for the PR. There are several properties we need to expose.

> Config option: HA
> -
>
> Key: FLINK-5536
> URL: https://issues.apache.org/jira/browse/FLINK-5536
> Project: Flink
>  Issue Type: Sub-task
>  Components: Mesos
>Reporter: Eron Wright 
>Assignee: Stavros Kontopoulos
>
> Configure Flink HA thru package options plus good defaults.   The main 
> components are ZK configuration and state backend configuration.
> - The ZK information can be defaulted to `master.mesos` as with other packages
> - Evaluate whether ZK can be fully configured by default, even if a state 
> backend isn't configured.
> - Use DCOS HDFS as the filesystem for the state backend.  Evaluate whether to 
> assume that DCOS HDFS is installed by default, or whether to make it explicit.
> - To use DCOS HDFS, the init script should download the core-site.xml and 
> hdfs-site.xml from the HDFS 'connection' endpoint.   Supply a default value 
> for the endpoint address; see 
> [https://docs.mesosphere.com/service-docs/hdfs/connecting-clients/].



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] flink pull request #3511: [Flink-5734] code generation for normalizedkey sor...

2017-05-06 Thread heytitle
Github user heytitle commented on a diff in the pull request:

https://github.com/apache/flink/pull/3511#discussion_r115130267
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/operators/sort/NormalizedKeySorter.java
 ---
@@ -47,61 +47,61 @@

public static final int MAX_NORMALIZED_KEY_LEN_PER_ELEMENT = 8;

-   private static final int MIN_REQUIRED_BUFFERS = 3;
+   public static final int MIN_REQUIRED_BUFFERS = 3;

-   private static final int LARGE_RECORD_THRESHOLD = 10 * 1024 * 1024;
+   public static final int LARGE_RECORD_THRESHOLD = 10 * 1024 * 1024;

-   private static final long LARGE_RECORD_TAG = 1L << 63;
+   public static final long LARGE_RECORD_TAG = 1L << 63;

-   private static final long POINTER_MASK = LARGE_RECORD_TAG - 1;
+   public static final long POINTER_MASK = LARGE_RECORD_TAG - 1;
 
--- End diff --

The reason `public` is used here because `Janino` first check accessibility 
of these variables and it seems not able to access them when `protected` is 
used and it throws the error below.
```
org.codehaus.commons.compiler.CompileException: Field 
"LARGE_RECORD_THRESHOLD" is not accessible

at 
org.codehaus.janino.ReflectionIClass$ReflectionIField.getConstantValue(ReflectionIClass.java:340)
at 
org.codehaus.janino.UnitCompiler.getConstantValue2(UnitCompiler.java:4433)
at org.codehaus.janino.UnitCompiler.access$1(UnitCompiler.java:182)
at 
org.codehaus.janino.UnitCompiler$11.visitFieldAccess(UnitCompiler.java:4407)
at org.codehaus.janino.Java$FieldAccess.accept(Java.java:3229)
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-5867) The implementation of RestartPipelinedRegionStrategy

2017-05-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999589#comment-15999589
 ] 

ASF GitHub Bot commented on FLINK-5867:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/3773


> The implementation of RestartPipelinedRegionStrategy
> 
>
> Key: FLINK-5867
> URL: https://issues.apache.org/jira/browse/FLINK-5867
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager
>Reporter: shuai.xu
>Assignee: shuai.xu
>
> The RestartPipelinedRegionStrategy's responsibility is the following:
> 1. Calculate all FailoverRegions and their relations when initializing.
> 2. Listen for the failure of the job and executions, and find corresponding 
> FailoverRegions to do the failover.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] flink pull request #3773: [FLINK-5867] [FLINK-5866] [flip-1] Implement Failo...

2017-05-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/3773


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-6221) Add Promethus support to metrics

2017-05-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999521#comment-15999521
 ] 

ASF GitHub Bot commented on FLINK-6221:
---

Github user mbode commented on the issue:

https://github.com/apache/flink/pull/3833
  
Thanks for looking at it so quickly! I somewhat had the same instinct as 
far as your first point is concerned and thought about pulling out a 
`DropwizardReporter` without Scheduling but decided against it to not have to 
touch too many places. I like your suggestion of converting metrics directly 
without using Dropwizard as an intermediate step and am going to try that.


> Add Promethus support to metrics
> 
>
> Key: FLINK-6221
> URL: https://issues.apache.org/jira/browse/FLINK-6221
> Project: Flink
>  Issue Type: Improvement
>  Components: Metrics
>Affects Versions: 1.2.0
>Reporter: Joshua Griffith
>Assignee: Maximilian Bode
>Priority: Minor
>
> [Prometheus|https://prometheus.io/] is becoming popular for metrics and 
> alerting. It's possible to use 
> [statsd-exporter|https://github.com/prometheus/statsd_exporter] to load Flink 
> metrics into Prometheus but it would be far easier if Flink supported 
> Promethus as a metrics reporter. A [dropwizard 
> client|https://github.com/prometheus/client_java/tree/master/simpleclient_dropwizard]
>  exists that could be integrated into the existing metrics system.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6438) Expand docs home page a little

2017-05-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999520#comment-15999520
 ] 

ASF GitHub Bot commented on FLINK-6438:
---

Github user alpinegizmo commented on the issue:

https://github.com/apache/flink/pull/3823
  
@greghogan I'd be happy to list other blogs; any suggestions? I've 
considered at a few, but the problem with the ones I've examined is that the 
proportion of Flink-related posts is rather small, and the posts aren't tagged, 
so we can't link directly the subset that's relevant.


> Expand docs home page a little
> --
>
> Key: FLINK-6438
> URL: https://issues.apache.org/jira/browse/FLINK-6438
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: David Anderson
>Assignee: David Anderson
>Priority: Minor
>
> The idea is to improve the documentation home page by adding a few links to 
> valuable items that are too easily overlooked.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] flink issue #3833: [FLINK-6221] Add PrometheusReporter

2017-05-06 Thread mbode
Github user mbode commented on the issue:

https://github.com/apache/flink/pull/3833
  
Thanks for looking at it so quickly! I somewhat had the same instinct as 
far as your first point is concerned and thought about pulling out a 
`DropwizardReporter` without Scheduling but decided against it to not have to 
touch too many places. I like your suggestion of converting metrics directly 
without using Dropwizard as an intermediate step and am going to try that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3823: [FLINK-6438] Added a few links to the docs home page ...

2017-05-06 Thread alpinegizmo
Github user alpinegizmo commented on the issue:

https://github.com/apache/flink/pull/3823
  
@greghogan I'd be happy to list other blogs; any suggestions? I've 
considered at a few, but the problem with the ones I've examined is that the 
proportion of Flink-related posts is rather small, and the posts aren't tagged, 
so we can't link directly the subset that's relevant.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3823: [FLINK-6438] Added a few links to the docs home pa...

2017-05-06 Thread alpinegizmo
Github user alpinegizmo commented on a diff in the pull request:

https://github.com/apache/flink/pull/3823#discussion_r115125659
  
--- Diff: docs/index.md ---
@@ -33,9 +33,13 @@ Apache Flink is an open source platform for distributed 
stream and batch data pr
 
 - **Concepts**: Start with the basic concepts of Flink's [Dataflow 
Programming Model](concepts/programming-model.html) and [Distributed Runtime 
Environment](concepts/runtime.html). This will help you to fully understand the 
other parts of the documentation, including the setup and programming guides. 
It is highly recommended to read these sections first.
 
-- **Quickstarts**: [Run an example 
program](quickstart/setup_quickstart.html) on your local machine or [write a 
simple program](quickstart/run_example_quickstart.html) working on live 
Wikipedia edits.
+- **Quickstarts**: [Run an example 
program](quickstart/setup_quickstart.html) on your local machine or [study some 
examples](examples/index.html).
 
-- **Programming Guides**: You can check out our guides about [basic 
concepts](dev/api_concepts.html) and the [DataStream 
API](dev/datastream_api.html) or [DataSet API](dev/batch/index.html) to learn 
how to write your first Flink programs.
+- **Programming Guides**: You can check out our guides about [basic api 
concepts](dev/api_concepts.html) and the [DataStream 
API](dev/datastream_api.html) or [DataSet API](dev/batch/index.html) to learn 
how to write your first Flink programs.
+
+## Deployment
+
+Before putting your Flink job into production, be sure to read the 
[Production Readiness 
Checklist](http://localhost:4000/ops/production_ready.html).
--- End diff --

That's embarrassing; glad you caught that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-6438) Expand docs home page a little

2017-05-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999512#comment-15999512
 ] 

ASF GitHub Bot commented on FLINK-6438:
---

Github user alpinegizmo commented on a diff in the pull request:

https://github.com/apache/flink/pull/3823#discussion_r115125659
  
--- Diff: docs/index.md ---
@@ -33,9 +33,13 @@ Apache Flink is an open source platform for distributed 
stream and batch data pr
 
 - **Concepts**: Start with the basic concepts of Flink's [Dataflow 
Programming Model](concepts/programming-model.html) and [Distributed Runtime 
Environment](concepts/runtime.html). This will help you to fully understand the 
other parts of the documentation, including the setup and programming guides. 
It is highly recommended to read these sections first.
 
-- **Quickstarts**: [Run an example 
program](quickstart/setup_quickstart.html) on your local machine or [write a 
simple program](quickstart/run_example_quickstart.html) working on live 
Wikipedia edits.
+- **Quickstarts**: [Run an example 
program](quickstart/setup_quickstart.html) on your local machine or [study some 
examples](examples/index.html).
 
-- **Programming Guides**: You can check out our guides about [basic 
concepts](dev/api_concepts.html) and the [DataStream 
API](dev/datastream_api.html) or [DataSet API](dev/batch/index.html) to learn 
how to write your first Flink programs.
+- **Programming Guides**: You can check out our guides about [basic api 
concepts](dev/api_concepts.html) and the [DataStream 
API](dev/datastream_api.html) or [DataSet API](dev/batch/index.html) to learn 
how to write your first Flink programs.
+
+## Deployment
+
+Before putting your Flink job into production, be sure to read the 
[Production Readiness 
Checklist](http://localhost:4000/ops/production_ready.html).
--- End diff --

That's embarrassing; glad you caught that.


> Expand docs home page a little
> --
>
> Key: FLINK-6438
> URL: https://issues.apache.org/jira/browse/FLINK-6438
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: David Anderson
>Assignee: David Anderson
>Priority: Minor
>
> The idea is to improve the documentation home page by adding a few links to 
> valuable items that are too easily overlooked.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (FLINK-4545) Flink automatically manages TM network buffer

2017-05-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-4545.
-
Resolution: Fixed

Finalized in 0bb49e538c118b8265377355a9667789a3971966

> Flink automatically manages TM network buffer
> -
>
> Key: FLINK-4545
> URL: https://issues.apache.org/jira/browse/FLINK-4545
> Project: Flink
>  Issue Type: Wish
>  Components: Network
>Reporter: Zhenzhong Xu
>Assignee: Nico Kruber
>Priority: Critical
> Fix For: 1.3.0
>
>
> Currently, the number of network buffer per task manager is preconfigured and 
> the memory is pre-allocated through taskmanager.network.numberOfBuffers 
> config. In a Job DAG with shuffle phase, this number can go up very high 
> depends on the TM cluster size. The formula for calculating the buffer count 
> is documented here 
> (https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#configuring-the-network-buffers).
>   
> #slots-per-TM^2 * #TMs * 4
> In a standalone deployment, we may need to control the task manager cluster 
> size dynamically and then leverage the up-coming Flink feature to support 
> scaling job parallelism/rescaling at runtime. 
> If the buffer count config is static at runtime and cannot be changed without 
> restarting task manager process, this may add latency and complexity for 
> scaling process. I am wondering if there is already any discussion around 
> whether the network buffer should be automatically managed by Flink or at 
> least expose some API to allow it to be reconfigured. Let me know if there is 
> any existing JIRA that I should follow.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-4545) Flink automatically manages TM network buffer

2017-05-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-4545.
---

> Flink automatically manages TM network buffer
> -
>
> Key: FLINK-4545
> URL: https://issues.apache.org/jira/browse/FLINK-4545
> Project: Flink
>  Issue Type: Wish
>  Components: Network
>Reporter: Zhenzhong Xu
>Assignee: Nico Kruber
>Priority: Critical
> Fix For: 1.3.0
>
>
> Currently, the number of network buffer per task manager is preconfigured and 
> the memory is pre-allocated through taskmanager.network.numberOfBuffers 
> config. In a Job DAG with shuffle phase, this number can go up very high 
> depends on the TM cluster size. The formula for calculating the buffer count 
> is documented here 
> (https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#configuring-the-network-buffers).
>   
> #slots-per-TM^2 * #TMs * 4
> In a standalone deployment, we may need to control the task manager cluster 
> size dynamically and then leverage the up-coming Flink feature to support 
> scaling job parallelism/rescaling at runtime. 
> If the buffer count config is static at runtime and cannot be changed without 
> restarting task manager process, this may add latency and complexity for 
> scaling process. I am wondering if there is already any discussion around 
> whether the network buffer should be automatically managed by Flink or at 
> least expose some API to allow it to be reconfigured. Let me know if there is 
> any existing JIRA that I should follow.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (FLINK-6401) RocksDBPerformanceTest.testRocksDbRangeGetPerformance fails on Travis

2017-05-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-6401.
-
   Resolution: Fixed
 Assignee: Stephan Ewen
Fix Version/s: 1.3.0

Fixed via 4e1b48ec33c084c98ef68a126736c6628f6b3fa5

> RocksDBPerformanceTest.testRocksDbRangeGetPerformance fails on Travis
> -
>
> Key: FLINK-6401
> URL: https://issues.apache.org/jira/browse/FLINK-6401
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing, Tests
>Affects Versions: 1.3.0
>Reporter: Till Rohrmann
>Assignee: Stephan Ewen
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.3.0
>
>
> The test case {{RocksDBPerformanceTest.testRocksDbRangeGetPerformance}} 
> failed on Travis. Not sure whether the timeout are simply set too tight.
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/226347608/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-6401) RocksDBPerformanceTest.testRocksDbRangeGetPerformance fails on Travis

2017-05-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-6401.
---

> RocksDBPerformanceTest.testRocksDbRangeGetPerformance fails on Travis
> -
>
> Key: FLINK-6401
> URL: https://issues.apache.org/jira/browse/FLINK-6401
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing, Tests
>Affects Versions: 1.3.0
>Reporter: Till Rohrmann
>Assignee: Stephan Ewen
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.3.0
>
>
> The test case {{RocksDBPerformanceTest.testRocksDbRangeGetPerformance}} 
> failed on Travis. Not sure whether the timeout are simply set too tight.
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/226347608/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-6470) Add a utility to parse memory sizes with units

2017-05-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-6470.
---

> Add a utility to parse memory sizes with units
> --
>
> Key: FLINK-6470
> URL: https://issues.apache.org/jira/browse/FLINK-6470
> Project: Flink
>  Issue Type: Sub-task
>  Components: Core
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Minor
> Fix For: 1.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (FLINK-6447) AWS/EMR docs are out-of-date

2017-05-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-6447.
-
   Resolution: Fixed
Fix Version/s: 1.3.0

Fixed via 6c48f9bb0e27b86f57b940aac67db12c17b4f5bc

> AWS/EMR docs are out-of-date
> 
>
> Key: FLINK-6447
> URL: https://issues.apache.org/jira/browse/FLINK-6447
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: David Anderson
>Assignee: David Anderson
> Fix For: 1.3.0
>
>
> EMR now has explicit Flink support, so there's no need to install Flink by 
> hand.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (FLINK-6443) Add more doc links in concepts sections

2017-05-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-6443.
-
   Resolution: Fixed
Fix Version/s: 1.3.0

Fixed in b01d737ae1452dbfafd4696ff14d52dce5b60efd

> Add more doc links in concepts sections
> ---
>
> Key: FLINK-6443
> URL: https://issues.apache.org/jira/browse/FLINK-6443
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: David Anderson
>Assignee: David Anderson
>Priority: Minor
> Fix For: 1.3.0
>
>
> Some sections in the high-level concepts docs don't have any pointers to help 
> you learn more. It can be useful to point people to these concept sections 
> when answering questions on stackoverflow and the mailing list, but that 
> doesn't work well if the writeup there is a dead-end.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (FLINK-6470) Add a utility to parse memory sizes with units

2017-05-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-6470.
-
Resolution: Fixed

Fixed via 50b8dda37d5194297e4a6e41460fbc13e67a393b

> Add a utility to parse memory sizes with units
> --
>
> Key: FLINK-6470
> URL: https://issues.apache.org/jira/browse/FLINK-6470
> Project: Flink
>  Issue Type: Sub-task
>  Components: Core
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Minor
> Fix For: 1.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-6443) Add more doc links in concepts sections

2017-05-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-6443.
---

> Add more doc links in concepts sections
> ---
>
> Key: FLINK-6443
> URL: https://issues.apache.org/jira/browse/FLINK-6443
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: David Anderson
>Assignee: David Anderson
>Priority: Minor
> Fix For: 1.3.0
>
>
> Some sections in the high-level concepts docs don't have any pointers to help 
> you learn more. It can be useful to point people to these concept sections 
> when answering questions on stackoverflow and the mailing list, but that 
> doesn't work well if the writeup there is a dead-end.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-6447) AWS/EMR docs are out-of-date

2017-05-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-6447.
---

> AWS/EMR docs are out-of-date
> 
>
> Key: FLINK-6447
> URL: https://issues.apache.org/jira/browse/FLINK-6447
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: David Anderson
>Assignee: David Anderson
> Fix For: 1.3.0
>
>
> EMR now has explicit Flink support, so there's no need to install Flink by 
> hand.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] flink pull request #3828: [FLINK-6447] update aws/emr docs

2017-05-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/3828


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3822: [FLINK-6443] add more links to concepts docs

2017-05-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/3822


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-6447) AWS/EMR docs are out-of-date

2017-05-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999501#comment-15999501
 ] 

ASF GitHub Bot commented on FLINK-6447:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/3828


> AWS/EMR docs are out-of-date
> 
>
> Key: FLINK-6447
> URL: https://issues.apache.org/jira/browse/FLINK-6447
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: David Anderson
>Assignee: David Anderson
>
> EMR now has explicit Flink support, so there's no need to install Flink by 
> hand.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] flink pull request #3721: [FLINK-4545] replace the network buffers parameter

2017-05-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/3721


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-6443) Add more doc links in concepts sections

2017-05-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999503#comment-15999503
 ] 

ASF GitHub Bot commented on FLINK-6443:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/3822


> Add more doc links in concepts sections
> ---
>
> Key: FLINK-6443
> URL: https://issues.apache.org/jira/browse/FLINK-6443
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: David Anderson
>Assignee: David Anderson
>Priority: Minor
>
> Some sections in the high-level concepts docs don't have any pointers to help 
> you learn more. It can be useful to point people to these concept sections 
> when answering questions on stackoverflow and the mailing list, but that 
> doesn't work well if the writeup there is a dead-end.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-4545) Flink automatically manages TM network buffer

2017-05-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999502#comment-15999502
 ] 

ASF GitHub Bot commented on FLINK-4545:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/3721


> Flink automatically manages TM network buffer
> -
>
> Key: FLINK-4545
> URL: https://issues.apache.org/jira/browse/FLINK-4545
> Project: Flink
>  Issue Type: Wish
>  Components: Network
>Reporter: Zhenzhong Xu
>Assignee: Nico Kruber
>Priority: Critical
> Fix For: 1.3.0
>
>
> Currently, the number of network buffer per task manager is preconfigured and 
> the memory is pre-allocated through taskmanager.network.numberOfBuffers 
> config. In a Job DAG with shuffle phase, this number can go up very high 
> depends on the TM cluster size. The formula for calculating the buffer count 
> is documented here 
> (https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#configuring-the-network-buffers).
>   
> #slots-per-TM^2 * #TMs * 4
> In a standalone deployment, we may need to control the task manager cluster 
> size dynamically and then leverage the up-coming Flink feature to support 
> scaling job parallelism/rescaling at runtime. 
> If the buffer count config is static at runtime and cannot be changed without 
> restarting task manager process, this may add latency and complexity for 
> scaling process. I am wondering if there is already any discussion around 
> whether the network buffer should be automatically managed by Flink or at 
> least expose some API to allow it to be reconfigured. Let me know if there is 
> any existing JIRA that I should follow.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6221) Add Promethus support to metrics

2017-05-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999494#comment-15999494
 ] 

ASF GitHub Bot commented on FLINK-6221:
---

Github user zentol commented on the issue:

https://github.com/apache/flink/pull/3833
  
I've glanced over this for now but it looks pretty good.

If possible though i would like to not use the DropwizardExports class, for 
2 reasons:

1) We now introduce this odd pattern of extending 
`ScheduledDropwizardReporter` without actually being scheduled nor creating an 
actual Dropwizard reporter.

2) I've looked into the source code and it has the typical Dropwizard 
reporter problem where absolutely nothing gets cached and effectively constant 
objects are re-created again and again. We can make this way more efficient.


> Add Promethus support to metrics
> 
>
> Key: FLINK-6221
> URL: https://issues.apache.org/jira/browse/FLINK-6221
> Project: Flink
>  Issue Type: Improvement
>  Components: Metrics
>Affects Versions: 1.2.0
>Reporter: Joshua Griffith
>Assignee: Maximilian Bode
>Priority: Minor
>
> [Prometheus|https://prometheus.io/] is becoming popular for metrics and 
> alerting. It's possible to use 
> [statsd-exporter|https://github.com/prometheus/statsd_exporter] to load Flink 
> metrics into Prometheus but it would be far easier if Flink supported 
> Promethus as a metrics reporter. A [dropwizard 
> client|https://github.com/prometheus/client_java/tree/master/simpleclient_dropwizard]
>  exists that could be integrated into the existing metrics system.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] flink issue #3833: [FLINK-6221] Add PrometheusReporter

2017-05-06 Thread zentol
Github user zentol commented on the issue:

https://github.com/apache/flink/pull/3833
  
I've glanced over this for now but it looks pretty good.

If possible though i would like to not use the DropwizardExports class, for 
2 reasons:

1) We now introduce this odd pattern of extending 
`ScheduledDropwizardReporter` without actually being scheduled nor creating an 
actual Dropwizard reporter.

2) I've looked into the source code and it has the typical Dropwizard 
reporter problem where absolutely nothing gets cached and effectively constant 
objects are re-created again and again. We can make this way more efficient.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-6471) RocksDBStateBackendTest#testCancelRunningSnapshot sometimes fails

2017-05-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999482#comment-15999482
 ] 

Ted Yu commented on FLINK-6471:
---

Running RocksDBStateBackendTest under flink-contrib/flink-statebackend-rocksdb 
passes.

> RocksDBStateBackendTest#testCancelRunningSnapshot sometimes fails
> -
>
> Key: FLINK-6471
> URL: https://issues.apache.org/jira/browse/FLINK-6471
> Project: Flink
>  Issue Type: Test
>Reporter: Ted Yu
>Priority: Minor
>
> I got the following test failure based on commit 
> f37988c19adc30d324cde83c54f2fa5d36efb9e7 :
> {code}
> testCancelRunningSnapshot[1](org.apache.flink.contrib.streaming.state.RocksDBStateBackendTest)
>   Time elapsed: 0.166 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.flink.contrib.streaming.state.RocksDBStateBackendTest.testCancelRunningSnapshot(RocksDBStateBackendTest.java:313)
> Results :
> Failed tests:
>   RocksDBStateBackendTest.testCancelRunningSnapshot:313 null
> {code}
> The following assertion fails:
> {code}
> assertTrue(testStreamFactory.getLastCreatedStream().isClosed());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (FLINK-6471) RocksDBStateBackendTest#testCancelRunningSnapshot sometimes fails

2017-05-06 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-6471:
--
Summary: RocksDBStateBackendTest#testCancelRunningSnapshot sometimes fails  
(was: RocksDBStateBackendTest#testCancelRunningSnapshot fails)

> RocksDBStateBackendTest#testCancelRunningSnapshot sometimes fails
> -
>
> Key: FLINK-6471
> URL: https://issues.apache.org/jira/browse/FLINK-6471
> Project: Flink
>  Issue Type: Test
>Reporter: Ted Yu
>Priority: Minor
>
> I got the following test failure based on commit 
> f37988c19adc30d324cde83c54f2fa5d36efb9e7 :
> {code}
> testCancelRunningSnapshot[1](org.apache.flink.contrib.streaming.state.RocksDBStateBackendTest)
>   Time elapsed: 0.166 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.flink.contrib.streaming.state.RocksDBStateBackendTest.testCancelRunningSnapshot(RocksDBStateBackendTest.java:313)
> Results :
> Failed tests:
>   RocksDBStateBackendTest.testCancelRunningSnapshot:313 null
> {code}
> The following assertion fails:
> {code}
> assertTrue(testStreamFactory.getLastCreatedStream().isClosed());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6472) BoundedOutOfOrdernessTimestampExtractor does not bound out of orderliness

2017-05-06 Thread Elias Levy (JIRA)
Elias Levy created FLINK-6472:
-

 Summary: BoundedOutOfOrdernessTimestampExtractor does not bound 
out of orderliness
 Key: FLINK-6472
 URL: https://issues.apache.org/jira/browse/FLINK-6472
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.3.0
Reporter: Elias Levy


{{BoundedOutOfOrdernessTimestampExtractor}} attempts to emit watermarks that 
lag behind the largest observed timestamp by a configurable time delta.  It 
fails to so in some circumstances.

The class extends {{AssignerWithPeriodicWatermarks}}, which generates 
watermarks in periodic intervals.  The timer for this intervals is a processing 
time timer.

In circumstances where there is a rush of events (restarting Flink, unpausing 
an upstream producer, loading events from a file, etc), many events with 
timestamps much larger that what the configured bound would normally allow will 
be sent downstream without a watermark.  This can have negative effects 
downstream, as operators may be buffering the events waiting for a watermark to 
process them, thus leading the memory growth and possible out-of-memory 
conditions.

It is probably best to have a bounded out of orderliness extractor that is 
based on the punctuated timestamp extractor, so we can ensure that watermarks 
are generated in a timely fashion in event time, with the addition of process 
time timer to generate a watermark if there has been a lull in events, thus 
also bounding the delay of generating a watermark in processing time. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (FLINK-6471) RocksDBStateBackendTest#testCancelRunningSnapshot fails

2017-05-06 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-6471:
--
Description: 
I got the following test failure based on commit 
f37988c19adc30d324cde83c54f2fa5d36efb9e7 :
{code}
testCancelRunningSnapshot[1](org.apache.flink.contrib.streaming.state.RocksDBStateBackendTest)
  Time elapsed: 0.166 sec  <<< FAILURE!
java.lang.AssertionError: null
  at org.junit.Assert.fail(Assert.java:86)
  at org.junit.Assert.assertTrue(Assert.java:41)
  at org.junit.Assert.assertTrue(Assert.java:52)
  at 
org.apache.flink.contrib.streaming.state.RocksDBStateBackendTest.testCancelRunningSnapshot(RocksDBStateBackendTest.java:313)


Results :

Failed tests:
  RocksDBStateBackendTest.testCancelRunningSnapshot:313 null
{code}
The following assertion fails:
{code}
assertTrue(testStreamFactory.getLastCreatedStream().isClosed());
{code}

  was:
I got the following test failure based on commit 
f37988c19adc30d324cde83c54f2fa5d36efb9e7 :
{code}
testCancelRunningSnapshot[1](org.apache.flink.contrib.streaming.state.RocksDBStateBackendTest)
  Time elapsed: 0.166 sec  <<< FAILURE!
java.lang.AssertionError: null
  at org.junit.Assert.fail(Assert.java:86)
  at org.junit.Assert.assertTrue(Assert.java:41)
  at org.junit.Assert.assertTrue(Assert.java:52)
  at 
org.apache.flink.contrib.streaming.state.RocksDBStateBackendTest.testCancelRunningSnapshot(RocksDBStateBackendTest.java:313)


Results :

Failed tests:
  RocksDBStateBackendTest.testCancelRunningSnapshot:313 null
{code}


> RocksDBStateBackendTest#testCancelRunningSnapshot fails
> ---
>
> Key: FLINK-6471
> URL: https://issues.apache.org/jira/browse/FLINK-6471
> Project: Flink
>  Issue Type: Test
>Reporter: Ted Yu
>Priority: Minor
>
> I got the following test failure based on commit 
> f37988c19adc30d324cde83c54f2fa5d36efb9e7 :
> {code}
> testCancelRunningSnapshot[1](org.apache.flink.contrib.streaming.state.RocksDBStateBackendTest)
>   Time elapsed: 0.166 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.flink.contrib.streaming.state.RocksDBStateBackendTest.testCancelRunningSnapshot(RocksDBStateBackendTest.java:313)
> Results :
> Failed tests:
>   RocksDBStateBackendTest.testCancelRunningSnapshot:313 null
> {code}
> The following assertion fails:
> {code}
> assertTrue(testStreamFactory.getLastCreatedStream().isClosed());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6471) RocksDBStateBackendTest#testCancelRunningSnapshot fails

2017-05-06 Thread Ted Yu (JIRA)
Ted Yu created FLINK-6471:
-

 Summary: RocksDBStateBackendTest#testCancelRunningSnapshot fails
 Key: FLINK-6471
 URL: https://issues.apache.org/jira/browse/FLINK-6471
 Project: Flink
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor


I got the following test failure based on commit 
f37988c19adc30d324cde83c54f2fa5d36efb9e7 :
{code}
testCancelRunningSnapshot[1](org.apache.flink.contrib.streaming.state.RocksDBStateBackendTest)
  Time elapsed: 0.166 sec  <<< FAILURE!
java.lang.AssertionError: null
  at org.junit.Assert.fail(Assert.java:86)
  at org.junit.Assert.assertTrue(Assert.java:41)
  at org.junit.Assert.assertTrue(Assert.java:52)
  at 
org.apache.flink.contrib.streaming.state.RocksDBStateBackendTest.testCancelRunningSnapshot(RocksDBStateBackendTest.java:313)


Results :

Failed tests:
  RocksDBStateBackendTest.testCancelRunningSnapshot:313 null
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6425) Integrate serializer reconfiguration into state restore flow to activate serializer upgrades

2017-05-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999461#comment-15999461
 ] 

ASF GitHub Bot commented on FLINK-6425:
---

GitHub user tzulitai opened a pull request:

https://github.com/apache/flink/pull/3834

[FLINK-6425] [runtime] Activate serializer upgrades in state backends

This is a follow-up PR that finalizes serializer upgrades, and is based on 
#3804 (therefore, only the 2nd and 3rd commits, ed82173 and e77096a is 
relevant).

This PR includes the following changes:
1. Write configuration snapshots of serializers along with checkpoints 
(this changes serialization format of checkpoints).
2. On restore, confront configuration snapshots with newly registered 
serializers using the new 
`TypeSerializer#getMigrationStrategy(TypeSerializerConfigSnapshot)` method.
3. Serializer upgrades is completed if the confrontation determines that no 
migration is needed. The confrontation reconfigures the new serializer if the 
case requires. If the serializer cannot be reconfigured to avoid state 
migration, the job simply fails (as we currently do not have the actual state 
migration feature).

Note that the confrontation of config snapshots is currently only performed 
in the `RocksDBKeyedStateBackend`, which is the only place where this is 
currently needed due to its lazy deserialization characteristic. After we have 
eager state migration in place, the confrontation should happen for all state 
backends on restore.

## Tests
- Serialization compatibility of the new checkpoint format is covered with 
existing tests.
- Added a test that makes sure `InvalidClassException` is also caught when 
deserializing old serializers in the checkpoint (which occurs if the old 
serializer implementation was changed and results in a new serialVersionUID).
- Added tests for Java serialization failure resilience when reading the 
new checkpoints, in `SerializerProxiesTest`.
- Added end-to-end snapshot + restore tests which require reconfiguration 
of the `KryoSerializer` and `PojoSerializer` in cases where registration order 
of Kryo classes / Pojo types were changed.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tzulitai/flink FLINK-6425

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3834.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3834


commit 538a7acecce0d72e36e3726c0df2b6b96be35fc3
Author: Tzu-Li (Gordon) Tai 
Date:   2017-05-01T13:32:10Z

[FLINK-6190] [core] Migratable TypeSerializers

This commit introduces the user-facing APIs for migratable
TypeSerializers. The new user-facing APIs are:

- new class: TypeSerializerConfigSnapshot
- new class: ForwardCompatibleSerializationFormatConfig
- new method: TypeSerializer#snapshotConfiguration()
- new method: TypeSerializer#reconfigure(TypeSerializerConfigSnapshot)
- new enum: ReconfigureResult

commit ed82173fe97c6e9fb0784696bc4c49f10cc4e556
Author: Tzu-Li (Gordon) Tai 
Date:   2017-05-02T11:35:18Z

[hotfix] [core] Catch InvalidClassException in 
TypeSerializerSerializationProxy

Previously, the TypeSerializerSerializationProxy only uses the dummy
ClassNotFoundDummyTypeSerializer as a placeholder in the case where the
user uses a completely new serializer and deletes the old one.

There is also the case where the user changes the original serializer's
implementation and results in an InvalidClassException when trying to
deserialize the serializer. We should also use the
ClassNotFoundDummyTypeSerializer as a temporary placeholder in this
case.

commit e77096af29b4cbea26113928fe93218c075e4035
Author: Tzu-Li (Gordon) Tai 
Date:   2017-05-06T12:40:58Z

[FLINK-6425] [runtime] Activate serializer upgrades in state backends

This commit fully activates state serializer upgrades by changing the
following:
- Include serializer configuration snapshots in checkpoints
- On restore, use configuration snapshots to confront new serializers to
  perform the upgrade




> Integrate serializer reconfiguration into state restore flow to activate 
> serializer upgrades
> 
>
> Key: FLINK-6425
> URL: https://issues.apache.org/jira/browse/FLINK-6425
> Project: Flink
>  Issue Type: Sub-task
>  Components: State Backends, Checkpointing
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>
> With FLINK-6191, {{TypeSerializer}} 

[GitHub] flink pull request #3834: [FLINK-6425] [runtime] Activate serializer upgrade...

2017-05-06 Thread tzulitai
GitHub user tzulitai opened a pull request:

https://github.com/apache/flink/pull/3834

[FLINK-6425] [runtime] Activate serializer upgrades in state backends

This is a follow-up PR that finalizes serializer upgrades, and is based on 
#3804 (therefore, only the 2nd and 3rd commits, ed82173 and e77096a is 
relevant).

This PR includes the following changes:
1. Write configuration snapshots of serializers along with checkpoints 
(this changes serialization format of checkpoints).
2. On restore, confront configuration snapshots with newly registered 
serializers using the new 
`TypeSerializer#getMigrationStrategy(TypeSerializerConfigSnapshot)` method.
3. Serializer upgrades is completed if the confrontation determines that no 
migration is needed. The confrontation reconfigures the new serializer if the 
case requires. If the serializer cannot be reconfigured to avoid state 
migration, the job simply fails (as we currently do not have the actual state 
migration feature).

Note that the confrontation of config snapshots is currently only performed 
in the `RocksDBKeyedStateBackend`, which is the only place where this is 
currently needed due to its lazy deserialization characteristic. After we have 
eager state migration in place, the confrontation should happen for all state 
backends on restore.

## Tests
- Serialization compatibility of the new checkpoint format is covered with 
existing tests.
- Added a test that makes sure `InvalidClassException` is also caught when 
deserializing old serializers in the checkpoint (which occurs if the old 
serializer implementation was changed and results in a new serialVersionUID).
- Added tests for Java serialization failure resilience when reading the 
new checkpoints, in `SerializerProxiesTest`.
- Added end-to-end snapshot + restore tests which require reconfiguration 
of the `KryoSerializer` and `PojoSerializer` in cases where registration order 
of Kryo classes / Pojo types were changed.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tzulitai/flink FLINK-6425

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3834.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3834


commit 538a7acecce0d72e36e3726c0df2b6b96be35fc3
Author: Tzu-Li (Gordon) Tai 
Date:   2017-05-01T13:32:10Z

[FLINK-6190] [core] Migratable TypeSerializers

This commit introduces the user-facing APIs for migratable
TypeSerializers. The new user-facing APIs are:

- new class: TypeSerializerConfigSnapshot
- new class: ForwardCompatibleSerializationFormatConfig
- new method: TypeSerializer#snapshotConfiguration()
- new method: TypeSerializer#reconfigure(TypeSerializerConfigSnapshot)
- new enum: ReconfigureResult

commit ed82173fe97c6e9fb0784696bc4c49f10cc4e556
Author: Tzu-Li (Gordon) Tai 
Date:   2017-05-02T11:35:18Z

[hotfix] [core] Catch InvalidClassException in 
TypeSerializerSerializationProxy

Previously, the TypeSerializerSerializationProxy only uses the dummy
ClassNotFoundDummyTypeSerializer as a placeholder in the case where the
user uses a completely new serializer and deletes the old one.

There is also the case where the user changes the original serializer's
implementation and results in an InvalidClassException when trying to
deserialize the serializer. We should also use the
ClassNotFoundDummyTypeSerializer as a temporary placeholder in this
case.

commit e77096af29b4cbea26113928fe93218c075e4035
Author: Tzu-Li (Gordon) Tai 
Date:   2017-05-06T12:40:58Z

[FLINK-6425] [runtime] Activate serializer upgrades in state backends

This commit fully activates state serializer upgrades by changing the
following:
- Include serializer configuration snapshots in checkpoints
- On restore, use configuration snapshots to confront new serializers to
  perform the upgrade




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-5867) The implementation of RestartPipelinedRegionStrategy

2017-05-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999455#comment-15999455
 ] 

ASF GitHub Bot commented on FLINK-5867:
---

Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/3773
  
Thanks you @shuai-xu for checking this. Merging this for 1.3, together with 
the updates proposed by you.

I think we need to initially make a small change that non-pipelined 
exchanges are NOT boundaries of failover regions, until we have the enhancement 
in that triggers upstream region restart when a result could not be fetched.


> The implementation of RestartPipelinedRegionStrategy
> 
>
> Key: FLINK-5867
> URL: https://issues.apache.org/jira/browse/FLINK-5867
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager
>Reporter: shuai.xu
>Assignee: shuai.xu
>
> The RestartPipelinedRegionStrategy's responsibility is the following:
> 1. Calculate all FailoverRegions and their relations when initializing.
> 2. Listen for the failure of the job and executions, and find corresponding 
> FailoverRegions to do the failover.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] flink issue #3773: [FLINK-5867] [FLINK-5866] [flip-1] Implement FailoverStra...

2017-05-06 Thread StephanEwen
Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/3773
  
Thanks you @shuai-xu for checking this. Merging this for 1.3, together with 
the updates proposed by you.

I think we need to initially make a small change that non-pipelined 
exchanges are NOT boundaries of failover regions, until we have the enhancement 
in that triggers upstream region restart when a result could not be fetched.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-5869) ExecutionGraph use FailoverCoordinator to manage the failover of execution vertexes

2017-05-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999454#comment-15999454
 ] 

ASF GitHub Bot commented on FLINK-5869:
---

Github user StephanEwen closed the pull request at:

https://github.com/apache/flink/pull/3772


> ExecutionGraph use FailoverCoordinator to manage the failover of execution 
> vertexes
> ---
>
> Key: FLINK-5869
> URL: https://issues.apache.org/jira/browse/FLINK-5869
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager
>Reporter: shuai.xu
>Assignee: shuai.xu
> Fix For: 1.3.0
>
>
> Execution graph doesn't manage the failover of executions. It only care for 
> the state of the whole job, which is CREATED, RUNNING, FAILED, FINISHED, or 
> SUSPEND. 
> For execution failure, it will notice the FailoverCoordinator to do failover.
> It only record the finished job vertex and changes to FINISHED after all 
> vertexes finished.
> It will change to final fail if restart strategy fail or meet unrecoverable 
> exceptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5869) ExecutionGraph use FailoverCoordinator to manage the failover of execution vertexes

2017-05-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999453#comment-15999453
 ] 

ASF GitHub Bot commented on FLINK-5869:
---

Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/3772
  
Manually merged in 8ed85fe49b7595546a8f968e0faa1fa7d4da47ec


> ExecutionGraph use FailoverCoordinator to manage the failover of execution 
> vertexes
> ---
>
> Key: FLINK-5869
> URL: https://issues.apache.org/jira/browse/FLINK-5869
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager
>Reporter: shuai.xu
>Assignee: shuai.xu
> Fix For: 1.3.0
>
>
> Execution graph doesn't manage the failover of executions. It only care for 
> the state of the whole job, which is CREATED, RUNNING, FAILED, FINISHED, or 
> SUSPEND. 
> For execution failure, it will notice the FailoverCoordinator to do failover.
> It only record the finished job vertex and changes to FINISHED after all 
> vertexes finished.
> It will change to final fail if restart strategy fail or meet unrecoverable 
> exceptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] flink issue #3772: [FLINK-5869] [flip-1] Introduce abstraction for FailoverS...

2017-05-06 Thread StephanEwen
Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/3772
  
Manually merged in 8ed85fe49b7595546a8f968e0faa1fa7d4da47ec


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3772: [FLINK-5869] [flip-1] Introduce abstraction for Fa...

2017-05-06 Thread StephanEwen
Github user StephanEwen closed the pull request at:

https://github.com/apache/flink/pull/3772


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4545) Flink automatically manages TM network buffer

2017-05-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999430#comment-15999430
 ] 

ASF GitHub Bot commented on FLINK-4545:
---

Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/3721
  
Merging this.
I filed a follow-up JIRA to address the "configuration with units" to make 
sure all memory-related parameters behave the same way, without loss of byte 
precision where needed: https://issues.apache.org/jira/browse/FLINK-6469


> Flink automatically manages TM network buffer
> -
>
> Key: FLINK-4545
> URL: https://issues.apache.org/jira/browse/FLINK-4545
> Project: Flink
>  Issue Type: Wish
>  Components: Network
>Reporter: Zhenzhong Xu
>Assignee: Nico Kruber
>Priority: Critical
> Fix For: 1.3.0
>
>
> Currently, the number of network buffer per task manager is preconfigured and 
> the memory is pre-allocated through taskmanager.network.numberOfBuffers 
> config. In a Job DAG with shuffle phase, this number can go up very high 
> depends on the TM cluster size. The formula for calculating the buffer count 
> is documented here 
> (https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#configuring-the-network-buffers).
>   
> #slots-per-TM^2 * #TMs * 4
> In a standalone deployment, we may need to control the task manager cluster 
> size dynamically and then leverage the up-coming Flink feature to support 
> scaling job parallelism/rescaling at runtime. 
> If the buffer count config is static at runtime and cannot be changed without 
> restarting task manager process, this may add latency and complexity for 
> scaling process. I am wondering if there is already any discussion around 
> whether the network buffer should be automatically managed by Flink or at 
> least expose some API to allow it to be reconfigured. Let me know if there is 
> any existing JIRA that I should follow.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] flink issue #3721: [FLINK-4545] replace the network buffers parameter

2017-05-06 Thread StephanEwen
Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/3721
  
Merging this.
I filed a follow-up JIRA to address the "configuration with units" to make 
sure all memory-related parameters behave the same way, without loss of byte 
precision where needed: https://issues.apache.org/jira/browse/FLINK-6469


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-6469) Configure Memory Sizes with units

2017-05-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen updated FLINK-6469:

Summary: Configure Memory Sizes with units  (was: Configure Memory Sizes 
via units)

> Configure Memory Sizes with units
> -
>
> Key: FLINK-6469
> URL: https://issues.apache.org/jira/browse/FLINK-6469
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stephan Ewen
>
> Currently, memory sizes are configured by pure numbers, the interpretation is 
> different from configuration parameter to parameter.
> For example, heap sizes are configured in megabytes, network buffer memory is 
> configured in bytes, alignment thresholds are configured in bytes.
> I propose to configure all memory parameters the same way, with units similar 
> to time. The JVM itself configured heap size similarly: {{Xmx5g}} or 
> {{Xmx2000m}}.
> {code}
> 1  -> bytes
> 10 kb
> 64 mb
> 1 gb
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6470) Add a utility to parse memory sizes with units

2017-05-06 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-6470:
---

 Summary: Add a utility to parse memory sizes with units
 Key: FLINK-6470
 URL: https://issues.apache.org/jira/browse/FLINK-6470
 Project: Flink
  Issue Type: Sub-task
  Components: Core
Reporter: Stephan Ewen
Assignee: Stephan Ewen
Priority: Minor
 Fix For: 1.3.0






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6469) Configure Memory Sizes via units

2017-05-06 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-6469:
---

 Summary: Configure Memory Sizes via units
 Key: FLINK-6469
 URL: https://issues.apache.org/jira/browse/FLINK-6469
 Project: Flink
  Issue Type: Improvement
  Components: Core
Reporter: Stephan Ewen


Currently, memory sizes are configured by pure numbers, the interpretation is 
different from configuration parameter to parameter.

For example, heap sizes are configured in megabytes, network buffer memory is 
configured in bytes, alignment thresholds are configured in bytes.

I propose to configure all memory parameters the same way, with units similar 
to time. The JVM itself configured heap size similarly: {{Xmx5g}} or 
{{Xmx2000m}}.

{code}
1  -> bytes
10 kb
64 mb
1 gb
...
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6020) Blob Server cannot handle multiple job submits (with same content) parallelly

2017-05-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999405#comment-15999405
 ] 

ASF GitHub Bot commented on FLINK-6020:
---

Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/3525
  
@WangTaoTheTonic I have debugged a bit further in this issue, and it seems 
there is a bit more to do.
For non-HA blob servers, the atomic rename fix would do it.

For HA cases, we need to do a bit more. A recent change was that the blob 
cache will try and fetch blobs directly from the blob store, which may cause 
pre-mature reads before the blob has been fully written. Because the storage 
systems we target for HA do not all support atomic renames (S3 does not), we 
need to use the `_SUCCESS` file trick to mark completed blobs.

I chatted with @tillrohrmann about that, he agreed to take a look at fixing 
these and will make an effort to get this into the 1.3 release. Hope that this 
will work for you.


> Blob Server cannot handle multiple job submits (with same content) parallelly
> -
>
> Key: FLINK-6020
> URL: https://issues.apache.org/jira/browse/FLINK-6020
> Project: Flink
>  Issue Type: Sub-task
>  Components: Distributed Coordination
>Reporter: Tao Wang
>Assignee: Tao Wang
>Priority: Critical
>
> In yarn-cluster mode, if we submit one same job multiple times parallelly, 
> the task will encounter class load problem and lease occuputation.
> Because blob server stores user jars in name with generated sha1sum of those, 
> first writes a temp file and move it to finalialize. For recovery it also 
> will put them to HDFS with same file name.
> In same time, when multiple clients sumit same job with same jar, the local 
> jar files in blob server and those file on hdfs will be handled in multiple 
> threads(BlobServerConnection), and impact each other.
> It's better to have a way to handle this, now two ideas comes up to my head:
> 1. lock the write operation, or
> 2. use some unique identifier as file name instead of ( or added up to) 
> sha1sum of the file contents.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] flink issue #3525: [FLINK-6020]add a random integer suffix to blob key to av...

2017-05-06 Thread StephanEwen
Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/3525
  
@WangTaoTheTonic I have debugged a bit further in this issue, and it seems 
there is a bit more to do.
For non-HA blob servers, the atomic rename fix would do it.

For HA cases, we need to do a bit more. A recent change was that the blob 
cache will try and fetch blobs directly from the blob store, which may cause 
pre-mature reads before the blob has been fully written. Because the storage 
systems we target for HA do not all support atomic renames (S3 does not), we 
need to use the `_SUCCESS` file trick to mark completed blobs.

I chatted with @tillrohrmann about that, he agreed to take a look at fixing 
these and will make an effort to get this into the 1.3 release. Hope that this 
will work for you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-6020) Blob Server cannot handle multiple job submits (with same content) parallelly

2017-05-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999404#comment-15999404
 ] 

ASF GitHub Bot commented on FLINK-6020:
---

Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/3525
  
Yes, @netguy204 - that is definitely one possible way for class loaders to 
leak over...


> Blob Server cannot handle multiple job submits (with same content) parallelly
> -
>
> Key: FLINK-6020
> URL: https://issues.apache.org/jira/browse/FLINK-6020
> Project: Flink
>  Issue Type: Sub-task
>  Components: Distributed Coordination
>Reporter: Tao Wang
>Assignee: Tao Wang
>Priority: Critical
>
> In yarn-cluster mode, if we submit one same job multiple times parallelly, 
> the task will encounter class load problem and lease occuputation.
> Because blob server stores user jars in name with generated sha1sum of those, 
> first writes a temp file and move it to finalialize. For recovery it also 
> will put them to HDFS with same file name.
> In same time, when multiple clients sumit same job with same jar, the local 
> jar files in blob server and those file on hdfs will be handled in multiple 
> threads(BlobServerConnection), and impact each other.
> It's better to have a way to handle this, now two ideas comes up to my head:
> 1. lock the write operation, or
> 2. use some unique identifier as file name instead of ( or added up to) 
> sha1sum of the file contents.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] flink issue #3525: [FLINK-6020]add a random integer suffix to blob key to av...

2017-05-06 Thread StephanEwen
Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/3525
  
Yes, @netguy204 - that is definitely one possible way for class loaders to 
leak over...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-6468) release 1.2.1 tag in git

2017-05-06 Thread Petr Novotnik (JIRA)
Petr Novotnik created FLINK-6468:


 Summary: release 1.2.1 tag in git
 Key: FLINK-6468
 URL: https://issues.apache.org/jira/browse/FLINK-6468
 Project: Flink
  Issue Type: Bug
  Components: Build System
Affects Versions: 1.2.1
Reporter: Petr Novotnik
Priority: Minor


It appears that the `release-1.2.1` tag in missing in the git repository. It 
would be great to have it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6221) Add Promethus support to metrics

2017-05-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999386#comment-15999386
 ] 

ASF GitHub Bot commented on FLINK-6221:
---

GitHub user mbode opened a pull request:

https://github.com/apache/flink/pull/3833

[FLINK-6221] Add PrometheusReporter



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mbode/flink PrometheusReporter

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3833.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3833


commit 9c1889abcd5591d89dde3d5b032b6c54d4d518ba
Author: Maximilian Bode 
Date:   2017-05-06T00:49:42Z

[FLINK-6221] Add PrometheusReporter




> Add Promethus support to metrics
> 
>
> Key: FLINK-6221
> URL: https://issues.apache.org/jira/browse/FLINK-6221
> Project: Flink
>  Issue Type: Improvement
>  Components: Metrics
>Affects Versions: 1.2.0
>Reporter: Joshua Griffith
>Assignee: Maximilian Bode
>Priority: Minor
>
> [Prometheus|https://prometheus.io/] is becoming popular for metrics and 
> alerting. It's possible to use 
> [statsd-exporter|https://github.com/prometheus/statsd_exporter] to load Flink 
> metrics into Prometheus but it would be far easier if Flink supported 
> Promethus as a metrics reporter. A [dropwizard 
> client|https://github.com/prometheus/client_java/tree/master/simpleclient_dropwizard]
>  exists that could be integrated into the existing metrics system.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] flink pull request #3833: [FLINK-6221] Add PrometheusReporter

2017-05-06 Thread mbode
GitHub user mbode opened a pull request:

https://github.com/apache/flink/pull/3833

[FLINK-6221] Add PrometheusReporter



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mbode/flink PrometheusReporter

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3833.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3833


commit 9c1889abcd5591d89dde3d5b032b6c54d4d518ba
Author: Maximilian Bode 
Date:   2017-05-06T00:49:42Z

[FLINK-6221] Add PrometheusReporter




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---