[jira] [Commented] (FLINK-9894) Potential Data Race

2018-07-19 Thread JIRA


[ 
https://issues.apache.org/jira/browse/FLINK-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16550249#comment-16550249
 ] 

陈梓立 commented on FLINK-9894:


FLINK-9875 would definitely raise data race on current data base. the code snip 
associated to this issue is not thread-safe and wired.

FLINK-9875 would introduce a performance improvement and this synchronize 
burden is not that heavy. I would give a benchmark, but it might be pending for 
some times.

> Potential Data Race
> ---
>
> Key: FLINK-9894
> URL: https://issues.apache.org/jira/browse/FLINK-9894
> Project: Flink
>  Issue Type: Improvement
>  Components: Local Runtime
>Affects Versions: 1.5.1
>Reporter: 陈梓立
>Assignee: 陈梓立
>Priority: Major
>  Labels: pull-request-available
>
> CoLocationGroup#ensureConstraints



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9894) Potential Data Race

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16550246#comment-16550246
 ] 

ASF GitHub Bot commented on FLINK-9894:
---

Github user tison1 commented on the issue:

https://github.com/apache/flink/pull/6370
  
@zentol ok

... close as suggested, would be resolved in #6353 


> Potential Data Race
> ---
>
> Key: FLINK-9894
> URL: https://issues.apache.org/jira/browse/FLINK-9894
> Project: Flink
>  Issue Type: Improvement
>  Components: Local Runtime
>Affects Versions: 1.5.1
>Reporter: 陈梓立
>Assignee: 陈梓立
>Priority: Major
>  Labels: pull-request-available
>
> CoLocationGroup#ensureConstraints



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9894) Potential Data Race

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16550247#comment-16550247
 ] 

ASF GitHub Bot commented on FLINK-9894:
---

Github user tison1 closed the pull request at:

https://github.com/apache/flink/pull/6370


> Potential Data Race
> ---
>
> Key: FLINK-9894
> URL: https://issues.apache.org/jira/browse/FLINK-9894
> Project: Flink
>  Issue Type: Improvement
>  Components: Local Runtime
>Affects Versions: 1.5.1
>Reporter: 陈梓立
>Assignee: 陈梓立
>Priority: Major
>  Labels: pull-request-available
>
> CoLocationGroup#ensureConstraints



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink pull request #6370: [FLINK-9894] [runtime] Potential Data Race

2018-07-19 Thread tison1
Github user tison1 closed the pull request at:

https://github.com/apache/flink/pull/6370


---


[GitHub] flink issue #6370: [FLINK-9894] [runtime] Potential Data Race

2018-07-19 Thread tison1
Github user tison1 commented on the issue:

https://github.com/apache/flink/pull/6370
  
@zentol ok

... close as suggested, would be resolved in #6353 


---


[jira] [Commented] (FLINK-9894) Potential Data Race

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16550231#comment-16550231
 ] 

ASF GitHub Bot commented on FLINK-9894:
---

Github user zentol commented on the issue:

https://github.com/apache/flink/pull/6370
  
If another PR introduces race conditions, then these race conditions should 
be resolved in that very PR.


> Potential Data Race
> ---
>
> Key: FLINK-9894
> URL: https://issues.apache.org/jira/browse/FLINK-9894
> Project: Flink
>  Issue Type: Improvement
>  Components: Local Runtime
>Affects Versions: 1.5.1
>Reporter: 陈梓立
>Assignee: 陈梓立
>Priority: Major
>  Labels: pull-request-available
>
> CoLocationGroup#ensureConstraints



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #6370: [FLINK-9894] [runtime] Potential Data Race

2018-07-19 Thread zentol
Github user zentol commented on the issue:

https://github.com/apache/flink/pull/6370
  
If another PR introduces race conditions, then these race conditions should 
be resolved in that very PR.


---


[jira] [Commented] (FLINK-8058) Queryable state should check types

2018-07-19 Thread Chesnay Schepler (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16550229#comment-16550229
 ] 

Chesnay Schepler commented on FLINK-8058:
-

I don't know much about how queryable state works, this is just something i 
noticed while using it.
Maybe [~kkl0u] can  help you.

> Queryable state should check types
> --
>
> Key: FLINK-8058
> URL: https://issues.apache.org/jira/browse/FLINK-8058
> Project: Flink
>  Issue Type: Improvement
>  Components: Queryable State
>Affects Versions: 1.4.0
>Reporter: Chesnay Schepler
>Priority: Major
>
> The queryable state currently doesn't do any type checks on the client or 
> server and generally relies on serializers to catch errors.
> Neither the type of state is checked (ValueState, ListState etc.) nor the 
> type of contained values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8058) Queryable state should check types

2018-07-19 Thread Congxian Qiu (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16550174#comment-16550174
 ] 

Congxian Qiu commented on FLINK-8058:
-

Hi, [~Zentol] could I check the state type and type of contained values in 
JobMaster or in KvStateServerHandler? I perfer adding the statedescriptor in 
Jobmaster and check all the things when looking up state location, what about 
your opinion? 

Looking forward to your reply.

> Queryable state should check types
> --
>
> Key: FLINK-8058
> URL: https://issues.apache.org/jira/browse/FLINK-8058
> Project: Flink
>  Issue Type: Improvement
>  Components: Queryable State
>Affects Versions: 1.4.0
>Reporter: Chesnay Schepler
>Priority: Major
>
> The queryable state currently doesn't do any type checks on the client or 
> server and generally relies on serializers to catch errors.
> Neither the type of state is checked (ValueState, ListState etc.) nor the 
> type of contained values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9849) Upgrade hbase version to 2.0.1 for hbase connector

2018-07-19 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16550141#comment-16550141
 ] 

Ted Yu commented on FLINK-9849:
---

I generated the dependency tree where I don't see SNAPSHOT .
Here is some occurrence of glassfish dependency:
{code}
[INFO] +- org.glassfish:javax.el:jar:3.0.1-b08:compile
[INFO] |  |  \- org.glassfish:javax.el:jar:3.0.1-b08:compile
[INFO] |  |  \- org.glassfish:javax.el:jar:3.0.1-b08:compile
[INFO] |  |  \- org.glassfish:javax.el:jar:3.0.1-b08:compile
{code}

> Upgrade hbase version to 2.0.1 for hbase connector
> --
>
> Key: FLINK-9849
> URL: https://issues.apache.org/jira/browse/FLINK-9849
> Project: Flink
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: zhangminglei
>Priority: Major
>  Labels: pull-request-available
> Attachments: hbase-2.1.0.dep
>
>
> Currently hbase 1.4.3 is used for hbase connector.
> We should upgrade to 2.0.1 which was recently released.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9849) Upgrade hbase version to 2.0.1 for hbase connector

2018-07-19 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-9849:
--
Attachment: hbase-2.1.0.dep

> Upgrade hbase version to 2.0.1 for hbase connector
> --
>
> Key: FLINK-9849
> URL: https://issues.apache.org/jira/browse/FLINK-9849
> Project: Flink
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: zhangminglei
>Priority: Major
>  Labels: pull-request-available
> Attachments: hbase-2.1.0.dep
>
>
> Currently hbase 1.4.3 is used for hbase connector.
> We should upgrade to 2.0.1 which was recently released.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9901) Refactor InputStreamReader to Channels.newReader

2018-07-19 Thread zhangminglei (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangminglei updated FLINK-9901:

Description: From this benchmark report 
https://arnaudroger.github.io/blog/2017/03/20/faster-reader-inpustream-in-java.html.
 We can get a better performance boost by using {{Channels.newReader}}.  (was: 
From this benchmark report 
https://arnaudroger.github.io/blog/2017/03/20/faster-reader-inpustream-in-java.html.
 We can get a better performance booth by using {{Channels.newReader}}.)

> Refactor InputStreamReader to Channels.newReader
> 
>
> Key: FLINK-9901
> URL: https://issues.apache.org/jira/browse/FLINK-9901
> Project: Flink
>  Issue Type: Sub-task
>Reporter: zhangminglei
>Priority: Major
>
> From this benchmark report 
> https://arnaudroger.github.io/blog/2017/03/20/faster-reader-inpustream-in-java.html.
>  We can get a better performance boost by using {{Channels.newReader}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9849) Upgrade hbase version to 2.0.1 for hbase connector

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16550139#comment-16550139
 ] 

ASF GitHub Bot commented on FLINK-9849:
---

Github user zhangminglei commented on the issue:

https://github.com/apache/flink/pull/6365
  
Thanks @yanghua for pointing this out!


> Upgrade hbase version to 2.0.1 for hbase connector
> --
>
> Key: FLINK-9849
> URL: https://issues.apache.org/jira/browse/FLINK-9849
> Project: Flink
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: zhangminglei
>Priority: Major
>  Labels: pull-request-available
>
> Currently hbase 1.4.3 is used for hbase connector.
> We should upgrade to 2.0.1 which was recently released.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #6365: [FLINK-9849] [hbase] Hbase upgrade to 2.0.1

2018-07-19 Thread zhangminglei
Github user zhangminglei commented on the issue:

https://github.com/apache/flink/pull/6365
  
Thanks @yanghua for pointing this out!


---


[jira] [Assigned] (FLINK-9899) Add more metrics to the Kinesis source connector

2018-07-19 Thread vinoyang (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reassigned FLINK-9899:
---

Assignee: vinoyang

> Add more metrics to the Kinesis source connector
> 
>
> Key: FLINK-9899
> URL: https://issues.apache.org/jira/browse/FLINK-9899
> Project: Flink
>  Issue Type: Improvement
>  Components: Kinesis Connector
>Reporter: Lakshmi Rao
>Assignee: vinoyang
>Priority: Major
>
> Currently there are sparse metrics available for the Kinesis Connector. Using 
> the 
> [ShardMetricsReporter|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/metrics/ShardMetricsReporter.java]
>  add more stats. For example:
> - sleepTimeMillis 
> - maxNumberOfRecordsPerFetch
> - numberOfAggregatedRecordsPerFetch
> - numberOfDeaggregatedRecordsPerFetch
> - bytesPerFetch
> - averageRecordSizeBytes
> - runLoopTimeNanos
> - loopFrequencyHz



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9901) Refactor InputStreamReader to Channels.newReader

2018-07-19 Thread zhangminglei (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangminglei updated FLINK-9901:

Description: From this benchmark report 
https://arnaudroger.github.io/blog/2017/03/20/faster-reader-inpustream-in-java.html.
 We can get a better performance booth by using {{Channels.newReader}}.

> Refactor InputStreamReader to Channels.newReader
> 
>
> Key: FLINK-9901
> URL: https://issues.apache.org/jira/browse/FLINK-9901
> Project: Flink
>  Issue Type: Sub-task
>Reporter: zhangminglei
>Priority: Major
>
> From this benchmark report 
> https://arnaudroger.github.io/blog/2017/03/20/faster-reader-inpustream-in-java.html.
>  We can get a better performance booth by using {{Channels.newReader}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9901) Refactor InputStreamReader to Channels.newReader

2018-07-19 Thread zhangminglei (JIRA)
zhangminglei created FLINK-9901:
---

 Summary: Refactor InputStreamReader to Channels.newReader
 Key: FLINK-9901
 URL: https://issues.apache.org/jira/browse/FLINK-9901
 Project: Flink
  Issue Type: Sub-task
Reporter: zhangminglei






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (FLINK-9896) Fix flink documentation error

2018-07-19 Thread zhangminglei (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangminglei reassigned FLINK-9896:
---

Assignee: zhangminglei

> Fix flink documentation error
> -
>
> Key: FLINK-9896
> URL: https://issues.apache.org/jira/browse/FLINK-9896
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Hequn Cheng
>Assignee: zhangminglei
>Priority: Critical
> Attachments: image-2018-07-19-23-19-32-259.png
>
>
> Flink version of master has been upgraded to 1.7 snapshot, but documentation 
> still point to 1.6
>  !image-2018-07-19-23-19-32-259.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2018-07-19 Thread zhangminglei (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangminglei updated FLINK-9900:

Description: 
https://api.travis-ci.org/v3/job/405843617/log.txt

Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec <<< 
FAILURE! - in 
org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase
 
testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)
 Time elapsed: 120.036 sec <<< ERROR!
 org.junit.runners.model.TestTimedOutException: test timed out after 12 
milliseconds
 at sun.misc.Unsafe.park(Native Method)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 at 
java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
 at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
 at 
java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
 at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
 at 
org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244)

Results :

Tests in error: 
 ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244 
» TestTimedOut

Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29

  was:
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec <<< 
FAILURE! - in 
org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase
testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)
  Time elapsed: 120.036 sec  <<< ERROR!
org.junit.runners.model.TestTimedOutException: test timed out after 12 
milliseconds
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
at 
java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
at 
java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
at 
org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244)


Results :

Tests in error: 
  
ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244 
» TestTimedOut

Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29



> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>Reporter: zhangminglei
>Priority: Major
>
> https://api.travis-ci.org/v3/job/405843617/log.txt
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec 
> <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase
>  
> testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)
>  Time elapsed: 120.036 sec <<< ERROR!
>  org.junit.runners.model.TestTimedOutException: test timed out after 12 
> milliseconds
>  at sun.misc.Unsafe.park(Native Method)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>  at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244)
> Results :
> Tests in error: 
>  
> ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244
>  » TestTimedOut
> Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2018-07-19 Thread zhangminglei (JIRA)
zhangminglei created FLINK-9900:
---

 Summary: Failed to testRestoreBehaviourWithFaultyStateHandles 
(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
 Key: FLINK-9900
 URL: https://issues.apache.org/jira/browse/FLINK-9900
 Project: Flink
  Issue Type: Bug
Reporter: zhangminglei


Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec <<< 
FAILURE! - in 
org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase
testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)
  Time elapsed: 120.036 sec  <<< ERROR!
org.junit.runners.model.TestTimedOutException: test timed out after 12 
milliseconds
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
at 
java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
at 
java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
at 
org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244)


Results :

Tests in error: 
  
ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244 
» TestTimedOut

Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9899) Add more metrics to the Kinesis source connector

2018-07-19 Thread Lakshmi Rao (JIRA)
Lakshmi Rao created FLINK-9899:
--

 Summary: Add more metrics to the Kinesis source connector
 Key: FLINK-9899
 URL: https://issues.apache.org/jira/browse/FLINK-9899
 Project: Flink
  Issue Type: Improvement
  Components: Kinesis Connector
Reporter: Lakshmi Rao


Currently there are sparse metrics available for the Kinesis Connector. Using 
the 
[ShardMetricsReporter|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/metrics/ShardMetricsReporter.java]
 add more stats. For example:

- sleepTimeMillis 
- maxNumberOfRecordsPerFetch
- numberOfAggregatedRecordsPerFetch
- numberOfDeaggregatedRecordsPerFetch
- bytesPerFetch
- averageRecordSizeBytes
- runLoopTimeNanos
- loopFrequencyHz



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9898) Prometheus metrics reporter doesn't respect `metrics.scope`

2018-07-19 Thread Prithvi Raj (JIRA)
Prithvi Raj created FLINK-9898:
--

 Summary: Prometheus metrics reporter doesn't respect 
`metrics.scope`
 Key: FLINK-9898
 URL: https://issues.apache.org/jira/browse/FLINK-9898
 Project: Flink
  Issue Type: Bug
  Components: Metrics
Affects Versions: 1.5.1, 1.4.2, 1.4.1, 1.5.0, 1.4.0
Reporter: Prithvi Raj


The Apache Flink 
[documentation|https://ci.apache.org/projects/flink/flink-docs-release-1.5/monitoring/metrics.html#system-scope]
 details that users may change the default scope of metrics emitted by using a 
scope format. 

Changing the scope format allows end users to store metrics with lower 
cardinality while introducing the drawback of being unable to differentiate 
between metrics from different tasks/operators/etc sharing the same name. 

With the Prometheus reporter, regardless of the scope format used, every 
variable is always emitted. 

Would it be reasonable for the Prometheus reporter to respect the scope format 
and only emit dimensions that are in scope?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9679) Implement AvroSerializationSchema

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549851#comment-16549851
 ] 

ASF GitHub Bot commented on FLINK-9679:
---

Github user medcv commented on the issue:

https://github.com/apache/flink/pull/6259
  
@tillrohrmann Please review 


> Implement AvroSerializationSchema
> -
>
> Key: FLINK-9679
> URL: https://issues.apache.org/jira/browse/FLINK-9679
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: 1.6.0
>Reporter: Yazdan Shirvany
>Assignee: Yazdan Shirvany
>Priority: Major
>  Labels: pull-request-available
>
> Implement AvroSerializationSchema using Confluent Schema Registry



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #6259: [FLINK-9679] Implement AvroSerializationSchema

2018-07-19 Thread medcv
Github user medcv commented on the issue:

https://github.com/apache/flink/pull/6259
  
@tillrohrmann Please review 


---


[jira] [Commented] (FLINK-9679) Implement AvroSerializationSchema

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549847#comment-16549847
 ] 

ASF GitHub Bot commented on FLINK-9679:
---

Github user medcv commented on the issue:

https://github.com/apache/flink/pull/6259
  
@dawidwys Thanks for your suggestions.
New commit extend `SchemaCoder` with `writeSchema` method that helps to 
move the writing schema logic away from `AvroSerializationSchema` as you 
suggested.  

Totally agree with you that having dynamic `subject` variables make the 
implementations more generic but as this is `Confluent` specific 
implementations and this variable is only presented for 
`ConfluentRegistryAvroSerializationSchema`, I think a user of this method 
should be aware of how `Confluent` requires this variable when they setup their 
Kafka Producer and Schema Registry.  

I am open to suggestions to fix the issue ( by changing 
`FlinkKafkaProducer`) if you still thinking this is a blocker for this PR.


> Implement AvroSerializationSchema
> -
>
> Key: FLINK-9679
> URL: https://issues.apache.org/jira/browse/FLINK-9679
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: 1.6.0
>Reporter: Yazdan Shirvany
>Assignee: Yazdan Shirvany
>Priority: Major
>  Labels: pull-request-available
>
> Implement AvroSerializationSchema using Confluent Schema Registry



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #6259: [FLINK-9679] Implement AvroSerializationSchema

2018-07-19 Thread medcv
Github user medcv commented on the issue:

https://github.com/apache/flink/pull/6259
  
@dawidwys Thanks for your suggestions.
New commit extend `SchemaCoder` with `writeSchema` method that helps to 
move the writing schema logic away from `AvroSerializationSchema` as you 
suggested.  

Totally agree with you that having dynamic `subject` variables make the 
implementations more generic but as this is `Confluent` specific 
implementations and this variable is only presented for 
`ConfluentRegistryAvroSerializationSchema`, I think a user of this method 
should be aware of how `Confluent` requires this variable when they setup their 
Kafka Producer and Schema Registry.  

I am open to suggestions to fix the issue ( by changing 
`FlinkKafkaProducer`) if you still thinking this is a blocker for this PR.


---


[jira] [Commented] (FLINK-9641) Pulsar Source Connector

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549828#comment-16549828
 ] 

ASF GitHub Bot commented on FLINK-9641:
---

Github user cckellogg commented on a diff in the pull request:

https://github.com/apache/flink/pull/6200#discussion_r203863637
  
--- Diff: 
flink-connectors/flink-connector-pulsar/src/main/java/org/apache/flink/streaming/connectors/pulsar/PulsarConsumerSource.java
 ---
@@ -0,0 +1,203 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.connectors.pulsar;
+
+import org.apache.flink.api.common.functions.RuntimeContext;
+import org.apache.flink.api.common.serialization.DeserializationSchema;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import 
org.apache.flink.streaming.api.functions.source.MessageAcknowledgingSourceBase;
+import org.apache.flink.streaming.api.operators.StreamingRuntimeContext;
+import org.apache.flink.util.IOUtils;
+
+import org.apache.pulsar.client.api.Consumer;
+import org.apache.pulsar.client.api.Message;
+import org.apache.pulsar.client.api.MessageId;
+import org.apache.pulsar.client.api.PulsarClient;
+import org.apache.pulsar.client.api.PulsarClientException;
+import org.apache.pulsar.client.api.SubscriptionType;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.TimeUnit;
+
+/**
+ * Pulsar source (consumer) which receives messages from a topic and 
acknowledges messages.
+ * When checkpointing is enabled, it guarantees at least once processing 
semantics.
+ *
+ * When checkpointing is disabled, it auto acknowledges messages based 
on the number of messages it has
+ * received. In this mode messages may be dropped.
+ */
+class PulsarConsumerSource extends MessageAcknowledgingSourceBase implements PulsarSourceBase {
+
+   private static final Logger LOG = 
LoggerFactory.getLogger(PulsarConsumerSource.class);
+
+   private final int messageReceiveTimeoutMs = 100;
+   private final String serviceUrl;
+   private final String topic;
+   private final String subscriptionName;
+   private final DeserializationSchema deserializer;
+
+   private PulsarClient client;
+   private Consumer consumer;
+
+   private boolean isCheckpointingEnabled;
+
+   private final long acknowledgementBatchSize;
+   private long batchCount;
+   private long totalMessageCount;
+
+   private transient volatile boolean isRunning;
+
+   PulsarConsumerSource(PulsarSourceBuilder builder) {
+   super(MessageId.class);
+   this.serviceUrl = builder.serviceUrl;
+   this.topic = builder.topic;
+   this.deserializer = builder.deserializationSchema;
+   this.subscriptionName = builder.subscriptionName;
+   this.acknowledgementBatchSize = 
builder.acknowledgementBatchSize;
+   }
+
+   @Override
+   public void open(Configuration parameters) throws Exception {
+   super.open(parameters);
+
+   final RuntimeContext context = getRuntimeContext();
+   if (context instanceof StreamingRuntimeContext) {
+   isCheckpointingEnabled = ((StreamingRuntimeContext) 
context).isCheckpointingEnabled();
+   }
+
+   client = createClient();
+   consumer = createConsumer(client);
+
+   isRunning = true;
+   }
+
+   @Override
+   protected void acknowledgeIDs(long checkpointId, Set 
messageIds) {
+   if (consumer == null) {
+   LOG.error("null consumer unable to acknowledge 
messages");
+   throw new 

[GitHub] flink pull request #6200: [FLINK-9641] [streaming-connectors] Flink pulsar s...

2018-07-19 Thread cckellogg
Github user cckellogg commented on a diff in the pull request:

https://github.com/apache/flink/pull/6200#discussion_r203863637
  
--- Diff: 
flink-connectors/flink-connector-pulsar/src/main/java/org/apache/flink/streaming/connectors/pulsar/PulsarConsumerSource.java
 ---
@@ -0,0 +1,203 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.connectors.pulsar;
+
+import org.apache.flink.api.common.functions.RuntimeContext;
+import org.apache.flink.api.common.serialization.DeserializationSchema;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import 
org.apache.flink.streaming.api.functions.source.MessageAcknowledgingSourceBase;
+import org.apache.flink.streaming.api.operators.StreamingRuntimeContext;
+import org.apache.flink.util.IOUtils;
+
+import org.apache.pulsar.client.api.Consumer;
+import org.apache.pulsar.client.api.Message;
+import org.apache.pulsar.client.api.MessageId;
+import org.apache.pulsar.client.api.PulsarClient;
+import org.apache.pulsar.client.api.PulsarClientException;
+import org.apache.pulsar.client.api.SubscriptionType;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.TimeUnit;
+
+/**
+ * Pulsar source (consumer) which receives messages from a topic and 
acknowledges messages.
+ * When checkpointing is enabled, it guarantees at least once processing 
semantics.
+ *
+ * When checkpointing is disabled, it auto acknowledges messages based 
on the number of messages it has
+ * received. In this mode messages may be dropped.
+ */
+class PulsarConsumerSource extends MessageAcknowledgingSourceBase implements PulsarSourceBase {
+
+   private static final Logger LOG = 
LoggerFactory.getLogger(PulsarConsumerSource.class);
+
+   private final int messageReceiveTimeoutMs = 100;
+   private final String serviceUrl;
+   private final String topic;
+   private final String subscriptionName;
+   private final DeserializationSchema deserializer;
+
+   private PulsarClient client;
+   private Consumer consumer;
+
+   private boolean isCheckpointingEnabled;
+
+   private final long acknowledgementBatchSize;
+   private long batchCount;
+   private long totalMessageCount;
+
+   private transient volatile boolean isRunning;
+
+   PulsarConsumerSource(PulsarSourceBuilder builder) {
+   super(MessageId.class);
+   this.serviceUrl = builder.serviceUrl;
+   this.topic = builder.topic;
+   this.deserializer = builder.deserializationSchema;
+   this.subscriptionName = builder.subscriptionName;
+   this.acknowledgementBatchSize = 
builder.acknowledgementBatchSize;
+   }
+
+   @Override
+   public void open(Configuration parameters) throws Exception {
+   super.open(parameters);
+
+   final RuntimeContext context = getRuntimeContext();
+   if (context instanceof StreamingRuntimeContext) {
+   isCheckpointingEnabled = ((StreamingRuntimeContext) 
context).isCheckpointingEnabled();
+   }
+
+   client = createClient();
+   consumer = createConsumer(client);
+
+   isRunning = true;
+   }
+
+   @Override
+   protected void acknowledgeIDs(long checkpointId, Set 
messageIds) {
+   if (consumer == null) {
+   LOG.error("null consumer unable to acknowledge 
messages");
+   throw new RuntimeException("null pulsar consumer unable 
to acknowledge messages");
+   }
+
+   if (messageIds.isEmpty()) {
+   LOG.info("no message ids to acknowledge");
+   return;
  

[jira] [Commented] (FLINK-9732) Report more detailed error message on JobSubmissionFailure

2018-07-19 Thread Chesnay Schepler (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549805#comment-16549805
 ] 

Chesnay Schepler commented on FLINK-9732:
-

please read the discussion in the linked PR as to why that is not an option.

> Report more detailed error message on JobSubmissionFailure
> --
>
> Key: FLINK-9732
> URL: https://issues.apache.org/jira/browse/FLINK-9732
> Project: Flink
>  Issue Type: Improvement
>  Components: REST
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Chesnay Schepler
>Assignee: vinoyang
>Priority: Major
>
> Currently, if the job submission through the {{JobSubmitHandler}} fails the 
> error message returned tot he client only says "Job submission failed.".
> As outlined in the discussion in this 
> [PR|https://github.com/apache/flink/pull/6222] we should try to include more 
> information about the actual failure cause.
> The proposed solution is to encode the cause for the failure in the 
> {{Acknowledge}} that is returned by {{DispatcherGateway#submitJob}}.
> {code}
> public class AckOrException {
>   // holds exception, could also be a series of nullable fields
>   private final SuperEither 
> exception; 
>   ...
>   public void throwIfError() throws ExceptionA, ExceptionB, ExceptionC;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (FLINK-9435) Remove per-key selection Tuple instantiation via reflection in ComparableKeySelector and ArrayKeySelector

2018-07-19 Thread Nico Kruber (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber resolved FLINK-9435.

   Resolution: Fixed
Fix Version/s: 1.6.0

Fixed in:
- master: 95eadfe15203ee0ab1459a9ade943234d9d6e7ce
- 1.6: 402745ebad3eaf01622ea85524f7ff029fa8df8b

> Remove per-key selection Tuple instantiation via reflection in 
> ComparableKeySelector and ArrayKeySelector
> -
>
> Key: FLINK-9435
> URL: https://issues.apache.org/jira/browse/FLINK-9435
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.3.0, 1.3.1, 1.3.2, 1.3.3, 1.4.0, 1.5.0, 1.4.1, 1.4.2, 
> 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Inside {{KeySelectorUtil}}, every {{ComparableKeySelector#getKey()}} call 
> currently creates a new tuple from 
> {{Tuple.getTupleClass(keyLength).newInstance();}} which seems expensive. 
> Instead, we could get a template tuple and use {{Tuple#copy()}} which copies 
> the right sub-class in a more optimal way.
> Similarly, {{ArrayKeySelector}} instantiates new {{Tuple}} instances via 
> reflection which can be changed the same way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9897) Further enhance adaptiveReads in Kinesis Connector to read more records in the case of long running loops

2018-07-19 Thread Lakshmi Rao (JIRA)
Lakshmi Rao created FLINK-9897:
--

 Summary: Further enhance adaptiveReads in Kinesis Connector to 
read more records in the case of long running loops
 Key: FLINK-9897
 URL: https://issues.apache.org/jira/browse/FLINK-9897
 Project: Flink
  Issue Type: Improvement
  Components: Kinesis Connector
Reporter: Lakshmi Rao


In FLINK-9692, we introduced the ability for the shardConsumer to adaptively 
read more records based on the current average record size to optimize the 2 
Mb/sec shard limit. The feature maximizes  maxNumberOfRecordsPerFetch of 5 
reads/sec (as prescribed by Kinesis limits). In the case where applications 
take more time to process records in the run loop, they are no longer able to 
read at a frequency of 5 reads/sec (even though their fetchIntervalMillis maybe 
set to 200 ms). In such a scenario, the maxNumberOfRecordsPerFetch should be 
calculated based on the time that the run loop actually takes as opposed to 
fetchIntervalMillis. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (FLINK-9755) Exceptions in RemoteInputChannel#notifyBufferAvailable() are not propagated to the responsible thread

2018-07-19 Thread Nico Kruber (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber resolved FLINK-9755.

Resolution: Fixed

Fixed in
master: 5857f5543a7d9d3082d2f74342758d5a452a3c13
1.6: 0fec75c03bba0fa85a14e3f73baeb01998c83be0
1.5: 8193d5dc68289760ad68cf0b6b237fd86b0fd906

> Exceptions in RemoteInputChannel#notifyBufferAvailable() are not propagated 
> to the responsible thread
> -
>
> Key: FLINK-9755
> URL: https://issues.apache.org/jira/browse/FLINK-9755
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.5.2, 1.6.0
>
>
> The credit-based flow control implementation of 
> RemoteInputChannel#notifyBufferAvailable() does not forward errors (like the 
> {{IllegalStateException}}) to the thread that is being notified. The calling 
> code at {{LocalBufferPool#recycle}}, however, relies on the callback 
> forwarding errors and completely ignores any failures.
> Therefore, we could end up with a program waiting forever for the callback 
> and not even a failure message in the logs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9755) Exceptions in RemoteInputChannel#notifyBufferAvailable() are not propagated to the responsible thread

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549711#comment-16549711
 ] 

ASF GitHub Bot commented on FLINK-9755:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/6272


> Exceptions in RemoteInputChannel#notifyBufferAvailable() are not propagated 
> to the responsible thread
> -
>
> Key: FLINK-9755
> URL: https://issues.apache.org/jira/browse/FLINK-9755
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.5.2, 1.6.0
>
>
> The credit-based flow control implementation of 
> RemoteInputChannel#notifyBufferAvailable() does not forward errors (like the 
> {{IllegalStateException}}) to the thread that is being notified. The calling 
> code at {{LocalBufferPool#recycle}}, however, relies on the callback 
> forwarding errors and completely ignores any failures.
> Therefore, we could end up with a program waiting forever for the callback 
> and not even a failure message in the logs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9435) Remove per-key selection Tuple instantiation via reflection in ComparableKeySelector and ArrayKeySelector

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549712#comment-16549712
 ] 

ASF GitHub Bot commented on FLINK-9435:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/6115


> Remove per-key selection Tuple instantiation via reflection in 
> ComparableKeySelector and ArrayKeySelector
> -
>
> Key: FLINK-9435
> URL: https://issues.apache.org/jira/browse/FLINK-9435
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.3.0, 1.3.1, 1.3.2, 1.3.3, 1.4.0, 1.5.0, 1.4.1, 1.4.2, 
> 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
>
> Inside {{KeySelectorUtil}}, every {{ComparableKeySelector#getKey()}} call 
> currently creates a new tuple from 
> {{Tuple.getTupleClass(keyLength).newInstance();}} which seems expensive. 
> Instead, we could get a template tuple and use {{Tuple#copy()}} which copies 
> the right sub-class in a more optimal way.
> Similarly, {{ArrayKeySelector}} instantiates new {{Tuple}} instances via 
> reflection which can be changed the same way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink pull request #6115: [FLINK-9435][java] Remove per-key selection Tuple ...

2018-07-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/6115


---


[GitHub] flink pull request #6272: [FLINK-9755][network] forward exceptions in Remote...

2018-07-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/6272


---


[jira] [Comment Edited] (FLINK-9732) Report more detailed error message on JobSubmissionFailure

2018-07-19 Thread vinoyang (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549450#comment-16549450
 ] 

vinoyang edited comment on FLINK-9732 at 7/19/18 3:52 PM:
--

Hi [~Zentol] a little question, if the code "Job submission failed." at [this 
line|[https://github.com/apache/flink/blob/0cb7706dad74133652983a132d70ba4ded4aff9b/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/JobSubmitHandler.java#L112]]
 is replaced with : 
{code:java}
exception.getMessage()
{code}
seems can get more information? 

 


was (Author: yanghua):
Hi [~Zentol] a little question, if the code "Job submission failed." at [this 
line|

https://github.com/apache/flink/blob/0cb7706dad74133652983a132d70ba4ded4aff9b/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/JobSubmitHandler.java#L112

] is replaced with : 
{code:java}
exception.getMessage()
{code}
seems can get more information? 

 

> Report more detailed error message on JobSubmissionFailure
> --
>
> Key: FLINK-9732
> URL: https://issues.apache.org/jira/browse/FLINK-9732
> Project: Flink
>  Issue Type: Improvement
>  Components: REST
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Chesnay Schepler
>Assignee: vinoyang
>Priority: Major
>
> Currently, if the job submission through the {{JobSubmitHandler}} fails the 
> error message returned tot he client only says "Job submission failed.".
> As outlined in the discussion in this 
> [PR|https://github.com/apache/flink/pull/6222] we should try to include more 
> information about the actual failure cause.
> The proposed solution is to encode the cause for the failure in the 
> {{Acknowledge}} that is returned by {{DispatcherGateway#submitJob}}.
> {code}
> public class AckOrException {
>   // holds exception, could also be a series of nullable fields
>   private final SuperEither 
> exception; 
>   ...
>   public void throwIfError() throws ExceptionA, ExceptionB, ExceptionC;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9732) Report more detailed error message on JobSubmissionFailure

2018-07-19 Thread vinoyang (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549450#comment-16549450
 ] 

vinoyang commented on FLINK-9732:
-

Hi [~Zentol] a little question, if the code "Job submission failed." at [this 
line|

https://github.com/apache/flink/blob/0cb7706dad74133652983a132d70ba4ded4aff9b/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/JobSubmitHandler.java#L112

] is replaced with : 
{code:java}
exception.getMessage()
{code}
seems can get more information? 

 

> Report more detailed error message on JobSubmissionFailure
> --
>
> Key: FLINK-9732
> URL: https://issues.apache.org/jira/browse/FLINK-9732
> Project: Flink
>  Issue Type: Improvement
>  Components: REST
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Chesnay Schepler
>Assignee: vinoyang
>Priority: Major
>
> Currently, if the job submission through the {{JobSubmitHandler}} fails the 
> error message returned tot he client only says "Job submission failed.".
> As outlined in the discussion in this 
> [PR|https://github.com/apache/flink/pull/6222] we should try to include more 
> information about the actual failure cause.
> The proposed solution is to encode the cause for the failure in the 
> {{Acknowledge}} that is returned by {{DispatcherGateway#submitJob}}.
> {code}
> public class AckOrException {
>   // holds exception, could also be a series of nullable fields
>   private final SuperEither 
> exception; 
>   ...
>   public void throwIfError() throws ExceptionA, ExceptionB, ExceptionC;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9732) Report more detailed error message on JobSubmissionFailure

2018-07-19 Thread vinoyang (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated FLINK-9732:

Summary: Report more detailed error message on JobSubmissionFailure  (was: 
Report more detailed error message on SobSubmissionFailure)

> Report more detailed error message on JobSubmissionFailure
> --
>
> Key: FLINK-9732
> URL: https://issues.apache.org/jira/browse/FLINK-9732
> Project: Flink
>  Issue Type: Improvement
>  Components: REST
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Chesnay Schepler
>Assignee: vinoyang
>Priority: Major
>
> Currently, if the job submission through the {{JobSubmitHandler}} fails the 
> error message returned tot he client only says "Job submission failed.".
> As outlined in the discussion in this 
> [PR|https://github.com/apache/flink/pull/6222] we should try to include more 
> information about the actual failure cause.
> The proposed solution is to encode the cause for the failure in the 
> {{Acknowledge}} that is returned by {{DispatcherGateway#submitJob}}.
> {code}
> public class AckOrException {
>   // holds exception, could also be a series of nullable fields
>   private final SuperEither 
> exception; 
>   ...
>   public void throwIfError() throws ExceptionA, ExceptionB, ExceptionC;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-9814) CsvTableSource "lack of column" warning

2018-07-19 Thread JIRA


[ 
https://issues.apache.org/jira/browse/FLINK-9814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549347#comment-16549347
 ] 

François Lacombe edited comment on FLINK-9814 at 7/19/18 3:19 PM:
--

Hi Fabian,

Ok to add option to check the header.

Regarding the overhead, can't it be done prior to split the file and give it to 
workers?
 I think about reading the first line of the file with a dedicated file system 
access in the source class at its creation maybe.

This would be ok for many sources, since headers often contains columns and 
types descriptors and prevent doing a lot of work with bad formatted files

 

All the best


was (Author: flacombe):
Hi Fabian,

Ok to add option to check the header.

Regarding the overhead, can't it be done prior to split the file and give it 
workers?
I think about reading the first line of the file with a dedicated file system 
access in the source class at the source creation maybe.

This would be ok for many sources, since headers often contains columns and 
types descriptors and prevent doing a lot of work with bad formatted files

 

All the best

> CsvTableSource "lack of column" warning
> ---
>
> Key: FLINK-9814
> URL: https://issues.apache.org/jira/browse/FLINK-9814
> Project: Flink
>  Issue Type: Wish
>  Components: Table API  SQL
>Affects Versions: 1.5.0
>Reporter: François Lacombe
>Assignee: vinoyang
>Priority: Minor
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> The CsvTableSource class is built by defining expected columns to be find in 
> the corresponding csv file.
>  
> It would be great to throw an Exception when the csv file doesn't have the 
> same structure as defined in the source. For retro-compatibility sake, 
> developers should explicitly set the builder to define columns stricly and 
> expect Exception to be thrown in case of structure difference.
> It can be easilly checked with file header if it exists.
> Is this possible ?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9896) Fix flink documentation error

2018-07-19 Thread Hequn Cheng (JIRA)
Hequn Cheng created FLINK-9896:
--

 Summary: Fix flink documentation error
 Key: FLINK-9896
 URL: https://issues.apache.org/jira/browse/FLINK-9896
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Reporter: Hequn Cheng
 Attachments: image-2018-07-19-23-19-32-259.png

Flink version of master has been upgraded to 1.7 snapshot, but documentation 
still point to 1.6
 !image-2018-07-19-23-19-32-259.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9755) Exceptions in RemoteInputChannel#notifyBufferAvailable() are not propagated to the responsible thread

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549407#comment-16549407
 ] 

ASF GitHub Bot commented on FLINK-9755:
---

Github user NicoK commented on the issue:

https://github.com/apache/flink/pull/6272
  
Thanks for the review, merging...


> Exceptions in RemoteInputChannel#notifyBufferAvailable() are not propagated 
> to the responsible thread
> -
>
> Key: FLINK-9755
> URL: https://issues.apache.org/jira/browse/FLINK-9755
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.5.2, 1.6.0
>
>
> The credit-based flow control implementation of 
> RemoteInputChannel#notifyBufferAvailable() does not forward errors (like the 
> {{IllegalStateException}}) to the thread that is being notified. The calling 
> code at {{LocalBufferPool#recycle}}, however, relies on the callback 
> forwarding errors and completely ignores any failures.
> Therefore, we could end up with a program waiting forever for the callback 
> and not even a failure message in the logs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #6272: [FLINK-9755][network] forward exceptions in RemoteInputCh...

2018-07-19 Thread NicoK
Github user NicoK commented on the issue:

https://github.com/apache/flink/pull/6272
  
Thanks for the review, merging...


---


[jira] [Commented] (FLINK-9435) Remove per-key selection Tuple instantiation via reflection in ComparableKeySelector and ArrayKeySelector

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549406#comment-16549406
 ] 

ASF GitHub Bot commented on FLINK-9435:
---

Github user NicoK commented on the issue:

https://github.com/apache/flink/pull/6115
  
Thanks for the review, merging...


> Remove per-key selection Tuple instantiation via reflection in 
> ComparableKeySelector and ArrayKeySelector
> -
>
> Key: FLINK-9435
> URL: https://issues.apache.org/jira/browse/FLINK-9435
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.3.0, 1.3.1, 1.3.2, 1.3.3, 1.4.0, 1.5.0, 1.4.1, 1.4.2, 
> 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
>
> Inside {{KeySelectorUtil}}, every {{ComparableKeySelector#getKey()}} call 
> currently creates a new tuple from 
> {{Tuple.getTupleClass(keyLength).newInstance();}} which seems expensive. 
> Instead, we could get a template tuple and use {{Tuple#copy()}} which copies 
> the right sub-class in a more optimal way.
> Similarly, {{ArrayKeySelector}} instantiates new {{Tuple}} instances via 
> reflection which can be changed the same way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #6115: [FLINK-9435][java] Remove per-key selection Tuple instant...

2018-07-19 Thread NicoK
Github user NicoK commented on the issue:

https://github.com/apache/flink/pull/6115
  
Thanks for the review, merging...


---


[jira] [Commented] (FLINK-9061) add entropy to s3 path for better scalability

2018-07-19 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549386#comment-16549386
 ] 

Greg Hogan commented on FLINK-9061:
---

Not that we shouldn't implement the general purpose solution but Amazon looks 
to have increased the PUT rate from 100 to 3500 and the GET rate from 300 to 
5500:

"This S3 request rate performance increase removes any previous guidance to 
randomize object prefixes to achieve faster performance. That means you can now 
use logical or sequential naming patterns in S3 object naming without any 
performance implications."

https://aws.amazon.com/about-aws/whats-new/2018/07/amazon-s3-announces-increased-request-rate-performance/

> add entropy to s3 path for better scalability
> -
>
> Key: FLINK-9061
> URL: https://issues.apache.org/jira/browse/FLINK-9061
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystem, State Backends, Checkpointing
>Affects Versions: 1.5.0, 1.4.2
>Reporter: Jamie Grier
>Assignee: Indrajit Roychoudhury
>Priority: Critical
>  Labels: pull-request-available
>
> I think we need to modify the way we write checkpoints to S3 for high-scale 
> jobs (those with many total tasks).  The issue is that we are writing all the 
> checkpoint data under a common key prefix.  This is the worst case scenario 
> for S3 performance since the key is used as a partition key.
>  
> In the worst case checkpoints fail with a 500 status code coming back from S3 
> and an internal error type of TooBusyException.
>  
> One possible solution would be to add a hook in the Flink filesystem code 
> that allows me to "rewrite" paths.  For example say I have the checkpoint 
> directory set to:
>  
> s3://bucket/flink/checkpoints
>  
> I would hook that and rewrite that path to:
>  
> s3://bucket/[HASH]/flink/checkpoints, where HASH is the hash of the original 
> path
>  
> This would distribute the checkpoint write load around the S3 cluster evenly.
>  
> For reference: 
> https://aws.amazon.com/premiumsupport/knowledge-center/s3-bucket-performance-improve/
>  
> Any other people hit this issue?  Any other ideas for solutions?  This is a 
> pretty serious problem for people trying to checkpoint to S3.
>  
> -Jamie
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-9774) Allow to pass a string-based cluster identifier to command line

2018-07-19 Thread vinoyang (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549382#comment-16549382
 ] 

vinoyang edited comment on FLINK-9774 at 7/19/18 2:55 PM:
--

I think [~twalthr] gave a good example, the existed "-m" option can be extended 
to : 
{code:java}
-m ${clusterMode}://${clusterIdentifier}
{code}
what's your opinion? [~StephanEwen] [~till.rohrmann] [~Zentol]

 


was (Author: yanghua):
I think [~twalthr] gave a good example, the existed "-m" option can be extended 
to : 
{code:java}
-m ${clusterMode}://${cluster identifier}
{code}
what's your opinion? [~StephanEwen] [~till.rohrmann] [~Zentol]

 

> Allow to pass a string-based cluster identifier to command line
> ---
>
> Key: FLINK-9774
> URL: https://issues.apache.org/jira/browse/FLINK-9774
> Project: Flink
>  Issue Type: Improvement
>  Components: Client, Cluster Management
>Reporter: Timo Walther
>Assignee: vinoyang
>Priority: Major
>
> Whenever a new cluster is deployed for a job from a cluster descriptor, there 
> should be a generic possibility to convert the cluster identifier into a 
> string representation that can be passed to the command line in order to 
> retrieve the status of the running job.
> A possibility would be to extend the existing {{-m}} option. An example 
> design could be:
> {code}
> -m k8s://kubernetesMaster, -m yarn://yarnMaster, -m 
> standalone://standaloneMaster
> {code}
> The exact design is still up for discussion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9774) Allow to pass a string-based cluster identifier to command line

2018-07-19 Thread vinoyang (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549382#comment-16549382
 ] 

vinoyang commented on FLINK-9774:
-

I think [~twalthr] gave a good example, the existed "-m" option can be extended 
to : 
{code:java}
-m ${clusterMode}://${cluster identifier}
{code}
what's your opinion? [~StephanEwen] [~till.rohrmann] [~Zentol]

 

> Allow to pass a string-based cluster identifier to command line
> ---
>
> Key: FLINK-9774
> URL: https://issues.apache.org/jira/browse/FLINK-9774
> Project: Flink
>  Issue Type: Improvement
>  Components: Client, Cluster Management
>Reporter: Timo Walther
>Assignee: vinoyang
>Priority: Major
>
> Whenever a new cluster is deployed for a job from a cluster descriptor, there 
> should be a generic possibility to convert the cluster identifier into a 
> string representation that can be passed to the command line in order to 
> retrieve the status of the running job.
> A possibility would be to extend the existing {{-m}} option. An example 
> design could be:
> {code}
> -m k8s://kubernetesMaster, -m yarn://yarnMaster, -m 
> standalone://standaloneMaster
> {code}
> The exact design is still up for discussion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9609) Add bucket ready mechanism for BucketingSink when checkpoint complete

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549361#comment-16549361
 ] 

ASF GitHub Bot commented on FLINK-9609:
---

GitHub user zhangminglei opened a pull request:

https://github.com/apache/flink/pull/6375

[FLINK-9609] [connectors] Add bucket ready mechanism for BucketingSin…

## What is the purpose of the change
Currently, BucketingSink only support ```notifyCheckpointComplete```. 
However, users want to do some extra work when a bucket is ready. It would be 
nice if we can support BucketReady mechanism for users or we can tell users 
when a bucket is ready for use. For example, One bucket is created for every 5 
minutes, at the end of 5 minutes before creating the next bucket, the user 
might need to do something as the previous bucket ready, like sending the 
timestamp of the bucket ready time to a server or do some other stuff.

Here, Bucket ready means all the part files suffix name under a bucket 
neither .pending nor .in-progress. Then we can think this bucket is ready for 
user use. Like a watermark means no elements with a timestamp older or equal to 
the watermark timestamp should arrive at the window. We can also refer to the 
concept of watermark here, or we can call this BucketWatermark if we could.

## Brief change log
Add an interface ```BucketReady``` .


## Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): (no)
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
  - The serializers: (no)
  - The runtime per-record code paths (performance sensitive): (no)
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
  - The S3 file system connector: (no)

## Documentation

  - Does this pull request introduce a new feature? (yes)
  - If yes, how is the feature documented? (not documented)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhangminglei/flink flink-9609-bucketready

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/6375.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6375


commit f95894956ac15d09b51b3a232d6f83227582e641
Author: zhangminglei 
Date:   2018-07-19T14:38:45Z

[FLINK-9609] [connectors] Add bucket ready mechanism for BucketingSink when 
checkpoint complete




> Add bucket ready mechanism for BucketingSink when checkpoint complete
> -
>
> Key: FLINK-9609
> URL: https://issues.apache.org/jira/browse/FLINK-9609
> Project: Flink
>  Issue Type: New Feature
>  Components: filesystem-connector, Streaming Connectors
>Affects Versions: 1.5.0, 1.4.2
>Reporter: zhangminglei
>Assignee: zhangminglei
>Priority: Major
>  Labels: pull-request-available
>
> Currently, BucketingSink only support {{notifyCheckpointComplete}}. However, 
> users want to do some extra work when a bucket is ready. It would be nice if 
> we can support {{BucketReady}} mechanism for users or we can tell users when 
> a bucket is ready for use. For example, One bucket is created for every 5 
> minutes, at the end of 5 minutes before creating the next bucket, the user 
> might need to do something as the previous bucket ready, like sending the 
> timestamp of the bucket ready time to a server or do some other stuff.
> Here, Bucket ready means all the part files suffix name under a bucket 
> neither {{.pending}} nor {{.in-progress}}. Then we can think this bucket is 
> ready for user use. Like a watermark means no elements with a timestamp older 
> or equal to the watermark timestamp should arrive at the window. We can also 
> refer to the concept of watermark here, or we can call this *BucketWatermark* 
> if we could.
> Recently, I found a user who wants this functionality which I would think.
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Let-BucketingSink-roll-file-on-each-checkpoint-td19034.html
> Below is what he said:
> My user case is we read data from message queue, write to HDFS, and our ETL 
> team will use the data in HDFS. *In the case, ETL need to know if all data is 
> ready to be read accurately*, so we use a counter to count how many data has 
> been wrote, if the counter is equal to the number we received, we think HDFS 
> file is ready. We send the counter 

[jira] [Updated] (FLINK-9609) Add bucket ready mechanism for BucketingSink when checkpoint complete

2018-07-19 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-9609:
--
Labels: pull-request-available  (was: )

> Add bucket ready mechanism for BucketingSink when checkpoint complete
> -
>
> Key: FLINK-9609
> URL: https://issues.apache.org/jira/browse/FLINK-9609
> Project: Flink
>  Issue Type: New Feature
>  Components: filesystem-connector, Streaming Connectors
>Affects Versions: 1.5.0, 1.4.2
>Reporter: zhangminglei
>Assignee: zhangminglei
>Priority: Major
>  Labels: pull-request-available
>
> Currently, BucketingSink only support {{notifyCheckpointComplete}}. However, 
> users want to do some extra work when a bucket is ready. It would be nice if 
> we can support {{BucketReady}} mechanism for users or we can tell users when 
> a bucket is ready for use. For example, One bucket is created for every 5 
> minutes, at the end of 5 minutes before creating the next bucket, the user 
> might need to do something as the previous bucket ready, like sending the 
> timestamp of the bucket ready time to a server or do some other stuff.
> Here, Bucket ready means all the part files suffix name under a bucket 
> neither {{.pending}} nor {{.in-progress}}. Then we can think this bucket is 
> ready for user use. Like a watermark means no elements with a timestamp older 
> or equal to the watermark timestamp should arrive at the window. We can also 
> refer to the concept of watermark here, or we can call this *BucketWatermark* 
> if we could.
> Recently, I found a user who wants this functionality which I would think.
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Let-BucketingSink-roll-file-on-each-checkpoint-td19034.html
> Below is what he said:
> My user case is we read data from message queue, write to HDFS, and our ETL 
> team will use the data in HDFS. *In the case, ETL need to know if all data is 
> ready to be read accurately*, so we use a counter to count how many data has 
> been wrote, if the counter is equal to the number we received, we think HDFS 
> file is ready. We send the counter message in a custom sink so ETL can know 
> how many data has been wrote, but if use current BucketingSink, even through 
> HDFS file is flushed, ETL may still cannot read the data. If we can close 
> file during checkpoint, then the result is accurately. And for the HDFS small 
> file problem, it can be controller by use bigger checkpoint interval. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink pull request #6375: [FLINK-9609] [connectors] Add bucket ready mechani...

2018-07-19 Thread zhangminglei
GitHub user zhangminglei opened a pull request:

https://github.com/apache/flink/pull/6375

[FLINK-9609] [connectors] Add bucket ready mechanism for BucketingSin…

## What is the purpose of the change
Currently, BucketingSink only support ```notifyCheckpointComplete```. 
However, users want to do some extra work when a bucket is ready. It would be 
nice if we can support BucketReady mechanism for users or we can tell users 
when a bucket is ready for use. For example, One bucket is created for every 5 
minutes, at the end of 5 minutes before creating the next bucket, the user 
might need to do something as the previous bucket ready, like sending the 
timestamp of the bucket ready time to a server or do some other stuff.

Here, Bucket ready means all the part files suffix name under a bucket 
neither .pending nor .in-progress. Then we can think this bucket is ready for 
user use. Like a watermark means no elements with a timestamp older or equal to 
the watermark timestamp should arrive at the window. We can also refer to the 
concept of watermark here, or we can call this BucketWatermark if we could.

## Brief change log
Add an interface ```BucketReady``` .


## Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): (no)
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
  - The serializers: (no)
  - The runtime per-record code paths (performance sensitive): (no)
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
  - The S3 file system connector: (no)

## Documentation

  - Does this pull request introduce a new feature? (yes)
  - If yes, how is the feature documented? (not documented)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhangminglei/flink flink-9609-bucketready

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/6375.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6375


commit f95894956ac15d09b51b3a232d6f83227582e641
Author: zhangminglei 
Date:   2018-07-19T14:38:45Z

[FLINK-9609] [connectors] Add bucket ready mechanism for BucketingSink when 
checkpoint complete




---


[GitHub] flink pull request #:

2018-07-19 Thread tison1
Github user tison1 commented on the pull request:


https://github.com/apache/flink/commit/8231b62ff42aae53ca3a7b552980838ccab824ab#commitcomment-29765803
  
In 
flink-runtime/src/main/java/org/apache/flink/runtime/jobmanager/scheduler/CoLocationGroup.java:
In 
flink-runtime/src/main/java/org/apache/flink/runtime/jobmanager/scheduler/CoLocationGroup.java
 on line 81:
here is a question from 4 years later.

why this method call `ensureCapacity` twice. it seems to solve some issue 
about concurrency but as the change made in #6353 , this method throws a out of 
index exception. so i try to add a synchronized block to make sure it is 
thread-safe #6370 . definitely i think my code is not that perfect. so i come 
to here, wonder the original purpose of this code and ask advice about the two 
PRs mentioned above

@StephanEwen looking forward to your advice. thanks in advance!


---


[jira] [Commented] (FLINK-9850) Add a string to the print method to identify output for DataStream

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549354#comment-16549354
 ] 

ASF GitHub Bot commented on FLINK-9850:
---

Github user tison1 commented on the issue:

https://github.com/apache/flink/pull/6367
  
@yanghua also +1

this is a net win.


> Add a string to the print method to identify output for DataStream
> --
>
> Key: FLINK-9850
> URL: https://issues.apache.org/jira/browse/FLINK-9850
> Project: Flink
>  Issue Type: New Feature
>  Components: DataStream API
>Reporter: Hequn Cheng
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
>
> The output of the print method of {[DataSet}} allows the user to supply a 
> String to identify the output(see 
> [FLINK-1486|https://issues.apache.org/jira/browse/FLINK-1486]). But 
> {[DataStream}} doesn't support now. It is valuable to add this feature for 
> {{DataStream}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #6367: [FLINK-9850] Add a string to the print method to identify...

2018-07-19 Thread tison1
Github user tison1 commented on the issue:

https://github.com/apache/flink/pull/6367
  
@yanghua also +1

this is a net win.


---


[jira] [Commented] (FLINK-9894) Potential Data Race

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549352#comment-16549352
 ] 

ASF GitHub Bot commented on FLINK-9894:
---

Github user tison1 commented on the issue:

https://github.com/apache/flink/pull/6370
  
but the original `ensureConstraints` is wired. For example it calls 
`ensureCapacity` twice and the only code path is from `ExecutionJobVertex` 
construct `ExecutionVertex` which calls `ensureConstraints` from `0` to `N`, 
which we gain little goodies from  `ensureCapacity`. and so on.


> Potential Data Race
> ---
>
> Key: FLINK-9894
> URL: https://issues.apache.org/jira/browse/FLINK-9894
> Project: Flink
>  Issue Type: Improvement
>  Components: Local Runtime
>Affects Versions: 1.5.1
>Reporter: 陈梓立
>Assignee: 陈梓立
>Priority: Major
>  Labels: pull-request-available
>
> CoLocationGroup#ensureConstraints



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #6370: [FLINK-9894] [runtime] Potential Data Race

2018-07-19 Thread tison1
Github user tison1 commented on the issue:

https://github.com/apache/flink/pull/6370
  
but the original `ensureConstraints` is wired. For example it calls 
`ensureCapacity` twice and the only code path is from `ExecutionJobVertex` 
construct `ExecutionVertex` which calls `ensureConstraints` from `0` to `N`, 
which we gain little goodies from  `ensureCapacity`. and so on.


---


[jira] [Commented] (FLINK-9814) CsvTableSource "lack of column" warning

2018-07-19 Thread JIRA


[ 
https://issues.apache.org/jira/browse/FLINK-9814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549347#comment-16549347
 ] 

François Lacombe commented on FLINK-9814:
-

Hi Fabian,

Ok to add option to check the header.

Regarding the overhead, can't it be done prior to split the file and give it 
workers?
I think about reading the first line of the file with a dedicated file system 
access in the source class at the source creation maybe.

This would be ok for many sources, since headers often contains columns and 
types descriptors and prevent doing a lot of work with bad formatted files

 

All the best

> CsvTableSource "lack of column" warning
> ---
>
> Key: FLINK-9814
> URL: https://issues.apache.org/jira/browse/FLINK-9814
> Project: Flink
>  Issue Type: Wish
>  Components: Table API  SQL
>Affects Versions: 1.5.0
>Reporter: François Lacombe
>Assignee: vinoyang
>Priority: Minor
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> The CsvTableSource class is built by defining expected columns to be find in 
> the corresponding csv file.
>  
> It would be great to throw an Exception when the csv file doesn't have the 
> same structure as defined in the source. For retro-compatibility sake, 
> developers should explicitly set the builder to define columns stricly and 
> expect Exception to be thrown in case of structure difference.
> It can be easilly checked with file header if it exists.
> Is this possible ?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-9813) Build xTableSource from Avro schemas

2018-07-19 Thread JIRA


[ 
https://issues.apache.org/jira/browse/FLINK-9813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549333#comment-16549333
 ] 

François Lacombe edited comment on FLINK-9813 at 7/19/18 2:17 PM:
--

Hi Fabian,

Nice we agree on that :)

I think the Builder class, or at least the InputFormat class to which the 
Schema is provided could check if the schema provide compatible elements and 
throw Exception if not to prevent the SourceTable to be built.
 Nested Avro types may be supported, if a particular column contains JSON 
string. But it may be out of the scope of csv format specification.

I'm totally ok with Csv.avroSchema to build a Csv format descriptor.

Do you know the release number which will deliver this rework of Sources ?


was (Author: flacombe):
Hi Fabian,

Nice we agree on that :)

I think the Builder class, or at least the InputFormat class to which the 
Schema is provided could check if the schema provide compatible elements and thr
 
Il s'inspire déjà de quelques projets passés, mais il peut toujours y avoir des 
manques. Si vous identifiez une situation que nous avons eu à modéliser qui ne 
rentre pas dans le cadre, ce serait intéressant de l'étudier.
ow Exception if not to prevent the SourceTable to be built.
Nested Avro types may be supported, if a particular column contains JSON 
string. But it may be out of the scope of csv format specification.

I'm totally ok with Csv.avroSchema to build a Csv format descriptor.

Do you know the release number which will deliver this rework of Sources ?

> Build xTableSource from Avro schemas
> 
>
> Key: FLINK-9813
> URL: https://issues.apache.org/jira/browse/FLINK-9813
> Project: Flink
>  Issue Type: Wish
>  Components: Table API  SQL
>Affects Versions: 1.5.0
>Reporter: François Lacombe
>Priority: Trivial
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> As Avro provide efficient data schemas formalism, it may be great to be able 
> to build Flink Tables Sources with such files.
> More info about Avro schemas 
> :[https://avro.apache.org/docs/1.8.1/spec.html#schemas]
> For instance, with CsvTableSource :
> Parser schemaParser = new Schema.Parser();
> Schema tableSchema = schemaParser.parse("avro.json");
> Builder bld = CsvTableSource.builder().schema(tableSchema);
>  
> This would give me a fully available CsvTableSource with columns defined in 
> avro.json
> It may be possible to do so for every TableSources since avro format is 
> really common and versatile.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9813) Build xTableSource from Avro schemas

2018-07-19 Thread JIRA


[ 
https://issues.apache.org/jira/browse/FLINK-9813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549333#comment-16549333
 ] 

François Lacombe commented on FLINK-9813:
-

Hi Fabian,

Nice we agree on that :)

I think the Builder class, or at least the InputFormat class to which the 
Schema is provided could check if the schema provide compatible elements and thr
 
Il s'inspire déjà de quelques projets passés, mais il peut toujours y avoir des 
manques. Si vous identifiez une situation que nous avons eu à modéliser qui ne 
rentre pas dans le cadre, ce serait intéressant de l'étudier.
ow Exception if not to prevent the SourceTable to be built.
Nested Avro types may be supported, if a particular column contains JSON 
string. But it may be out of the scope of csv format specification.

I'm totally ok with Csv.avroSchema to build a Csv format descriptor.

Do you know the release number which will deliver this rework of Sources ?

> Build xTableSource from Avro schemas
> 
>
> Key: FLINK-9813
> URL: https://issues.apache.org/jira/browse/FLINK-9813
> Project: Flink
>  Issue Type: Wish
>  Components: Table API  SQL
>Affects Versions: 1.5.0
>Reporter: François Lacombe
>Priority: Trivial
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> As Avro provide efficient data schemas formalism, it may be great to be able 
> to build Flink Tables Sources with such files.
> More info about Avro schemas 
> :[https://avro.apache.org/docs/1.8.1/spec.html#schemas]
> For instance, with CsvTableSource :
> Parser schemaParser = new Schema.Parser();
> Schema tableSchema = schemaParser.parse("avro.json");
> Builder bld = CsvTableSource.builder().schema(tableSchema);
>  
> This would give me a fully available CsvTableSource with columns defined in 
> avro.json
> It may be possible to do so for every TableSources since avro format is 
> really common and versatile.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9850) Add a string to the print method to identify output for DataStream

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549334#comment-16549334
 ] 

ASF GitHub Bot commented on FLINK-9850:
---

Github user hequn8128 commented on the issue:

https://github.com/apache/flink/pull/6367
  
@yanghua Thanks for your update. +1 to merge 


> Add a string to the print method to identify output for DataStream
> --
>
> Key: FLINK-9850
> URL: https://issues.apache.org/jira/browse/FLINK-9850
> Project: Flink
>  Issue Type: New Feature
>  Components: DataStream API
>Reporter: Hequn Cheng
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
>
> The output of the print method of {[DataSet}} allows the user to supply a 
> String to identify the output(see 
> [FLINK-1486|https://issues.apache.org/jira/browse/FLINK-1486]). But 
> {[DataStream}} doesn't support now. It is valuable to add this feature for 
> {{DataStream}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #6367: [FLINK-9850] Add a string to the print method to identify...

2018-07-19 Thread hequn8128
Github user hequn8128 commented on the issue:

https://github.com/apache/flink/pull/6367
  
@yanghua Thanks for your update. +1 to merge 


---


[jira] [Commented] (FLINK-9675) Avoid FileInputStream/FileOutputStream

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549300#comment-16549300
 ] 

ASF GitHub Bot commented on FLINK-9675:
---

Github user hequn8128 commented on the issue:

https://github.com/apache/flink/pull/6335
  
@zhangminglei  It sounds great :-)


> Avoid FileInputStream/FileOutputStream
> --
>
> Key: FLINK-9675
> URL: https://issues.apache.org/jira/browse/FLINK-9675
> Project: Flink
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: zhangminglei
>Priority: Minor
>  Labels: filesystem, pull-request-available
>
> They rely on finalizers (before Java 11), which create unnecessary GC load.
> The alternatives, Files.newInputStream, are as easy to use and don't have 
> this issue.
> And here is a benchmark 
> https://arnaudroger.github.io/blog/2017/03/20/faster-reader-inpustream-in-java.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #6335: [FLINK-9675] [fs] Avoid FileInputStream/FileOutputStream

2018-07-19 Thread hequn8128
Github user hequn8128 commented on the issue:

https://github.com/apache/flink/pull/6335
  
@zhangminglei  It sounds great :-)


---


[jira] [Commented] (FLINK-9894) Potential Data Race

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549255#comment-16549255
 ] 

ASF GitHub Bot commented on FLINK-9894:
---

Github user tison1 commented on the issue:

https://github.com/apache/flink/pull/6370
  
@yanghua AFAIK, yes.


> Potential Data Race
> ---
>
> Key: FLINK-9894
> URL: https://issues.apache.org/jira/browse/FLINK-9894
> Project: Flink
>  Issue Type: Improvement
>  Components: Local Runtime
>Affects Versions: 1.5.1
>Reporter: 陈梓立
>Assignee: 陈梓立
>Priority: Major
>  Labels: pull-request-available
>
> CoLocationGroup#ensureConstraints



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #6370: [FLINK-9894] [runtime] Potential Data Race

2018-07-19 Thread tison1
Github user tison1 commented on the issue:

https://github.com/apache/flink/pull/6370
  
@yanghua AFAIK, yes.


---


[jira] [Assigned] (FLINK-9791) Outdated savepoint compatibility table

2018-07-19 Thread vinoyang (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reassigned FLINK-9791:
---

Assignee: (was: vinoyang)

> Outdated savepoint compatibility table
> --
>
> Key: FLINK-9791
> URL: https://issues.apache.org/jira/browse/FLINK-9791
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.4.2, 1.5.1, 1.6.0
>Reporter: Dawid Wysakowicz
>Priority: Major
>
> Savepoint compatibility table is outdated, does not cover 1.4.x nor 1.5.x. We 
> should either update it or remove it, as I think we agreed to support only 
> two versions backward compatibility and such table is unnecessary.
>  
> You can check the table in version 1.5.x here: 
> https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/upgrading.html#compatibility-table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (FLINK-9806) Add a canonical link element to documentation HTML

2018-07-19 Thread vinoyang (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reassigned FLINK-9806:
---

Assignee: (was: vinoyang)

> Add a canonical link element to documentation HTML
> --
>
> Key: FLINK-9806
> URL: https://issues.apache.org/jira/browse/FLINK-9806
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Patrick Lucas
>Priority: Major
>
> Flink has suffered for a while with non-optimal SEO for its documentation, 
> meaning a web search for a topic covered in the documentation often produces 
> results for many versions of Flink, even preferring older versions since 
> those pages have been around for longer.
> Using a canonical link element (see references) may alleviate this by 
> informing search engines about where to find the latest documentation (i.e. 
> pages hosted under [https://ci.apache.org/projects/flink/flink-docs-master/).]
> I think this is at least worth experimenting with, and if it doesn't cause 
> problems, even backporting it to the older release branches to eventually 
> clean up the Flink docs' SEO and converge on advertising only the latest docs 
> (unless a specific version is specified).
> References:
>  * [https://moz.com/learn/seo/canonicalization]
>  * [https://yoast.com/rel-canonical/]
>  * [https://support.google.com/webmasters/answer/139066?hl=en]
>  * [https://en.wikipedia.org/wiki/Canonical_link_element]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9755) Exceptions in RemoteInputChannel#notifyBufferAvailable() are not propagated to the responsible thread

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549216#comment-16549216
 ] 

ASF GitHub Bot commented on FLINK-9755:
---

Github user NicoK commented on the issue:

https://github.com/apache/flink/pull/6272
  
rebased to solve the merge conflict (auto-solved by git though)


> Exceptions in RemoteInputChannel#notifyBufferAvailable() are not propagated 
> to the responsible thread
> -
>
> Key: FLINK-9755
> URL: https://issues.apache.org/jira/browse/FLINK-9755
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.5.2, 1.6.0
>
>
> The credit-based flow control implementation of 
> RemoteInputChannel#notifyBufferAvailable() does not forward errors (like the 
> {{IllegalStateException}}) to the thread that is being notified. The calling 
> code at {{LocalBufferPool#recycle}}, however, relies on the callback 
> forwarding errors and completely ignores any failures.
> Therefore, we could end up with a program waiting forever for the callback 
> and not even a failure message in the logs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #6272: [FLINK-9755][network] forward exceptions in RemoteInputCh...

2018-07-19 Thread NicoK
Github user NicoK commented on the issue:

https://github.com/apache/flink/pull/6272
  
rebased to solve the merge conflict (auto-solved by git though)


---


[jira] [Assigned] (FLINK-6222) YARN: setting environment variables in an easier fashion

2018-07-19 Thread Dawid Wysakowicz (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Wysakowicz reassigned FLINK-6222:
---

Assignee: Dawid Wysakowicz  (was: Craig Foster)

> YARN: setting environment variables in an easier fashion
> 
>
> Key: FLINK-6222
> URL: https://issues.apache.org/jira/browse/FLINK-6222
> Project: Flink
>  Issue Type: Improvement
>  Components: Startup Shell Scripts
>Affects Versions: 1.2.0
> Environment: YARN, EMR
>Reporter: Craig Foster
>Assignee: Dawid Wysakowicz
>Priority: Major
> Attachments: patch0-add-yarn-hadoop-conf.diff
>
>
> Right now we require end-users to set YARN_CONF_DIR or HADOOP_CONF_DIR and 
> sometimes FLINK_CONF_DIR.
> For example, in [1], it is stated: 
> “Please note that the Client requires the YARN_CONF_DIR or HADOOP_CONF_DIR 
> environment variable to be set to read the YARN and HDFS configuration.” 
> In BigTop, we set this with /etc/flink/default and then a wrapper is created 
> to source that. However, this is slightly cumbersome and we don't have a 
> central place within the Flink project itself to source environment 
> variables. config.sh could do this but it doesn't have information about 
> FLINK_CONF_DIR. For YARN and Hadoop variables, I already have a solution that 
> would add "env.yarn.confdir" and "env.hadoop.confdir" variables to the 
> flink-conf.yaml file and then we just symlink /etc/lib/flink/conf/ and 
> /etc/flink/conf. 
> But we could also add a flink-env.sh file to set these variables and decouple 
> them from config.sh entirely. 
> I'd like to know the opinion/preference of others and what would be more 
> amenable. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-9860) Netty resource leak on receiver side

2018-07-19 Thread Chesnay Schepler (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-9860.
---
Resolution: Fixed

master: 6a56e48c78caecda84d0995f67bf92dad37b1791
1.6: 49c6f385df97958cda77933edfce93cd71d46600

> Netty resource leak on receiver side
> 
>
> Key: FLINK-9860
> URL: https://issues.apache.org/jira/browse/FLINK-9860
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.1, 1.6.0
>Reporter: Till Rohrmann
>Assignee: Nico Kruber
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.5.2, 1.6.0
>
>
> The Hadoop-free Wordcount end-to-end test fails with the following exception:
> {code}
> ERROR org.apache.flink.shaded.netty4.io.netty.util.ResourceLeakDetector  - 
> LEAK: ByteBuf.release() was not called before it's garbage-collected. See 
> http://netty.io/wiki/reference-counted-objects.html for more information.
> Recent access records: 
> Created at:
>   
> org.apache.flink.shaded.netty4.io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:331)
>   
> org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:185)
>   
> org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:176)
>   
> org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:137)
>   
> org.apache.flink.shaded.netty4.io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:114)
>   
> org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:147)
>   
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
>   
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
>   
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
>   
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
>   
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
>   
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> {code}
> We might have a resource leak on the receiving side of our network stack.
> https://api.travis-ci.org/v3/job/404225956/log.txt



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (FLINK-9860) Netty resource leak on receiver side

2018-07-19 Thread Chesnay Schepler (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler reopened FLINK-9860:
-

> Netty resource leak on receiver side
> 
>
> Key: FLINK-9860
> URL: https://issues.apache.org/jira/browse/FLINK-9860
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.1, 1.6.0
>Reporter: Till Rohrmann
>Assignee: Nico Kruber
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.5.2, 1.6.0
>
>
> The Hadoop-free Wordcount end-to-end test fails with the following exception:
> {code}
> ERROR org.apache.flink.shaded.netty4.io.netty.util.ResourceLeakDetector  - 
> LEAK: ByteBuf.release() was not called before it's garbage-collected. See 
> http://netty.io/wiki/reference-counted-objects.html for more information.
> Recent access records: 
> Created at:
>   
> org.apache.flink.shaded.netty4.io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:331)
>   
> org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:185)
>   
> org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:176)
>   
> org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:137)
>   
> org.apache.flink.shaded.netty4.io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:114)
>   
> org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:147)
>   
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
>   
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
>   
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
>   
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
>   
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
>   
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> {code}
> We might have a resource leak on the receiving side of our network stack.
> https://api.travis-ci.org/v3/job/404225956/log.txt



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-9860) Netty resource leak on receiver side

2018-07-19 Thread Chesnay Schepler (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-9860.
---
Resolution: Fixed

1.5: c1c4bcb34b421eff12bc52c2e56219233acbd290

> Netty resource leak on receiver side
> 
>
> Key: FLINK-9860
> URL: https://issues.apache.org/jira/browse/FLINK-9860
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.1, 1.6.0
>Reporter: Till Rohrmann
>Assignee: Nico Kruber
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.5.2, 1.6.0
>
>
> The Hadoop-free Wordcount end-to-end test fails with the following exception:
> {code}
> ERROR org.apache.flink.shaded.netty4.io.netty.util.ResourceLeakDetector  - 
> LEAK: ByteBuf.release() was not called before it's garbage-collected. See 
> http://netty.io/wiki/reference-counted-objects.html for more information.
> Recent access records: 
> Created at:
>   
> org.apache.flink.shaded.netty4.io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:331)
>   
> org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:185)
>   
> org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:176)
>   
> org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:137)
>   
> org.apache.flink.shaded.netty4.io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:114)
>   
> org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:147)
>   
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
>   
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
>   
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
>   
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
>   
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
>   
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> {code}
> We might have a resource leak on the receiving side of our network stack.
> https://api.travis-ci.org/v3/job/404225956/log.txt



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (FLINK-9871) Use Description class for ConfigOptions with rich formatting

2018-07-19 Thread Dawid Wysakowicz (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Wysakowicz resolved FLINK-9871.
-
Resolution: Fixed

> Use Description class for ConfigOptions with rich formatting
> 
>
> Key: FLINK-9871
> URL: https://issues.apache.org/jira/browse/FLINK-9871
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.6.0
>Reporter: Dawid Wysakowicz
>Assignee: Dawid Wysakowicz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9871) Use Description class for ConfigOptions with rich formatting

2018-07-19 Thread Dawid Wysakowicz (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549188#comment-16549188
 ] 

Dawid Wysakowicz commented on FLINK-9871:
-

Fixed in 1.6: 41d4d8d00ee0a9a73a7674a3b3143a5452cd436d
Fixed in master: 0cb7706dad74133652983a132d70ba4ded4aff9b


> Use Description class for ConfigOptions with rich formatting
> 
>
> Key: FLINK-9871
> URL: https://issues.apache.org/jira/browse/FLINK-9871
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.6.0
>Reporter: Dawid Wysakowicz
>Assignee: Dawid Wysakowicz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9871) Use Description class for ConfigOptions with rich formatting

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549185#comment-16549185
 ] 

ASF GitHub Bot commented on FLINK-9871:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/6371


> Use Description class for ConfigOptions with rich formatting
> 
>
> Key: FLINK-9871
> URL: https://issues.apache.org/jira/browse/FLINK-9871
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.6.0
>Reporter: Dawid Wysakowicz
>Assignee: Dawid Wysakowicz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink pull request #6371: [FLINK-9871] Use Description class for ConfigOptio...

2018-07-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/6371


---


[jira] [Commented] (FLINK-9888) Remove unsafe defaults from release scripts

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549183#comment-16549183
 ] 

ASF GitHub Bot commented on FLINK-9888:
---

Github user yanghua commented on the issue:

https://github.com/apache/flink/pull/6362
  
+1


> Remove unsafe defaults from release scripts
> ---
>
> Key: FLINK-9888
> URL: https://issues.apache.org/jira/browse/FLINK-9888
> Project: Flink
>  Issue Type: Bug
>  Components: Release System
>Affects Versions: 1.4.0, 1.5.0, 1.6.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available
>
> Several variables in the release scripts have unsafe defaults, like these 
> from {{create_release_branch.sh}}:
> {code}
> OLD_VERSION=${OLD_VERSION:-1.2-SNAPSHOT}
> NEW_VERSION=${NEW_VERSION:-1.3-SNAPSHOT}
> {code}
> We should not allow the script to run successfully without these being 
> explicitly set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #6362: [FLINK-9888][release] Remove unsafe defaults from release...

2018-07-19 Thread yanghua
Github user yanghua commented on the issue:

https://github.com/apache/flink/pull/6362
  
+1


---


[jira] [Commented] (FLINK-9895) Ensure correct logging settings for NettyLeakDetectionResource

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549182#comment-16549182
 ] 

ASF GitHub Bot commented on FLINK-9895:
---

Github user yanghua commented on the issue:

https://github.com/apache/flink/pull/6374
  
+1


> Ensure correct logging settings for NettyLeakDetectionResource
> --
>
> Key: FLINK-9895
> URL: https://issues.apache.org/jira/browse/FLINK-9895
> Project: Flink
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 1.6.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> The {{NettyLeakDetectionResource}} only works properly if ERROR logging is 
> enabled for nettys {{ResourceLeakDetector}}. We should add an assertion to 
> the resource constructor to ensure this is actually set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #6374: [FLINK-9895][tests] Ensure error logging for NettyLeakDet...

2018-07-19 Thread yanghua
Github user yanghua commented on the issue:

https://github.com/apache/flink/pull/6374
  
+1


---


[jira] [Commented] (FLINK-9895) Ensure correct logging settings for NettyLeakDetectionResource

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549171#comment-16549171
 ] 

ASF GitHub Bot commented on FLINK-9895:
---

GitHub user zentol opened a pull request:

https://github.com/apache/flink/pull/6374

[FLINK-9895][tests] Ensure error logging for NettyLeakDetectionResource

## What is the purpose of the change

This PR is a small addition to #6363 to ensure that ERROR logging is 
enabled for Nettys `ResourceLeakDetector`, as otherwise the the leak will not 
cause test failures.

## Verifying this change

* disable error logging in `flink-runtime` for `ResourceLeakDetector`. (see 
`log4j-test.properties`)
* disable auto-release in `FileUploadHandler`
* run `FileUploadHandlerTest`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zentol/flink 9895

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/6374.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6374


commit 373d6ef65b50de86897a9da6d403982aae59a3d1
Author: zentol 
Date:   2018-07-19T11:47:53Z

[FLINK-9895][tests] Ensure error logging for NettyLeakDetectionResource




> Ensure correct logging settings for NettyLeakDetectionResource
> --
>
> Key: FLINK-9895
> URL: https://issues.apache.org/jira/browse/FLINK-9895
> Project: Flink
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 1.6.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> The {{NettyLeakDetectionResource}} only works properly if ERROR logging is 
> enabled for nettys {{ResourceLeakDetector}}. We should add an assertion to 
> the resource constructor to ensure this is actually set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9894) Potential Data Race

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549170#comment-16549170
 ] 

ASF GitHub Bot commented on FLINK-9894:
---

Github user yanghua commented on the issue:

https://github.com/apache/flink/pull/6370
  
@tison1 I think PR #6353 and #6370 has causal relationship, the current 
codebase may not trigger this race condition, right?


> Potential Data Race
> ---
>
> Key: FLINK-9894
> URL: https://issues.apache.org/jira/browse/FLINK-9894
> Project: Flink
>  Issue Type: Improvement
>  Components: Local Runtime
>Affects Versions: 1.5.1
>Reporter: 陈梓立
>Assignee: 陈梓立
>Priority: Major
>  Labels: pull-request-available
>
> CoLocationGroup#ensureConstraints



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9895) Ensure correct logging settings for NettyLeakDetectionResource

2018-07-19 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-9895:
--
Labels: pull-request-available  (was: )

> Ensure correct logging settings for NettyLeakDetectionResource
> --
>
> Key: FLINK-9895
> URL: https://issues.apache.org/jira/browse/FLINK-9895
> Project: Flink
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 1.6.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> The {{NettyLeakDetectionResource}} only works properly if ERROR logging is 
> enabled for nettys {{ResourceLeakDetector}}. We should add an assertion to 
> the resource constructor to ensure this is actually set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink pull request #6374: [FLINK-9895][tests] Ensure error logging for Netty...

2018-07-19 Thread zentol
GitHub user zentol opened a pull request:

https://github.com/apache/flink/pull/6374

[FLINK-9895][tests] Ensure error logging for NettyLeakDetectionResource

## What is the purpose of the change

This PR is a small addition to #6363 to ensure that ERROR logging is 
enabled for Nettys `ResourceLeakDetector`, as otherwise the the leak will not 
cause test failures.

## Verifying this change

* disable error logging in `flink-runtime` for `ResourceLeakDetector`. (see 
`log4j-test.properties`)
* disable auto-release in `FileUploadHandler`
* run `FileUploadHandlerTest`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zentol/flink 9895

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/6374.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6374


commit 373d6ef65b50de86897a9da6d403982aae59a3d1
Author: zentol 
Date:   2018-07-19T11:47:53Z

[FLINK-9895][tests] Ensure error logging for NettyLeakDetectionResource




---


[GitHub] flink issue #6370: [FLINK-9894] [runtime] Potential Data Race

2018-07-19 Thread yanghua
Github user yanghua commented on the issue:

https://github.com/apache/flink/pull/6370
  
@tison1 I think PR #6353 and #6370 has causal relationship, the current 
codebase may not trigger this race condition, right?


---


[jira] [Created] (FLINK-9895) Ensure correct logging settings for NettyLeakDetectionResource

2018-07-19 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-9895:
---

 Summary: Ensure correct logging settings for 
NettyLeakDetectionResource
 Key: FLINK-9895
 URL: https://issues.apache.org/jira/browse/FLINK-9895
 Project: Flink
  Issue Type: Improvement
  Components: Tests
Affects Versions: 1.6.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.6.0


The {{NettyLeakDetectionResource}} only works properly if ERROR logging is 
enabled for nettys {{ResourceLeakDetector}}. We should add an assertion to the 
resource constructor to ensure this is actually set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (FLINK-9762) CoreOptions.TMP_DIRS wrongly managed on Yarn

2018-07-19 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann resolved FLINK-9762.
--
Resolution: Fixed

Fixed via

master:
6bdec86e31d82d7c38d2509a039a1a03ab9f246e
ec28f92ffd042308494d9661a38ab462738611aa

1.6.0:
fcd266fb439aa9288d6114d9bfb1e22011588d74
877cd7ef6e8a876a2a3579d0761bc2d160a4daf4

1.5.2:
221a2b3833634065c13492ce141a8ba674e630d6
b505e52e3e4e3824b0f75c977b090c6d6cd24c62

> CoreOptions.TMP_DIRS wrongly managed on Yarn
> 
>
> Key: FLINK-9762
> URL: https://issues.apache.org/jira/browse/FLINK-9762
> Project: Flink
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.5.0
>Reporter: Oleksandr Nitavskyi
>Assignee: Oleksandr Nitavskyi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.2, 1.6.0
>
>
> The issue on Yarn is that it is impossible to have different LOCAL_DIRS on 
> JobManager and TaskManager, despite LOCAL_DIRS value depends on the container.
> The issue is that CoreOptions.TMP_DIRS is configured to the default value 
> during JobManager initialization and added to the configuration object. When 
> TaskManager is launched the appropriate configuration object is cloned with 
> LOCAL_DIRS which makes sense only for Job Manager container. When YARN 
> container with TaskManager from his point of view CoreOptions.TMP_DIRS is 
> always equal either to path in flink.yml or to the or to the LOCAL_DIRS of 
> Job Manager (default behaviour). Is TaskManager’s container do not have an 
> access to another folders, that folders allocated by YARN TaskManager cannot 
> be started.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9871) Use Description class for ConfigOptions with rich formatting

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549155#comment-16549155
 ] 

ASF GitHub Bot commented on FLINK-9871:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/6371#discussion_r203692344
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/configuration/AkkaOptions.java ---
@@ -42,22 +45,27 @@
public static final ConfigOption WATCH_HEARTBEAT_INTERVAL = 
ConfigOptions
.key("akka.watch.heartbeat.interval")
.defaultValue(ASK_TIMEOUT.defaultValue())
-   .withDescription("Heartbeat interval for Akka’s DeathWatch 
mechanism to detect dead TaskManagers. If" +
-   " TaskManagers are wrongly marked dead because of lost 
or delayed heartbeat messages, then you should" +
-   " decrease this value or increase 
akka.watch.heartbeat.pause. A thorough description of Akka’s DeathWatch" +
-   " can be found http://doc.akka.io/docs/akka/snapshot/scala/remoting.html#failure-detector\;>here.");
+   .withDescription(Description.builder()
+   .text("Heartbeat interval for Akka’s DeathWatch 
mechanism to detect dead TaskManagers. If" +
--- End diff --

we could think about adding another version of `withDescription` that works 
like `text`, so that you could write this as
```
withDescription(
"Heartbeat interval for Akka’s DeathWatch mechanism to detect dead 
TaskManagers. If" +
..
" Akka’s DeathWatch can be found %s",

link("http://doc.akka.io/docs/akka/snapshot/scala/remoting.html#failure-detector;,
 "here"));
```

Just a thought.


> Use Description class for ConfigOptions with rich formatting
> 
>
> Key: FLINK-9871
> URL: https://issues.apache.org/jira/browse/FLINK-9871
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.6.0
>Reporter: Dawid Wysakowicz
>Assignee: Dawid Wysakowicz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink pull request #6371: [FLINK-9871] Use Description class for ConfigOptio...

2018-07-19 Thread zentol
Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/6371#discussion_r203692344
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/configuration/AkkaOptions.java ---
@@ -42,22 +45,27 @@
public static final ConfigOption WATCH_HEARTBEAT_INTERVAL = 
ConfigOptions
.key("akka.watch.heartbeat.interval")
.defaultValue(ASK_TIMEOUT.defaultValue())
-   .withDescription("Heartbeat interval for Akka’s DeathWatch 
mechanism to detect dead TaskManagers. If" +
-   " TaskManagers are wrongly marked dead because of lost 
or delayed heartbeat messages, then you should" +
-   " decrease this value or increase 
akka.watch.heartbeat.pause. A thorough description of Akka’s DeathWatch" +
-   " can be found http://doc.akka.io/docs/akka/snapshot/scala/remoting.html#failure-detector\;>here.");
+   .withDescription(Description.builder()
+   .text("Heartbeat interval for Akka’s DeathWatch 
mechanism to detect dead TaskManagers. If" +
--- End diff --

we could think about adding another version of `withDescription` that works 
like `text`, so that you could write this as
```
withDescription(
"Heartbeat interval for Akka’s DeathWatch mechanism to detect dead 
TaskManagers. If" +
..
" Akka’s DeathWatch can be found %s",

link("http://doc.akka.io/docs/akka/snapshot/scala/remoting.html#failure-detector;,
 "here"));
```

Just a thought.


---


[GitHub] flink pull request #6372: [Flink 9353] Tests running per job standalone clus...

2018-07-19 Thread yanghua
Github user yanghua commented on a diff in the pull request:

https://github.com/apache/flink/pull/6372#discussion_r203692186
  
--- Diff: flink-end-to-end-tests/README.md ---
@@ -31,6 +31,12 @@ You can also run tests individually via
 $ FLINK_DIR= flink-end-to-end-tests/run-single-test.sh 
your_test.sh arg1 arg2
 ```
 
+### Kubernetes test
+
+Kubernetes test (test_kubernetes_embedded_job.sh) assumes a running 
minikube cluster. Right now we cannot
+execute it on travis. You can run it thought with `run-single-test.sh` in 
your local environment as long 
--- End diff --

does the word "thought" need to be replaced with "through"?


---


[jira] [Commented] (FLINK-9838) Slot request failed Exceptions after completing a job

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549148#comment-16549148
 ] 

ASF GitHub Bot commented on FLINK-9838:
---

Github user yanghua commented on the issue:

https://github.com/apache/flink/pull/6373
  
+1


> Slot request failed Exceptions after completing a job
> -
>
> Key: FLINK-9838
> URL: https://issues.apache.org/jira/browse/FLINK-9838
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Till Rohrmann
>Priority: Major
>  Labels: pull-request-available
>
> Currently, after a job finished, e.g. the following one, several exceptions 
> are logged (at INFO level) about failed slot requests although the job has 
> run successfully.
> {code}
> StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
> env.fromElements(1, 2, 3, 4).print();
> env.execute();
> {code}
> {code}
> 16:28:16,106 INFO  
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  - Closing 
> the SlotManager.
> 16:28:16,106 INFO  
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  - 
> Suspending the SlotManager.
> 16:28:16,106 INFO  
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  - 
> Unregister TaskManager aa20e76adb9aee0cdadc50dbc06ea208 from the SlotManager.
> 16:28:16,107 INFO  
> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  - Slot 
> request with allocation id f99ff6d66f7bc618a9ee6e9470e0cdb1 for job 
> 1bdaafd1072e210790790b99e7741b6a failed.
> org.apache.flink.util.FlinkException: The assigned slot 
> b21f8807-5d0a-4e53-9e55-b6522b4a41c0_0 was removed.
>   at 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlot(SlotManager.java:786)
>   at 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlots(SlotManager.java:756)
>   at 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.internalUnregisterTaskManager(SlotManager.java:948)
>   at 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.unregisterTaskManager(SlotManager.java:372)
>   at 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.suspend(SlotManager.java:234)
>   at 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.close(SlotManager.java:251)
>   at 
> org.apache.flink.runtime.resourcemanager.ResourceManager.postStop(ResourceManager.java:224)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.postStop(AkkaRpcActor.java:105)
>   at 
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.postStop(FencedAkkaRpcActor.java:40)
>   at akka.actor.Actor$class.aroundPostStop(Actor.scala:515)
>   at akka.actor.UntypedActor.aroundPostStop(UntypedActor.scala:95)
>   at 
> akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)
>   at 
> akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)
>   at akka.actor.ActorCell.terminate(ActorCell.scala:374)
>   at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:467)
>   at akka.actor.ActorCell.systemInvoke(ActorCell.scala:483)
>   at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:282)
>   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:260)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:224)
>   at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 16:28:16,109 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor 
>- Stopping TaskExecutor akka://flink/user/taskmanager_0.
> 16:28:16,110 INFO  
> org.apache.flink.runtime.state.TaskExecutorLocalStateStoresManager  - 
> Shutting down TaskExecutorLocalStateStoresManager.
> 16:28:16,109 INFO  org.apache.flink.runtime.dispatcher.StandaloneDispatcher   
>- Stopping dispatcher 
> akka://flink/user/dispatcher421f3c27-5248-40d4-b219-f0c23480bd6f.
> 16:28:16,111 INFO  org.apache.flink.runtime.dispatcher.StandaloneDispatcher   
>- Stopping all currently running jobs of dispatcher 
> akka://flink/user/dispatcher421f3c27-5248-40d4-b219-f0c23480bd6f.
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #6373: [FLINK-9838][logging] Don't log slot request failures on ...

2018-07-19 Thread yanghua
Github user yanghua commented on the issue:

https://github.com/apache/flink/pull/6373
  
+1


---


[jira] [Commented] (FLINK-9860) Netty resource leak on receiver side

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549146#comment-16549146
 ] 

ASF GitHub Bot commented on FLINK-9860:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/6363


> Netty resource leak on receiver side
> 
>
> Key: FLINK-9860
> URL: https://issues.apache.org/jira/browse/FLINK-9860
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.1, 1.6.0
>Reporter: Till Rohrmann
>Assignee: Nico Kruber
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.5.2, 1.6.0
>
>
> The Hadoop-free Wordcount end-to-end test fails with the following exception:
> {code}
> ERROR org.apache.flink.shaded.netty4.io.netty.util.ResourceLeakDetector  - 
> LEAK: ByteBuf.release() was not called before it's garbage-collected. See 
> http://netty.io/wiki/reference-counted-objects.html for more information.
> Recent access records: 
> Created at:
>   
> org.apache.flink.shaded.netty4.io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:331)
>   
> org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:185)
>   
> org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:176)
>   
> org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:137)
>   
> org.apache.flink.shaded.netty4.io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:114)
>   
> org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:147)
>   
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
>   
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
>   
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
>   
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
>   
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
>   
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> {code}
> We might have a resource leak on the receiving side of our network stack.
> https://api.travis-ci.org/v3/job/404225956/log.txt



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink pull request #6363: [FLINK-9860][REST] fix buffer leak in FileUploadHa...

2018-07-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/6363


---


[jira] [Commented] (FLINK-9694) Potentially NPE in CompositeTypeSerializerConfigSnapshot constructor

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549144#comment-16549144
 ] 

ASF GitHub Bot commented on FLINK-9694:
---

Github user yanghua commented on the issue:

https://github.com/apache/flink/pull/6231
  
hi @pnowojski I did not call the 
`CompositeTypeSerializerConfigSnapshot(TypeSerializer... nestedSerializers)` 
constructor explicitly, the caller is Flink itself, see 
[here](https://github.com/apache/flink/blob/master/flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/types/CRowSerializer.scala#L123).
 And I just fix the NPE in this case : 

```scala
def this() = this(null)//scala
```
but it does not means : 

```
CompositeTypeSerializerConfigSnapshot(null);//java
```

it seems means : 

```
CompositeTypeSerializerConfigSnapshot(new TypeSerializer[] {null})
//java
```

so it jumps the preconditions not null check : 

```
Preconditions.checkNotNull(nestedSerializers);//java
```
then coursed NPE in the `for` loop 
[here](https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/common/typeutils/CompositeTypeSerializerConfigSnapshot.java#L53).

I think it is a defensive check, then it's OK in our inner Flink version 
(in the previous comment, I said we customized table to provide stream and 
dimension table join).



> Potentially NPE in CompositeTypeSerializerConfigSnapshot constructor
> 
>
> Key: FLINK-9694
> URL: https://issues.apache.org/jira/browse/FLINK-9694
> Project: Flink
>  Issue Type: Bug
>  Components: Table API  SQL
>Affects Versions: 1.5.0
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Minor
>  Labels: pull-request-available
>
> the partial specific exception stack trace :
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.flink.api.common.typeutils.CompositeTypeSerializerConfigSnapshot.(CompositeTypeSerializerConfigSnapshot.java:53)
> at 
> org.apache.flink.table.runtime.types.CRowSerializer$CRowSerializerConfigSnapshot.(CRowSerializer.scala:120)
> at 
> org.apache.flink.table.runtime.types.CRowSerializer$CRowSerializerConfigSnapshot.(CRowSerializer.scala:123)
> at sun.reflect.GeneratedConstructorAccessor10.newInstance(Unknown Source)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at java.lang.Class.newInstance(Class.java:442)
> at 
> org.apache.flink.util.InstantiationUtil.instantiate(InstantiationUtil.java:319)
> ... 20 more{code}
> related code is : 
> {code:java}
> public CompositeTypeSerializerConfigSnapshot(TypeSerializer... 
> nestedSerializers) {
>Preconditions.checkNotNull(nestedSerializers);
>this.nestedSerializersAndConfigs = new 
> ArrayList<>(nestedSerializers.length);
>for (TypeSerializer nestedSerializer : nestedSerializers) {
>   TypeSerializerConfigSnapshot configSnapshot = 
> nestedSerializer.snapshotConfiguration();
>   this.nestedSerializersAndConfigs.add(
>  new Tuple2, TypeSerializerConfigSnapshot>(
> nestedSerializer.duplicate(),
> Preconditions.checkNotNull(configSnapshot)));
>}
> }
> {code}
> exception happens at : 
> {code:java}
> TypeSerializerConfigSnapshot configSnapshot = 
> nestedSerializer.snapshotConfiguration();
> {code}
> the reason is the type of constructor's parameter "..." used "varargs" 
> feature. The  initialize code in *CRowSerializer.scala* is : 
> {code:java}
> def this() = this(null)// Scala code
> {code}
> when invoked this, actually the the type of 
> CompositeTypeSerializerConfigSnapshot's
> nestedSerializers parameter is :
> {code:java}
> TypeSerializer[] nestedSerializers = new TypeSerializer[] {null};
> {code}
> so the checkNotNull precondition statement :
> {code:java}
> Preconditions.checkNotNull(nestedSerializers);
> {code}
> is always useless.
> So we should check the object reference in _for_ loop to protect NPE. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #6231: [FLINK-9694] Potentially NPE in CompositeTypeSerializerCo...

2018-07-19 Thread yanghua
Github user yanghua commented on the issue:

https://github.com/apache/flink/pull/6231
  
hi @pnowojski I did not call the 
`CompositeTypeSerializerConfigSnapshot(TypeSerializer... nestedSerializers)` 
constructor explicitly, the caller is Flink itself, see 
[here](https://github.com/apache/flink/blob/master/flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/types/CRowSerializer.scala#L123).
 And I just fix the NPE in this case : 

```scala
def this() = this(null)//scala
```
but it does not means : 

```
CompositeTypeSerializerConfigSnapshot(null);//java
```

it seems means : 

```
CompositeTypeSerializerConfigSnapshot(new TypeSerializer[] {null})
//java
```

so it jumps the preconditions not null check : 

```
Preconditions.checkNotNull(nestedSerializers);//java
```
then coursed NPE in the `for` loop 
[here](https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/common/typeutils/CompositeTypeSerializerConfigSnapshot.java#L53).

I think it is a defensive check, then it's OK in our inner Flink version 
(in the previous comment, I said we customized table to provide stream and 
dimension table join).



---


[jira] [Updated] (FLINK-9838) Slot request failed Exceptions after completing a job

2018-07-19 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-9838:
--
Labels: pull-request-available  (was: )

> Slot request failed Exceptions after completing a job
> -
>
> Key: FLINK-9838
> URL: https://issues.apache.org/jira/browse/FLINK-9838
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Till Rohrmann
>Priority: Major
>  Labels: pull-request-available
>
> Currently, after a job finished, e.g. the following one, several exceptions 
> are logged (at INFO level) about failed slot requests although the job has 
> run successfully.
> {code}
> StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
> env.fromElements(1, 2, 3, 4).print();
> env.execute();
> {code}
> {code}
> 16:28:16,106 INFO  
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  - Closing 
> the SlotManager.
> 16:28:16,106 INFO  
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  - 
> Suspending the SlotManager.
> 16:28:16,106 INFO  
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  - 
> Unregister TaskManager aa20e76adb9aee0cdadc50dbc06ea208 from the SlotManager.
> 16:28:16,107 INFO  
> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  - Slot 
> request with allocation id f99ff6d66f7bc618a9ee6e9470e0cdb1 for job 
> 1bdaafd1072e210790790b99e7741b6a failed.
> org.apache.flink.util.FlinkException: The assigned slot 
> b21f8807-5d0a-4e53-9e55-b6522b4a41c0_0 was removed.
>   at 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlot(SlotManager.java:786)
>   at 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlots(SlotManager.java:756)
>   at 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.internalUnregisterTaskManager(SlotManager.java:948)
>   at 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.unregisterTaskManager(SlotManager.java:372)
>   at 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.suspend(SlotManager.java:234)
>   at 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.close(SlotManager.java:251)
>   at 
> org.apache.flink.runtime.resourcemanager.ResourceManager.postStop(ResourceManager.java:224)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.postStop(AkkaRpcActor.java:105)
>   at 
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.postStop(FencedAkkaRpcActor.java:40)
>   at akka.actor.Actor$class.aroundPostStop(Actor.scala:515)
>   at akka.actor.UntypedActor.aroundPostStop(UntypedActor.scala:95)
>   at 
> akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)
>   at 
> akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)
>   at akka.actor.ActorCell.terminate(ActorCell.scala:374)
>   at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:467)
>   at akka.actor.ActorCell.systemInvoke(ActorCell.scala:483)
>   at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:282)
>   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:260)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:224)
>   at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 16:28:16,109 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor 
>- Stopping TaskExecutor akka://flink/user/taskmanager_0.
> 16:28:16,110 INFO  
> org.apache.flink.runtime.state.TaskExecutorLocalStateStoresManager  - 
> Shutting down TaskExecutorLocalStateStoresManager.
> 16:28:16,109 INFO  org.apache.flink.runtime.dispatcher.StandaloneDispatcher   
>- Stopping dispatcher 
> akka://flink/user/dispatcher421f3c27-5248-40d4-b219-f0c23480bd6f.
> 16:28:16,111 INFO  org.apache.flink.runtime.dispatcher.StandaloneDispatcher   
>- Stopping all currently running jobs of dispatcher 
> akka://flink/user/dispatcher421f3c27-5248-40d4-b219-f0c23480bd6f.
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9838) Slot request failed Exceptions after completing a job

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549139#comment-16549139
 ] 

ASF GitHub Bot commented on FLINK-9838:
---

GitHub user tillrohrmann opened a pull request:

https://github.com/apache/flink/pull/6373

[FLINK-9838][logging] Don't log slot request failures on the ResourceManager

## What is the purpose of the change

Decrease log cluttering by not logging slot request failures on the 
`ResourceManager`.

## Verifying this change

- Verified manually

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): (no)
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
  - The serializers: (no)
  - The runtime per-record code paths (performance sensitive): (no)
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
  - The S3 file system connector: (no)

## Documentation

  - Does this pull request introduce a new feature? (no)
  - If yes, how is the feature documented? (not applicable)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tillrohrmann/flink 
fixSlotAllocationFailureLogging

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/6373.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6373


commit f02f0231ba1fe047086fdf90227bf4dac9697d87
Author: Till Rohrmann 
Date:   2018-07-19T11:07:44Z

[FLINK-9838][logging] Don't log slot request failures on the ResourceManager




> Slot request failed Exceptions after completing a job
> -
>
> Key: FLINK-9838
> URL: https://issues.apache.org/jira/browse/FLINK-9838
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Till Rohrmann
>Priority: Major
>  Labels: pull-request-available
>
> Currently, after a job finished, e.g. the following one, several exceptions 
> are logged (at INFO level) about failed slot requests although the job has 
> run successfully.
> {code}
> StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
> env.fromElements(1, 2, 3, 4).print();
> env.execute();
> {code}
> {code}
> 16:28:16,106 INFO  
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  - Closing 
> the SlotManager.
> 16:28:16,106 INFO  
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  - 
> Suspending the SlotManager.
> 16:28:16,106 INFO  
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  - 
> Unregister TaskManager aa20e76adb9aee0cdadc50dbc06ea208 from the SlotManager.
> 16:28:16,107 INFO  
> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  - Slot 
> request with allocation id f99ff6d66f7bc618a9ee6e9470e0cdb1 for job 
> 1bdaafd1072e210790790b99e7741b6a failed.
> org.apache.flink.util.FlinkException: The assigned slot 
> b21f8807-5d0a-4e53-9e55-b6522b4a41c0_0 was removed.
>   at 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlot(SlotManager.java:786)
>   at 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlots(SlotManager.java:756)
>   at 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.internalUnregisterTaskManager(SlotManager.java:948)
>   at 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.unregisterTaskManager(SlotManager.java:372)
>   at 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.suspend(SlotManager.java:234)
>   at 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.close(SlotManager.java:251)
>   at 
> org.apache.flink.runtime.resourcemanager.ResourceManager.postStop(ResourceManager.java:224)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.postStop(AkkaRpcActor.java:105)
>   at 
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.postStop(FencedAkkaRpcActor.java:40)
>   at akka.actor.Actor$class.aroundPostStop(Actor.scala:515)
>   at akka.actor.UntypedActor.aroundPostStop(UntypedActor.scala:95)
>   at 
> akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)
>   at 
> akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)
>   at akka.actor.ActorCell.terminate(ActorCell.scala:374)
>   at 

[GitHub] flink pull request #6373: [FLINK-9838][logging] Don't log slot request failu...

2018-07-19 Thread tillrohrmann
GitHub user tillrohrmann opened a pull request:

https://github.com/apache/flink/pull/6373

[FLINK-9838][logging] Don't log slot request failures on the ResourceManager

## What is the purpose of the change

Decrease log cluttering by not logging slot request failures on the 
`ResourceManager`.

## Verifying this change

- Verified manually

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): (no)
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
  - The serializers: (no)
  - The runtime per-record code paths (performance sensitive): (no)
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
  - The S3 file system connector: (no)

## Documentation

  - Does this pull request introduce a new feature? (no)
  - If yes, how is the feature documented? (not applicable)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tillrohrmann/flink 
fixSlotAllocationFailureLogging

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/6373.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6373


commit f02f0231ba1fe047086fdf90227bf4dac9697d87
Author: Till Rohrmann 
Date:   2018-07-19T11:07:44Z

[FLINK-9838][logging] Don't log slot request failures on the ResourceManager




---


  1   2   3   >